CN106292288A - Model parameter correction method based on Policy-Gradient learning method and application thereof - Google Patents

Model parameter correction method based on Policy-Gradient learning method and application thereof Download PDF

Info

Publication number
CN106292288A
CN106292288A CN201610841970.3A CN201610841970A CN106292288A CN 106292288 A CN106292288 A CN 106292288A CN 201610841970 A CN201610841970 A CN 201610841970A CN 106292288 A CN106292288 A CN 106292288A
Authority
CN
China
Prior art keywords
robot
theta
delta
correction
model parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610841970.3A
Other languages
Chinese (zh)
Other versions
CN106292288B (en
Inventor
陈启军
刘成菊
宁静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201610841970.3A priority Critical patent/CN106292288B/en
Publication of CN106292288A publication Critical patent/CN106292288A/en
Application granted granted Critical
Publication of CN106292288B publication Critical patent/CN106292288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

The present invention relates to model parameter correction method based on Policy-Gradient learning method and application thereof, this model parameter correction method comprises the following steps: S1: selecting inverted pendulum input parameter and robot trunk attitude parameter is correction, sets up the model parameter update equation of correction;S2: select the error of robot centroid tracking and the robot pose error relative to body erect state as the robot fitness index to current environment, set up fitness function;S3: according to fitness function, the gain coefficient in Utilization strategies Gradient learning method Optimized model parameters revision equation, the gain parameter after optimizing substitutes into model parameter update equation and obtains correction.Compared with prior art, quickly, robot can regulate gait and body posture under unknown disturbance quickly, in real time, improves adaptivity and the robustness of robot ambulation in strategy equation of the present invention convergence.

Description

Model parameter correction method based on strategy gradient learning method and application thereof
Technical Field
The invention relates to the technical field of robot walking control, in particular to a model parameter correction method based on a strategy gradient learning method and application thereof.
Background
In the walking problem of the robot, in order to generate stable gait, most of the current schemes abstract the robot into a simple physical model, such as a Linear Inverted Pendulum Model (LIPM), a table-trolley model, etc., simplify the motion equation of the robot by using the model, and perform offline trajectory planning. At present, most of learning methods are applied to a robot walking scheme, key parameters influencing gait are selected, optimization learning is directly carried out on the key parameters in a high-dimensional search space, and the robot is not subjected to abstract modeling, so that a large amount of off-line training or long-time on-line learning is required to be carried out, and a local optimal solution is searched for to ensure the walking stability of the robot.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a model parameter correction method based on a strategy gradient learning method and application thereof.
The purpose of the invention can be realized by the following technical scheme:
the model parameter correction method based on the strategy gradient learning method comprises the following steps:
s1: selecting an inverted pendulum input parameter and a robot trunk posture parameter as correction quantities, and establishing a model parameter correction equation of the correction quantities, wherein the model parameter correction equation contains a gain coefficient to be optimized;
s2: selecting the error of robot centroid tracking and the error of robot body posture relative to the upright state as the fitness index of the robot to the current environment, and establishing a fitness evaluation function;
s3: and optimizing a gain coefficient in the model parameter correction equation by using a strategy gradient learning method according to the fitness evaluation function, and substituting the optimized gain parameter into the model parameter correction equation to obtain the correction of the next single-leg support stage.
In step S1, the input parameters of the inverted pendulum selected as the correction amount include an x-axis step size and a y-axis step size, the posture parameters of the trunk of the robot selected as the correction amount include an x-axis trunk angle and a y-axis trunk angle, and the model parameter correction equation specifically includes:
Δs x = K 1 · 1 N Σ i = 1 N ( x f , x , i - x e , x , i ) + K 3 · 1 N Σ i = 1 N ( θ B , y , i - θ B , y , i r e f )
Δs y = K 2 · 1 N Σ i = 1 N ( x f , y , i - x e , y , i ) + K 4 · 1 N Σ i = 1 N ( θ B , x , i - θ B , x , i r e f )
Δθ B , x = K 5 · 1 N Σ i = 1 N ( p L H i p , z , i - p R H i p , z , i )
Δθ B , y = K 6 · 1 N Σ i = 1 N ( p S u p p F o o t , x , i - p H e a d , x , i )
wherein, subscripts x, y and z respectively represent the axial directions of x, y and z, s is the step size, Delta s is the correction quantity of the step size, and thetaBIs the angle of the trunk, Δ θBFor correction of torso angle, N is the interpolated step number for a single-support stage, with the index i representing the ith step number in the single-support stage, xfAs an estimate of the centroid after Kalman filtering, xeIs an ideal value of the center of mass,the angle of inclination when the trunk is upright, pRHipAnd pLHipDisplacement of hip joints of the right leg and the left leg of the robot, pHeadAnd pSuppFootAre respectively a robot headDisplacement of joints and supporting feet, K1,...,K6Is a gain parameter.
The fitness evaluation function F (K) The method specifically comprises the following steps:
F ( K ‾ ) = α x ( | Δs x | + | Δ x ‾ x | ) + α y ( | Δs y | + | Δ x ‾ y | ) + β x ( | Δθ B , x | + | Δ θ ‾ B , x | ) + β y ( | Δθ B , y | + | Δ θ ‾ B , y | )
Δ x ‾ x = 1 N Σ i = 1 N ( x f , x , i - x e , x , i )
Δ x ‾ y = 1 N Σ i = 1 N ( x f , y , i - x e , y , i )
Δ θ ‾ B , x = 1 N Σ i = 1 N ( θ B , x , i - θ B , x , i r e f )
Δ θ ‾ B , y = 1 N Σ i = 1 N ( θ B , y , i - θ B , y , i r e f )
wherein,K={K1,...,K6denotes a gain parameter set, αx、αy、βxAnd βyIs a weight factor and satisfies αxy=1,βxyThe smaller the value of the fitness evaluation function is, the higher the fitness of the robot under the gain parameter set is.
The strategy gradient learning method comprises the following specific steps:
301: in the k-th iteration, the set of gain parameters obtained for the last iterationK k-1Calculating F: (K) In thatK k-1Partial derivatives at each parameter value within and inK k-1Randomly generating n strategies nearby to obtain strategy setm K k-1(m 1.. n), the number n of strategies is proportional to the search space, and the generation formula of the strategy set is as follows:
m K k-1K k-1+m ρ
wherein,m ρ(m 1.. n.) denotes a set of perturbations, each perturbation ρ in the set of perturbationsmIn the set { -em,0,+emRandomly selected of }, emRepresents the corresponding pmThe disturbance gain parameter of (1);
302: according to the disturbance ρmOf (e) am,0,+emThe value taking situation willm K k-1The corresponding groups are three groups:G0andwill be provided withm K k-1Substituting the fitness evaluation function to obtain an average value corresponding to each group:and
303: calculating approximated gradient valuesIf it isAnd isOtherwise
304: to pairPerforming an orthogonalization process, multiplying by a fixed step factor η to obtain a gradient valueSlave policy setK k-1Subtracting the gradient valueObtaining the strategy set of the iterationK kAnd use ofK kCarrying out the next iteration;
305: when the number of iterations reaches a preset value NiterWhen so, the iteration ends.
A model parameter corrector based on the model parameter correction method outputs a correction amount deltas of the step size and a correction amount deltatheta of the trunk angleBAnd the output of the model parameter corrector is transmitted to the inverted pendulum model and the robot model for compensation, and the following formula is satisfied:
s ‾ = s - Δ s
θB,i=θB,i-1-ΔθB
wherein,for the step size of the next single-foot supporting stage, each single-foot supporting stage is compensated once; and thetaB,iFor the torso angle of frame i of the single-foot support phase, each single-foot support phase is compensated N times.
A robot walking control method using the model parameter corrector comprises the following steps:
1) planning a barycenter track of the robot by using the inverted pendulum model, further planning a corresponding foot track, then performing inverse kinematics calculation by using a decomposition speed control method to obtain a joint speed of the robot, and controlling the robot to walk by using the robot model according to the joint speed;
2) two closed loops are designed, the first closed loop: measuring the motion of the mass center according to the joint space state of the robot to obtain an actual value of the mass center, performing Kalman filtering on the actual value of the mass center to obtain an estimated value of the mass center, and performing self-correction on an input parameter of the inverted pendulum in the inverted pendulum model by using the estimated value of the mass center after the Kalman filtering;
a second closed loop: and compensating the input parameters of the inverted pendulum in the inverted pendulum model and the trunk angle of the robot in the robot model by using the estimated value of the mass center after Kalman filtering and the measured trunk angle and using a model parameter corrector.
Compared with the prior art, the invention has the following advantages:
1) in order to avoid a large amount of on-line calculation, the invention selects key parameters influencing gait as inverted pendulum input parameters and robot body angles, finds the internal relation between the correction quantity and the robot model, and establishes a correction equation of the correction quantity and the robot model parameters, thereby indirectly optimizing and learning the gait parameters of the robot.
2) The invention enables the robot to be self-adaptively adjusted according to the external environment, selects the error of centroid tracking and the body upright error as the fitness index of the robot to the current environment, and establishes the fitness evaluation function, thereby improving the fitness of the robot under the gain parameter set.
3) According to the fitness evaluation function of the robot, a strategy gradient learning method is adopted to optimize the gain coefficient in the correction equation, correct gait key parameters, design a model parameter corrector, and are applied to robot walking control to separate the walking task and the optimization link of the robot, so that the calculation speed is greatly improved.
4) The invention realizes gait planning based on the inverted pendulum model, ensures the walking stability in advance, and the gait of the robot is adjustable in real time, thereby effectively improving the self-adaptability and robustness of the robot and being suitable for the walking of the robot under unknown disturbance.
Drawings
FIG. 1 is a fitness equation curve for a strategy gradient learning method;
FIG. 2 is a block diagram of a humanoid robot walking control using a model parameter corrector;
FIG. 3 is a diagram showing the effect of using a model parameter corrector to correct the body angle of a robot;
FIG. 4 is a diagram of the effect of correcting the planned trajectory of the centroid of the inverted pendulum by using the model parameter corrector, wherein (4a) is a diagram of the effect of correcting the planned trajectory of the centroid of the inverted pendulum at the x-axis position, and (4b) is a diagram of the effect of correcting the planned trajectory of the centroid of the inverted pendulum at the y-axis position;
fig. 5 is a diagram of the effect of the correction of the robot foot placement point by using the model parameter corrector, wherein (5a) is a diagram of the effect of the correction of the robot foot placement point in the x-direction position, and (5b) is a diagram of the effect of the correction of the robot foot placement point in the y-direction position.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
The model parameter correction method, the model parameter corrector design method and the application of the model parameter corrector in the walking control of the NAO robot, which are provided by the invention and are based on the strategy gradient learning method, are explained by taking the NAO of the humanoid robot as an example.
A model parameter correction method based on a strategy gradient learning method is used for correcting key parameters in an inverted pendulum model and a robot model and comprises the following steps:
s1: establishing a model parameter correction equation of correction quantity:
if a three-dimensional inverted pendulum model is adopted to abstract and simplify a robot model and plan a centroid track and a corresponding foot track of the robot, in input parameters of the inverted pendulum, parameters influencing gait of the robot are many, such as the step size s of the robot in the x direction and the y directionx、syHeight s of foot during walkingzAnd the height h of the inverted pendulum, etc. Besides, the robot is a high-dimensional model with multiple degrees of freedom, and the inverted pendulum model cannot be completely described, such as the posture theta of the trunkBAnd the like. Selecting x axial step size sxY axial step size syX axial torso angle θB,xY axial torso angle θB,yAs key parameters influencing gait, a model parameter correction equation is established as follows:
Δs x = K 1 · 1 N Σ i = 1 N ( x f , x , i - x e , x , i ) + K 3 · 1 N Σ i = 1 N ( θ B , y , i - θ B , y , i r e f )
Δs y = K 2 · 1 N Σ i = 1 N ( x f , y , i - x e , y , i ) + K 4 · 1 N Σ i = 1 N ( θ B , x , i - θ B , x , i r e f )
Δθ B , x = K 5 · 1 N Σ i = 1 N ( p L H i p , z , i - p R H i p , z , i )
Δθ B , y = K 6 · 1 N Σ i = 1 N ( p S u p p F o o t , x , i - p H e a d , x , i )
wherein, subscripts x, y and z respectively represent the axial directions of x, y and z, s is the step size, Delta s is the correction quantity of the step size, and thetaBIs the angle of the trunk, Δ θBThe correction of the trunk angle, N is the interpolation step number of a single-foot supporting stage,tbis the starting time of the single-foot support phase, teFor the end time of the single-foot support phase, Δ T is the sampling time, and the subscript i indicates the ith number of steps in the single-foot support phase, xfAs an estimate of the centroid after Kalman filtering, xeIs an ideal value of the center of mass,the angle of inclination when the trunk is upright, pRHipAnd pLHipDisplacement of Hip joints (Hip joints) of the right leg and the left leg of the robot, pHeadAnd pSuppFootDisplacement of the robot head joint and the supporting foot, respectively, K1,...,K6Is a gain parameter.
The design of the parameter correction equation considers the correction of the foot drop points of the robot in the x and y directions and the correction of the vertical degree of the trunk. Step correction amount Δ s in x directionxError in x-axis centroid and y-axis torso angle, since y-axis torso angle represents body tilt; step correction amount Δ s in y directionyError associated with the y-direction centroid and x-direction torso angle, since the x-direction torso angle represents body side-to-side tilt; x-axis torso angle correction Δ θB,xThe Hip joint height difference of the left leg and the right leg is related to the Hip joint height difference, because when the robot needs to keep feet on the ground or perform walking tasks, if the body inclines left and right, the Hip joint height difference is caused, and for the robot without the driving joint at the waist, the inclination of the body is actually mainly generated by the Hip joint height difference; y-axial torso angle correction Δ θB,yThe reason is that for a robot without a driving joint at the waist, when the body of the robot tilts back and forth, the robot can approximately equivalently rotate by a connecting rod taking the supporting leg as a fulcrum, the original point of the connecting rod is the supporting leg, the tail end of the connecting rod is the head, and the magnitude of the distance difference between the head joint and the supporting leg in the x direction represents the tilting degree of the body.
S2: establishing a fitness evaluation function:
after the robot is disturbed by unknown, if the robot can keep higher centroid tracking precision and better body upright posture through adjustment, the fitness of the robot is higher, so that the error of the centroid tracking of the robot and the error of the body posture of the robot relative to the upright state are selected as the fitness indexes of the robot to the current environment, and a fitness evaluation function F (a fitness evaluation function F) is establishedK) The following were used:
F ( K ‾ ) = α x ( | Δs x | + | Δ x ‾ x | ) + α y ( | Δs y | + | Δ x ‾ y | ) + β x ( | Δθ B , x | + | Δ θ ‾ B , x | ) + β y ( | Δθ B , y | + | Δ θ ‾ B , y | )
Δ x ‾ x = 1 N Σ i = 1 N ( x f , x , i - x e , x , i )
Δ x ‾ y = 1 N Σ i = 1 N ( x f , y , i - x e , y , i )
Δ θ ‾ B , x = 1 N Σ i = 1 N ( θ B , x , i - θ B , x , i r e f )
Δ θ ‾ B , y = 1 N Σ i = 1 N ( θ B , y , i - θ B , y , i r e f )
wherein,K={K1,...,K6denotes a gain parameter set, αx、αy、βxAnd βyThe weight factors respectively represent the weight occupied by the mass center error in the x direction and the y direction and the weight occupied by the inclination angle error of the body in the x direction and the y direction, and satisfy αxy=1,βxyThe smaller the value of the fitness evaluation function is, the higher the fitness of the robot under the gain parameter set is.
Each term of the fitness evaluation function comprises a compensation quantity absolute value and an average error mean value, and the absolute value output by the compensator, the absolute value of the centroid error and the absolute value of the body inclination angle error are linearly superposed by different weights, wherein the first two terms represent a centroid following effect, and the second two terms represent a body standing effect. In the process of strategy gradient learning, if the value of the equation of the fitness evaluation function is gradually reduced, the error of centroid following and the deviation of body inclination are gradually reduced, and the fitness of the robot under the current parameter set is gradually enhanced; the correction quantity of the input parameters of the inverted pendulum and the correction quantity of the inclination angle of the body cannot be too large, so that the joint space of the robot is ensured to be arranged in a reasonable range. Meanwhile, it is noted that the objective equation is not directly related to the robot walking time, but the better the standing effect of the body, the longer the robot can perform the centroid tracking.
S3: model parameter optimization learning:
parameter set of gain using strategic gradient learningKAnd optimizing, sequentially assigning values to each parameter in the parameter set, and calculating a fitness function value according to the gain parameter set and the current state of the robot. The basic principle of the strategy gradient learning method is as follows: suppose the objective equation F: (K) For theKEach parameter is derivable by calculating F: (K) Thereby obtaining a locally optimal solutionK *Then, the specific steps of the strategy gradient learning method are as follows:
301: in the k-th iteration, the set of gain parameters obtained for the last iterationK k-1Calculating F: (K) In thatK k-1Partial derivatives at each parameter value within and inK k-1Randomly generating n strategies nearby to obtain strategy setm K k-1(m 1.. n.) denotes a policy setm K k-1The number n of strategies is in direct proportion to the search space for n-dimensional vectors, and the generation formula of the strategy set is as follows:
m K k-1K k-1+m ρ
wherein,m ρ(m 1.. n.) denotes a set of disturbances, which is a set of disturbancesm ρFor n-dimensional vectors, each perturbation ρ in the set of perturbationsmIn the set { -em,0,+emRandomly selected of }, emRepresents the corresponding pmThe set disturbance gain parameter of (1), the disturbance ρ in the disturbance setmAnd policy setm K k-1The strategies in (1) correspond one to one;
302: according to the disturbance ρmIf the value of (A) is negative, zero or positive, the value of (B) will bem K k-1The corresponding groups are three groups:G0andwill be provided withm K k-1Substituting the fitness evaluation function to obtain an average value corresponding to each group:and
303: according toAndcalculating approximated gradient valuesThe method specifically comprises the following steps: if it isAnd isOtherwise
304: to pairPerforming an orthogonalization process, multiplying by a fixed step factor η to obtain a gradient valueSlave policy setK k-1Subtracting the gradient valueObtaining the strategy set of the iterationK kAnd use ofK kCarrying out the next iteration;
305: when the number of iterations reaches a preset value NiterWhen the iteration is over, if NiterLarge enough to guarantee the solution obtainedK *Is a locally optimal solution.
Obtaining local optimum gainK *And then, substituting the model parameter correction equation to obtain the correction quantity of the next single-foot supporting stage, wherein the local optimal correction quantity is used for correcting the input parameters of the inverted pendulum of the robot and the trunk angle.
Correction quantity delta s of step size and correction quantity delta theta of trunk angle are designed and outputBThe output of the model parameter corrector is transmitted to the inverted pendulum model and the robot model for compensation, and the following formula is satisfied:
s ‾ = s - Δ s
θB,i=θB,i-1-ΔθB
wherein,the step size of the next single-foot supporting stage is used as the input parameter of the corrected inverted pendulum model, and each single-foot supporting stage is compensated once; and thetaB,iFor the torso angle of frame i of the single-foot support phase, each single-foot support phase is compensated N times.
FIG. 2 is a block diagram of robot walk control using a model parameter corrector, xmFor the centroid measurement, ignoring measurement errors, it is understood to be the actual value of the centroid,the robot walking control method using the model parameter corrector for the joint velocity of the robot includes the following stepsThe method comprises the following steps:
1) planning a barycenter track of the robot by using the inverted pendulum model, further planning a corresponding foot track, then performing inverse kinematics calculation by using a decomposition speed control method to obtain a joint speed of the robot, and controlling the robot to walk by using the robot model according to the joint speed;
2) the walking task and the optimization link of the robot are separated, two closed-loop loops are designed, wherein the first closed-loop is as follows: measuring the motion of the mass center according to the joint space state of the robot to obtain an actual value of the mass center, obtaining an estimated value of the mass center after the actual value of the mass center passes through a Kalman filter, and performing self-correction on an input parameter of the inverted pendulum in the inverted pendulum model by using the estimated value of the mass center after the Kalman filter;
a second closed loop: and compensating the input parameters of the inverted pendulum in the inverted pendulum model and the robot body angle in the robot model by using the actual mass center motion and the body angle measured by the inertial unit and using a model parameter corrector based on a strategy gradient learning method.
When the parameter compensator is called at a certain time in the walking process of the robot, the fitness function value curve of the strategy gradient learning method is shown in fig. 1, the fitness function curve integrally has a descending trend, the error of centroid tracking and the error of trunk angle are gradually reduced, and the fitness of the robot is gradually enhanced, so that the parameter corrector is effective in correcting the model.
If a force with a peak value of 6.44N and a duration of about 0.5s is applied to the robot at 9s, the direction is mainly along the positive direction of the y axis, and the point of application is the chest part of the robot. In order to adopt the correction effect of the model parameter corrector on the trunk angle of the robot as shown in fig. 3, it can be seen that the robot inclines to the left rear after being subjected to the external force, the trunk angle curve in the x-axis direction (x-axis) recovers the periodic fluctuation when the robot normally walks after being adjusted for about 9s, and the trunk angle curve in the y-axis direction (y-axis) recovers the normal waveform after being adjusted for about 1 s. Since the robot inertial unit sensor cannot measure the trunk angle in the z-axis (z-axis), and the z-axis trunk angle compensation amount is not added in the design of the parameter compensator, the z-axis trunk angle is always 0.
The model parameter corrector is adopted to correct the planned trajectory of the mass center of the inverted pendulum and the actually Measured mass center trajectory are shown in fig. 4, the Expected CoM is an ideal mass center trajectory, the Measured CoM is an actually Measured mass center trajectory, the ideal mass center trajectory has certain change under the action of the compensator, the mass center has good tracking effect, and the external force action has no influence on the mass center tracking basically. The input parameters of the inverted pendulum are corrected again under the action of external force, and the starting time and the ending time of the single-foot supporting stage need to be recalculated, so that the fluctuation of the mass center in the y direction is caused when the supporting legs are switched, but the mass center track in the y direction is restored to a smooth state again through two-step adjustment.
The effect of the model parameter corrector on the robot foot-drop point is shown in fig. 5 (relative to the central point of two feet when the robot initially stands, the left direction is the x direction, and the right direction is the y direction), the Reference position is the curve (point line) of the foot-drop point of the robot when the corrector is not used, the Measured position is the actual Measured position, and the robot walks forward in a fixed manner. After the parameter corrector is added, the size of the x-direction step of the robot is reduced after the robot is subjected to external force, and the walking distance in the x-direction is reduced after 20 steps of walking is finished; in the y direction, when the robot is subjected to an external force to the left, the body inclines to the left, so that the center of mass deviates to the left, the robot continuously steps to the left, the compensation amount of the first two steps is large, the later steps are upright because the trunk is restored, and the step size in the y direction is gradually restored to the size before the external force is applied.

Claims (6)

1. The model parameter correction method based on the strategy gradient learning method is characterized by comprising the following steps of:
s1: selecting an inverted pendulum input parameter and a robot trunk posture parameter as correction quantities, and establishing a model parameter correction equation of the correction quantities, wherein the model parameter correction equation contains a gain coefficient to be optimized;
s2: selecting the error of robot centroid tracking and the error of robot body posture relative to the upright state as the fitness index of the robot to the current environment, and establishing a fitness evaluation function;
s3: and optimizing a gain coefficient in the model parameter correction equation by using a strategy gradient learning method according to the fitness evaluation function, and substituting the optimized gain parameter into the model parameter correction equation to obtain the correction quantity.
2. The method of claim 1, wherein in step S1, the input parameters of the inverted pendulum selected as the correction amount include x-axis step size and y-axis step size, the posture parameters of the robot trunk selected as the correction amount include x-axis trunk angle and y-axis trunk angle, and the model parameter correction equation is specifically:
Δs x = K 1 · 1 N Σ i = 1 N ( x f , x , i - x e , x , i ) + K 3 · 1 N Σ i = 1 N ( θ B , y , i - θ B , y , i r e f )
Δs y = K 2 · 1 N Σ i = 1 N ( x f , y , i - x e , y , i ) + K 4 · 1 N Σ i = 1 N ( θ B , x , i - θ B , x , i r e f )
Δθ B , x = K 5 · 1 N Σ i = 1 N ( p L H i p , z , i - p R H i p , z , i )
Δθ B , y = K 6 · 1 N Σ i = 1 N ( p S u p p F o o t , x , i - p H e a d , x , i )
wherein, subscripts x, y and z respectively represent the axial directions of x, y and z, s is the step size, Delta s is the correction quantity of the step size, and thetaBIs the angle of the trunk, Δ θBFor correction of torso angle, N is the interpolated step number for a single-support stage, with the index i representing the ith step number in the single-support stage, xfAs an estimate of the centroid after Kalman filtering, xeIs an ideal value of the center of mass,the angle of inclination when the trunk is upright, pRHipAnd pLHipDisplacement of hip joints of the right leg and the left leg of the robot, pHeadAnd pSuppFootDisplacement of the robot head joint and the supporting foot, respectively, K1,...,K6Is a gain parameter.
3. The model parameter modification method based on the strategy gradient learning method according to claim 2, wherein the fitness evaluation function F (b) isK) The method specifically comprises the following steps:
F ( K ‾ ) = α x ( | Δs x | + | Δ x ‾ x | ) + α y ( | Δs y | + | Δ x ‾ y | ) + β x ( | Δθ B , x | + | Δ θ ‾ B , x | ) + β y ( | Δθ B , y | + | Δ θ ‾ B , y | )
Δ x ‾ x = 1 N Σ i = 1 N ( x f , x , i - x e , x , i )
Δ x ‾ y = 1 N Σ i = 1 N ( x f , y , i - x e , y , i )
Δ θ ‾ B , x = 1 N Σ i = 1 N ( θ B , x , i - θ B , x , i r e f )
Δ θ ‾ B , y = 1 N Σ i = 1 N ( θ B , y , i - θ B , y , i r e f )
wherein,K={K1,...,K6denotes a gain parameter set, αx、αy、βxAnd βyIs a weight factor and satisfies αxy=1,βxyThe smaller the value of the fitness evaluation function is, the higher the fitness of the robot under the gain parameter set is.
4. The model parameter modification method based on the strategy gradient learning method according to claim 3, wherein the strategy gradient learning method comprises the following specific steps:
301: in the k-th iteration, the set of gain parameters obtained for the last iterationK k-1And is incorporated inK k-1Randomly generating n strategies nearby to obtain strategy setm K k-1(m 1.. n), the number n of strategies is proportional to the search space, and the generation formula of the strategy set is as follows:
m K k-1K k-1+m ρ
wherein,m ρ(m 1.. n.) denotes a set of perturbations, each perturbation ρ in the set of perturbationsmIn the set { -em,0,+emRandomly selected of }, emRepresents the corresponding pmThe disturbance gain parameter of (1);
302: according to the disturbance ρmOf (e) am,0,+emThe value taking situation willThe corresponding groups are three groups:G0andwill be provided withm K k-1Substituting the fitness evaluation function to obtain an average value corresponding to each group:and
303: calculating approximated gradient valuesIf it isAnd isOtherwise
304: to pairPerforming an orthogonalization process, multiplying by a fixed step factor η to obtain a gradient valueSlave policy setK k-1Subtracting the gradient valueObtaining the strategy set of the iterationK kAnd use ofK kCarrying out the next iteration;
305: when the number of iterations reaches a preset value NiterWhen so, the iteration ends.
5. A model parameter modifier based on the method of claim 2, wherein the model parameter modifier outputs a correction amount as the step size and a correction amount as the torso angleBThe output of the model parameter corrector is transmitted toThe inverted pendulum model and the robot model are compensated, and the following formula is satisfied:
s ‾ = s - Δ s
θB,i=θB,i-1-ΔθB
wherein,for the step size of the next single-foot supporting stage, each single-foot supporting stage is compensated once; and thetaB,iFor the torso angle of frame i of the single-foot support phase, each single-foot support phase is compensated N times.
6. A robot walking control method using the model parameter corrector as set forth in claim 5, comprising the steps of:
1) planning a barycenter track of the robot by using the inverted pendulum model, further planning a corresponding foot track, then performing inverse kinematics calculation by using a decomposition speed control method to obtain a joint speed of the robot, and controlling the robot to walk by using the robot model according to the joint speed;
2) two closed loops are designed, the first closed loop: measuring the motion of the mass center according to the joint space state of the robot to obtain an actual value of the mass center, performing Kalman filtering on the actual value of the mass center to obtain an estimated value of the mass center, and performing self-correction on an input parameter of the inverted pendulum in the inverted pendulum model by using the estimated value of the mass center after the Kalman filtering;
a second closed loop: and compensating the input parameters of the inverted pendulum in the inverted pendulum model and the trunk angle of the robot in the robot model by using the estimated value of the mass center after Kalman filtering and the measured trunk angle and using a model parameter corrector.
CN201610841970.3A 2016-09-22 2016-09-22 Model parameter correction method and corrector based on Policy-Gradient learning method Active CN106292288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610841970.3A CN106292288B (en) 2016-09-22 2016-09-22 Model parameter correction method and corrector based on Policy-Gradient learning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610841970.3A CN106292288B (en) 2016-09-22 2016-09-22 Model parameter correction method and corrector based on Policy-Gradient learning method

Publications (2)

Publication Number Publication Date
CN106292288A true CN106292288A (en) 2017-01-04
CN106292288B CN106292288B (en) 2017-10-24

Family

ID=57712212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610841970.3A Active CN106292288B (en) 2016-09-22 2016-09-22 Model parameter correction method and corrector based on Policy-Gradient learning method

Country Status (1)

Country Link
CN (1) CN106292288B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315346A (en) * 2017-06-23 2017-11-03 武汉工程大学 A kind of humanoid robot gait's planing method based on CPG models
CN107891920A (en) * 2017-11-08 2018-04-10 北京理工大学 A kind of leg joint offset angle automatic obtaining method for biped robot
CN108646797A (en) * 2018-04-27 2018-10-12 宁波工程学院 A kind of multi-line cutting machine tension control method based on genetic optimization
CN109976333A (en) * 2019-02-21 2019-07-05 南京邮电大学 A kind of apery Soccer robot omnidirectional walking manner
CN113569653A (en) * 2021-06-30 2021-10-29 宁波春建电子科技有限公司 Three-dimensional head posture estimation algorithm based on facial feature information
WO2022205840A1 (en) * 2021-03-30 2022-10-06 深圳市优必选科技股份有限公司 Robot and gait control method and apparatus therefor

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008071352A (en) * 2006-09-13 2008-03-27 Samsung Electronics Co Ltd Device and method for estimating attitude of mobile robot
CN101509781A (en) * 2009-03-20 2009-08-19 同济大学 Walking robot positioning system based on monocular cam
JP2013132731A (en) * 2011-12-27 2013-07-08 Seiko Epson Corp Robot control system, robot system and robot control method
CN104020772A (en) * 2014-06-17 2014-09-03 哈尔滨工程大学 Complex-shaped objective genetic path planning method based on kinematics
CN104318071A (en) * 2014-09-30 2015-01-28 同济大学 Robot walking control method based on linear foothold compensator
CN104898672A (en) * 2015-05-12 2015-09-09 北京理工大学 Optimized control method of humanoid robot walking track

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008071352A (en) * 2006-09-13 2008-03-27 Samsung Electronics Co Ltd Device and method for estimating attitude of mobile robot
CN101509781A (en) * 2009-03-20 2009-08-19 同济大学 Walking robot positioning system based on monocular cam
JP2013132731A (en) * 2011-12-27 2013-07-08 Seiko Epson Corp Robot control system, robot system and robot control method
CN104020772A (en) * 2014-06-17 2014-09-03 哈尔滨工程大学 Complex-shaped objective genetic path planning method based on kinematics
CN104318071A (en) * 2014-09-30 2015-01-28 同济大学 Robot walking control method based on linear foothold compensator
CN104898672A (en) * 2015-05-12 2015-09-09 北京理工大学 Optimized control method of humanoid robot walking track

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315346A (en) * 2017-06-23 2017-11-03 武汉工程大学 A kind of humanoid robot gait's planing method based on CPG models
CN107315346B (en) * 2017-06-23 2020-01-14 武汉工程大学 Humanoid robot gait planning method based on CPG model
CN107891920A (en) * 2017-11-08 2018-04-10 北京理工大学 A kind of leg joint offset angle automatic obtaining method for biped robot
CN108646797A (en) * 2018-04-27 2018-10-12 宁波工程学院 A kind of multi-line cutting machine tension control method based on genetic optimization
CN109976333A (en) * 2019-02-21 2019-07-05 南京邮电大学 A kind of apery Soccer robot omnidirectional walking manner
CN109976333B (en) * 2019-02-21 2022-11-01 南京邮电大学 Omnidirectional walking method of humanoid football robot
WO2022205840A1 (en) * 2021-03-30 2022-10-06 深圳市优必选科技股份有限公司 Robot and gait control method and apparatus therefor
CN113569653A (en) * 2021-06-30 2021-10-29 宁波春建电子科技有限公司 Three-dimensional head posture estimation algorithm based on facial feature information

Also Published As

Publication number Publication date
CN106292288B (en) 2017-10-24

Similar Documents

Publication Publication Date Title
CN106292288B (en) Model parameter correction method and corrector based on Policy-Gradient learning method
CN108858208B (en) Self-adaptive balance control method, device and system for humanoid robot in complex terrain
CN106886155B (en) Four-legged robot motion trajectory control method based on PSO-PD neural network
CN108572553B (en) Motion closed-loop control method of quadruped robot
CN107891920B (en) Automatic acquisition method for leg joint compensation angle of biped robot
CN113485398B (en) Gesture control method for wheeled biped robot
CN111913490A (en) Drop foot adjustment-based dynamic gait stability control method and system for quadruped robot
CN109760761B (en) Four-footed robot motion control method based on bionics principle and intuition
CN112051741A (en) Dynamic motion generation and control method for biped robot
CN114248855B (en) Biped robot space domain gait planning and control method
CN111290272B (en) Attitude stationarity adjusting method based on multi-legged robot
JP6781101B2 (en) Non-linear system control method, biped robot control device, biped robot control method and its program
CN110244714B (en) Sliding mode control-based robot single-leg swinging phase double-closed-loop control method
CN107315346B (en) Humanoid robot gait planning method based on CPG model
Buschmann et al. Experiments in fast biped walking
CN114047697B (en) Four-foot robot balance inverted pendulum control method based on deep reinforcement learning
CN103112517A (en) Method and device for regulating body posture of four-foot robot
WO2024146206A1 (en) Whole-body compliance control method applied to rapid walking of biped robot
Han et al. A heuristic gait template planning and dynamic motion control for biped robots
CN116520869A (en) Gait planning method, system and equipment for biped humanoid robot
CN115755592B (en) Multi-mode control method for adjusting motion state of three-degree-of-freedom exoskeleton and exoskeleton
WO2023165192A1 (en) Robot control method and apparatus, and robot and computer-readable storage medium
Song et al. CPG-based control design for bipedal walking on unknown slope surfaces
Kim et al. A model predictive capture point control framework for robust humanoid balancing via ankle, hip, and stepping strategies
CN115951696A (en) Pavement attitude self-adaption method for quadruped robot

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant