CN108549237A

CN108549237A - Preview based on depth enhancing study controls humanoid robot gait's planing method

Info

Publication number: CN108549237A
Application number: CN201810465382.3A
Authority: CN
Inventors: 毕盛; 刘云达; 董敏; 张英杰; 闵华清
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-05-16
Filing date: 2018-05-16
Publication date: 2018-09-18
Anticipated expiration: 2038-05-16
Also published as: CN108549237B

Abstract

The invention discloses a kind of previews based on depth enhancing study to control humanoid robot gait's planing method, including step：1) status information is obtained by the sensor being assemblied on anthropomorphic robot；2) existing deeply learning network is improved, completely new state, action vector sum reward function are defined；3) output of preview controller is modified using the action of definition vector, calculates the angle of each steering engine of anthropomorphic robot both legs, instructs Humanoid Robot Based on Walking；4) during Humanoid Robot Based on Walking, with state, the deeply learning network of the value retrofit of action vector, reward function.The method of the present invention can effectively solve walk problem of the anthropomorphic robot under complex environment, and be tested on emulation platform and tangible machine people, demonstrate the validity of the method.

Description

Preview based on depth enhancing study controls humanoid robot gait's planing method

Technical field

The present invention relates to the technical fields of anthropomorphic robot, refer in particular to a kind of preview control based on depth enhancing study Humanoid robot gait's planing method.

Background technology

One basic function of anthropomorphic robot is stabilized walking.However, the composed structure due to anthropomorphic robot is answered The features such as polygamy, coupled relation is strong, module independence is poor so that the function of the stabilized walking of anthropomorphic robot is than relatively difficult to achieve. Therefore, the gait control of anthropomorphic robot and planning problem also become the research hotspot in presently relevant field.Traditional gait Control method can be roughly divided into two classes：Method based on modern control theory and the method based on walking mechanism.However these Method is mostly more outmoded, is not suitable for the model mechanism of current even more complex.And nearest all kinds of machine learning methods is continuous It proposes and innovates, the development for also having encouraged dynamic gait to control.Compared to traditional control theory, the method based on machine learning A large amount of prioris in relation to complex model are not needed, and are easily achieved, can reach and compare favourably with traditional control theory Level.

Deeply learning method has proven to effective in complicated control problem.Pass through the side of study Formula solves the problems, such as that the designer of system is impercipient to system dynamics, these methods, which may provide, surmounts designer The perfect solution of ken.Meanwhile such method has continuous learning and improved ability, constantly study and Adapt to complex environment.

Invention content

The present invention mainly studies gait planning function of the anthropomorphic robot when complicated ground environment is walked, for existing Control theory cannot effectively solve the problems, such as to walk under complex environment, it is proposed that a kind of preview control based on depth enhancing study Humanoid robot gait's planing method can effectively solve walk problem of the anthropomorphic robot under complex environment, and flat in emulation It is tested on platform and tangible machine people, demonstrates the validity of the method.

To achieve the above object, technical solution provided by the present invention is：Preview control based on depth enhancing study is imitative Robot people's gait planning method, includes the following steps：

1) status information is obtained by the sensor being assemblied on anthropomorphic robot；

2) existing deeply learning network is improved, completely new state, action vector sum reward function are defined；

3) output of preview controller is modified using the action of definition vector, it is each calculates anthropomorphic robot both legs The angle of steering engine, instructs Humanoid Robot Based on Walking；

4) during Humanoid Robot Based on Walking, the depth with state, the value retrofit of action vector, reward function is strong Change learning network.

In step 1), status information, Humanoid Robot Based on Walking are obtained by the sensor being assemblied on anthropomorphic robot When degree of stability mainly the steering engine of pitch orientation influences on foot supported, therefore in defined status information, it should carry For the angle information of pitch-control motor in support leg information and support leg, in addition it is also necessary to the value of acceleration and angular speed, to sentence The stable case of disconnected Humanoid Robot Based on Walking process, then makes real-time adjustment, so as to adapt to injustice to offline gait again Whole terrain environment；

[α,ω,θ_lhip,θ_rhip,θ_lankle,θ_rankle]

Wherein, α indicates the acceleration square root sum square in x-axis and y-axis direction of anthropomorphic robot；ω indicates imitative The angular speed square root sum square in x-axis and y-axis direction of robot people；θ_lhip,θ_rhip,θ_lankle,θ_rankleIndicate apery The angle of steering engine on robot or so leg hip joint and ankle-joint pitch orientation.

In step 2), improved deeply learning network uses depth deterministic policy gradient method DDPG, specifically such as Under：

2.1) definition of deeply study correlated variables

By the method that deeply learns, the control output of preview controller is compensated, deeply is used Study, it is necessary first to define relevant variable, including state vector, action vector, reward function；

The output of preview controller control is bivector, corresponds to the output of barycenter x-axis direction and y-axis direction coordinate respectively Value, therefore the action definition of deeply learning network is：

Wherein, Δ μ_xWith Δ μ_yThe knots modification of each dimension output of preview controller is corresponded to respectively；

In view of the expectation to Humanoid Robot Based on Walking, it is desirable to which anthropomorphic robot can be kept in the case where the more walking the more remote Stablize, defining reward function is：

Wherein, if anthropomorphic robot can smoothly go to terminal, return value 50；If anthropomorphic robot is in walking It falls down in the process, then return value is -50；If other situations, then the current state of robot is referred to；

The quadratic sum r of acceleration_α(t) definition is：

Wherein, α_x(x) and α_y(t) respectively represent t moment anthropomorphic robot in the direction of the x axis with the acceleration on y-axis direction Value；

The quadratic sum r of angular speed_ω(t) definition is：

Wherein, ω_x(x) and ω_y(t) it is fast with the angle on y-axis direction in the direction of the x axis to respectively represent t moment anthropomorphic robot The value of degree；

X_dis represents the distance of apery machine walking；

2.2) structure of deeply learning network

When realizing DDPG, needs to build Actor networks respectively and Critic networks are used for training, Critic networks Effect be parametrization behavior memory function；The effect of Actor networks is the value guidance strategy obtained according to Critic networks The concrete structure of the update of function, Critic networks is：

Input layer：S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension；

Hidden layer：Hidden layer is 2 layers, and wherein first layer has 402 nodes, includes the node of 2 representatives action；The second layer For 300 nodes；The activation primitive of each neuron is line rectification activation primitive, its output is calculated using following formula：

y_i(t)=max (t, 0), i=1,2 ... n

Represent the output y of i-th of neuron_i(t) higher value in 0 and t is taken；

Output layer：Q (t) represents the output valve of strategic function, totally 1 dimension；

The concrete structure of Actor networks is：

Hidden layer：Hidden layer is 2 layers, and it is 300 nodes that wherein first layer, which has 400 nodes, the second layer,；Each neuron Activation primitive be line rectification activation primitive, calculate its output using following formula：

y_i(t)=max (t, 0), i=1,2 ... n

Output layer：A (t) represents the working value of output, totally 2 dimension；

Using BP algorithm and gradient descent method, Critic and Actor networks are updated, for the defeated of each neuron Go out weight w_i, there is following update formula：

Wherein, w_iFor i-th of weight,For learning rate, E is the learning performance index of two networks；

In step 3), the correction amount exported to preview controller using improved deeply learning network is repaiied Just, on the basis of revised preview controller, the angle of each steering engine of anthropomorphic robot both legs, guidance machine people are calculated Walking；Wherein, the theoretical emphasis of traditional preview controller is exactly to be controlled using following information, and Future Information refers to future N_pTarget ZMP reference values within step, if current point in time is k, then future N_pDouble-legged pose within step passes through three-dimensional walking Mode computation obtains, and then obtains N_pTarget ZMP reference values within step：ZMP^* _k+1,…,ZMP^* _k+Np；Then in these futures Target ZMP reference values are stored in fifo buffer, and output valve is buffered as current reference value, preview controller with FIFO The state computation control output of ZMP reference values and anthropomorphic robot in device, the formula for controlling output are：

Wherein, u_kIt is exported for k moment controllers；C, K_s, K_x,Device coefficient in order to control；For k when The anthropomorphic robot center-of-mass coordinate at quarter, [ZMP^* _k+1,…,ZMP^* _k+Np]^TFor the k+1 moment to k+N_pReference ZMP；

Go out the correction amount that preview controls output valve by the network training of depth enhancing study；

u′_k=u_k+Δu_k

After obtaining control input, the center-of-mass coordinate at k+1 moment is calculated；

Utilize the center-of-mass coordinate (x at k+1 moment_k+1,y_k+1), this can obtain the barycenter pose and left and right foot at k+1 moment Pose：

Wherein, G_cobpresent, G_lpresentAnd G_rpresentFor k+1 moment barycenter, the pose of left foot and right crus of diaphragm；Finally further according to Inverse kinematics principle calculates the steering engine angle of anthropomorphic robot both legs, obtains each joint steering engine of both legs at k+1 moment Angle knows Humanoid Robot Based on Walking with this.

Compared with prior art, the present invention having the following advantages that and advantageous effect：

1, this method uses the thinking of deeply study, accelerates receipts on the basis of existing preview control theory Hold back speed.

2, this method is simple and practicable, is capable of the walking movement of On-line Control anthropomorphic robot, adjusts the step of robot in due course State helps anthropomorphic robot to realize stabilized walking on the ground of out-of-flatness, has certain realistic meaning and application value.

Description of the drawings

Fig. 1 is Critic network structures.

Fig. 2 is Actor network structures.

Fig. 3 is preview control flow chart.

Fig. 4 is the preview control flow chart learnt based on deeply.

Fig. 5 is walking experiment effect figure.

Specific implementation mode

The present invention is further explained in the light of specific embodiments.

The preview based on depth enhancing study that the present embodiment is provided controls humanoid robot gait's planing method, tool Body situation is as follows：

1) acquisition of anthropomorphic robot state

Status information is obtained by the sensor being assemblied on anthropomorphic robot.Degree of stability when Humanoid Robot Based on Walking The steering engine of pitch orientation influences on main foot supported, therefore in defined status information, it should provide support leg information And in support leg pitch-control motor angle information.In addition it is also necessary to the value of acceleration and angular speed, to judge anthropomorphic robot The stable case of walking process.Then real-time adjustment is made to offline gait again, so as to adapt to the terrain environment of out-of-flatness.

[α,ω,θ_lhip,θ_rhip,θ_lankle,θ_rankle]

2.1) definition of deeply study correlated variables

Walking Mode generation method based on preview controller cannot be guaranteed what those were difficult to be described with this naive model The stability of movement.Complicated movement, for example, upper part of the body posture large-amplitude sloshing, arms swing, result in ZMP reference value and Actual value has larger discrepancy.Method therefore, it is necessary to learn by deeply exports the control of preview controller and carries out Compensation.Deeply learning method used by the present embodiment is the method (DDPG) of depth deterministic policy gradient.This method Advantage be can export it is continuous as a result, the performance under complex scene is more preferable than similar result.

To be learnt using deeply, it is necessary first to define relevant variable, including state vector, action vector, reward Function.The description of state, which has been described above in step 1), to be described, therefore is repeated no more.

The output of preview controller control is bivector, corresponds to the output of barycenter x-axis direction and y-axis direction coordinate respectively Value.Therefore the action definition of deeply learning network is：

Wherein, Δ μ_xWith Δ μ_yThe knots modification of each dimension output of preview controller is corresponded to respectively.

In view of the expectation to Humanoid Robot Based on Walking, it is intended that anthropomorphic robot can be in the case where the more walking the more remote It keeps stablizing, defining reward function is：

Wherein, if anthropomorphic robot can smoothly go to terminal, return value 50；If anthropomorphic robot is in walking It falls down in the process, then return value is -50；If other situations, then the current state of robot is referred to.

The quadratic sum r of acceleration_α(t) definition is：

Wherein, α_x(x) and α_y(t) respectively represented t moment anthropomorphic robot in the direction of the x axis with the acceleration on y-axis direction The value of degree.

The quadratic sum r of angular speed_ω(t) definition is

Wherein, ω_x(x) and ω_y(t) respectively represented t moment anthropomorphic robot in the direction of the x axis with the angle on y-axis direction The value of speed.

X_dis represents the distance of apery machine walking.

2.2) structure of deeply learning network

When realizing DDPG, needs to build Actor networks respectively and Critic networks are used for training.Critic networks Effect be parametrization behavior memory function；The effect of Actor networks is the value guidance strategy obtained according to Critic networks The update of function.As shown in Figure 1, the concrete structure of Critic networks is：

Hidden layer：Hidden layer is 2 layers, and wherein first layer has 402 nodes, including the node of 2 representatives action；The Two layers are 300 nodes.The activation primitive of each neuron is line rectification activation primitive, and it is defeated to calculate its using following formula Go out：

y_i(t)=max (t, 0), i=1,2 ... n

Represent the output y of i-th of neuron_i(t) higher value in 0 and t is taken.

Output layer：Q (t) represents the output valve of strategic function, totally 1 dimension.

As shown in Fig. 2, the concrete structure of Actor networks is：

Hidden layer：Hidden layer is 2 layers, and wherein first layer has 400 nodes, the second layer is 300 nodes.Each nerve The activation primitive of member is line rectification activation primitive, its output is calculated using following formula：

y_i(t)=max (t, 0), i=1,2 ... n

Represent the output y of i-th of neuron_i(t) higher value in 0 and t is taken.

Output layer：A (t) represents the working value of output, totally 2 dimension.

Wherein, w_iFor i-th of weight,For learning rate, E is the learning performance index of two networks.

3) correction amount, exported to preview controller using improved deeply learning network is modified, and is being corrected On the basis of preview controller afterwards, the angle of each steering engine of anthropomorphic robot both legs, guidance machine people walking are calculated.

The theoretical emphasis of traditional preview controller is exactly to be controlled using following information.Specific to the present embodiment In, Future Information refers to the following N_pTarget ZMP reference values within step.If current point in time is k, then future N_pMesh within step Mark ZMP reference values (ZMP^* _k+1,…,ZMP^* _k+Np).Then these Future targets ZMP reference values are stored in FIFO (first in first out) In buffer, output valve is as current reference value.ZMP reference values in preview controller fifo buffer and apery machine The state computation control output of device people.Controlling the formula exported is：

Wherein, u_kIt is exported for k moment controllers, c, K_s, K_x,Device coefficient in order to control,For k when The anthropomorphic robot center-of-mass coordinate at quarter, [ZMP^* _k+1,…,ZMP^* _k+Np]^TFor the k+1 moment to k+N_pReference ZMP.

Go out the correction amount u' that preview controls output valve by the network training of depth enhancing study_k。

u′_k=u_k+Δu_k

After obtaining control input, the center-of-mass coordinate at k+1 moment can be calculated.

Utilize the center-of-mass coordinate (x at k+1 moment_k+1,y_k+1).This can be obtained by the barycenter pose and left and right foot at k+1 moment Pose

Wherein G_cobpresent, G_lpresentAnd G_rpresentFor k+1 moment barycenter, the pose of left foot and right crus of diaphragm.Then further according to Inverse kinematics principle calculates the steering engine angle of anthropomorphic robot both legs, obtains each joint steering engine of both legs at k+1 moment Angle knows that Humanoid Robot Based on Walking, detailed process are shown in Figure 3 with this.

In anthropomorphic robot gait processes, for each output (u to preview controller_x,u_y), it is calculated and works as Preceding state learns one group of correction amount for output, the network of update deeply study using deeply study DDPG.Together The output of Shi Liyong preview controllers, calculates the walking posture of anthropomorphic robot.In conclusion algorithm steps are as follows, it is specifically shown in Shown in Fig. 4：

1. initializing deeply study DDPG frames and preview controller；

2. obtaining current state according to sensor information, one group is calculated about preview using deeply study DDPG The correction amount of controller；

3. the output quantity of preview controller is added in the output of preview controller, and according to output valve, in conjunction with inverse movement Principle is learned, the walking of anthropomorphic robot is instructed；

4. obtaining current system return value immediately, deeply learning framework is updated；

5. judge anthropomorphic robot current state, if anthropomorphic robot is with falling down or go to target, end loop； Otherwise it gos to step 2..

Wherein, the experiment walking effect of anthropomorphic robot is shown in Figure 5.

Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore Change made by all shapes according to the present invention, principle, should all cover within the scope of the present invention.

Claims

1. the preview based on depth enhancing study controls humanoid robot gait's planing method, which is characterized in that including following step Suddenly：

3) output of preview controller is modified using the action of definition vector, calculates each steering engine of anthropomorphic robot both legs Angle, instruct Humanoid Robot Based on Walking；

4) during Humanoid Robot Based on Walking, with state, the deeply of the value retrofit of action vector, reward function Practise network.

2. the preview according to claim 1 based on depth enhancing study controls humanoid robot gait's planing method, It is characterized in that：In step 1), status information, Humanoid Robot Based on Walking are obtained by the sensor being assemblied on anthropomorphic robot When degree of stability mainly the steering engine of pitch orientation influences on foot supported, therefore in defined status information, it should carry For the angle information of pitch-control motor in support leg information and support leg, in addition it is also necessary to the value of acceleration and angular speed, to sentence The stable case of disconnected Humanoid Robot Based on Walking process, then makes real-time adjustment, so as to adapt to injustice to offline gait again Whole terrain environment；

[α,ω,θ_lhip,θ_rhip,θ_lankle,θ_rankle]

Wherein, α indicates the acceleration square root sum square in x-axis and y-axis direction of anthropomorphic robot；ω indicates apery machine The angular speed square root sum square in x-axis and y-axis direction of device people；θ_lhip,θ_rhip,θ_lankle,θ_rankleIndicate apery machine The angle of steering engine on people or so leg hip joint and ankle-joint pitch orientation.

3. the preview according to claim 1 based on depth enhancing study controls humanoid robot gait's planing method, It is characterized in that：In step 2), improved deeply learning network uses depth deterministic policy gradient method DDPG, specifically such as Under：

2.1) definition of deeply study correlated variables

By the method that deeply learns, the control output of preview controller is compensated, to be learnt using deeply, Firstly the need of the relevant variable of definition, including state vector, action vector, reward function；

The output of preview controller control is bivector, corresponds to the output valve of barycenter x-axis direction and y-axis direction coordinate respectively, Therefore the action definition of deeply learning network is：

In view of the expectation to Humanoid Robot Based on Walking, it is desirable to which anthropomorphic robot can keep steady in the case where the more walking the more remote Fixed, defining reward function is：

Wherein, if anthropomorphic robot can smoothly go to terminal, return value 50；If anthropomorphic robot is in the process of walking In fall down, then return value be -50；If other situations, then the current state of robot is referred to；

Square root sum square r of acceleration_α(t) definition is：

Angular speed square root sum square r_ω(t) definition is：

Wherein, ω_x(x) and ω_y(t) respectively represent t moment anthropomorphic robot in the direction of the x axis with the angular speed on y-axis direction Value；

X_dis represents the distance of apery machine walking；

2.2) structure of deeply learning network

When realizing DDPG, needs to build Actor networks respectively and Critic networks are used for training, the work of Critic networks Be parametrization behavior memory function；The effect of Actor networks is the value guidance strategic function obtained according to Critic networks Update, the concrete structure of Critic networks is：

Hidden layer：Hidden layer is 2 layers, and wherein first layer has 402 nodes, includes the node of 2 representatives action；The second layer is 300 nodes；The activation primitive of each neuron is line rectification activation primitive, its output is calculated using following formula：

y_i(t)=max (t, 0), i=1,2 ... n

The concrete structure of Actor networks is：

Hidden layer：Hidden layer is 2 layers, and it is 300 nodes that wherein first layer, which has 400 nodes, the second layer,；Each neuron swashs Function living is line rectification activation primitive, its output is calculated using following formula：

y_i(t)=max (t, 0), i=1,2 ... n

Using BP algorithm and gradient descent method, Critic and Actor networks are updated, the output of each neuron is weighed Weight w_i, there is following update formula：

4. the preview according to claim 1 based on depth enhancing study controls humanoid robot gait's planing method, It is characterized in that：In step 3), the correction amount exported to preview controller using improved deeply learning network is repaiied Just, on the basis of revised preview controller, the angle of each steering engine of anthropomorphic robot both legs, guidance machine people are calculated Walking；Wherein, the theoretical emphasis of traditional preview controller is exactly to be controlled using following information, and Future Information refers to future N_pTarget ZMP reference values within step, if current point in time is k, then future N_pDouble-legged pose within step passes through three-dimensional walking Mode computation obtains, and then obtains N_pTarget ZMP reference values within step：ZMP^* _k+1,…,ZMP^* _k+Np；Then in these futures Target ZMP reference values are stored in fifo buffer, and output valve is buffered as current reference value, preview controller with FIFO The state computation control output of ZMP reference values and anthropomorphic robot in device, the formula for controlling output are：

Wherein, u_kIt is exported for k moment controllers；C, K_s, K_x,Device coefficient in order to control；For the k moment Anthropomorphic robot center-of-mass coordinate, [ZMP^* _k+1,…,ZMP^* _k+Np]^TFor the k+1 moment to k+N_pReference ZMP；

u′_k=u_k+Δu_k

Utilize the center-of-mass coordinate (x at k+1 moment_k+1,y_k+1), this can obtain the barycenter pose at k+1 moment and left and right foot pose：

Wherein, G_cobpresent, G_lpresentAnd G_rpresentFor k+1 moment barycenter, the pose of left foot and right crus of diaphragm；Finally further according to inverse fortune It is dynamic to learn principle, the steering engine angle of anthropomorphic robot both legs is calculated, both legs each joint steering engine angle at k+1 moment is obtained Degree knows Humanoid Robot Based on Walking with this.