CN108549237A - Preview based on depth enhancing study controls humanoid robot gait's planing method - Google Patents

Preview based on depth enhancing study controls humanoid robot gait's planing method Download PDF

Info

Publication number
CN108549237A
CN108549237A CN201810465382.3A CN201810465382A CN108549237A CN 108549237 A CN108549237 A CN 108549237A CN 201810465382 A CN201810465382 A CN 201810465382A CN 108549237 A CN108549237 A CN 108549237A
Authority
CN
China
Prior art keywords
output
walking
robot
moment
preview
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810465382.3A
Other languages
Chinese (zh)
Other versions
CN108549237B (en
Inventor
毕盛
刘云达
董敏
张英杰
闵华清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201810465382.3A priority Critical patent/CN108549237B/en
Publication of CN108549237A publication Critical patent/CN108549237A/en
Application granted granted Critical
Publication of CN108549237B publication Critical patent/CN108549237B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a kind of previews based on depth enhancing study to control humanoid robot gait's planing method, including step:1) status information is obtained by the sensor being assemblied on anthropomorphic robot;2) existing deeply learning network is improved, completely new state, action vector sum reward function are defined;3) output of preview controller is modified using the action of definition vector, calculates the angle of each steering engine of anthropomorphic robot both legs, instructs Humanoid Robot Based on Walking;4) during Humanoid Robot Based on Walking, with state, the deeply learning network of the value retrofit of action vector, reward function.The method of the present invention can effectively solve walk problem of the anthropomorphic robot under complex environment, and be tested on emulation platform and tangible machine people, demonstrate the validity of the method.

Description

Preview based on depth enhancing study controls humanoid robot gait's planing method
Technical field
The present invention relates to the technical fields of anthropomorphic robot, refer in particular to a kind of preview control based on depth enhancing study Humanoid robot gait's planing method.
Background technology
One basic function of anthropomorphic robot is stabilized walking.However, the composed structure due to anthropomorphic robot is answered The features such as polygamy, coupled relation is strong, module independence is poor so that the function of the stabilized walking of anthropomorphic robot is than relatively difficult to achieve. Therefore, the gait control of anthropomorphic robot and planning problem also become the research hotspot in presently relevant field.Traditional gait Control method can be roughly divided into two classes:Method based on modern control theory and the method based on walking mechanism.However these Method is mostly more outmoded, is not suitable for the model mechanism of current even more complex.And nearest all kinds of machine learning methods is continuous It proposes and innovates, the development for also having encouraged dynamic gait to control.Compared to traditional control theory, the method based on machine learning A large amount of prioris in relation to complex model are not needed, and are easily achieved, can reach and compare favourably with traditional control theory Level.
Deeply learning method has proven to effective in complicated control problem.Pass through the side of study Formula solves the problems, such as that the designer of system is impercipient to system dynamics, these methods, which may provide, surmounts designer The perfect solution of ken.Meanwhile such method has continuous learning and improved ability, constantly study and Adapt to complex environment.
Invention content
The present invention mainly studies gait planning function of the anthropomorphic robot when complicated ground environment is walked, for existing Control theory cannot effectively solve the problems, such as to walk under complex environment, it is proposed that a kind of preview control based on depth enhancing study Humanoid robot gait's planing method can effectively solve walk problem of the anthropomorphic robot under complex environment, and flat in emulation It is tested on platform and tangible machine people, demonstrates the validity of the method.
To achieve the above object, technical solution provided by the present invention is:Preview control based on depth enhancing study is imitative Robot people's gait planning method, includes the following steps:
1) status information is obtained by the sensor being assemblied on anthropomorphic robot;
2) existing deeply learning network is improved, completely new state, action vector sum reward function are defined;
3) output of preview controller is modified using the action of definition vector, it is each calculates anthropomorphic robot both legs The angle of steering engine, instructs Humanoid Robot Based on Walking;
4) during Humanoid Robot Based on Walking, the depth with state, the value retrofit of action vector, reward function is strong Change learning network.
In step 1), status information, Humanoid Robot Based on Walking are obtained by the sensor being assemblied on anthropomorphic robot When degree of stability mainly the steering engine of pitch orientation influences on foot supported, therefore in defined status information, it should carry For the angle information of pitch-control motor in support leg information and support leg, in addition it is also necessary to the value of acceleration and angular speed, to sentence The stable case of disconnected Humanoid Robot Based on Walking process, then makes real-time adjustment, so as to adapt to injustice to offline gait again Whole terrain environment;
[α,ω,θlhiprhiplanklerankle]
Wherein, α indicates the acceleration square root sum square in x-axis and y-axis direction of anthropomorphic robot;ω indicates imitative The angular speed square root sum square in x-axis and y-axis direction of robot people;θlhiprhiplanklerankleIndicate apery The angle of steering engine on robot or so leg hip joint and ankle-joint pitch orientation.
In step 2), improved deeply learning network uses depth deterministic policy gradient method DDPG, specifically such as Under:
2.1) definition of deeply study correlated variables
By the method that deeply learns, the control output of preview controller is compensated, deeply is used Study, it is necessary first to define relevant variable, including state vector, action vector, reward function;
The output of preview controller control is bivector, corresponds to the output of barycenter x-axis direction and y-axis direction coordinate respectively Value, therefore the action definition of deeply learning network is:
Wherein, Δ μxWith Δ μyThe knots modification of each dimension output of preview controller is corresponded to respectively;
In view of the expectation to Humanoid Robot Based on Walking, it is desirable to which anthropomorphic robot can be kept in the case where the more walking the more remote Stablize, defining reward function is:
Wherein, if anthropomorphic robot can smoothly go to terminal, return value 50;If anthropomorphic robot is in walking It falls down in the process, then return value is -50;If other situations, then the current state of robot is referred to;
The quadratic sum r of accelerationα(t) definition is:
Wherein, αx(x) and αy(t) respectively represent t moment anthropomorphic robot in the direction of the x axis with the acceleration on y-axis direction Value;
The quadratic sum r of angular speedω(t) definition is:
Wherein, ωx(x) and ωy(t) it is fast with the angle on y-axis direction in the direction of the x axis to respectively represent t moment anthropomorphic robot The value of degree;
X_dis represents the distance of apery machine walking;
2.2) structure of deeply learning network
When realizing DDPG, needs to build Actor networks respectively and Critic networks are used for training, Critic networks Effect be parametrization behavior memory function;The effect of Actor networks is the value guidance strategy obtained according to Critic networks The concrete structure of the update of function, Critic networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and wherein first layer has 402 nodes, includes the node of 2 representatives action;The second layer For 300 nodes;The activation primitive of each neuron is line rectification activation primitive, its output is calculated using following formula:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken;
Output layer:Q (t) represents the output valve of strategic function, totally 1 dimension;
The concrete structure of Actor networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and it is 300 nodes that wherein first layer, which has 400 nodes, the second layer,;Each neuron Activation primitive be line rectification activation primitive, calculate its output using following formula:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken;
Output layer:A (t) represents the working value of output, totally 2 dimension;
Using BP algorithm and gradient descent method, Critic and Actor networks are updated, for the defeated of each neuron Go out weight wi, there is following update formula:
Wherein, wiFor i-th of weight,For learning rate, E is the learning performance index of two networks;
In step 3), the correction amount exported to preview controller using improved deeply learning network is repaiied Just, on the basis of revised preview controller, the angle of each steering engine of anthropomorphic robot both legs, guidance machine people are calculated Walking;Wherein, the theoretical emphasis of traditional preview controller is exactly to be controlled using following information, and Future Information refers to future NpTarget ZMP reference values within step, if current point in time is k, then future NpDouble-legged pose within step passes through three-dimensional walking Mode computation obtains, and then obtains NpTarget ZMP reference values within step:ZMP* k+1,…,ZMP* k+Np;Then in these futures Target ZMP reference values are stored in fifo buffer, and output valve is buffered as current reference value, preview controller with FIFO The state computation control output of ZMP reference values and anthropomorphic robot in device, the formula for controlling output are:
Wherein, ukIt is exported for k moment controllers;C, Ks, Kx,Device coefficient in order to control;For k when The anthropomorphic robot center-of-mass coordinate at quarter, [ZMP* k+1,…,ZMP* k+Np]TFor the k+1 moment to k+NpReference ZMP;
Go out the correction amount that preview controls output valve by the network training of depth enhancing study;
u′k=uk+Δuk
After obtaining control input, the center-of-mass coordinate at k+1 moment is calculated;
Utilize the center-of-mass coordinate (x at k+1 momentk+1,yk+1), this can obtain the barycenter pose and left and right foot at k+1 moment Pose:
Wherein, Gcobpresent, GlpresentAnd GrpresentFor k+1 moment barycenter, the pose of left foot and right crus of diaphragm;Finally further according to Inverse kinematics principle calculates the steering engine angle of anthropomorphic robot both legs, obtains each joint steering engine of both legs at k+1 moment Angle knows Humanoid Robot Based on Walking with this.
Compared with prior art, the present invention having the following advantages that and advantageous effect:
1, this method uses the thinking of deeply study, accelerates receipts on the basis of existing preview control theory Hold back speed.
2, this method is simple and practicable, is capable of the walking movement of On-line Control anthropomorphic robot, adjusts the step of robot in due course State helps anthropomorphic robot to realize stabilized walking on the ground of out-of-flatness, has certain realistic meaning and application value.
Description of the drawings
Fig. 1 is Critic network structures.
Fig. 2 is Actor network structures.
Fig. 3 is preview control flow chart.
Fig. 4 is the preview control flow chart learnt based on deeply.
Fig. 5 is walking experiment effect figure.
Specific implementation mode
The present invention is further explained in the light of specific embodiments.
The preview based on depth enhancing study that the present embodiment is provided controls humanoid robot gait's planing method, tool Body situation is as follows:
1) acquisition of anthropomorphic robot state
Status information is obtained by the sensor being assemblied on anthropomorphic robot.Degree of stability when Humanoid Robot Based on Walking The steering engine of pitch orientation influences on main foot supported, therefore in defined status information, it should provide support leg information And in support leg pitch-control motor angle information.In addition it is also necessary to the value of acceleration and angular speed, to judge anthropomorphic robot The stable case of walking process.Then real-time adjustment is made to offline gait again, so as to adapt to the terrain environment of out-of-flatness.
[α,ω,θlhiprhiplanklerankle]
Wherein, α indicates the acceleration square root sum square in x-axis and y-axis direction of anthropomorphic robot;ω indicates imitative The angular speed square root sum square in x-axis and y-axis direction of robot people;θlhiprhiplanklerankleIndicate apery The angle of steering engine on robot or so leg hip joint and ankle-joint pitch orientation.
2.1) definition of deeply study correlated variables
Walking Mode generation method based on preview controller cannot be guaranteed what those were difficult to be described with this naive model The stability of movement.Complicated movement, for example, upper part of the body posture large-amplitude sloshing, arms swing, result in ZMP reference value and Actual value has larger discrepancy.Method therefore, it is necessary to learn by deeply exports the control of preview controller and carries out Compensation.Deeply learning method used by the present embodiment is the method (DDPG) of depth deterministic policy gradient.This method Advantage be can export it is continuous as a result, the performance under complex scene is more preferable than similar result.
To be learnt using deeply, it is necessary first to define relevant variable, including state vector, action vector, reward Function.The description of state, which has been described above in step 1), to be described, therefore is repeated no more.
The output of preview controller control is bivector, corresponds to the output of barycenter x-axis direction and y-axis direction coordinate respectively Value.Therefore the action definition of deeply learning network is:
Wherein, Δ μxWith Δ μyThe knots modification of each dimension output of preview controller is corresponded to respectively.
In view of the expectation to Humanoid Robot Based on Walking, it is intended that anthropomorphic robot can be in the case where the more walking the more remote It keeps stablizing, defining reward function is:
Wherein, if anthropomorphic robot can smoothly go to terminal, return value 50;If anthropomorphic robot is in walking It falls down in the process, then return value is -50;If other situations, then the current state of robot is referred to.
The quadratic sum r of accelerationα(t) definition is:
Wherein, αx(x) and αy(t) respectively represented t moment anthropomorphic robot in the direction of the x axis with the acceleration on y-axis direction The value of degree.
The quadratic sum r of angular speedω(t) definition is
Wherein, ωx(x) and ωy(t) respectively represented t moment anthropomorphic robot in the direction of the x axis with the angle on y-axis direction The value of speed.
X_dis represents the distance of apery machine walking.
2.2) structure of deeply learning network
When realizing DDPG, needs to build Actor networks respectively and Critic networks are used for training.Critic networks Effect be parametrization behavior memory function;The effect of Actor networks is the value guidance strategy obtained according to Critic networks The update of function.As shown in Figure 1, the concrete structure of Critic networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and wherein first layer has 402 nodes, including the node of 2 representatives action;The Two layers are 300 nodes.The activation primitive of each neuron is line rectification activation primitive, and it is defeated to calculate its using following formula Go out:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken.
Output layer:Q (t) represents the output valve of strategic function, totally 1 dimension.
As shown in Fig. 2, the concrete structure of Actor networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and wherein first layer has 400 nodes, the second layer is 300 nodes.Each nerve The activation primitive of member is line rectification activation primitive, its output is calculated using following formula:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken.
Output layer:A (t) represents the working value of output, totally 2 dimension.
Using BP algorithm and gradient descent method, Critic and Actor networks are updated, for the defeated of each neuron Go out weight wi, there is following update formula:
Wherein, wiFor i-th of weight,For learning rate, E is the learning performance index of two networks.
3) correction amount, exported to preview controller using improved deeply learning network is modified, and is being corrected On the basis of preview controller afterwards, the angle of each steering engine of anthropomorphic robot both legs, guidance machine people walking are calculated.
The theoretical emphasis of traditional preview controller is exactly to be controlled using following information.Specific to the present embodiment In, Future Information refers to the following NpTarget ZMP reference values within step.If current point in time is k, then future NpMesh within step Mark ZMP reference values (ZMP* k+1,…,ZMP* k+Np).Then these Future targets ZMP reference values are stored in FIFO (first in first out) In buffer, output valve is as current reference value.ZMP reference values in preview controller fifo buffer and apery machine The state computation control output of device people.Controlling the formula exported is:
Wherein, ukIt is exported for k moment controllers, c, Ks, Kx,Device coefficient in order to control,For k when The anthropomorphic robot center-of-mass coordinate at quarter, [ZMP* k+1,…,ZMP* k+Np]TFor the k+1 moment to k+NpReference ZMP.
Go out the correction amount u' that preview controls output valve by the network training of depth enhancing studyk
u′k=uk+Δuk
After obtaining control input, the center-of-mass coordinate at k+1 moment can be calculated.
Utilize the center-of-mass coordinate (x at k+1 momentk+1,yk+1).This can be obtained by the barycenter pose and left and right foot at k+1 moment Pose
Wherein Gcobpresent, GlpresentAnd GrpresentFor k+1 moment barycenter, the pose of left foot and right crus of diaphragm.Then further according to Inverse kinematics principle calculates the steering engine angle of anthropomorphic robot both legs, obtains each joint steering engine of both legs at k+1 moment Angle knows that Humanoid Robot Based on Walking, detailed process are shown in Figure 3 with this.
In anthropomorphic robot gait processes, for each output (u to preview controllerx,uy), it is calculated and works as Preceding state learns one group of correction amount for output, the network of update deeply study using deeply study DDPG.Together The output of Shi Liyong preview controllers, calculates the walking posture of anthropomorphic robot.In conclusion algorithm steps are as follows, it is specifically shown in Shown in Fig. 4:
1. initializing deeply study DDPG frames and preview controller;
2. obtaining current state according to sensor information, one group is calculated about preview using deeply study DDPG The correction amount of controller;
3. the output quantity of preview controller is added in the output of preview controller, and according to output valve, in conjunction with inverse movement Principle is learned, the walking of anthropomorphic robot is instructed;
4. obtaining current system return value immediately, deeply learning framework is updated;
5. judge anthropomorphic robot current state, if anthropomorphic robot is with falling down or go to target, end loop; Otherwise it gos to step 2..
Wherein, the experiment walking effect of anthropomorphic robot is shown in Figure 5.
Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore Change made by all shapes according to the present invention, principle, should all cover within the scope of the present invention.

Claims (4)

1. the preview based on depth enhancing study controls humanoid robot gait's planing method, which is characterized in that including following step Suddenly:
1) status information is obtained by the sensor being assemblied on anthropomorphic robot;
2) existing deeply learning network is improved, completely new state, action vector sum reward function are defined;
3) output of preview controller is modified using the action of definition vector, calculates each steering engine of anthropomorphic robot both legs Angle, instruct Humanoid Robot Based on Walking;
4) during Humanoid Robot Based on Walking, with state, the deeply of the value retrofit of action vector, reward function Practise network.
2. the preview according to claim 1 based on depth enhancing study controls humanoid robot gait's planing method, It is characterized in that:In step 1), status information, Humanoid Robot Based on Walking are obtained by the sensor being assemblied on anthropomorphic robot When degree of stability mainly the steering engine of pitch orientation influences on foot supported, therefore in defined status information, it should carry For the angle information of pitch-control motor in support leg information and support leg, in addition it is also necessary to the value of acceleration and angular speed, to sentence The stable case of disconnected Humanoid Robot Based on Walking process, then makes real-time adjustment, so as to adapt to injustice to offline gait again Whole terrain environment;
[α,ω,θlhiprhiplanklerankle]
Wherein, α indicates the acceleration square root sum square in x-axis and y-axis direction of anthropomorphic robot;ω indicates apery machine The angular speed square root sum square in x-axis and y-axis direction of device people;θlhiprhiplanklerankleIndicate apery machine The angle of steering engine on people or so leg hip joint and ankle-joint pitch orientation.
3. the preview according to claim 1 based on depth enhancing study controls humanoid robot gait's planing method, It is characterized in that:In step 2), improved deeply learning network uses depth deterministic policy gradient method DDPG, specifically such as Under:
2.1) definition of deeply study correlated variables
By the method that deeply learns, the control output of preview controller is compensated, to be learnt using deeply, Firstly the need of the relevant variable of definition, including state vector, action vector, reward function;
The output of preview controller control is bivector, corresponds to the output valve of barycenter x-axis direction and y-axis direction coordinate respectively, Therefore the action definition of deeply learning network is:
Wherein, Δ μxWith Δ μyThe knots modification of each dimension output of preview controller is corresponded to respectively;
In view of the expectation to Humanoid Robot Based on Walking, it is desirable to which anthropomorphic robot can keep steady in the case where the more walking the more remote Fixed, defining reward function is:
Wherein, if anthropomorphic robot can smoothly go to terminal, return value 50;If anthropomorphic robot is in the process of walking In fall down, then return value be -50;If other situations, then the current state of robot is referred to;
Square root sum square r of accelerationα(t) definition is:
Wherein, αx(x) and αy(t) respectively represent t moment anthropomorphic robot in the direction of the x axis with the acceleration on y-axis direction Value;
Angular speed square root sum square rω(t) definition is:
Wherein, ωx(x) and ωy(t) respectively represent t moment anthropomorphic robot in the direction of the x axis with the angular speed on y-axis direction Value;
X_dis represents the distance of apery machine walking;
2.2) structure of deeply learning network
When realizing DDPG, needs to build Actor networks respectively and Critic networks are used for training, the work of Critic networks Be parametrization behavior memory function;The effect of Actor networks is the value guidance strategic function obtained according to Critic networks Update, the concrete structure of Critic networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and wherein first layer has 402 nodes, includes the node of 2 representatives action;The second layer is 300 nodes;The activation primitive of each neuron is line rectification activation primitive, its output is calculated using following formula:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken;
Output layer:Q (t) represents the output valve of strategic function, totally 1 dimension;
The concrete structure of Actor networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and it is 300 nodes that wherein first layer, which has 400 nodes, the second layer,;Each neuron swashs Function living is line rectification activation primitive, its output is calculated using following formula:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken;
Output layer:A (t) represents the working value of output, totally 2 dimension;
Using BP algorithm and gradient descent method, Critic and Actor networks are updated, the output of each neuron is weighed Weight wi, there is following update formula:
Wherein, wiFor i-th of weight,For learning rate, E is the learning performance index of two networks.
4. the preview according to claim 1 based on depth enhancing study controls humanoid robot gait's planing method, It is characterized in that:In step 3), the correction amount exported to preview controller using improved deeply learning network is repaiied Just, on the basis of revised preview controller, the angle of each steering engine of anthropomorphic robot both legs, guidance machine people are calculated Walking;Wherein, the theoretical emphasis of traditional preview controller is exactly to be controlled using following information, and Future Information refers to future NpTarget ZMP reference values within step, if current point in time is k, then future NpDouble-legged pose within step passes through three-dimensional walking Mode computation obtains, and then obtains NpTarget ZMP reference values within step:ZMP* k+1,…,ZMP* k+Np;Then in these futures Target ZMP reference values are stored in fifo buffer, and output valve is buffered as current reference value, preview controller with FIFO The state computation control output of ZMP reference values and anthropomorphic robot in device, the formula for controlling output are:
Wherein, ukIt is exported for k moment controllers;C, Ks, Kx,Device coefficient in order to control;For the k moment Anthropomorphic robot center-of-mass coordinate, [ZMP* k+1,…,ZMP* k+Np]TFor the k+1 moment to k+NpReference ZMP;
Go out the correction amount that preview controls output valve by the network training of depth enhancing study;
u′k=uk+Δuk
After obtaining control input, the center-of-mass coordinate at k+1 moment is calculated;
Utilize the center-of-mass coordinate (x at k+1 momentk+1,yk+1), this can obtain the barycenter pose at k+1 moment and left and right foot pose:
Wherein, Gcobpresent, GlpresentAnd GrpresentFor k+1 moment barycenter, the pose of left foot and right crus of diaphragm;Finally further according to inverse fortune It is dynamic to learn principle, the steering engine angle of anthropomorphic robot both legs is calculated, both legs each joint steering engine angle at k+1 moment is obtained Degree knows Humanoid Robot Based on Walking with this.
CN201810465382.3A 2018-05-16 2018-05-16 Preset control humanoid robot gait planning method based on deep reinforcement learning Expired - Fee Related CN108549237B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810465382.3A CN108549237B (en) 2018-05-16 2018-05-16 Preset control humanoid robot gait planning method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810465382.3A CN108549237B (en) 2018-05-16 2018-05-16 Preset control humanoid robot gait planning method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN108549237A true CN108549237A (en) 2018-09-18
CN108549237B CN108549237B (en) 2020-04-28

Family

ID=63495020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810465382.3A Expired - Fee Related CN108549237B (en) 2018-05-16 2018-05-16 Preset control humanoid robot gait planning method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN108549237B (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109483530A (en) * 2018-10-18 2019-03-19 北京控制工程研究所 A kind of legged type robot motion control method and system based on deeply study
CN109709967A (en) * 2019-01-22 2019-05-03 深圳市幻尔科技有限公司 The implementation method for the dynamic gait that the low operation of robot requires
CN109719721A (en) * 2018-12-26 2019-05-07 北京化工大学 A kind of autonomous emergence of imitative snake search and rescue robot adaptability gait
CN109871892A (en) * 2019-02-18 2019-06-11 华南理工大学 A kind of robot vision cognitive system based on small sample metric learning
CN110238839A (en) * 2019-04-11 2019-09-17 清华大学 It is a kind of to optimize non-molding machine people multi peg-in-hole control method using environmental forecasting
CN110308727A (en) * 2019-07-12 2019-10-08 沈阳城市学院 A kind of control method for eliminating biped robot's upper body posture shaking
CN110496377A (en) * 2019-08-19 2019-11-26 华南理工大学 A kind of virtual table tennis forehand hit training method based on intensified learning
CN110562301A (en) * 2019-08-16 2019-12-13 北京交通大学 Subway train energy-saving driving curve calculation method based on Q learning
CN110764415A (en) * 2019-10-31 2020-02-07 清华大学深圳国际研究生院 Gait planning method for leg movement of quadruped robot
CN110764416A (en) * 2019-11-11 2020-02-07 河海大学 Humanoid robot gait optimization control method based on deep Q network
CN110909859A (en) * 2019-11-29 2020-03-24 中国科学院自动化研究所 Bionic robot fish motion control method and system based on antagonistic structured control
CN111027143A (en) * 2019-12-18 2020-04-17 四川大学 Shipboard aircraft approach guiding method based on deep reinforcement learning
CN111142378A (en) * 2020-01-07 2020-05-12 四川省桑瑞光辉标识系统股份有限公司 Neural network optimization method of biped robot neural network controller
CN111191399A (en) * 2019-12-24 2020-05-22 北京航空航天大学 Control method, device and equipment of robot fish and storage medium
CN111360834A (en) * 2020-03-25 2020-07-03 中南大学 Humanoid robot motion control method and system based on deep reinforcement learning
CN112162554A (en) * 2020-09-23 2021-01-01 吉林大学 Data storage and backtracking platform for N3 sweeper
CN112666939A (en) * 2020-12-09 2021-04-16 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN112782973A (en) * 2019-11-07 2021-05-11 四川省桑瑞光辉标识系统股份有限公司 Biped robot walking control method and system based on double-agent cooperative game
CN113031528A (en) * 2021-02-25 2021-06-25 电子科技大学 Multi-legged robot motion control method based on depth certainty strategy gradient
CN113156892A (en) * 2021-04-16 2021-07-23 西湖大学 Four-footed robot simulated motion control method based on deep reinforcement learning
CN113627584A (en) * 2020-05-08 2021-11-09 南京大学 Neural network-based inverse kinematics solving method for mechanical arm, electronic equipment and storage medium
CN117062280A (en) * 2023-08-17 2023-11-14 北京美中爱瑞肿瘤医院有限责任公司 Automatic following system of neurosurgery self-service operating lamp
CN117565023A (en) * 2022-12-30 2024-02-20 爱布(上海)人工智能科技有限公司 Muscle movement sensing system for grasping walking intention and implementation method thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1393866A1 (en) * 2001-06-07 2004-03-03 Japan Science and Technology Corporation Apparatus walking with two legs; walking control apparatus; and walking control method thereof
CN104217107A (en) * 2014-08-27 2014-12-17 华南理工大学 Method for detecting tumbling state of humanoid robot based on multi-sensor information
CN106094817A (en) * 2016-06-14 2016-11-09 华南理工大学 Intensified learning humanoid robot gait's planing method based on big data mode
CN106584460A (en) * 2016-12-16 2017-04-26 浙江大学 Vibration suppression method in walking of humanoid robot
CN106842925A (en) * 2017-01-20 2017-06-13 清华大学 A kind of locomotive smart steering method and system based on deeply study
CN107944476A (en) * 2017-11-10 2018-04-20 大连理工大学 A kind of yellow peach stoning machine device people's behaviour control method based on deeply study

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1393866A1 (en) * 2001-06-07 2004-03-03 Japan Science and Technology Corporation Apparatus walking with two legs; walking control apparatus; and walking control method thereof
CN104217107A (en) * 2014-08-27 2014-12-17 华南理工大学 Method for detecting tumbling state of humanoid robot based on multi-sensor information
CN106094817A (en) * 2016-06-14 2016-11-09 华南理工大学 Intensified learning humanoid robot gait's planing method based on big data mode
CN106584460A (en) * 2016-12-16 2017-04-26 浙江大学 Vibration suppression method in walking of humanoid robot
CN106842925A (en) * 2017-01-20 2017-06-13 清华大学 A kind of locomotive smart steering method and system based on deeply study
CN107944476A (en) * 2017-11-10 2018-04-20 大连理工大学 A kind of yellow peach stoning machine device people's behaviour control method based on deeply study

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马琼雄等: "基于深度强化学习的水下机器人最优轨迹控制", 《华南师范大学学报(自然科学版)》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109483530A (en) * 2018-10-18 2019-03-19 北京控制工程研究所 A kind of legged type robot motion control method and system based on deeply study
CN109719721B (en) * 2018-12-26 2020-07-24 北京化工大学 Adaptive gait autonomous emerging method of snake-like search and rescue robot
CN109719721A (en) * 2018-12-26 2019-05-07 北京化工大学 A kind of autonomous emergence of imitative snake search and rescue robot adaptability gait
CN109709967A (en) * 2019-01-22 2019-05-03 深圳市幻尔科技有限公司 The implementation method for the dynamic gait that the low operation of robot requires
CN109871892A (en) * 2019-02-18 2019-06-11 华南理工大学 A kind of robot vision cognitive system based on small sample metric learning
CN110238839A (en) * 2019-04-11 2019-09-17 清华大学 It is a kind of to optimize non-molding machine people multi peg-in-hole control method using environmental forecasting
CN110238839B (en) * 2019-04-11 2020-10-20 清华大学 Multi-shaft-hole assembly control method for optimizing non-model robot by utilizing environment prediction
CN110308727A (en) * 2019-07-12 2019-10-08 沈阳城市学院 A kind of control method for eliminating biped robot's upper body posture shaking
CN110562301A (en) * 2019-08-16 2019-12-13 北京交通大学 Subway train energy-saving driving curve calculation method based on Q learning
CN110496377A (en) * 2019-08-19 2019-11-26 华南理工大学 A kind of virtual table tennis forehand hit training method based on intensified learning
CN110496377B (en) * 2019-08-19 2020-07-28 华南理工大学 Virtual table tennis player ball hitting training method based on reinforcement learning
CN110764415A (en) * 2019-10-31 2020-02-07 清华大学深圳国际研究生院 Gait planning method for leg movement of quadruped robot
CN110764415B (en) * 2019-10-31 2022-04-15 清华大学深圳国际研究生院 Gait planning method for leg movement of quadruped robot
CN112782973A (en) * 2019-11-07 2021-05-11 四川省桑瑞光辉标识系统股份有限公司 Biped robot walking control method and system based on double-agent cooperative game
CN110764416A (en) * 2019-11-11 2020-02-07 河海大学 Humanoid robot gait optimization control method based on deep Q network
CN110909859A (en) * 2019-11-29 2020-03-24 中国科学院自动化研究所 Bionic robot fish motion control method and system based on antagonistic structured control
CN111027143A (en) * 2019-12-18 2020-04-17 四川大学 Shipboard aircraft approach guiding method based on deep reinforcement learning
CN111191399A (en) * 2019-12-24 2020-05-22 北京航空航天大学 Control method, device and equipment of robot fish and storage medium
CN111191399B (en) * 2019-12-24 2021-11-05 北京航空航天大学 Control method, device and equipment of robot fish and storage medium
CN111142378A (en) * 2020-01-07 2020-05-12 四川省桑瑞光辉标识系统股份有限公司 Neural network optimization method of biped robot neural network controller
CN111360834A (en) * 2020-03-25 2020-07-03 中南大学 Humanoid robot motion control method and system based on deep reinforcement learning
CN113627584B (en) * 2020-05-08 2024-04-12 南京大学 Mechanical arm inverse kinematics solving method based on neural network, electronic equipment and storage medium
CN113627584A (en) * 2020-05-08 2021-11-09 南京大学 Neural network-based inverse kinematics solving method for mechanical arm, electronic equipment and storage medium
CN112162554B (en) * 2020-09-23 2021-10-01 吉林大学 Data storage and backtracking platform for N3 sweeper
CN112162554A (en) * 2020-09-23 2021-01-01 吉林大学 Data storage and backtracking platform for N3 sweeper
CN112666939A (en) * 2020-12-09 2021-04-16 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN112666939B (en) * 2020-12-09 2021-09-10 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN113031528B (en) * 2021-02-25 2022-03-15 电子科技大学 Multi-legged robot non-structural ground motion control method based on depth certainty strategy gradient
CN113031528A (en) * 2021-02-25 2021-06-25 电子科技大学 Multi-legged robot motion control method based on depth certainty strategy gradient
CN113156892A (en) * 2021-04-16 2021-07-23 西湖大学 Four-footed robot simulated motion control method based on deep reinforcement learning
CN117565023A (en) * 2022-12-30 2024-02-20 爱布(上海)人工智能科技有限公司 Muscle movement sensing system for grasping walking intention and implementation method thereof
CN117565023B (en) * 2022-12-30 2024-05-17 爱布(上海)人工智能科技有限公司 Muscle movement sensing system for grasping walking intention and implementation method thereof
CN117062280A (en) * 2023-08-17 2023-11-14 北京美中爱瑞肿瘤医院有限责任公司 Automatic following system of neurosurgery self-service operating lamp
CN117062280B (en) * 2023-08-17 2024-03-08 北京美中爱瑞肿瘤医院有限责任公司 Automatic following system of neurosurgery self-service operating lamp

Also Published As

Publication number Publication date
CN108549237B (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN108549237A (en) Preview based on depth enhancing study controls humanoid robot gait's planing method
CN104932264B (en) The apery robot stabilized control method of Q learning frameworks based on RBF networks
CN109483530B (en) Foot type robot motion control method and system based on deep reinforcement learning
Miura et al. Human-like walking with toe supporting for humanoids
CN109991979B (en) Lower limb robot anthropomorphic gait planning method oriented to complex environment
Yi et al. Online learning of a full body push recovery controller for omnidirectional walking
US8428780B2 (en) External force target generating device of legged mobile robot
US8442680B2 (en) Motion state evaluation apparatus of legged mobile robot
WO2019218805A1 (en) Motion closed-loop control method for quadruped robot
CN109760761B (en) Four-footed robot motion control method based on bionics principle and intuition
CN104898672B (en) A kind of optimal control method of Humanoid Robot Based on Walking track
US8396593B2 (en) Gait generating device of legged mobile robot
CN106094817B (en) Intensified learning humanoid robot gait's planing method based on big data mode
JP6781101B2 (en) Non-linear system control method, biped robot control device, biped robot control method and its program
CN103750927B (en) Artificial leg knee joint adaptive iterative learning control method
CN106019950A (en) Mobile phone satellite self-adaptive attitude control method
CN114397810A (en) Four-legged robot motion control method based on adaptive virtual model control
US20110213498A1 (en) Desired motion evaluation apparatus of legged mobile robot
CN113568422B (en) Four-foot robot control method based on model predictive control optimization reinforcement learning
Dong et al. On-line gait adjustment for humanoid robot robust walking based on divergence component of motion
CN109857146B (en) Layered unmanned aerial vehicle tracking control method based on feedforward and weight distribution
CN116237943A (en) Four-foot robot control method combined with terrain constraint
CN104793621A (en) Muscular viscoelastic behavior imitated humanoid robot walking stability control method
Xie et al. Online whole-stage gait planning method for biped robots based on improved Variable Spring-Loaded Inverted Pendulum with Finite-sized Foot (VSLIP-FF) model
Kim et al. A model predictive capture point control framework for robust humanoid balancing via ankle, hip, and stepping strategies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200428

CF01 Termination of patent right due to non-payment of annual fee