CN108549237A - Preview based on depth enhancing study controls humanoid robot gait's planing method - Google Patents
Preview based on depth enhancing study controls humanoid robot gait's planing method Download PDFInfo
- Publication number
- CN108549237A CN108549237A CN201810465382.3A CN201810465382A CN108549237A CN 108549237 A CN108549237 A CN 108549237A CN 201810465382 A CN201810465382 A CN 201810465382A CN 108549237 A CN108549237 A CN 108549237A
- Authority
- CN
- China
- Prior art keywords
- output
- walking
- robot
- moment
- preview
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000005021 gait Effects 0.000 title claims abstract description 19
- 230000002708 enhancing effect Effects 0.000 title claims abstract description 14
- 230000006870 function Effects 0.000 claims abstract description 28
- 230000009471 action Effects 0.000 claims abstract description 18
- 210000002414 leg Anatomy 0.000 claims description 17
- 230000004913 activation Effects 0.000 claims description 11
- 230000001133 acceleration Effects 0.000 claims description 9
- 210000002683 foot Anatomy 0.000 claims description 9
- 238000012937 correction Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 8
- 210000002569 neuron Anatomy 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 6
- 210000000544 articulatio talocruralis Anatomy 0.000 claims description 3
- 230000006399 behavior Effects 0.000 claims description 3
- 230000001276 controlling effect Effects 0.000 claims description 3
- 230000002596 correlated effect Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 210000004394 hip joint Anatomy 0.000 claims description 3
- 230000007786 learning performance Effects 0.000 claims description 3
- 210000001699 lower leg Anatomy 0.000 claims description 3
- 230000006386 memory function Effects 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000010181 polygamy Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Manipulator (AREA)
Abstract
The invention discloses a kind of previews based on depth enhancing study to control humanoid robot gait's planing method, including step:1) status information is obtained by the sensor being assemblied on anthropomorphic robot;2) existing deeply learning network is improved, completely new state, action vector sum reward function are defined;3) output of preview controller is modified using the action of definition vector, calculates the angle of each steering engine of anthropomorphic robot both legs, instructs Humanoid Robot Based on Walking;4) during Humanoid Robot Based on Walking, with state, the deeply learning network of the value retrofit of action vector, reward function.The method of the present invention can effectively solve walk problem of the anthropomorphic robot under complex environment, and be tested on emulation platform and tangible machine people, demonstrate the validity of the method.
Description
Technical field
The present invention relates to the technical fields of anthropomorphic robot, refer in particular to a kind of preview control based on depth enhancing study
Humanoid robot gait's planing method.
Background technology
One basic function of anthropomorphic robot is stabilized walking.However, the composed structure due to anthropomorphic robot is answered
The features such as polygamy, coupled relation is strong, module independence is poor so that the function of the stabilized walking of anthropomorphic robot is than relatively difficult to achieve.
Therefore, the gait control of anthropomorphic robot and planning problem also become the research hotspot in presently relevant field.Traditional gait
Control method can be roughly divided into two classes:Method based on modern control theory and the method based on walking mechanism.However these
Method is mostly more outmoded, is not suitable for the model mechanism of current even more complex.And nearest all kinds of machine learning methods is continuous
It proposes and innovates, the development for also having encouraged dynamic gait to control.Compared to traditional control theory, the method based on machine learning
A large amount of prioris in relation to complex model are not needed, and are easily achieved, can reach and compare favourably with traditional control theory
Level.
Deeply learning method has proven to effective in complicated control problem.Pass through the side of study
Formula solves the problems, such as that the designer of system is impercipient to system dynamics, these methods, which may provide, surmounts designer
The perfect solution of ken.Meanwhile such method has continuous learning and improved ability, constantly study and
Adapt to complex environment.
Invention content
The present invention mainly studies gait planning function of the anthropomorphic robot when complicated ground environment is walked, for existing
Control theory cannot effectively solve the problems, such as to walk under complex environment, it is proposed that a kind of preview control based on depth enhancing study
Humanoid robot gait's planing method can effectively solve walk problem of the anthropomorphic robot under complex environment, and flat in emulation
It is tested on platform and tangible machine people, demonstrates the validity of the method.
To achieve the above object, technical solution provided by the present invention is:Preview control based on depth enhancing study is imitative
Robot people's gait planning method, includes the following steps:
1) status information is obtained by the sensor being assemblied on anthropomorphic robot;
2) existing deeply learning network is improved, completely new state, action vector sum reward function are defined;
3) output of preview controller is modified using the action of definition vector, it is each calculates anthropomorphic robot both legs
The angle of steering engine, instructs Humanoid Robot Based on Walking;
4) during Humanoid Robot Based on Walking, the depth with state, the value retrofit of action vector, reward function is strong
Change learning network.
In step 1), status information, Humanoid Robot Based on Walking are obtained by the sensor being assemblied on anthropomorphic robot
When degree of stability mainly the steering engine of pitch orientation influences on foot supported, therefore in defined status information, it should carry
For the angle information of pitch-control motor in support leg information and support leg, in addition it is also necessary to the value of acceleration and angular speed, to sentence
The stable case of disconnected Humanoid Robot Based on Walking process, then makes real-time adjustment, so as to adapt to injustice to offline gait again
Whole terrain environment;
[α,ω,θlhip,θrhip,θlankle,θrankle]
Wherein, α indicates the acceleration square root sum square in x-axis and y-axis direction of anthropomorphic robot;ω indicates imitative
The angular speed square root sum square in x-axis and y-axis direction of robot people;θlhip,θrhip,θlankle,θrankleIndicate apery
The angle of steering engine on robot or so leg hip joint and ankle-joint pitch orientation.
In step 2), improved deeply learning network uses depth deterministic policy gradient method DDPG, specifically such as
Under:
2.1) definition of deeply study correlated variables
By the method that deeply learns, the control output of preview controller is compensated, deeply is used
Study, it is necessary first to define relevant variable, including state vector, action vector, reward function;
The output of preview controller control is bivector, corresponds to the output of barycenter x-axis direction and y-axis direction coordinate respectively
Value, therefore the action definition of deeply learning network is:
Wherein, Δ μxWith Δ μyThe knots modification of each dimension output of preview controller is corresponded to respectively;
In view of the expectation to Humanoid Robot Based on Walking, it is desirable to which anthropomorphic robot can be kept in the case where the more walking the more remote
Stablize, defining reward function is:
Wherein, if anthropomorphic robot can smoothly go to terminal, return value 50;If anthropomorphic robot is in walking
It falls down in the process, then return value is -50;If other situations, then the current state of robot is referred to;
The quadratic sum r of accelerationα(t) definition is:
Wherein, αx(x) and αy(t) respectively represent t moment anthropomorphic robot in the direction of the x axis with the acceleration on y-axis direction
Value;
The quadratic sum r of angular speedω(t) definition is:
Wherein, ωx(x) and ωy(t) it is fast with the angle on y-axis direction in the direction of the x axis to respectively represent t moment anthropomorphic robot
The value of degree;
X_dis represents the distance of apery machine walking;
2.2) structure of deeply learning network
When realizing DDPG, needs to build Actor networks respectively and Critic networks are used for training, Critic networks
Effect be parametrization behavior memory function;The effect of Actor networks is the value guidance strategy obtained according to Critic networks
The concrete structure of the update of function, Critic networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and wherein first layer has 402 nodes, includes the node of 2 representatives action;The second layer
For 300 nodes;The activation primitive of each neuron is line rectification activation primitive, its output is calculated using following formula:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken;
Output layer:Q (t) represents the output valve of strategic function, totally 1 dimension;
The concrete structure of Actor networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and it is 300 nodes that wherein first layer, which has 400 nodes, the second layer,;Each neuron
Activation primitive be line rectification activation primitive, calculate its output using following formula:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken;
Output layer:A (t) represents the working value of output, totally 2 dimension;
Using BP algorithm and gradient descent method, Critic and Actor networks are updated, for the defeated of each neuron
Go out weight wi, there is following update formula:
Wherein, wiFor i-th of weight,For learning rate, E is the learning performance index of two networks;
In step 3), the correction amount exported to preview controller using improved deeply learning network is repaiied
Just, on the basis of revised preview controller, the angle of each steering engine of anthropomorphic robot both legs, guidance machine people are calculated
Walking;Wherein, the theoretical emphasis of traditional preview controller is exactly to be controlled using following information, and Future Information refers to future
NpTarget ZMP reference values within step, if current point in time is k, then future NpDouble-legged pose within step passes through three-dimensional walking
Mode computation obtains, and then obtains NpTarget ZMP reference values within step:ZMP* k+1,…,ZMP* k+Np;Then in these futures
Target ZMP reference values are stored in fifo buffer, and output valve is buffered as current reference value, preview controller with FIFO
The state computation control output of ZMP reference values and anthropomorphic robot in device, the formula for controlling output are:
Wherein, ukIt is exported for k moment controllers;C, Ks, Kx,Device coefficient in order to control;For k when
The anthropomorphic robot center-of-mass coordinate at quarter, [ZMP* k+1,…,ZMP* k+Np]TFor the k+1 moment to k+NpReference ZMP;
Go out the correction amount that preview controls output valve by the network training of depth enhancing study;
u′k=uk+Δuk
After obtaining control input, the center-of-mass coordinate at k+1 moment is calculated;
Utilize the center-of-mass coordinate (x at k+1 momentk+1,yk+1), this can obtain the barycenter pose and left and right foot at k+1 moment
Pose:
Wherein, Gcobpresent, GlpresentAnd GrpresentFor k+1 moment barycenter, the pose of left foot and right crus of diaphragm;Finally further according to
Inverse kinematics principle calculates the steering engine angle of anthropomorphic robot both legs, obtains each joint steering engine of both legs at k+1 moment
Angle knows Humanoid Robot Based on Walking with this.
Compared with prior art, the present invention having the following advantages that and advantageous effect:
1, this method uses the thinking of deeply study, accelerates receipts on the basis of existing preview control theory
Hold back speed.
2, this method is simple and practicable, is capable of the walking movement of On-line Control anthropomorphic robot, adjusts the step of robot in due course
State helps anthropomorphic robot to realize stabilized walking on the ground of out-of-flatness, has certain realistic meaning and application value.
Description of the drawings
Fig. 1 is Critic network structures.
Fig. 2 is Actor network structures.
Fig. 3 is preview control flow chart.
Fig. 4 is the preview control flow chart learnt based on deeply.
Fig. 5 is walking experiment effect figure.
Specific implementation mode
The present invention is further explained in the light of specific embodiments.
The preview based on depth enhancing study that the present embodiment is provided controls humanoid robot gait's planing method, tool
Body situation is as follows:
1) acquisition of anthropomorphic robot state
Status information is obtained by the sensor being assemblied on anthropomorphic robot.Degree of stability when Humanoid Robot Based on Walking
The steering engine of pitch orientation influences on main foot supported, therefore in defined status information, it should provide support leg information
And in support leg pitch-control motor angle information.In addition it is also necessary to the value of acceleration and angular speed, to judge anthropomorphic robot
The stable case of walking process.Then real-time adjustment is made to offline gait again, so as to adapt to the terrain environment of out-of-flatness.
[α,ω,θlhip,θrhip,θlankle,θrankle]
Wherein, α indicates the acceleration square root sum square in x-axis and y-axis direction of anthropomorphic robot;ω indicates imitative
The angular speed square root sum square in x-axis and y-axis direction of robot people;θlhip,θrhip,θlankle,θrankleIndicate apery
The angle of steering engine on robot or so leg hip joint and ankle-joint pitch orientation.
2.1) definition of deeply study correlated variables
Walking Mode generation method based on preview controller cannot be guaranteed what those were difficult to be described with this naive model
The stability of movement.Complicated movement, for example, upper part of the body posture large-amplitude sloshing, arms swing, result in ZMP reference value and
Actual value has larger discrepancy.Method therefore, it is necessary to learn by deeply exports the control of preview controller and carries out
Compensation.Deeply learning method used by the present embodiment is the method (DDPG) of depth deterministic policy gradient.This method
Advantage be can export it is continuous as a result, the performance under complex scene is more preferable than similar result.
To be learnt using deeply, it is necessary first to define relevant variable, including state vector, action vector, reward
Function.The description of state, which has been described above in step 1), to be described, therefore is repeated no more.
The output of preview controller control is bivector, corresponds to the output of barycenter x-axis direction and y-axis direction coordinate respectively
Value.Therefore the action definition of deeply learning network is:
Wherein, Δ μxWith Δ μyThe knots modification of each dimension output of preview controller is corresponded to respectively.
In view of the expectation to Humanoid Robot Based on Walking, it is intended that anthropomorphic robot can be in the case where the more walking the more remote
It keeps stablizing, defining reward function is:
Wherein, if anthropomorphic robot can smoothly go to terminal, return value 50;If anthropomorphic robot is in walking
It falls down in the process, then return value is -50;If other situations, then the current state of robot is referred to.
The quadratic sum r of accelerationα(t) definition is:
Wherein, αx(x) and αy(t) respectively represented t moment anthropomorphic robot in the direction of the x axis with the acceleration on y-axis direction
The value of degree.
The quadratic sum r of angular speedω(t) definition is
Wherein, ωx(x) and ωy(t) respectively represented t moment anthropomorphic robot in the direction of the x axis with the angle on y-axis direction
The value of speed.
X_dis represents the distance of apery machine walking.
2.2) structure of deeply learning network
When realizing DDPG, needs to build Actor networks respectively and Critic networks are used for training.Critic networks
Effect be parametrization behavior memory function;The effect of Actor networks is the value guidance strategy obtained according to Critic networks
The update of function.As shown in Figure 1, the concrete structure of Critic networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and wherein first layer has 402 nodes, including the node of 2 representatives action;The
Two layers are 300 nodes.The activation primitive of each neuron is line rectification activation primitive, and it is defeated to calculate its using following formula
Go out:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken.
Output layer:Q (t) represents the output valve of strategic function, totally 1 dimension.
As shown in Fig. 2, the concrete structure of Actor networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and wherein first layer has 400 nodes, the second layer is 300 nodes.Each nerve
The activation primitive of member is line rectification activation primitive, its output is calculated using following formula:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken.
Output layer:A (t) represents the working value of output, totally 2 dimension.
Using BP algorithm and gradient descent method, Critic and Actor networks are updated, for the defeated of each neuron
Go out weight wi, there is following update formula:
Wherein, wiFor i-th of weight,For learning rate, E is the learning performance index of two networks.
3) correction amount, exported to preview controller using improved deeply learning network is modified, and is being corrected
On the basis of preview controller afterwards, the angle of each steering engine of anthropomorphic robot both legs, guidance machine people walking are calculated.
The theoretical emphasis of traditional preview controller is exactly to be controlled using following information.Specific to the present embodiment
In, Future Information refers to the following NpTarget ZMP reference values within step.If current point in time is k, then future NpMesh within step
Mark ZMP reference values (ZMP* k+1,…,ZMP* k+Np).Then these Future targets ZMP reference values are stored in FIFO (first in first out)
In buffer, output valve is as current reference value.ZMP reference values in preview controller fifo buffer and apery machine
The state computation control output of device people.Controlling the formula exported is:
Wherein, ukIt is exported for k moment controllers, c, Ks, Kx,Device coefficient in order to control,For k when
The anthropomorphic robot center-of-mass coordinate at quarter, [ZMP* k+1,…,ZMP* k+Np]TFor the k+1 moment to k+NpReference ZMP.
Go out the correction amount u' that preview controls output valve by the network training of depth enhancing studyk。
u′k=uk+Δuk
After obtaining control input, the center-of-mass coordinate at k+1 moment can be calculated.
Utilize the center-of-mass coordinate (x at k+1 momentk+1,yk+1).This can be obtained by the barycenter pose and left and right foot at k+1 moment
Pose
Wherein Gcobpresent, GlpresentAnd GrpresentFor k+1 moment barycenter, the pose of left foot and right crus of diaphragm.Then further according to
Inverse kinematics principle calculates the steering engine angle of anthropomorphic robot both legs, obtains each joint steering engine of both legs at k+1 moment
Angle knows that Humanoid Robot Based on Walking, detailed process are shown in Figure 3 with this.
In anthropomorphic robot gait processes, for each output (u to preview controllerx,uy), it is calculated and works as
Preceding state learns one group of correction amount for output, the network of update deeply study using deeply study DDPG.Together
The output of Shi Liyong preview controllers, calculates the walking posture of anthropomorphic robot.In conclusion algorithm steps are as follows, it is specifically shown in
Shown in Fig. 4:
1. initializing deeply study DDPG frames and preview controller;
2. obtaining current state according to sensor information, one group is calculated about preview using deeply study DDPG
The correction amount of controller;
3. the output quantity of preview controller is added in the output of preview controller, and according to output valve, in conjunction with inverse movement
Principle is learned, the walking of anthropomorphic robot is instructed;
4. obtaining current system return value immediately, deeply learning framework is updated;
5. judge anthropomorphic robot current state, if anthropomorphic robot is with falling down or go to target, end loop;
Otherwise it gos to step 2..
Wherein, the experiment walking effect of anthropomorphic robot is shown in Figure 5.
Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore
Change made by all shapes according to the present invention, principle, should all cover within the scope of the present invention.
Claims (4)
1. the preview based on depth enhancing study controls humanoid robot gait's planing method, which is characterized in that including following step
Suddenly:
1) status information is obtained by the sensor being assemblied on anthropomorphic robot;
2) existing deeply learning network is improved, completely new state, action vector sum reward function are defined;
3) output of preview controller is modified using the action of definition vector, calculates each steering engine of anthropomorphic robot both legs
Angle, instruct Humanoid Robot Based on Walking;
4) during Humanoid Robot Based on Walking, with state, the deeply of the value retrofit of action vector, reward function
Practise network.
2. the preview according to claim 1 based on depth enhancing study controls humanoid robot gait's planing method,
It is characterized in that:In step 1), status information, Humanoid Robot Based on Walking are obtained by the sensor being assemblied on anthropomorphic robot
When degree of stability mainly the steering engine of pitch orientation influences on foot supported, therefore in defined status information, it should carry
For the angle information of pitch-control motor in support leg information and support leg, in addition it is also necessary to the value of acceleration and angular speed, to sentence
The stable case of disconnected Humanoid Robot Based on Walking process, then makes real-time adjustment, so as to adapt to injustice to offline gait again
Whole terrain environment;
[α,ω,θlhip,θrhip,θlankle,θrankle]
Wherein, α indicates the acceleration square root sum square in x-axis and y-axis direction of anthropomorphic robot;ω indicates apery machine
The angular speed square root sum square in x-axis and y-axis direction of device people;θlhip,θrhip,θlankle,θrankleIndicate apery machine
The angle of steering engine on people or so leg hip joint and ankle-joint pitch orientation.
3. the preview according to claim 1 based on depth enhancing study controls humanoid robot gait's planing method,
It is characterized in that:In step 2), improved deeply learning network uses depth deterministic policy gradient method DDPG, specifically such as
Under:
2.1) definition of deeply study correlated variables
By the method that deeply learns, the control output of preview controller is compensated, to be learnt using deeply,
Firstly the need of the relevant variable of definition, including state vector, action vector, reward function;
The output of preview controller control is bivector, corresponds to the output valve of barycenter x-axis direction and y-axis direction coordinate respectively,
Therefore the action definition of deeply learning network is:
Wherein, Δ μxWith Δ μyThe knots modification of each dimension output of preview controller is corresponded to respectively;
In view of the expectation to Humanoid Robot Based on Walking, it is desirable to which anthropomorphic robot can keep steady in the case where the more walking the more remote
Fixed, defining reward function is:
Wherein, if anthropomorphic robot can smoothly go to terminal, return value 50;If anthropomorphic robot is in the process of walking
In fall down, then return value be -50;If other situations, then the current state of robot is referred to;
Square root sum square r of accelerationα(t) definition is:
Wherein, αx(x) and αy(t) respectively represent t moment anthropomorphic robot in the direction of the x axis with the acceleration on y-axis direction
Value;
Angular speed square root sum square rω(t) definition is:
Wherein, ωx(x) and ωy(t) respectively represent t moment anthropomorphic robot in the direction of the x axis with the angular speed on y-axis direction
Value;
X_dis represents the distance of apery machine walking;
2.2) structure of deeply learning network
When realizing DDPG, needs to build Actor networks respectively and Critic networks are used for training, the work of Critic networks
Be parametrization behavior memory function;The effect of Actor networks is the value guidance strategic function obtained according to Critic networks
Update, the concrete structure of Critic networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and wherein first layer has 402 nodes, includes the node of 2 representatives action;The second layer is
300 nodes;The activation primitive of each neuron is line rectification activation primitive, its output is calculated using following formula:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken;
Output layer:Q (t) represents the output valve of strategic function, totally 1 dimension;
The concrete structure of Actor networks is:
Input layer:S (t) indicates the state that Q functions are inputted in t moment in Q study, totally 9 dimension;
Hidden layer:Hidden layer is 2 layers, and it is 300 nodes that wherein first layer, which has 400 nodes, the second layer,;Each neuron swashs
Function living is line rectification activation primitive, its output is calculated using following formula:
yi(t)=max (t, 0), i=1,2 ... n
Represent the output y of i-th of neuroni(t) higher value in 0 and t is taken;
Output layer:A (t) represents the working value of output, totally 2 dimension;
Using BP algorithm and gradient descent method, Critic and Actor networks are updated, the output of each neuron is weighed
Weight wi, there is following update formula:
Wherein, wiFor i-th of weight,For learning rate, E is the learning performance index of two networks.
4. the preview according to claim 1 based on depth enhancing study controls humanoid robot gait's planing method,
It is characterized in that:In step 3), the correction amount exported to preview controller using improved deeply learning network is repaiied
Just, on the basis of revised preview controller, the angle of each steering engine of anthropomorphic robot both legs, guidance machine people are calculated
Walking;Wherein, the theoretical emphasis of traditional preview controller is exactly to be controlled using following information, and Future Information refers to future
NpTarget ZMP reference values within step, if current point in time is k, then future NpDouble-legged pose within step passes through three-dimensional walking
Mode computation obtains, and then obtains NpTarget ZMP reference values within step:ZMP* k+1,…,ZMP* k+Np;Then in these futures
Target ZMP reference values are stored in fifo buffer, and output valve is buffered as current reference value, preview controller with FIFO
The state computation control output of ZMP reference values and anthropomorphic robot in device, the formula for controlling output are:
Wherein, ukIt is exported for k moment controllers;C, Ks, Kx,Device coefficient in order to control;For the k moment
Anthropomorphic robot center-of-mass coordinate, [ZMP* k+1,…,ZMP* k+Np]TFor the k+1 moment to k+NpReference ZMP;
Go out the correction amount that preview controls output valve by the network training of depth enhancing study;
u′k=uk+Δuk
After obtaining control input, the center-of-mass coordinate at k+1 moment is calculated;
Utilize the center-of-mass coordinate (x at k+1 momentk+1,yk+1), this can obtain the barycenter pose at k+1 moment and left and right foot pose:
Wherein, Gcobpresent, GlpresentAnd GrpresentFor k+1 moment barycenter, the pose of left foot and right crus of diaphragm;Finally further according to inverse fortune
It is dynamic to learn principle, the steering engine angle of anthropomorphic robot both legs is calculated, both legs each joint steering engine angle at k+1 moment is obtained
Degree knows Humanoid Robot Based on Walking with this.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810465382.3A CN108549237B (en) | 2018-05-16 | 2018-05-16 | Preset control humanoid robot gait planning method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810465382.3A CN108549237B (en) | 2018-05-16 | 2018-05-16 | Preset control humanoid robot gait planning method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108549237A true CN108549237A (en) | 2018-09-18 |
CN108549237B CN108549237B (en) | 2020-04-28 |
Family
ID=63495020
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810465382.3A Expired - Fee Related CN108549237B (en) | 2018-05-16 | 2018-05-16 | Preset control humanoid robot gait planning method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108549237B (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109483530A (en) * | 2018-10-18 | 2019-03-19 | 北京控制工程研究所 | A kind of legged type robot motion control method and system based on deeply study |
CN109709967A (en) * | 2019-01-22 | 2019-05-03 | 深圳市幻尔科技有限公司 | The implementation method for the dynamic gait that the low operation of robot requires |
CN109719721A (en) * | 2018-12-26 | 2019-05-07 | 北京化工大学 | A kind of autonomous emergence of imitative snake search and rescue robot adaptability gait |
CN109871892A (en) * | 2019-02-18 | 2019-06-11 | 华南理工大学 | A kind of robot vision cognitive system based on small sample metric learning |
CN110238839A (en) * | 2019-04-11 | 2019-09-17 | 清华大学 | It is a kind of to optimize non-molding machine people multi peg-in-hole control method using environmental forecasting |
CN110308727A (en) * | 2019-07-12 | 2019-10-08 | 沈阳城市学院 | A kind of control method for eliminating biped robot's upper body posture shaking |
CN110496377A (en) * | 2019-08-19 | 2019-11-26 | 华南理工大学 | A kind of virtual table tennis forehand hit training method based on intensified learning |
CN110562301A (en) * | 2019-08-16 | 2019-12-13 | 北京交通大学 | Subway train energy-saving driving curve calculation method based on Q learning |
CN110764415A (en) * | 2019-10-31 | 2020-02-07 | 清华大学深圳国际研究生院 | Gait planning method for leg movement of quadruped robot |
CN110764416A (en) * | 2019-11-11 | 2020-02-07 | 河海大学 | Humanoid robot gait optimization control method based on deep Q network |
CN110909859A (en) * | 2019-11-29 | 2020-03-24 | 中国科学院自动化研究所 | Bionic robot fish motion control method and system based on antagonistic structured control |
CN111027143A (en) * | 2019-12-18 | 2020-04-17 | 四川大学 | Shipboard aircraft approach guiding method based on deep reinforcement learning |
CN111142378A (en) * | 2020-01-07 | 2020-05-12 | 四川省桑瑞光辉标识系统股份有限公司 | Neural network optimization method of biped robot neural network controller |
CN111191399A (en) * | 2019-12-24 | 2020-05-22 | 北京航空航天大学 | Control method, device and equipment of robot fish and storage medium |
CN111360834A (en) * | 2020-03-25 | 2020-07-03 | 中南大学 | Humanoid robot motion control method and system based on deep reinforcement learning |
CN112162554A (en) * | 2020-09-23 | 2021-01-01 | 吉林大学 | Data storage and backtracking platform for N3 sweeper |
CN112666939A (en) * | 2020-12-09 | 2021-04-16 | 深圳先进技术研究院 | Robot path planning algorithm based on deep reinforcement learning |
CN112782973A (en) * | 2019-11-07 | 2021-05-11 | 四川省桑瑞光辉标识系统股份有限公司 | Biped robot walking control method and system based on double-agent cooperative game |
CN113031528A (en) * | 2021-02-25 | 2021-06-25 | 电子科技大学 | Multi-legged robot motion control method based on depth certainty strategy gradient |
CN113156892A (en) * | 2021-04-16 | 2021-07-23 | 西湖大学 | Four-footed robot simulated motion control method based on deep reinforcement learning |
CN113627584A (en) * | 2020-05-08 | 2021-11-09 | 南京大学 | Neural network-based inverse kinematics solving method for mechanical arm, electronic equipment and storage medium |
CN117062280A (en) * | 2023-08-17 | 2023-11-14 | 北京美中爱瑞肿瘤医院有限责任公司 | Automatic following system of neurosurgery self-service operating lamp |
CN117565023A (en) * | 2022-12-30 | 2024-02-20 | 爱布(上海)人工智能科技有限公司 | Muscle movement sensing system for grasping walking intention and implementation method thereof |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1393866A1 (en) * | 2001-06-07 | 2004-03-03 | Japan Science and Technology Corporation | Apparatus walking with two legs; walking control apparatus; and walking control method thereof |
CN104217107A (en) * | 2014-08-27 | 2014-12-17 | 华南理工大学 | Method for detecting tumbling state of humanoid robot based on multi-sensor information |
CN106094817A (en) * | 2016-06-14 | 2016-11-09 | 华南理工大学 | Intensified learning humanoid robot gait's planing method based on big data mode |
CN106584460A (en) * | 2016-12-16 | 2017-04-26 | 浙江大学 | Vibration suppression method in walking of humanoid robot |
CN106842925A (en) * | 2017-01-20 | 2017-06-13 | 清华大学 | A kind of locomotive smart steering method and system based on deeply study |
CN107944476A (en) * | 2017-11-10 | 2018-04-20 | 大连理工大学 | A kind of yellow peach stoning machine device people's behaviour control method based on deeply study |
-
2018
- 2018-05-16 CN CN201810465382.3A patent/CN108549237B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1393866A1 (en) * | 2001-06-07 | 2004-03-03 | Japan Science and Technology Corporation | Apparatus walking with two legs; walking control apparatus; and walking control method thereof |
CN104217107A (en) * | 2014-08-27 | 2014-12-17 | 华南理工大学 | Method for detecting tumbling state of humanoid robot based on multi-sensor information |
CN106094817A (en) * | 2016-06-14 | 2016-11-09 | 华南理工大学 | Intensified learning humanoid robot gait's planing method based on big data mode |
CN106584460A (en) * | 2016-12-16 | 2017-04-26 | 浙江大学 | Vibration suppression method in walking of humanoid robot |
CN106842925A (en) * | 2017-01-20 | 2017-06-13 | 清华大学 | A kind of locomotive smart steering method and system based on deeply study |
CN107944476A (en) * | 2017-11-10 | 2018-04-20 | 大连理工大学 | A kind of yellow peach stoning machine device people's behaviour control method based on deeply study |
Non-Patent Citations (1)
Title |
---|
马琼雄等: "基于深度强化学习的水下机器人最优轨迹控制", 《华南师范大学学报(自然科学版)》 * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109483530A (en) * | 2018-10-18 | 2019-03-19 | 北京控制工程研究所 | A kind of legged type robot motion control method and system based on deeply study |
CN109719721B (en) * | 2018-12-26 | 2020-07-24 | 北京化工大学 | Adaptive gait autonomous emerging method of snake-like search and rescue robot |
CN109719721A (en) * | 2018-12-26 | 2019-05-07 | 北京化工大学 | A kind of autonomous emergence of imitative snake search and rescue robot adaptability gait |
CN109709967A (en) * | 2019-01-22 | 2019-05-03 | 深圳市幻尔科技有限公司 | The implementation method for the dynamic gait that the low operation of robot requires |
CN109871892A (en) * | 2019-02-18 | 2019-06-11 | 华南理工大学 | A kind of robot vision cognitive system based on small sample metric learning |
CN110238839A (en) * | 2019-04-11 | 2019-09-17 | 清华大学 | It is a kind of to optimize non-molding machine people multi peg-in-hole control method using environmental forecasting |
CN110238839B (en) * | 2019-04-11 | 2020-10-20 | 清华大学 | Multi-shaft-hole assembly control method for optimizing non-model robot by utilizing environment prediction |
CN110308727A (en) * | 2019-07-12 | 2019-10-08 | 沈阳城市学院 | A kind of control method for eliminating biped robot's upper body posture shaking |
CN110562301A (en) * | 2019-08-16 | 2019-12-13 | 北京交通大学 | Subway train energy-saving driving curve calculation method based on Q learning |
CN110496377A (en) * | 2019-08-19 | 2019-11-26 | 华南理工大学 | A kind of virtual table tennis forehand hit training method based on intensified learning |
CN110496377B (en) * | 2019-08-19 | 2020-07-28 | 华南理工大学 | Virtual table tennis player ball hitting training method based on reinforcement learning |
CN110764415A (en) * | 2019-10-31 | 2020-02-07 | 清华大学深圳国际研究生院 | Gait planning method for leg movement of quadruped robot |
CN110764415B (en) * | 2019-10-31 | 2022-04-15 | 清华大学深圳国际研究生院 | Gait planning method for leg movement of quadruped robot |
CN112782973A (en) * | 2019-11-07 | 2021-05-11 | 四川省桑瑞光辉标识系统股份有限公司 | Biped robot walking control method and system based on double-agent cooperative game |
CN110764416A (en) * | 2019-11-11 | 2020-02-07 | 河海大学 | Humanoid robot gait optimization control method based on deep Q network |
CN110909859A (en) * | 2019-11-29 | 2020-03-24 | 中国科学院自动化研究所 | Bionic robot fish motion control method and system based on antagonistic structured control |
CN111027143A (en) * | 2019-12-18 | 2020-04-17 | 四川大学 | Shipboard aircraft approach guiding method based on deep reinforcement learning |
CN111191399A (en) * | 2019-12-24 | 2020-05-22 | 北京航空航天大学 | Control method, device and equipment of robot fish and storage medium |
CN111191399B (en) * | 2019-12-24 | 2021-11-05 | 北京航空航天大学 | Control method, device and equipment of robot fish and storage medium |
CN111142378A (en) * | 2020-01-07 | 2020-05-12 | 四川省桑瑞光辉标识系统股份有限公司 | Neural network optimization method of biped robot neural network controller |
CN111360834A (en) * | 2020-03-25 | 2020-07-03 | 中南大学 | Humanoid robot motion control method and system based on deep reinforcement learning |
CN113627584B (en) * | 2020-05-08 | 2024-04-12 | 南京大学 | Mechanical arm inverse kinematics solving method based on neural network, electronic equipment and storage medium |
CN113627584A (en) * | 2020-05-08 | 2021-11-09 | 南京大学 | Neural network-based inverse kinematics solving method for mechanical arm, electronic equipment and storage medium |
CN112162554B (en) * | 2020-09-23 | 2021-10-01 | 吉林大学 | Data storage and backtracking platform for N3 sweeper |
CN112162554A (en) * | 2020-09-23 | 2021-01-01 | 吉林大学 | Data storage and backtracking platform for N3 sweeper |
CN112666939A (en) * | 2020-12-09 | 2021-04-16 | 深圳先进技术研究院 | Robot path planning algorithm based on deep reinforcement learning |
CN112666939B (en) * | 2020-12-09 | 2021-09-10 | 深圳先进技术研究院 | Robot path planning algorithm based on deep reinforcement learning |
CN113031528B (en) * | 2021-02-25 | 2022-03-15 | 电子科技大学 | Multi-legged robot non-structural ground motion control method based on depth certainty strategy gradient |
CN113031528A (en) * | 2021-02-25 | 2021-06-25 | 电子科技大学 | Multi-legged robot motion control method based on depth certainty strategy gradient |
CN113156892A (en) * | 2021-04-16 | 2021-07-23 | 西湖大学 | Four-footed robot simulated motion control method based on deep reinforcement learning |
CN117565023A (en) * | 2022-12-30 | 2024-02-20 | 爱布(上海)人工智能科技有限公司 | Muscle movement sensing system for grasping walking intention and implementation method thereof |
CN117565023B (en) * | 2022-12-30 | 2024-05-17 | 爱布(上海)人工智能科技有限公司 | Muscle movement sensing system for grasping walking intention and implementation method thereof |
CN117062280A (en) * | 2023-08-17 | 2023-11-14 | 北京美中爱瑞肿瘤医院有限责任公司 | Automatic following system of neurosurgery self-service operating lamp |
CN117062280B (en) * | 2023-08-17 | 2024-03-08 | 北京美中爱瑞肿瘤医院有限责任公司 | Automatic following system of neurosurgery self-service operating lamp |
Also Published As
Publication number | Publication date |
---|---|
CN108549237B (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108549237A (en) | Preview based on depth enhancing study controls humanoid robot gait's planing method | |
CN104932264B (en) | The apery robot stabilized control method of Q learning frameworks based on RBF networks | |
CN109483530B (en) | Foot type robot motion control method and system based on deep reinforcement learning | |
Miura et al. | Human-like walking with toe supporting for humanoids | |
CN109991979B (en) | Lower limb robot anthropomorphic gait planning method oriented to complex environment | |
Yi et al. | Online learning of a full body push recovery controller for omnidirectional walking | |
US8428780B2 (en) | External force target generating device of legged mobile robot | |
US8442680B2 (en) | Motion state evaluation apparatus of legged mobile robot | |
WO2019218805A1 (en) | Motion closed-loop control method for quadruped robot | |
CN109760761B (en) | Four-footed robot motion control method based on bionics principle and intuition | |
CN104898672B (en) | A kind of optimal control method of Humanoid Robot Based on Walking track | |
US8396593B2 (en) | Gait generating device of legged mobile robot | |
CN106094817B (en) | Intensified learning humanoid robot gait's planing method based on big data mode | |
JP6781101B2 (en) | Non-linear system control method, biped robot control device, biped robot control method and its program | |
CN103750927B (en) | Artificial leg knee joint adaptive iterative learning control method | |
CN106019950A (en) | Mobile phone satellite self-adaptive attitude control method | |
CN114397810A (en) | Four-legged robot motion control method based on adaptive virtual model control | |
US20110213498A1 (en) | Desired motion evaluation apparatus of legged mobile robot | |
CN113568422B (en) | Four-foot robot control method based on model predictive control optimization reinforcement learning | |
Dong et al. | On-line gait adjustment for humanoid robot robust walking based on divergence component of motion | |
CN109857146B (en) | Layered unmanned aerial vehicle tracking control method based on feedforward and weight distribution | |
CN116237943A (en) | Four-foot robot control method combined with terrain constraint | |
CN104793621A (en) | Muscular viscoelastic behavior imitated humanoid robot walking stability control method | |
Xie et al. | Online whole-stage gait planning method for biped robots based on improved Variable Spring-Loaded Inverted Pendulum with Finite-sized Foot (VSLIP-FF) model | |
Kim et al. | A model predictive capture point control framework for robust humanoid balancing via ankle, hip, and stepping strategies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200428 |
|
CF01 | Termination of patent right due to non-payment of annual fee |