CN107450555A - A kind of Hexapod Robot real-time gait planing method based on deeply study - Google Patents
A kind of Hexapod Robot real-time gait planing method based on deeply study Download PDFInfo
- Publication number
- CN107450555A CN107450555A CN201710763223.7A CN201710763223A CN107450555A CN 107450555 A CN107450555 A CN 107450555A CN 201710763223 A CN201710763223 A CN 201710763223A CN 107450555 A CN107450555 A CN 107450555A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- mtd
- mtr
- msup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 241000238631 Hexapoda Species 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000005021 gait Effects 0.000 title claims abstract description 27
- 230000033001 locomotion Effects 0.000 claims abstract description 60
- 238000012549 training Methods 0.000 claims abstract description 27
- 238000000605 extraction Methods 0.000 claims abstract description 20
- 230000009467 reduction Effects 0.000 claims abstract description 19
- 238000011217 control strategy Methods 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims description 26
- 230000007935 neutral effect Effects 0.000 claims description 19
- 230000007246 mechanism Effects 0.000 claims description 14
- 230000009471 action Effects 0.000 claims description 10
- 238000003384 imaging method Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 5
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 230000007306 turnover Effects 0.000 claims description 3
- 239000000523 sample Substances 0.000 claims 2
- 239000012723 sample buffer Substances 0.000 claims 1
- 230000003044 adaptive effect Effects 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 241000592274 Polypodium vulgare Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 235000001968 nicotinic acid Nutrition 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0246—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
- G05D1/0251—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means extracting 3D information from a plurality of images taken from different locations, e.g. stereo vision
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Multimedia (AREA)
- Electromagnetism (AREA)
- Manipulator (AREA)
Abstract
The invention provides a kind of Hexapod Robot real-time gait planing method based on deeply study, step includes:Environment traffic information is obtained by Hexapod Robot and formulates overall movement locus;The photo of environment is obtained by camera, calculates the traffic information of target trajectory using binocular distance-finding method further according to photo, and the track traffic information calculated is navigated for robot center of mass motion track;In the range of the sufficient end swing space of robot leg, the photo of road conditions environment is shot, and Data Dimensionality Reduction and feature extraction are carried out to photo by the deeply learning network based on depth deterministic policy gradient (DDPG) that training in advance is crossed;The control strategy of Hexapod Robot is drawn according to feature extraction result, Hexapod Robot falls foot according to control strategy come control machine people, realizes the real-time walking of Hexapod Robot.The method of the gait planning can be planned the complicated non-structure environment of road conditions in real time, significant to the adaptive capacity to environment of raising Hexapod Robot.
Description
Technical field
It is especially a kind of to be learnt based on deeply the present invention relates to a kind of method of Hexapod Robot real-time gait planning
Hexapod Robot real-time gait planing method.
Background technology
Robot technology be materialogy, theory of mechanisms, bionics, electromechanical integration technology, control technology, sensor technology,
The subjects such as artificial intelligence it is highly integrated, be the important embodiment of National Industrial development level and scientific and technological strength.It is autonomous to complete gait
The polypody bio-robot of planning is highly intelligentized mobile robot, and the autonomous learning of external environment and completion can be walked
State is planned.Road conditions environment complexity is various, and the gait planning method of Hexapod Robot tradition preprogramming has significant limitation.
In order to improve the adaptive capacity to environment of Hexapod Robot, Hexapod Robot needs to complete various basic job tasks such as overall
The function that mobile navigation, the planning of barycenter motion track and foothold are chosen.Satellite navigation and more biographies are merged by multi-foot robot
The information of sensor carries out machine learning (such as deep learning and intensified learning), in particular how is interacted with external environment
Improve the performance of target in empirical learning, realize the various functions such as its perception, decision-making and action.The correlation of Hexapod Robot is ground
Study carefully the concern for enjoying various countries experts and scholars always, but how to improve locomotivity of the Hexapod Robot under non-structure environment still
It is so a pendent problem.
The content of the invention
The technical problem to be solved in the present invention is the ground that existing Hexapod Robot gait planning technology can not adapt to complexity
Shape environment and remote autonomous walking and the unfixed situation in final position.
In order to solve the above-mentioned technical problem, the invention provides a kind of Hexapod Robot based on deeply study is real-time
Gait planning method, comprises the following steps:
Step 1, environment traffic information is obtained by satellite map by Hexapod Robot, and formulated according to environment traffic information
Mass motion track;
Step 2, Hexapod Robot utilizes the camera being arranged on fuselage to obtain surrounding enviroment photo, further according to peripheral ring
Border photo calculates the target position information of movement locus using binocular distance-finding method, and by Hexapod Robot according to movement locus
Target position information cook up robot center of mass motion track;
Step 3, Hexapod Robot moves according to robot center of mass motion track, and is swung at the sufficient end of robot leg
In spatial dimension, using on fuselage camera shoot road conditions environment photo, and by training in advance cross based on
DDPG deeply learning network to carry out Data Dimensionality Reduction and feature extraction to road conditions environment photo;
Step 4, Hexapod Robot draws the control strategy of Hexapod Robot according to Data Dimensionality Reduction and feature extraction result, and
The each joint driving mechanism of Hexapod Robot is controlled to complete joint freedom degrees motion according to control strategy, so as to realize six sufficient machines
The real-time gait planning walking of device.
As the further limits scheme of the present invention, motion is calculated using binocular distance-finding method according to photo in step 2
The real-time position information of track concretely comprises the following steps:
Step 2.1, focal length f, the centre-to-centre spacing T of two cameras in left and right of camera are obtainedxAnd movement locus in road conditions
On target point two cameras in left and right image plane subpoint to the respective image plane leftmost side physical distance xlAnd xr,
The image plane in left side corresponding to the camera of left and right two and the image plane on right side are rectangle plane, and are located at same imaging plane
On, the photocentre projection of the camera of left and right two is located at the center of corresponding image plane respectively, i.e. Ol、OrIn the projection of imaging plane
Point, then parallax d be:
D=xl-xr (1)
Step 2.2, establishing Q matrixes using Similar Principle of Triangle is:
In formula (2) and (3), (X, Y, Z) is target point using left camera photocentre as the seat in the three-dimensional coordinate system of origin
Mark, W are rotation translation conversion ratio example coefficient, and (x, y) is coordinate of the target point in the image plane in left side, cxAnd cyIt is respectively left
The offset of origin, c in the coordinate system and three-dimensional coordinate system of the image plane of side and the image plane on right sidex' it is cxCorrection value;
Step 2.3, the space length that target point to imaging plane is calculated is:
Using the photocentre position of left camera as robot position, by the co-ordinate position information of target point (X,
Y, Z) target position information as movement locus.
As the further limits scheme of the present invention, the deeply based on DDPG crossed by training in advance in step 3
Learning network concretely comprises the following steps to carry out Data Dimensionality Reduction and feature extraction to road conditions environment photo:
Step 3.1, using target foot end, independently selection point process of stopping over meets intensified learning and meets Markov property
Condition, calculate t before observed quantity and action collection be combined into:
st=(x1,a1,...,at-1,xt)=xt (5)
In formula (5), xtAnd atThe respectively observed quantity of t and the action taken;
Step 3.2, Utilization strategies value function come describe sufficient end independently selection stop over point process prospective earnings for:
Qπ(st,at)=E [Rt|st,at] (6)
In formula (6),For moment t obtain to beat the later future profits of discount total
It is discount factor with, γ ∈ [0,1], r (st,at) be moment t revenue function, T be sufficient end independently select that foothold terminates when
Carve, π is that sufficient end independently selects foothold strategy;
It is default determination because sufficient end independently selects the target strategy π of foothold, is designated as function mu:S ← A, S are state
Space, A are the motion space of N-dimensional degree, while had using Bellman equation processing formula (6):
Wherein, st+1~E represents that the observed quantity at t+1 moment obtains from environment E, μ (st+1) represent the t+1 moment from
Measure the action being be mapped to by function mu;
Step 3.3, using the principle of maximal possibility estimation, it is to update network weight parameter by minimizing loss function
θQPolicy evaluation network Q (s, a | θQ), used loss function is:
L(θQ)=Eμ'[(Q(st,at|θQ)-yt)2] (8)
In formula (8), yt=r (st,at)+γQ(st+1,μ(st+1)|θQ) it is that target strategy assesses network, μ ' is target plan
Slightly;
Step 3.4, the parameter for reality is θμStrategic function μ (s | θμ), the gradient obtained using chain method is:
The gradient being calculated by formula (9) is Policy-Gradient, recycle Policy-Gradient come update strategic function μ (s |
θμ);
Step 3.5, using the sample data come training network, used from policing algorithm in network training from same sample
Obtained in buffering area, to minimize the relevance between sample, while neutral net is trained with a target Q value network, i.e.,
Objective network is updated using experience replay mechanism and target Q value network method, used slowly more new strategy is:
θQ'←τθQ+(1-τ)θQ' (10)
θμ'←τθμ+(1-τ)θμ' (11)
In formula (10) and (11), τ is turnover rate, τ<<1, thus just construct the deeply study based on DDPG
Network, and be convergent neutral net;
Step 3.6, Data Dimensionality Reduction and feature are carried out to road conditions environment photo using the deeply learning network built
Extraction.
As the further limits scheme of the present invention, the deeply learning network in step 3.6 is inputted by two images
Layer, four convolutional layers, four full articulamentums and an output layer are formed;Image input layer is used to input independently to be selected for sufficient end
Select the image of foothold;Convolutional layer is used to extract characteristics of image, i.e., the deep layer form of expression of two images;Full articulamentum and output
Layer is combined to form a deep layer network, after completing training, input feature vector terrain information to the exportable each joint of the network
Angle control instruction, that is, each joint driving mechanism of Hexapod Robot leg is controlled to complete joint freedom degrees motion, so as to real
The real-time walking of existing six sufficient machines.
The beneficial effects of the present invention are:(1) satellite navigation is applied to Hexapod Robot center of mass motion trajectory planning, made
Hexapod Robot can complete remote autonomous walking gait planning.(2) binocular ranging side is utilized by way of binocular ranging
Method calculates the positional information of movement locus, and the track traffic information calculated is used for into leading for robot center of mass motion track
Boat, realize closely gait planning.(3) dual image input layer can effectively move planning and the variation targets point of track
Determination, and the image information of input, input are determined in pre-training neutral net using stochastical sampling and experience replay mechanism
Image information it is not only independent of one another but also interrelated, meet neutral net for input data requirement independent of one another;(4) mesh is used
Mark Q values network technique constantly to adjust the weight matrix of neutral net, realizes Data Dimensionality Reduction, promotes neutral net convergence;(5) lead to
Cross the photo progress data drop that the deeply learning network based on DDPG that training in advance is crossed shoots road conditions environment to camera
Peacekeeping Extraction of Topographic Patterns, and the gait motion control strategy of Hexapod Robot is directly given, effectively solve " dimension disaster " and ask
Topic, realize the real-time gait planning of Hexapod Robot.
Brief description of the drawings
Fig. 1 is the system structure diagram of the present invention;
Fig. 2 is flow chart of the method for the present invention;
Embodiment
A kind of as shown in figure 1, Hexapod Robot real-time gait planing method fortune based on deeply study of the present invention
Capable system includes:Satellite navigation system, machine vision and image processing system, central control system and basic motion system.
Wherein, satellite navigation system is mainly made up of the Satellite Map GIS Software in Hexapod Robot, in input mesh
Ground after, path planning can be quickly completed, and by the information transmission of path planning to central control system;Image processing system
Mainly have installed in the anterior camera of Hexapod Robot and the matlab software sharings on industrial computer;Maincenter control system
System is mainly based on the depth of depth deterministic policy gradient (DDPG) by the dynamics simulation platform pre-training on industrial computer
Spend intensified learning network and communication module to form, in the process generally and target Q value network both approaches ensure base
It can be restrained during pre-training in DDPG deeply learning network.Basic motion system is tied by the machinery of Hexapod Robot
Structure, driving and sensor are formed, and perform the Motion that central control system is formulated, each pass of control Hexapod Robot leg
Motor or oil cylinder of driving etc. are saved, the motion of joint freedom degrees is completed, so as to realize the real-time walking of Hexapod Robot, and moves
Feedback of the information is to central control system.
In the certain distance of Hexapod Robot fuselage, environment is obtained by the camera installed in Hexapod Robot fuselage
Photo, the positional information of movement locus, and the track road that will be calculated are calculated using binocular distance-finding method further according to photo
Condition information is used for the navigation of robot center of mass motion track;
During pre-training neutral net, first by the matlab softwares on industrial computer by environment traffic information
RGB image is converted into gray level image, utilizes experience replay mechanism so that the degree of correlation is as small as possible to meet nerve net before and after photo
Network recycles stochastical sampling to obtain the image of input neutral net for input data requirement independent of each other;Pass through depth
Data Dimensionality Reduction is realized in study, and the weight matrix of neutral net is constantly adjusted using target Q value network technique, is finally given convergent
Neutral net.
Hexapod Robot moves according to the motion planning of centroid trajectory, in the sufficient end swing space scope of robot leg
It is interior, the photo of the camera shooting road conditions environment of Hexapod Robot is now recycled, it is true based on depth by having trained
The deeply learning network of qualitative Policy-Gradient realizes Data Dimensionality Reduction and extraction feature and provides the real-time step of Hexapod Robot
State programming movement strategy, finally control strategy is sent motion of the basic motion system come control machine people to by communication system
State, realize that sufficient end on movement locus independently selects the real-time control of foothold.
At work, step is as follows for system:
Step 1, start the Satellite Map GIS Software being arranged in Hexapod Robot, input robot motion destination and complete
Path planning, and by the information transmission of path planning to central control system;
Step 2, it is based on depth deterministic policy gradient by the dynamics simulation platform on industrial computer, pre-training
(DDPG) deeply learning network, ensured using experience replay mechanism and target Q value network both approaches based on deep
The deeply learning network for spending deterministic policy gradient can Fast Convergent during pre-training;
Step 3, the photograph in the certain distance of Hexapod Robot fuselage is obtained with installed in the anterior camera of robot
The image of piece environment traffic information, image information is transmitted to industrial computer using communication module, binocular ranging is utilized further according to photo
Method calculates the positional information of movement locus, and the track traffic information calculated is used for into robot center of mass motion track
Navigation;
Step 4, Hexapod Robot moves according to the motion planning of centroid trajectory, is swung at the sufficient end of robot leg empty
Between in the range of, recycle Hexapod Robot camera shoot road conditions environment photo, and by training in advance cross based on
DDPG deeply learning network to carry out Data Dimensionality Reduction and feature extraction to the environment photo of acquisition;
Step 5, the control strategy of Hexapod Robot is drawn according to feature extraction result, using communication module by control information
Send the basic motion system of robot to, Hexapod Robot controls each pass of Hexapod Robot leg using control strategy
Motor or oil cylinder of driving etc. are saved, the motion of joint freedom degrees is completed, so as to realize the real-time walking of six sufficient machines, and moves letter
Breath feeds back to central control system.
As shown in Fig. 2 the invention provides a kind of Hexapod Robot real-time gait planning side based on deeply study
Method, comprise the following steps:
Step 1, environment traffic information is obtained by satellite map by Hexapod Robot, and formulated according to environment traffic information
Mass motion track;
Step 2, Hexapod Robot utilizes the camera being arranged on fuselage to obtain surrounding enviroment photo, further according to peripheral ring
Border photo calculates the target position information of movement locus using binocular distance-finding method, and by Hexapod Robot according to movement locus
Target position information cook up robot center of mass motion track;
Step 3, Hexapod Robot moves according to robot center of mass motion track, and is swung at the sufficient end of robot leg
In spatial dimension, using on fuselage camera shoot road conditions environment photo, and by training in advance cross based on
DDPG deeply learning network to carry out Data Dimensionality Reduction and feature extraction to road conditions environment photo;
Step 4, Hexapod Robot draws the control strategy of Hexapod Robot according to Data Dimensionality Reduction and feature extraction result, and
The each joint driving mechanism of Hexapod Robot is controlled to complete joint freedom degrees motion according to control strategy, so as to realize six sufficient machines
The real-time gait planning walking of device.
As the further limits scheme of the present invention, motion is calculated using binocular distance-finding method according to photo in step 2
The real-time position information of track concretely comprises the following steps:
Step 2.1, focal length f, the centre-to-centre spacing T of two cameras in left and right of camera are obtainedxAnd movement locus in road conditions
On target point two cameras in left and right image plane subpoint to the respective image plane leftmost side physical distance xlAnd xr,
The image plane in left side corresponding to the camera of left and right two and the image plane on right side are rectangle plane, and are located at same imaging plane
On, the photocentre projection of the camera of left and right two is located at the center of corresponding image plane respectively, i.e. Ol、OrIn the projection of imaging plane
Point, then parallax d be:
D=xl-xr (1)
Step 2.2, establishing Q matrixes using Similar Principle of Triangle is:
In formula (2) and (3), (X, Y, Z) is target point using left camera photocentre as the seat in the three-dimensional coordinate system of origin
Mark, W are rotation translation conversion ratio example coefficient, and (x, y) is coordinate of the target point in the image plane in left side, cxAnd cyIt is respectively left
The offset of origin, c in the coordinate system and three-dimensional coordinate system of the image plane of side and the image plane on right sidex' it is cxCorrection value, two
Person's numerical value is typically more or less the same, for convenience in the present invention it is considered that both approximately equals;
Step 2.3, the space length that target point to imaging plane is calculated is:
Using the photocentre position of left camera as robot position, by the co-ordinate position information of target point (X,
Y, Z) target position information as movement locus.
As the further limits scheme of the present invention, the deeply based on DDPG crossed by training in advance in step 3
Learning network concretely comprises the following steps to carry out Data Dimensionality Reduction and feature extraction to road conditions environment photo:
Step 3.1, using target foot end, independently selection point process of stopping over meets intensified learning and meets Markov property
Condition, calculate t before observed quantity and action collection be combined into:
st=(x1,a1,...,at-1,xt)=xt (5)
In formula (5), xtAnd atThe respectively observed quantity of t and the action taken;
Step 3.2, Utilization strategies value function come describe sufficient end independently selection stop over point process prospective earnings for:
Qπ(st,at)=E [Rt|st,at] (6)
In formula (6),For moment t obtain to beat the later future profits of discount total
It is discount factor with, γ ∈ [0,1], r (st,at) be moment t revenue function, T be sufficient end independently select that foothold terminates when
Carve, π is that sufficient end independently selects foothold strategy;
It is default determination because sufficient end independently selects the target strategy π of foothold, is designated as function mu:S ← A, S are state
Space, A are the motion space of N-dimensional degree, while had using Bellman equation processing formula (6):
Wherein, st+1~E represents that the observed quantity at t+1 moment obtains from environment E, μ (st+1) represent the t+1 moment from
Measure the action being be mapped to by function mu;
Step 3.3, using the principle of maximal possibility estimation, it is to update network weight parameter by minimizing loss function
θQPolicy evaluation network Q (s, a | θQ), used loss function is:
L(θQ)=Eμ'[(Q(st,at|θQ)-yt)2] (8)
In formula (8), yt=r (st,at)+γQ(st+1,μ(st+1)|θQ) it is that target strategy assesses network, μ ' is target plan
Slightly;
Step 3.4, the parameter for reality is θμStrategic function μ (s | θμ), the gradient obtained using chain method is:
The gradient being calculated by formula (9) is Policy-Gradient, recycle Policy-Gradient come update strategic function μ (s |
θμ);
Step 3.5, using the sample data come training network, used from policing algorithm in network training from same sample
Obtained in buffering area, to minimize the relevance between sample, while neutral net is trained with a target Q value network, i.e.,
Objective network is updated using experience replay mechanism and target Q value network method, used slowly more new strategy is:
θQ'←τθQ+(1-τ)θQ' (10)
θμ'←τθμ+(1-τ)θμ' (11)
In formula (10) and (11), τ is turnover rate, τ<<1, thus just construct the deeply study based on DDPG
Network, and be convergent neutral net;
Step 3.6, Data Dimensionality Reduction and feature are carried out to road conditions environment photo using the deeply learning network built
Extraction.
As the further limits scheme of the present invention, the deeply learning network in step 3.6 is inputted by two images
Layer, four convolutional layers, four full articulamentums and an output layer are formed;Wherein, the reason for image input layer is two is will
Determine that movement locus and phase targets are determined, the reason for quantity of convolutional layer and full articulamentum is four is that extraction image should be made special
Sign is effective, needs to make neutral net Fast Convergent in the training process again;Image input layer is used to input independently to be selected for sufficient end
Select the image of foothold;Convolutional layer is used to extract characteristics of image, i.e., the deep layer form of expression of two images, such as some point, line, arcs
Deng;Full articulamentum and output layer are combined to form a deep layer network, after completing training, input feature vector terrain information to the network
The angle control instruction in exportable each joint, that is, each joint driving mechanism of Hexapod Robot leg is controlled to complete joint certainly
Moved by degree, so as to realize the real-time walking of six sufficient machines.
Satellite navigation is applied to Hexapod Robot center of mass motion trajectory planning by the present invention, Hexapod Robot is completed
Remote autonomous walking gait planning.The position of movement locus is calculated using binocular distance-finding method by way of binocular ranging
Information, and the track traffic information calculated is used for the navigation of robot center of mass motion track, realize closely gait planning.
Dual image input layer can effectively move the planning of track and the determination of variation targets point, and in pre-training neutral net
Shi Caiyong stochastical samplings and the image information of experience replay mechanism determination input, the image information of input are not only independently of one another but also mutual
Association, meets neutral net for input data requirement independent of one another;Neutral net is constantly adjusted using target Q value network technique
Weight matrix, realize Data Dimensionality Reduction, promote neutral net convergence;The deeply based on DDPG crossed by training in advance
The photo that learning network shoots road conditions environment to camera carries out Data Dimensionality Reduction and Extraction of Topographic Patterns, and directly gives six sufficient machines
The gait motion control strategy of device people, effectively solve " dimension disaster " problem, realize the real-time gait planning of Hexapod Robot.
Claims (4)
1. a kind of Hexapod Robot real-time gait planing method based on deeply study, it is characterised in that including following step
Suddenly:
Step 1, environment traffic information is obtained by satellite map by Hexapod Robot, and entirety is formulated according to environment traffic information
Movement locus;
Step 2, Hexapod Robot utilizes the camera being arranged on fuselage to obtain surrounding enviroment photo, is shone further according to surrounding enviroment
Piece calculates the target position information of movement locus, and the mesh by Hexapod Robot according to movement locus using binocular distance-finding method
Cursor position information planning goes out robot center of mass motion track;
Step 3, Hexapod Robot moves according to robot center of mass motion track, and in the sufficient end swing space of robot leg
In the range of, using on fuselage camera shoot road conditions environment photo, and by training in advance cross based on DDPG's
Deeply learning network to carry out Data Dimensionality Reduction and feature extraction to road conditions environment photo;
Step 4, Hexapod Robot draws the control strategy of Hexapod Robot according to Data Dimensionality Reduction and feature extraction result, and according to
Control strategy moves to control each joint driving mechanism of Hexapod Robot to complete joint freedom degrees, so as to realize six sufficient machines
Real-time gait planning walking.
2. the Hexapod Robot real-time gait planing method according to claim 1 based on deeply study, its feature
It is, the real-time position information for being calculated movement locus in step 2 using binocular distance-finding method according to photo is concretely comprised the following steps:
Step 2.1, focal length f, the centre-to-centre spacing T of two cameras in left and right of camera are obtainedxAnd the mesh in road conditions on movement locus
Punctuate the image plane of two cameras in left and right subpoint to the respective image plane leftmost side physical distance xlAnd xr, left and right two
The image plane in left side corresponding to individual camera and the image plane on right side are rectangle plane, and on same imaging plane, it is left
The photocentre projection of right two cameras is located at the center of corresponding image plane respectively, i.e. Ol、OrIn the subpoint of imaging plane, then
Parallax d is:
D=xl-xr (1)
Step 2.2, establishing Q matrixes using Similar Principle of Triangle is:
<mrow>
<mi>Q</mi>
<mo>=</mo>
<mfenced open = "[" close = "]">
<mtable>
<mtr>
<mtd>
<mn>1</mn>
</mtd>
<mtd>
<mn>0</mn>
</mtd>
<mtd>
<mn>0</mn>
</mtd>
<mtd>
<mrow>
<mo>-</mo>
<msub>
<mi>c</mi>
<mi>x</mi>
</msub>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mn>0</mn>
</mtd>
<mtd>
<mn>1</mn>
</mtd>
<mtd>
<mn>0</mn>
</mtd>
<mtd>
<mrow>
<mo>-</mo>
<msub>
<mi>c</mi>
<mi>y</mi>
</msub>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mn>0</mn>
</mtd>
<mtd>
<mn>0</mn>
</mtd>
<mtd>
<mn>0</mn>
</mtd>
<mtd>
<mi>f</mi>
</mtd>
</mtr>
<mtr>
<mtd>
<mn>0</mn>
</mtd>
<mtd>
<mn>0</mn>
</mtd>
<mtd>
<mrow>
<mo>-</mo>
<mfrac>
<mn>1</mn>
<msub>
<mi>T</mi>
<mi>x</mi>
</msub>
</mfrac>
</mrow>
</mtd>
<mtd>
<mfrac>
<mrow>
<msub>
<mi>c</mi>
<mi>x</mi>
</msub>
<mo>-</mo>
<msup>
<msub>
<mi>c</mi>
<mi>x</mi>
</msub>
<mo>&prime;</mo>
</msup>
</mrow>
<msub>
<mi>T</mi>
<mi>x</mi>
</msub>
</mfrac>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>Q</mi>
<mfenced open = "[" close = "]">
<mtable>
<mtr>
<mtd>
<mi>x</mi>
</mtd>
</mtr>
<mtr>
<mtd>
<mi>y</mi>
</mtd>
</mtr>
<mtr>
<mtd>
<mi>d</mi>
</mtd>
</mtr>
<mtr>
<mtd>
<mn>1</mn>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>=</mo>
<mfenced open = "[" close = "]">
<mtable>
<mtr>
<mtd>
<mrow>
<mi>x</mi>
<mo>-</mo>
<msub>
<mi>c</mi>
<mi>x</mi>
</msub>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>y</mi>
<mo>-</mo>
<msub>
<mi>c</mi>
<mi>y</mi>
</msub>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mi>f</mi>
</mtd>
</mtr>
<mtr>
<mtd>
<mfrac>
<mrow>
<mo>-</mo>
<mi>d</mi>
<mo>+</mo>
<msub>
<mi>c</mi>
<mi>x</mi>
</msub>
<mo>-</mo>
<msup>
<msub>
<mi>c</mi>
<mi>x</mi>
</msub>
<mo>&prime;</mo>
</msup>
</mrow>
<msub>
<mi>T</mi>
<mi>x</mi>
</msub>
</mfrac>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>=</mo>
<mfenced open = "[" close = "]">
<mtable>
<mtr>
<mtd>
<mi>X</mi>
</mtd>
</mtr>
<mtr>
<mtd>
<mi>Y</mi>
</mtd>
</mtr>
<mtr>
<mtd>
<mi>Z</mi>
</mtd>
</mtr>
<mtr>
<mtd>
<mi>W</mi>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
In formula (2) and (3), (X, Y, Z) is target point using left camera photocentre as the coordinate in the three-dimensional coordinate system of origin, W
Conversion ratio example coefficient is translated for rotation, (x, y) is coordinate of the target point in the image plane in left side, cxAnd cyOn the left of respectively
The offset of origin, c in the coordinate system and three-dimensional coordinate system of image plane and the image plane on right sidex' it is cxCorrection value;
Step 2.3, the space length that target point to imaging plane is calculated is:
<mrow>
<mi>Z</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mo>-</mo>
<msub>
<mi>T</mi>
<mi>x</mi>
</msub>
<mi>f</mi>
</mrow>
<mrow>
<mi>d</mi>
<mo>-</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>c</mi>
<mi>x</mi>
</msub>
<mo>-</mo>
<msup>
<msub>
<mi>c</mi>
<mi>x</mi>
</msub>
<mo>&prime;</mo>
</msup>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mo>)</mo>
</mrow>
</mrow>
1
Using the photocentre position of left camera as robot position, by the co-ordinate position information (X, Y, Z) of target point
Target position information as movement locus.
3. the Hexapod Robot real-time gait planing method according to claim 1 or 2 based on deeply study, it is special
Sign is that the deeply learning network based on DDPG crossed in step 3 by training in advance is come to the progress of road conditions environment photo
Data Dimensionality Reduction and feature extraction concretely comprise the following steps:
Step 3.1, using target foot end, independently selection point process of stopping over meets intensified learning and meets the bar of Markov property
Part, the collection of observed quantity and action before calculating t are combined into:
st=(x1,a1,...,at-1,xt)=xt (5)
In formula (5), xtAnd atThe respectively observed quantity of t and the action taken;
Step 3.2, Utilization strategies value function come describe sufficient end independently selection stop over point process prospective earnings for:
Qπ(st,at)=E [Rt|st,at] (6)
In formula (6),The later future profits summation of discount, γ were beaten for what moment t was obtained
∈ [0,1] is discount factor, r (st,at) be moment t revenue function, at the time of T is that sufficient end independently selects the foothold to terminate, π
Foothold strategy is independently selected for sufficient end;
It is default determination because sufficient end independently selects the target strategy π of foothold, is designated as function mu:S ← A, S are state space,
A is the motion space of N-dimensional degree, while had using Bellman equation processing formula (6):
<mrow>
<msup>
<mi>Q</mi>
<mi>&mu;</mi>
</msup>
<mrow>
<mo>(</mo>
<msub>
<mi>s</mi>
<mi>t</mi>
</msub>
<mo>,</mo>
<msub>
<mi>a</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msub>
<mi>E</mi>
<mrow>
<msub>
<mi>s</mi>
<mrow>
<mi>t</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
</msub>
<mo>~</mo>
<mi>E</mi>
</mrow>
</msub>
<mo>&lsqb;</mo>
<mi>r</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>s</mi>
<mi>t</mi>
</msub>
<mo>,</mo>
<msub>
<mi>a</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<msup>
<mi>&gamma;Q</mi>
<mi>&mu;</mi>
</msup>
<mrow>
<mo>(</mo>
<msub>
<mi>s</mi>
<mrow>
<mi>t</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
</msub>
<mo>,</mo>
<mi>&mu;</mi>
<mo>(</mo>
<msub>
<mi>s</mi>
<mrow>
<mi>t</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
</msub>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>7</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, st+1~E represents that the observed quantity at t+1 moment obtains from environment E, μ (st+1) represent that the t+1 moment leads to from observed quantity
Cross the action that function mu is be mapped to;
Step 3.3, using the principle of maximal possibility estimation, by minimizing loss function to update network weight parameter be θQ's
Policy evaluation network Q (s, a | θQ), used loss function is:
L(θQ)=Eμ'[(Q(st,at|θQ)-yt)2] (8)
In formula (8), yt=r (st,at)+γQ(st+1,μ(st+1)|θQ) it is that target strategy assesses network, μ ' is target strategy;
Step 3.4, the parameter for reality is θμStrategic function μ (s | θμ), the gradient obtained using chain method is:
<mrow>
<mtable>
<mtr>
<mtd>
<mrow>
<msub>
<mo>&dtri;</mo>
<msup>
<mi>&theta;</mi>
<mi>&mu;</mi>
</msup>
</msub>
<mi>&mu;</mi>
<mo>&ap;</mo>
<msub>
<mi>E</mi>
<msup>
<mi>&mu;</mi>
<mo>&prime;</mo>
</msup>
</msub>
<mo>&lsqb;</mo>
<msub>
<mo>&dtri;</mo>
<msup>
<mi>&theta;</mi>
<mi>&mu;</mi>
</msup>
</msub>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<mi>s</mi>
<mo>,</mo>
<mi>a</mi>
<mo>|</mo>
<msup>
<mi>&theta;</mi>
<mi>Q</mi>
</msup>
<mo>)</mo>
</mrow>
<msub>
<mo>|</mo>
<mrow>
<mi>s</mi>
<mo>=</mo>
<msub>
<mi>s</mi>
<mi>t</mi>
</msub>
<mo>,</mo>
<mi>a</mi>
<mo>=</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>s</mi>
<mi>t</mi>
</msub>
<mo>|</mo>
<msup>
<mi>&theta;</mi>
<mi>&mu;</mi>
</msup>
<mo>)</mo>
</mrow>
</mrow>
</msub>
<mo>&rsqb;</mo>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>=</mo>
<msub>
<mi>E</mi>
<msup>
<mi>&mu;</mi>
<mo>&prime;</mo>
</msup>
</msub>
<mo>&lsqb;</mo>
<msub>
<mo>&dtri;</mo>
<mi>a</mi>
</msub>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<mi>s</mi>
<mo>,</mo>
<mi>a</mi>
<mo>|</mo>
<msup>
<mi>&theta;</mi>
<mi>Q</mi>
</msup>
<mo>)</mo>
</mrow>
<msub>
<mo>|</mo>
<mrow>
<mi>s</mi>
<mo>=</mo>
<msub>
<mi>s</mi>
<mi>t</mi>
</msub>
<mo>,</mo>
<mi>a</mi>
<mo>=</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>s</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</msub>
<msub>
<mo>&dtri;</mo>
<msup>
<mi>&theta;</mi>
<mi>&mu;</mi>
</msup>
</msub>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>s</mi>
<mo>|</mo>
<msup>
<mi>&theta;</mi>
<mi>&mu;</mi>
</msup>
<mo>)</mo>
</mrow>
<msub>
<mo>|</mo>
<mrow>
<mi>s</mi>
<mo>=</mo>
<msub>
<mi>s</mi>
<mi>t</mi>
</msub>
</mrow>
</msub>
<mo>&rsqb;</mo>
</mrow>
</mtd>
</mtr>
</mtable>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>9</mn>
<mo>)</mo>
</mrow>
</mrow>
The gradient being calculated by formula (9) is Policy-Gradient, recycle Policy-Gradient come update strategic function μ (s | θμ);
Step 3.5, using the sample data come training network, used from policing algorithm in network training from same Sample Buffer
Obtained in area, to minimize the relevance between sample, while neutral net is trained with a target Q value network, that is, used
Experience replay mechanism and target Q value network method are updated to objective network, and used slowly more new strategy is:
θQ'←τθQ+(1-τ)θQ' (10)
θμ'←τθμ+(1-τ)θμ' (11)
In formula (10) and (11), τ is turnover rate, τ<<1, a deeply learning network based on DDPG is thus just constructed,
And it is convergent neutral net;
Step 3.6, Data Dimensionality Reduction and feature extraction are carried out to road conditions environment photo using the deeply learning network built.
4. the Hexapod Robot real-time gait planing method according to claim 3 based on deeply study, its feature
Be, the deeply learning network in step 3.6 by two image input layers, four convolutional layers, four full articulamentums and
One output layer is formed;Image input layer, which is used to input, is used for the image that sufficient end independently selects foothold;Convolutional layer is used to extract
Characteristics of image, i.e., the deep layer form of expression of two images;Full articulamentum and output layer are combined to form a deep layer network, are completed
After training, input feature vector terrain information controls Hexapod Robot to the angle control instruction in the exportable each joint of the network
Each joint driving mechanism of leg completes joint freedom degrees motion, so as to realize the real-time walking of six sufficient machines.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710763223.7A CN107450555A (en) | 2017-08-30 | 2017-08-30 | A kind of Hexapod Robot real-time gait planing method based on deeply study |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710763223.7A CN107450555A (en) | 2017-08-30 | 2017-08-30 | A kind of Hexapod Robot real-time gait planing method based on deeply study |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107450555A true CN107450555A (en) | 2017-12-08 |
Family
ID=60493631
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710763223.7A Pending CN107450555A (en) | 2017-08-30 | 2017-08-30 | A kind of Hexapod Robot real-time gait planing method based on deeply study |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107450555A (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108161934A (en) * | 2017-12-25 | 2018-06-15 | 清华大学 | A kind of method for learning to realize robot multi peg-in-hole using deeply |
CN108321795A (en) * | 2018-01-19 | 2018-07-24 | 上海交通大学 | Start-stop of generator set configuration method based on depth deterministic policy algorithm and system |
CN108536011A (en) * | 2018-03-19 | 2018-09-14 | 中山大学 | A kind of Hexapod Robot complicated landform adaptive motion control method based on deeply study |
CN108549928A (en) * | 2018-03-19 | 2018-09-18 | 清华大学 | Visual tracking method and device based on continuous moving under deeply learning guide |
CN109116854A (en) * | 2018-09-16 | 2019-01-01 | 南京大学 | A kind of robot cooperated control method of multiple groups based on intensified learning and control system |
CN109242099A (en) * | 2018-08-07 | 2019-01-18 | 中国科学院深圳先进技术研究院 | Training method, device, training equipment and the storage medium of intensified learning network |
CN109483530A (en) * | 2018-10-18 | 2019-03-19 | 北京控制工程研究所 | A kind of legged type robot motion control method and system based on deeply study |
CN109521774A (en) * | 2018-12-27 | 2019-03-26 | 南京芊玥机器人科技有限公司 | A kind of spray robot track optimizing method based on intensified learning |
CN109855616A (en) * | 2019-01-16 | 2019-06-07 | 电子科技大学 | A kind of multiple sensor robot air navigation aid based on virtual environment and intensified learning |
CN109871011A (en) * | 2019-01-15 | 2019-06-11 | 哈尔滨工业大学(深圳) | A kind of robot navigation method based on pretreatment layer and deeply study |
CN110307848A (en) * | 2019-07-04 | 2019-10-08 | 南京大学 | A kind of Mobile Robotics Navigation method |
CN110442129A (en) * | 2019-07-26 | 2019-11-12 | 中南大学 | A kind of control method and system that multiple agent is formed into columns |
CN110618678A (en) * | 2018-06-19 | 2019-12-27 | 辉达公司 | Behavioral guided path planning in autonomous machine applications |
CN110764415A (en) * | 2019-10-31 | 2020-02-07 | 清华大学深圳国际研究生院 | Gait planning method for leg movement of quadruped robot |
CN110861084A (en) * | 2019-11-18 | 2020-03-06 | 东南大学 | Four-legged robot falling self-resetting control method based on deep reinforcement learning |
CN110908384A (en) * | 2019-12-05 | 2020-03-24 | 中山大学 | Formation navigation method for distributed multi-robot collaborative unknown random maze |
CN111487864A (en) * | 2020-05-14 | 2020-08-04 | 山东师范大学 | Robot path navigation method and system based on deep reinforcement learning |
CN111667513A (en) * | 2020-06-01 | 2020-09-15 | 西北工业大学 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
CN111796514A (en) * | 2019-04-09 | 2020-10-20 | 罗伯特·博世有限公司 | Controlling and monitoring a physical system based on a trained bayesian neural network |
CN112161630A (en) * | 2020-10-12 | 2021-01-01 | 北京化工大学 | AGV (automatic guided vehicle) online collision-free path planning method suitable for large-scale storage system |
CN112684794A (en) * | 2020-12-07 | 2021-04-20 | 杭州未名信科科技有限公司 | Foot type robot motion control method, device and medium based on meta reinforcement learning |
CN112859851A (en) * | 2021-01-08 | 2021-05-28 | 广州视源电子科技股份有限公司 | Multi-legged robot control system and multi-legged robot |
CN113110459A (en) * | 2021-04-20 | 2021-07-13 | 上海交通大学 | Motion planning method for multi-legged robot |
CN113406957A (en) * | 2021-05-19 | 2021-09-17 | 成都理工大学 | Mobile robot autonomous navigation method based on immune deep reinforcement learning |
WO2022048472A1 (en) * | 2020-09-07 | 2022-03-10 | 腾讯科技(深圳)有限公司 | Legged robot movement control method, apparatus and device, and medium |
CN115542913A (en) * | 2022-10-05 | 2022-12-30 | 哈尔滨理工大学 | Hexapod robot fault-tolerant free gait planning method based on geometric and physical feature map |
CN116151359A (en) * | 2022-11-29 | 2023-05-23 | 哈尔滨理工大学 | Deep neural network-based layered training method for six-foot robot driver decision model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004028757A1 (en) * | 2002-09-26 | 2004-04-08 | National Institute Of Advanced Industrial Science And Technology | Walking gait producing device for walking robot |
CN106094516A (en) * | 2016-06-08 | 2016-11-09 | 南京大学 | A kind of robot self-adapting grasping method based on deeply study |
CN106094813A (en) * | 2016-05-26 | 2016-11-09 | 华南理工大学 | It is correlated with based on model humanoid robot gait's control method of intensified learning |
CN106444780A (en) * | 2016-11-10 | 2017-02-22 | 速感科技(北京)有限公司 | Robot autonomous navigation method and system based on vision positioning algorithm |
-
2017
- 2017-08-30 CN CN201710763223.7A patent/CN107450555A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004028757A1 (en) * | 2002-09-26 | 2004-04-08 | National Institute Of Advanced Industrial Science And Technology | Walking gait producing device for walking robot |
CN106094813A (en) * | 2016-05-26 | 2016-11-09 | 华南理工大学 | It is correlated with based on model humanoid robot gait's control method of intensified learning |
CN106094516A (en) * | 2016-06-08 | 2016-11-09 | 南京大学 | A kind of robot self-adapting grasping method based on deeply study |
CN106444780A (en) * | 2016-11-10 | 2017-02-22 | 速感科技(北京)有限公司 | Robot autonomous navigation method and system based on vision positioning algorithm |
Non-Patent Citations (3)
Title |
---|
CHANGJIU ZHOU 等: ""Reinforcement Learning with Fuzzy Evaluative Feedback for a Biped Robot"", 《PROCEEDINGS OF THE 2000 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS & AUTOMATION》 * |
唐开强 等: ""约束条件下基于强化学习的六足机器人步态规划"", 《第18届中国系统仿真技术及其应用学术年会论文集》 * |
郭祖华 等: ""基于全局轨迹的六足机器人运动规划算法"", 《系统仿真学报》 * |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108161934B (en) * | 2017-12-25 | 2020-06-09 | 清华大学 | Method for realizing robot multi-axis hole assembly by utilizing deep reinforcement learning |
CN108161934A (en) * | 2017-12-25 | 2018-06-15 | 清华大学 | A kind of method for learning to realize robot multi peg-in-hole using deeply |
CN108321795A (en) * | 2018-01-19 | 2018-07-24 | 上海交通大学 | Start-stop of generator set configuration method based on depth deterministic policy algorithm and system |
CN108321795B (en) * | 2018-01-19 | 2021-01-22 | 上海交通大学 | Generator set start-stop configuration method and system based on deep certainty strategy algorithm |
CN108536011A (en) * | 2018-03-19 | 2018-09-14 | 中山大学 | A kind of Hexapod Robot complicated landform adaptive motion control method based on deeply study |
CN108549928A (en) * | 2018-03-19 | 2018-09-18 | 清华大学 | Visual tracking method and device based on continuous moving under deeply learning guide |
CN108549928B (en) * | 2018-03-19 | 2020-09-25 | 清华大学 | Continuous movement-based visual tracking method and device under deep reinforcement learning guidance |
US11966838B2 (en) | 2018-06-19 | 2024-04-23 | Nvidia Corporation | Behavior-guided path planning in autonomous machine applications |
CN110618678A (en) * | 2018-06-19 | 2019-12-27 | 辉达公司 | Behavioral guided path planning in autonomous machine applications |
CN109242099A (en) * | 2018-08-07 | 2019-01-18 | 中国科学院深圳先进技术研究院 | Training method, device, training equipment and the storage medium of intensified learning network |
CN109242099B (en) * | 2018-08-07 | 2020-11-10 | 中国科学院深圳先进技术研究院 | Training method and device of reinforcement learning network, training equipment and storage medium |
CN109116854A (en) * | 2018-09-16 | 2019-01-01 | 南京大学 | A kind of robot cooperated control method of multiple groups based on intensified learning and control system |
CN109483530A (en) * | 2018-10-18 | 2019-03-19 | 北京控制工程研究所 | A kind of legged type robot motion control method and system based on deeply study |
CN109521774B (en) * | 2018-12-27 | 2023-04-07 | 南京芊玥机器人科技有限公司 | Spraying robot track optimization method based on reinforcement learning |
CN109521774A (en) * | 2018-12-27 | 2019-03-26 | 南京芊玥机器人科技有限公司 | A kind of spray robot track optimizing method based on intensified learning |
CN109871011A (en) * | 2019-01-15 | 2019-06-11 | 哈尔滨工业大学(深圳) | A kind of robot navigation method based on pretreatment layer and deeply study |
CN109855616A (en) * | 2019-01-16 | 2019-06-07 | 电子科技大学 | A kind of multiple sensor robot air navigation aid based on virtual environment and intensified learning |
CN111796514A (en) * | 2019-04-09 | 2020-10-20 | 罗伯特·博世有限公司 | Controlling and monitoring a physical system based on a trained bayesian neural network |
CN110307848A (en) * | 2019-07-04 | 2019-10-08 | 南京大学 | A kind of Mobile Robotics Navigation method |
CN110442129B (en) * | 2019-07-26 | 2021-10-22 | 中南大学 | Control method and system for multi-agent formation |
CN110442129A (en) * | 2019-07-26 | 2019-11-12 | 中南大学 | A kind of control method and system that multiple agent is formed into columns |
CN110764415A (en) * | 2019-10-31 | 2020-02-07 | 清华大学深圳国际研究生院 | Gait planning method for leg movement of quadruped robot |
CN110764415B (en) * | 2019-10-31 | 2022-04-15 | 清华大学深圳国际研究生院 | Gait planning method for leg movement of quadruped robot |
CN110861084A (en) * | 2019-11-18 | 2020-03-06 | 东南大学 | Four-legged robot falling self-resetting control method based on deep reinforcement learning |
CN110861084B (en) * | 2019-11-18 | 2022-04-05 | 东南大学 | Four-legged robot falling self-resetting control method based on deep reinforcement learning |
CN110908384A (en) * | 2019-12-05 | 2020-03-24 | 中山大学 | Formation navigation method for distributed multi-robot collaborative unknown random maze |
CN110908384B (en) * | 2019-12-05 | 2022-09-23 | 中山大学 | Formation navigation method for distributed multi-robot collaborative unknown random maze |
CN111487864A (en) * | 2020-05-14 | 2020-08-04 | 山东师范大学 | Robot path navigation method and system based on deep reinforcement learning |
CN111667513B (en) * | 2020-06-01 | 2022-02-18 | 西北工业大学 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
CN111667513A (en) * | 2020-06-01 | 2020-09-15 | 西北工业大学 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
WO2022048472A1 (en) * | 2020-09-07 | 2022-03-10 | 腾讯科技(深圳)有限公司 | Legged robot movement control method, apparatus and device, and medium |
CN112161630A (en) * | 2020-10-12 | 2021-01-01 | 北京化工大学 | AGV (automatic guided vehicle) online collision-free path planning method suitable for large-scale storage system |
CN112684794A (en) * | 2020-12-07 | 2021-04-20 | 杭州未名信科科技有限公司 | Foot type robot motion control method, device and medium based on meta reinforcement learning |
CN112859851A (en) * | 2021-01-08 | 2021-05-28 | 广州视源电子科技股份有限公司 | Multi-legged robot control system and multi-legged robot |
CN112859851B (en) * | 2021-01-08 | 2023-02-21 | 广州视源电子科技股份有限公司 | Multi-legged robot control system and multi-legged robot |
CN113110459A (en) * | 2021-04-20 | 2021-07-13 | 上海交通大学 | Motion planning method for multi-legged robot |
CN113406957A (en) * | 2021-05-19 | 2021-09-17 | 成都理工大学 | Mobile robot autonomous navigation method based on immune deep reinforcement learning |
CN113406957B (en) * | 2021-05-19 | 2022-07-08 | 成都理工大学 | Mobile robot autonomous navigation method based on immune deep reinforcement learning |
CN115542913A (en) * | 2022-10-05 | 2022-12-30 | 哈尔滨理工大学 | Hexapod robot fault-tolerant free gait planning method based on geometric and physical feature map |
CN115542913B (en) * | 2022-10-05 | 2023-09-12 | 哈尔滨理工大学 | Six-foot robot fault-tolerant free gait planning method based on geometric and physical feature map |
CN116151359B (en) * | 2022-11-29 | 2023-09-29 | 哈尔滨理工大学 | Deep neural network-based layered training method for six-foot robot driver decision model |
CN116151359A (en) * | 2022-11-29 | 2023-05-23 | 哈尔滨理工大学 | Deep neural network-based layered training method for six-foot robot driver decision model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107450555A (en) | A kind of Hexapod Robot real-time gait planing method based on deeply study | |
CN106444780B (en) | A kind of autonomous navigation method and system of the robot of view-based access control model location algorithm | |
JP7082416B2 (en) | Real-time 3D that expresses the real world Two-way real-time 3D interactive operation of real-time 3D virtual objects in a virtual world | |
Chen et al. | Stabilization approaches for reinforcement learning-based end-to-end autonomous driving | |
CN108227735B (en) | Method, computer readable medium and system for self-stabilization based on visual flight | |
CN110666793B (en) | Method for realizing robot square part assembly based on deep reinforcement learning | |
CN107562052A (en) | A kind of Hexapod Robot gait planning method based on deeply study | |
CN106094516A (en) | A kind of robot self-adapting grasping method based on deeply study | |
EP3547267A1 (en) | Robot control system, machine control system, robot control method, machine control method, and recording medium | |
Zhou et al. | A deep Q-network (DQN) based path planning method for mobile robots | |
CN106648116A (en) | Virtual reality integrated system based on action capture | |
CN108780325A (en) | System and method for adjusting unmanned vehicle track | |
CN105027030A (en) | Wireless wrist computing and control device and method for 3d imaging, mapping, networking and interfacing | |
CN206497423U (en) | A kind of virtual reality integrated system with inertia action trap setting | |
CN106078752A (en) | Method is imitated in a kind of anthropomorphic robot human body behavior based on Kinect | |
CN107085422A (en) | A kind of tele-control system of the multi-functional Hexapod Robot based on Xtion equipment | |
CN101610877A (en) | The method and apparatus that is used for sense of touch control | |
Jain et al. | From pixels to legs: Hierarchical learning of quadruped locomotion | |
CN113076615B (en) | High-robustness mechanical arm operation method and system based on antagonistic deep reinforcement learning | |
Felbrich et al. | Autonomous robotic additive manufacturing through distributed model‐free deep reinforcement learning in computational design environments | |
CN103991077A (en) | Robot hand controller shared control method based on force fusion | |
WO2018198909A1 (en) | Information processing device, information processing method, and program | |
Mahmoudi et al. | MRL team description paper for humanoid KidSize league of RoboCup 2019 | |
Murhij et al. | Hand gestures recognition model for Augmented reality robotic applications | |
Yoo et al. | Recent progress and development of the humanoid robot HanSaRam |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171208 |
|
RJ01 | Rejection of invention patent application after publication |