CN107450555A - A kind of Hexapod Robot real-time gait planing method based on deeply study - Google Patents

A kind of Hexapod Robot real-time gait planing method based on deeply study Download PDF

Info

Publication number
CN107450555A
CN107450555A CN201710763223.7A CN201710763223A CN107450555A CN 107450555 A CN107450555 A CN 107450555A CN 201710763223 A CN201710763223 A CN 201710763223A CN 107450555 A CN107450555 A CN 107450555A
Authority
CN
China
Prior art keywords
mrow
msub
mtd
mtr
msup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710763223.7A
Other languages
Chinese (zh)
Inventor
唐开强
刘佳生
洪俊
孙建
侯跃南
钱勇
潘东旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201710763223.7A priority Critical patent/CN107450555A/en
Publication of CN107450555A publication Critical patent/CN107450555A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0246Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
    • G05D1/0251Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means extracting 3D information from a plurality of images taken from different locations, e.g. stereo vision
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Multimedia (AREA)
  • Electromagnetism (AREA)
  • Manipulator (AREA)

Abstract

The invention provides a kind of Hexapod Robot real-time gait planing method based on deeply study, step includes:Environment traffic information is obtained by Hexapod Robot and formulates overall movement locus;The photo of environment is obtained by camera, calculates the traffic information of target trajectory using binocular distance-finding method further according to photo, and the track traffic information calculated is navigated for robot center of mass motion track;In the range of the sufficient end swing space of robot leg, the photo of road conditions environment is shot, and Data Dimensionality Reduction and feature extraction are carried out to photo by the deeply learning network based on depth deterministic policy gradient (DDPG) that training in advance is crossed;The control strategy of Hexapod Robot is drawn according to feature extraction result, Hexapod Robot falls foot according to control strategy come control machine people, realizes the real-time walking of Hexapod Robot.The method of the gait planning can be planned the complicated non-structure environment of road conditions in real time, significant to the adaptive capacity to environment of raising Hexapod Robot.

Description

A kind of Hexapod Robot real-time gait planing method based on deeply study
Technical field
It is especially a kind of to be learnt based on deeply the present invention relates to a kind of method of Hexapod Robot real-time gait planning Hexapod Robot real-time gait planing method.
Background technology
Robot technology be materialogy, theory of mechanisms, bionics, electromechanical integration technology, control technology, sensor technology, The subjects such as artificial intelligence it is highly integrated, be the important embodiment of National Industrial development level and scientific and technological strength.It is autonomous to complete gait The polypody bio-robot of planning is highly intelligentized mobile robot, and the autonomous learning of external environment and completion can be walked State is planned.Road conditions environment complexity is various, and the gait planning method of Hexapod Robot tradition preprogramming has significant limitation. In order to improve the adaptive capacity to environment of Hexapod Robot, Hexapod Robot needs to complete various basic job tasks such as overall The function that mobile navigation, the planning of barycenter motion track and foothold are chosen.Satellite navigation and more biographies are merged by multi-foot robot The information of sensor carries out machine learning (such as deep learning and intensified learning), in particular how is interacted with external environment Improve the performance of target in empirical learning, realize the various functions such as its perception, decision-making and action.The correlation of Hexapod Robot is ground Study carefully the concern for enjoying various countries experts and scholars always, but how to improve locomotivity of the Hexapod Robot under non-structure environment still It is so a pendent problem.
The content of the invention
The technical problem to be solved in the present invention is the ground that existing Hexapod Robot gait planning technology can not adapt to complexity Shape environment and remote autonomous walking and the unfixed situation in final position.
In order to solve the above-mentioned technical problem, the invention provides a kind of Hexapod Robot based on deeply study is real-time Gait planning method, comprises the following steps:
Step 1, environment traffic information is obtained by satellite map by Hexapod Robot, and formulated according to environment traffic information Mass motion track;
Step 2, Hexapod Robot utilizes the camera being arranged on fuselage to obtain surrounding enviroment photo, further according to peripheral ring Border photo calculates the target position information of movement locus using binocular distance-finding method, and by Hexapod Robot according to movement locus Target position information cook up robot center of mass motion track;
Step 3, Hexapod Robot moves according to robot center of mass motion track, and is swung at the sufficient end of robot leg In spatial dimension, using on fuselage camera shoot road conditions environment photo, and by training in advance cross based on DDPG deeply learning network to carry out Data Dimensionality Reduction and feature extraction to road conditions environment photo;
Step 4, Hexapod Robot draws the control strategy of Hexapod Robot according to Data Dimensionality Reduction and feature extraction result, and The each joint driving mechanism of Hexapod Robot is controlled to complete joint freedom degrees motion according to control strategy, so as to realize six sufficient machines The real-time gait planning walking of device.
As the further limits scheme of the present invention, motion is calculated using binocular distance-finding method according to photo in step 2 The real-time position information of track concretely comprises the following steps:
Step 2.1, focal length f, the centre-to-centre spacing T of two cameras in left and right of camera are obtainedxAnd movement locus in road conditions On target point two cameras in left and right image plane subpoint to the respective image plane leftmost side physical distance xlAnd xr, The image plane in left side corresponding to the camera of left and right two and the image plane on right side are rectangle plane, and are located at same imaging plane On, the photocentre projection of the camera of left and right two is located at the center of corresponding image plane respectively, i.e. Ol、OrIn the projection of imaging plane Point, then parallax d be:
D=xl-xr (1)
Step 2.2, establishing Q matrixes using Similar Principle of Triangle is:
In formula (2) and (3), (X, Y, Z) is target point using left camera photocentre as the seat in the three-dimensional coordinate system of origin Mark, W are rotation translation conversion ratio example coefficient, and (x, y) is coordinate of the target point in the image plane in left side, cxAnd cyIt is respectively left The offset of origin, c in the coordinate system and three-dimensional coordinate system of the image plane of side and the image plane on right sidex' it is cxCorrection value;
Step 2.3, the space length that target point to imaging plane is calculated is:
Using the photocentre position of left camera as robot position, by the co-ordinate position information of target point (X, Y, Z) target position information as movement locus.
As the further limits scheme of the present invention, the deeply based on DDPG crossed by training in advance in step 3 Learning network concretely comprises the following steps to carry out Data Dimensionality Reduction and feature extraction to road conditions environment photo:
Step 3.1, using target foot end, independently selection point process of stopping over meets intensified learning and meets Markov property Condition, calculate t before observed quantity and action collection be combined into:
st=(x1,a1,...,at-1,xt)=xt (5)
In formula (5), xtAnd atThe respectively observed quantity of t and the action taken;
Step 3.2, Utilization strategies value function come describe sufficient end independently selection stop over point process prospective earnings for:
Qπ(st,at)=E [Rt|st,at] (6)
In formula (6),For moment t obtain to beat the later future profits of discount total It is discount factor with, γ ∈ [0,1], r (st,at) be moment t revenue function, T be sufficient end independently select that foothold terminates when Carve, π is that sufficient end independently selects foothold strategy;
It is default determination because sufficient end independently selects the target strategy π of foothold, is designated as function mu:S ← A, S are state Space, A are the motion space of N-dimensional degree, while had using Bellman equation processing formula (6):
Wherein, st+1~E represents that the observed quantity at t+1 moment obtains from environment E, μ (st+1) represent the t+1 moment from Measure the action being be mapped to by function mu;
Step 3.3, using the principle of maximal possibility estimation, it is to update network weight parameter by minimizing loss function θQPolicy evaluation network Q (s, a | θQ), used loss function is:
L(θQ)=Eμ'[(Q(st,atQ)-yt)2] (8)
In formula (8), yt=r (st,at)+γQ(st+1,μ(st+1)|θQ) it is that target strategy assesses network, μ ' is target plan Slightly;
Step 3.4, the parameter for reality is θμStrategic function μ (s | θμ), the gradient obtained using chain method is:
The gradient being calculated by formula (9) is Policy-Gradient, recycle Policy-Gradient come update strategic function μ (s | θμ);
Step 3.5, using the sample data come training network, used from policing algorithm in network training from same sample Obtained in buffering area, to minimize the relevance between sample, while neutral net is trained with a target Q value network, i.e., Objective network is updated using experience replay mechanism and target Q value network method, used slowly more new strategy is:
θQ'←τθQ+(1-τ)θQ' (10)
θμ'←τθμ+(1-τ)θμ' (11)
In formula (10) and (11), τ is turnover rate, τ<<1, thus just construct the deeply study based on DDPG Network, and be convergent neutral net;
Step 3.6, Data Dimensionality Reduction and feature are carried out to road conditions environment photo using the deeply learning network built Extraction.
As the further limits scheme of the present invention, the deeply learning network in step 3.6 is inputted by two images Layer, four convolutional layers, four full articulamentums and an output layer are formed;Image input layer is used to input independently to be selected for sufficient end Select the image of foothold;Convolutional layer is used to extract characteristics of image, i.e., the deep layer form of expression of two images;Full articulamentum and output Layer is combined to form a deep layer network, after completing training, input feature vector terrain information to the exportable each joint of the network Angle control instruction, that is, each joint driving mechanism of Hexapod Robot leg is controlled to complete joint freedom degrees motion, so as to real The real-time walking of existing six sufficient machines.
The beneficial effects of the present invention are:(1) satellite navigation is applied to Hexapod Robot center of mass motion trajectory planning, made Hexapod Robot can complete remote autonomous walking gait planning.(2) binocular ranging side is utilized by way of binocular ranging Method calculates the positional information of movement locus, and the track traffic information calculated is used for into leading for robot center of mass motion track Boat, realize closely gait planning.(3) dual image input layer can effectively move planning and the variation targets point of track Determination, and the image information of input, input are determined in pre-training neutral net using stochastical sampling and experience replay mechanism Image information it is not only independent of one another but also interrelated, meet neutral net for input data requirement independent of one another;(4) mesh is used Mark Q values network technique constantly to adjust the weight matrix of neutral net, realizes Data Dimensionality Reduction, promotes neutral net convergence;(5) lead to Cross the photo progress data drop that the deeply learning network based on DDPG that training in advance is crossed shoots road conditions environment to camera Peacekeeping Extraction of Topographic Patterns, and the gait motion control strategy of Hexapod Robot is directly given, effectively solve " dimension disaster " and ask Topic, realize the real-time gait planning of Hexapod Robot.
Brief description of the drawings
Fig. 1 is the system structure diagram of the present invention;
Fig. 2 is flow chart of the method for the present invention;
Embodiment
A kind of as shown in figure 1, Hexapod Robot real-time gait planing method fortune based on deeply study of the present invention Capable system includes:Satellite navigation system, machine vision and image processing system, central control system and basic motion system.
Wherein, satellite navigation system is mainly made up of the Satellite Map GIS Software in Hexapod Robot, in input mesh Ground after, path planning can be quickly completed, and by the information transmission of path planning to central control system;Image processing system Mainly have installed in the anterior camera of Hexapod Robot and the matlab software sharings on industrial computer;Maincenter control system System is mainly based on the depth of depth deterministic policy gradient (DDPG) by the dynamics simulation platform pre-training on industrial computer Spend intensified learning network and communication module to form, in the process generally and target Q value network both approaches ensure base It can be restrained during pre-training in DDPG deeply learning network.Basic motion system is tied by the machinery of Hexapod Robot Structure, driving and sensor are formed, and perform the Motion that central control system is formulated, each pass of control Hexapod Robot leg Motor or oil cylinder of driving etc. are saved, the motion of joint freedom degrees is completed, so as to realize the real-time walking of Hexapod Robot, and moves Feedback of the information is to central control system.
In the certain distance of Hexapod Robot fuselage, environment is obtained by the camera installed in Hexapod Robot fuselage Photo, the positional information of movement locus, and the track road that will be calculated are calculated using binocular distance-finding method further according to photo Condition information is used for the navigation of robot center of mass motion track;
During pre-training neutral net, first by the matlab softwares on industrial computer by environment traffic information RGB image is converted into gray level image, utilizes experience replay mechanism so that the degree of correlation is as small as possible to meet nerve net before and after photo Network recycles stochastical sampling to obtain the image of input neutral net for input data requirement independent of each other;Pass through depth Data Dimensionality Reduction is realized in study, and the weight matrix of neutral net is constantly adjusted using target Q value network technique, is finally given convergent Neutral net.
Hexapod Robot moves according to the motion planning of centroid trajectory, in the sufficient end swing space scope of robot leg It is interior, the photo of the camera shooting road conditions environment of Hexapod Robot is now recycled, it is true based on depth by having trained The deeply learning network of qualitative Policy-Gradient realizes Data Dimensionality Reduction and extraction feature and provides the real-time step of Hexapod Robot State programming movement strategy, finally control strategy is sent motion of the basic motion system come control machine people to by communication system State, realize that sufficient end on movement locus independently selects the real-time control of foothold.
At work, step is as follows for system:
Step 1, start the Satellite Map GIS Software being arranged in Hexapod Robot, input robot motion destination and complete Path planning, and by the information transmission of path planning to central control system;
Step 2, it is based on depth deterministic policy gradient by the dynamics simulation platform on industrial computer, pre-training (DDPG) deeply learning network, ensured using experience replay mechanism and target Q value network both approaches based on deep The deeply learning network for spending deterministic policy gradient can Fast Convergent during pre-training;
Step 3, the photograph in the certain distance of Hexapod Robot fuselage is obtained with installed in the anterior camera of robot The image of piece environment traffic information, image information is transmitted to industrial computer using communication module, binocular ranging is utilized further according to photo Method calculates the positional information of movement locus, and the track traffic information calculated is used for into robot center of mass motion track Navigation;
Step 4, Hexapod Robot moves according to the motion planning of centroid trajectory, is swung at the sufficient end of robot leg empty Between in the range of, recycle Hexapod Robot camera shoot road conditions environment photo, and by training in advance cross based on DDPG deeply learning network to carry out Data Dimensionality Reduction and feature extraction to the environment photo of acquisition;
Step 5, the control strategy of Hexapod Robot is drawn according to feature extraction result, using communication module by control information Send the basic motion system of robot to, Hexapod Robot controls each pass of Hexapod Robot leg using control strategy Motor or oil cylinder of driving etc. are saved, the motion of joint freedom degrees is completed, so as to realize the real-time walking of six sufficient machines, and moves letter Breath feeds back to central control system.
As shown in Fig. 2 the invention provides a kind of Hexapod Robot real-time gait planning side based on deeply study Method, comprise the following steps:
Step 1, environment traffic information is obtained by satellite map by Hexapod Robot, and formulated according to environment traffic information Mass motion track;
Step 2, Hexapod Robot utilizes the camera being arranged on fuselage to obtain surrounding enviroment photo, further according to peripheral ring Border photo calculates the target position information of movement locus using binocular distance-finding method, and by Hexapod Robot according to movement locus Target position information cook up robot center of mass motion track;
Step 3, Hexapod Robot moves according to robot center of mass motion track, and is swung at the sufficient end of robot leg In spatial dimension, using on fuselage camera shoot road conditions environment photo, and by training in advance cross based on DDPG deeply learning network to carry out Data Dimensionality Reduction and feature extraction to road conditions environment photo;
Step 4, Hexapod Robot draws the control strategy of Hexapod Robot according to Data Dimensionality Reduction and feature extraction result, and The each joint driving mechanism of Hexapod Robot is controlled to complete joint freedom degrees motion according to control strategy, so as to realize six sufficient machines The real-time gait planning walking of device.
As the further limits scheme of the present invention, motion is calculated using binocular distance-finding method according to photo in step 2 The real-time position information of track concretely comprises the following steps:
Step 2.1, focal length f, the centre-to-centre spacing T of two cameras in left and right of camera are obtainedxAnd movement locus in road conditions On target point two cameras in left and right image plane subpoint to the respective image plane leftmost side physical distance xlAnd xr, The image plane in left side corresponding to the camera of left and right two and the image plane on right side are rectangle plane, and are located at same imaging plane On, the photocentre projection of the camera of left and right two is located at the center of corresponding image plane respectively, i.e. Ol、OrIn the projection of imaging plane Point, then parallax d be:
D=xl-xr (1)
Step 2.2, establishing Q matrixes using Similar Principle of Triangle is:
In formula (2) and (3), (X, Y, Z) is target point using left camera photocentre as the seat in the three-dimensional coordinate system of origin Mark, W are rotation translation conversion ratio example coefficient, and (x, y) is coordinate of the target point in the image plane in left side, cxAnd cyIt is respectively left The offset of origin, c in the coordinate system and three-dimensional coordinate system of the image plane of side and the image plane on right sidex' it is cxCorrection value, two Person's numerical value is typically more or less the same, for convenience in the present invention it is considered that both approximately equals;
Step 2.3, the space length that target point to imaging plane is calculated is:
Using the photocentre position of left camera as robot position, by the co-ordinate position information of target point (X, Y, Z) target position information as movement locus.
As the further limits scheme of the present invention, the deeply based on DDPG crossed by training in advance in step 3 Learning network concretely comprises the following steps to carry out Data Dimensionality Reduction and feature extraction to road conditions environment photo:
Step 3.1, using target foot end, independently selection point process of stopping over meets intensified learning and meets Markov property Condition, calculate t before observed quantity and action collection be combined into:
st=(x1,a1,...,at-1,xt)=xt (5)
In formula (5), xtAnd atThe respectively observed quantity of t and the action taken;
Step 3.2, Utilization strategies value function come describe sufficient end independently selection stop over point process prospective earnings for:
Qπ(st,at)=E [Rt|st,at] (6)
In formula (6),For moment t obtain to beat the later future profits of discount total It is discount factor with, γ ∈ [0,1], r (st,at) be moment t revenue function, T be sufficient end independently select that foothold terminates when Carve, π is that sufficient end independently selects foothold strategy;
It is default determination because sufficient end independently selects the target strategy π of foothold, is designated as function mu:S ← A, S are state Space, A are the motion space of N-dimensional degree, while had using Bellman equation processing formula (6):
Wherein, st+1~E represents that the observed quantity at t+1 moment obtains from environment E, μ (st+1) represent the t+1 moment from Measure the action being be mapped to by function mu;
Step 3.3, using the principle of maximal possibility estimation, it is to update network weight parameter by minimizing loss function θQPolicy evaluation network Q (s, a | θQ), used loss function is:
L(θQ)=Eμ'[(Q(st,atQ)-yt)2] (8)
In formula (8), yt=r (st,at)+γQ(st+1,μ(st+1)|θQ) it is that target strategy assesses network, μ ' is target plan Slightly;
Step 3.4, the parameter for reality is θμStrategic function μ (s | θμ), the gradient obtained using chain method is:
The gradient being calculated by formula (9) is Policy-Gradient, recycle Policy-Gradient come update strategic function μ (s | θμ);
Step 3.5, using the sample data come training network, used from policing algorithm in network training from same sample Obtained in buffering area, to minimize the relevance between sample, while neutral net is trained with a target Q value network, i.e., Objective network is updated using experience replay mechanism and target Q value network method, used slowly more new strategy is:
θQ'←τθQ+(1-τ)θQ' (10)
θμ'←τθμ+(1-τ)θμ' (11)
In formula (10) and (11), τ is turnover rate, τ<<1, thus just construct the deeply study based on DDPG Network, and be convergent neutral net;
Step 3.6, Data Dimensionality Reduction and feature are carried out to road conditions environment photo using the deeply learning network built Extraction.
As the further limits scheme of the present invention, the deeply learning network in step 3.6 is inputted by two images Layer, four convolutional layers, four full articulamentums and an output layer are formed;Wherein, the reason for image input layer is two is will Determine that movement locus and phase targets are determined, the reason for quantity of convolutional layer and full articulamentum is four is that extraction image should be made special Sign is effective, needs to make neutral net Fast Convergent in the training process again;Image input layer is used to input independently to be selected for sufficient end Select the image of foothold;Convolutional layer is used to extract characteristics of image, i.e., the deep layer form of expression of two images, such as some point, line, arcs Deng;Full articulamentum and output layer are combined to form a deep layer network, after completing training, input feature vector terrain information to the network The angle control instruction in exportable each joint, that is, each joint driving mechanism of Hexapod Robot leg is controlled to complete joint certainly Moved by degree, so as to realize the real-time walking of six sufficient machines.
Satellite navigation is applied to Hexapod Robot center of mass motion trajectory planning by the present invention, Hexapod Robot is completed Remote autonomous walking gait planning.The position of movement locus is calculated using binocular distance-finding method by way of binocular ranging Information, and the track traffic information calculated is used for the navigation of robot center of mass motion track, realize closely gait planning. Dual image input layer can effectively move the planning of track and the determination of variation targets point, and in pre-training neutral net Shi Caiyong stochastical samplings and the image information of experience replay mechanism determination input, the image information of input are not only independently of one another but also mutual Association, meets neutral net for input data requirement independent of one another;Neutral net is constantly adjusted using target Q value network technique Weight matrix, realize Data Dimensionality Reduction, promote neutral net convergence;The deeply based on DDPG crossed by training in advance The photo that learning network shoots road conditions environment to camera carries out Data Dimensionality Reduction and Extraction of Topographic Patterns, and directly gives six sufficient machines The gait motion control strategy of device people, effectively solve " dimension disaster " problem, realize the real-time gait planning of Hexapod Robot.

Claims (4)

1. a kind of Hexapod Robot real-time gait planing method based on deeply study, it is characterised in that including following step Suddenly:
Step 1, environment traffic information is obtained by satellite map by Hexapod Robot, and entirety is formulated according to environment traffic information Movement locus;
Step 2, Hexapod Robot utilizes the camera being arranged on fuselage to obtain surrounding enviroment photo, is shone further according to surrounding enviroment Piece calculates the target position information of movement locus, and the mesh by Hexapod Robot according to movement locus using binocular distance-finding method Cursor position information planning goes out robot center of mass motion track;
Step 3, Hexapod Robot moves according to robot center of mass motion track, and in the sufficient end swing space of robot leg In the range of, using on fuselage camera shoot road conditions environment photo, and by training in advance cross based on DDPG's Deeply learning network to carry out Data Dimensionality Reduction and feature extraction to road conditions environment photo;
Step 4, Hexapod Robot draws the control strategy of Hexapod Robot according to Data Dimensionality Reduction and feature extraction result, and according to Control strategy moves to control each joint driving mechanism of Hexapod Robot to complete joint freedom degrees, so as to realize six sufficient machines Real-time gait planning walking.
2. the Hexapod Robot real-time gait planing method according to claim 1 based on deeply study, its feature It is, the real-time position information for being calculated movement locus in step 2 using binocular distance-finding method according to photo is concretely comprised the following steps:
Step 2.1, focal length f, the centre-to-centre spacing T of two cameras in left and right of camera are obtainedxAnd the mesh in road conditions on movement locus Punctuate the image plane of two cameras in left and right subpoint to the respective image plane leftmost side physical distance xlAnd xr, left and right two The image plane in left side corresponding to individual camera and the image plane on right side are rectangle plane, and on same imaging plane, it is left The photocentre projection of right two cameras is located at the center of corresponding image plane respectively, i.e. Ol、OrIn the subpoint of imaging plane, then Parallax d is:
D=xl-xr (1)
Step 2.2, establishing Q matrixes using Similar Principle of Triangle is:
<mrow> <mi>Q</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mo>-</mo> <msub> <mi>c</mi> <mi>x</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mo>-</mo> <msub> <mi>c</mi> <mi>y</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mi>f</mi> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <msub> <mi>T</mi> <mi>x</mi> </msub> </mfrac> </mrow> </mtd> <mtd> <mfrac> <mrow> <msub> <mi>c</mi> <mi>x</mi> </msub> <mo>-</mo> <msup> <msub> <mi>c</mi> <mi>x</mi> </msub> <mo>&amp;prime;</mo> </msup> </mrow> <msub> <mi>T</mi> <mi>x</mi> </msub> </mfrac> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
<mrow> <mi>Q</mi> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mi>x</mi> </mtd> </mtr> <mtr> <mtd> <mi>y</mi> </mtd> </mtr> <mtr> <mtd> <mi>d</mi> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mi>x</mi> <mo>-</mo> <msub> <mi>c</mi> <mi>x</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>y</mi> <mo>-</mo> <msub> <mi>c</mi> <mi>y</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mi>f</mi> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mo>-</mo> <mi>d</mi> <mo>+</mo> <msub> <mi>c</mi> <mi>x</mi> </msub> <mo>-</mo> <msup> <msub> <mi>c</mi> <mi>x</mi> </msub> <mo>&amp;prime;</mo> </msup> </mrow> <msub> <mi>T</mi> <mi>x</mi> </msub> </mfrac> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mi>X</mi> </mtd> </mtr> <mtr> <mtd> <mi>Y</mi> </mtd> </mtr> <mtr> <mtd> <mi>Z</mi> </mtd> </mtr> <mtr> <mtd> <mi>W</mi> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
In formula (2) and (3), (X, Y, Z) is target point using left camera photocentre as the coordinate in the three-dimensional coordinate system of origin, W Conversion ratio example coefficient is translated for rotation, (x, y) is coordinate of the target point in the image plane in left side, cxAnd cyOn the left of respectively The offset of origin, c in the coordinate system and three-dimensional coordinate system of image plane and the image plane on right sidex' it is cxCorrection value;
Step 2.3, the space length that target point to imaging plane is calculated is:
<mrow> <mi>Z</mi> <mo>=</mo> <mfrac> <mrow> <mo>-</mo> <msub> <mi>T</mi> <mi>x</mi> </msub> <mi>f</mi> </mrow> <mrow> <mi>d</mi> <mo>-</mo> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mi>x</mi> </msub> <mo>-</mo> <msup> <msub> <mi>c</mi> <mi>x</mi> </msub> <mo>&amp;prime;</mo> </msup> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow> 1
Using the photocentre position of left camera as robot position, by the co-ordinate position information (X, Y, Z) of target point Target position information as movement locus.
3. the Hexapod Robot real-time gait planing method according to claim 1 or 2 based on deeply study, it is special Sign is that the deeply learning network based on DDPG crossed in step 3 by training in advance is come to the progress of road conditions environment photo Data Dimensionality Reduction and feature extraction concretely comprise the following steps:
Step 3.1, using target foot end, independently selection point process of stopping over meets intensified learning and meets the bar of Markov property Part, the collection of observed quantity and action before calculating t are combined into:
st=(x1,a1,...,at-1,xt)=xt (5)
In formula (5), xtAnd atThe respectively observed quantity of t and the action taken;
Step 3.2, Utilization strategies value function come describe sufficient end independently selection stop over point process prospective earnings for:
Qπ(st,at)=E [Rt|st,at] (6)
In formula (6),The later future profits summation of discount, γ were beaten for what moment t was obtained ∈ [0,1] is discount factor, r (st,at) be moment t revenue function, at the time of T is that sufficient end independently selects the foothold to terminate, π Foothold strategy is independently selected for sufficient end;
It is default determination because sufficient end independently selects the target strategy π of foothold, is designated as function mu:S ← A, S are state space, A is the motion space of N-dimensional degree, while had using Bellman equation processing formula (6):
<mrow> <msup> <mi>Q</mi> <mi>&amp;mu;</mi> </msup> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>a</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>E</mi> <mrow> <msub> <mi>s</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>~</mo> <mi>E</mi> </mrow> </msub> <mo>&amp;lsqb;</mo> <mi>r</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>a</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msup> <mi>&amp;gamma;Q</mi> <mi>&amp;mu;</mi> </msup> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <mi>&amp;mu;</mi> <mo>(</mo> <msub> <mi>s</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>
Wherein, st+1~E represents that the observed quantity at t+1 moment obtains from environment E, μ (st+1) represent that the t+1 moment leads to from observed quantity Cross the action that function mu is be mapped to;
Step 3.3, using the principle of maximal possibility estimation, by minimizing loss function to update network weight parameter be θQ's Policy evaluation network Q (s, a | θQ), used loss function is:
L(θQ)=Eμ'[(Q(st,atQ)-yt)2] (8)
In formula (8), yt=r (st,at)+γQ(st+1,μ(st+1)|θQ) it is that target strategy assesses network, μ ' is target strategy;
Step 3.4, the parameter for reality is θμStrategic function μ (s | θμ), the gradient obtained using chain method is:
<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mo>&amp;dtri;</mo> <msup> <mi>&amp;theta;</mi> <mi>&amp;mu;</mi> </msup> </msub> <mi>&amp;mu;</mi> <mo>&amp;ap;</mo> <msub> <mi>E</mi> <msup> <mi>&amp;mu;</mi> <mo>&amp;prime;</mo> </msup> </msub> <mo>&amp;lsqb;</mo> <msub> <mo>&amp;dtri;</mo> <msup> <mi>&amp;theta;</mi> <mi>&amp;mu;</mi> </msup> </msub> <mi>Q</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>,</mo> <mi>a</mi> <mo>|</mo> <msup> <mi>&amp;theta;</mi> <mi>Q</mi> </msup> <mo>)</mo> </mrow> <msub> <mo>|</mo> <mrow> <mi>s</mi> <mo>=</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>a</mi> <mo>=</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>|</mo> <msup> <mi>&amp;theta;</mi> <mi>&amp;mu;</mi> </msup> <mo>)</mo> </mrow> </mrow> </msub> <mo>&amp;rsqb;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <msub> <mi>E</mi> <msup> <mi>&amp;mu;</mi> <mo>&amp;prime;</mo> </msup> </msub> <mo>&amp;lsqb;</mo> <msub> <mo>&amp;dtri;</mo> <mi>a</mi> </msub> <mi>Q</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>,</mo> <mi>a</mi> <mo>|</mo> <msup> <mi>&amp;theta;</mi> <mi>Q</mi> </msup> <mo>)</mo> </mrow> <msub> <mo>|</mo> <mrow> <mi>s</mi> <mo>=</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>a</mi> <mo>=</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> </mrow> </msub> <msub> <mo>&amp;dtri;</mo> <msup> <mi>&amp;theta;</mi> <mi>&amp;mu;</mi> </msup> </msub> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>|</mo> <msup> <mi>&amp;theta;</mi> <mi>&amp;mu;</mi> </msup> <mo>)</mo> </mrow> <msub> <mo>|</mo> <mrow> <mi>s</mi> <mo>=</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> </mrow> </msub> <mo>&amp;rsqb;</mo> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>
The gradient being calculated by formula (9) is Policy-Gradient, recycle Policy-Gradient come update strategic function μ (s | θμ);
Step 3.5, using the sample data come training network, used from policing algorithm in network training from same Sample Buffer Obtained in area, to minimize the relevance between sample, while neutral net is trained with a target Q value network, that is, used Experience replay mechanism and target Q value network method are updated to objective network, and used slowly more new strategy is:
θQ'←τθQ+(1-τ)θQ' (10)
θμ'←τθμ+(1-τ)θμ' (11)
In formula (10) and (11), τ is turnover rate, τ<<1, a deeply learning network based on DDPG is thus just constructed, And it is convergent neutral net;
Step 3.6, Data Dimensionality Reduction and feature extraction are carried out to road conditions environment photo using the deeply learning network built.
4. the Hexapod Robot real-time gait planing method according to claim 3 based on deeply study, its feature Be, the deeply learning network in step 3.6 by two image input layers, four convolutional layers, four full articulamentums and One output layer is formed;Image input layer, which is used to input, is used for the image that sufficient end independently selects foothold;Convolutional layer is used to extract Characteristics of image, i.e., the deep layer form of expression of two images;Full articulamentum and output layer are combined to form a deep layer network, are completed After training, input feature vector terrain information controls Hexapod Robot to the angle control instruction in the exportable each joint of the network Each joint driving mechanism of leg completes joint freedom degrees motion, so as to realize the real-time walking of six sufficient machines.
CN201710763223.7A 2017-08-30 2017-08-30 A kind of Hexapod Robot real-time gait planing method based on deeply study Pending CN107450555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710763223.7A CN107450555A (en) 2017-08-30 2017-08-30 A kind of Hexapod Robot real-time gait planing method based on deeply study

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710763223.7A CN107450555A (en) 2017-08-30 2017-08-30 A kind of Hexapod Robot real-time gait planing method based on deeply study

Publications (1)

Publication Number Publication Date
CN107450555A true CN107450555A (en) 2017-12-08

Family

ID=60493631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710763223.7A Pending CN107450555A (en) 2017-08-30 2017-08-30 A kind of Hexapod Robot real-time gait planing method based on deeply study

Country Status (1)

Country Link
CN (1) CN107450555A (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108161934A (en) * 2017-12-25 2018-06-15 清华大学 A kind of method for learning to realize robot multi peg-in-hole using deeply
CN108321795A (en) * 2018-01-19 2018-07-24 上海交通大学 Start-stop of generator set configuration method based on depth deterministic policy algorithm and system
CN108536011A (en) * 2018-03-19 2018-09-14 中山大学 A kind of Hexapod Robot complicated landform adaptive motion control method based on deeply study
CN108549928A (en) * 2018-03-19 2018-09-18 清华大学 Visual tracking method and device based on continuous moving under deeply learning guide
CN109116854A (en) * 2018-09-16 2019-01-01 南京大学 A kind of robot cooperated control method of multiple groups based on intensified learning and control system
CN109242099A (en) * 2018-08-07 2019-01-18 中国科学院深圳先进技术研究院 Training method, device, training equipment and the storage medium of intensified learning network
CN109483530A (en) * 2018-10-18 2019-03-19 北京控制工程研究所 A kind of legged type robot motion control method and system based on deeply study
CN109521774A (en) * 2018-12-27 2019-03-26 南京芊玥机器人科技有限公司 A kind of spray robot track optimizing method based on intensified learning
CN109855616A (en) * 2019-01-16 2019-06-07 电子科技大学 A kind of multiple sensor robot air navigation aid based on virtual environment and intensified learning
CN109871011A (en) * 2019-01-15 2019-06-11 哈尔滨工业大学(深圳) A kind of robot navigation method based on pretreatment layer and deeply study
CN110307848A (en) * 2019-07-04 2019-10-08 南京大学 A kind of Mobile Robotics Navigation method
CN110442129A (en) * 2019-07-26 2019-11-12 中南大学 A kind of control method and system that multiple agent is formed into columns
CN110618678A (en) * 2018-06-19 2019-12-27 辉达公司 Behavioral guided path planning in autonomous machine applications
CN110764415A (en) * 2019-10-31 2020-02-07 清华大学深圳国际研究生院 Gait planning method for leg movement of quadruped robot
CN110861084A (en) * 2019-11-18 2020-03-06 东南大学 Four-legged robot falling self-resetting control method based on deep reinforcement learning
CN110908384A (en) * 2019-12-05 2020-03-24 中山大学 Formation navigation method for distributed multi-robot collaborative unknown random maze
CN111487864A (en) * 2020-05-14 2020-08-04 山东师范大学 Robot path navigation method and system based on deep reinforcement learning
CN111667513A (en) * 2020-06-01 2020-09-15 西北工业大学 Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN111796514A (en) * 2019-04-09 2020-10-20 罗伯特·博世有限公司 Controlling and monitoring a physical system based on a trained bayesian neural network
CN112161630A (en) * 2020-10-12 2021-01-01 北京化工大学 AGV (automatic guided vehicle) online collision-free path planning method suitable for large-scale storage system
CN112684794A (en) * 2020-12-07 2021-04-20 杭州未名信科科技有限公司 Foot type robot motion control method, device and medium based on meta reinforcement learning
CN112859851A (en) * 2021-01-08 2021-05-28 广州视源电子科技股份有限公司 Multi-legged robot control system and multi-legged robot
CN113110459A (en) * 2021-04-20 2021-07-13 上海交通大学 Motion planning method for multi-legged robot
CN113406957A (en) * 2021-05-19 2021-09-17 成都理工大学 Mobile robot autonomous navigation method based on immune deep reinforcement learning
WO2022048472A1 (en) * 2020-09-07 2022-03-10 腾讯科技(深圳)有限公司 Legged robot movement control method, apparatus and device, and medium
CN115542913A (en) * 2022-10-05 2022-12-30 哈尔滨理工大学 Hexapod robot fault-tolerant free gait planning method based on geometric and physical feature map
CN116151359A (en) * 2022-11-29 2023-05-23 哈尔滨理工大学 Deep neural network-based layered training method for six-foot robot driver decision model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004028757A1 (en) * 2002-09-26 2004-04-08 National Institute Of Advanced Industrial Science And Technology Walking gait producing device for walking robot
CN106094516A (en) * 2016-06-08 2016-11-09 南京大学 A kind of robot self-adapting grasping method based on deeply study
CN106094813A (en) * 2016-05-26 2016-11-09 华南理工大学 It is correlated with based on model humanoid robot gait's control method of intensified learning
CN106444780A (en) * 2016-11-10 2017-02-22 速感科技(北京)有限公司 Robot autonomous navigation method and system based on vision positioning algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004028757A1 (en) * 2002-09-26 2004-04-08 National Institute Of Advanced Industrial Science And Technology Walking gait producing device for walking robot
CN106094813A (en) * 2016-05-26 2016-11-09 华南理工大学 It is correlated with based on model humanoid robot gait's control method of intensified learning
CN106094516A (en) * 2016-06-08 2016-11-09 南京大学 A kind of robot self-adapting grasping method based on deeply study
CN106444780A (en) * 2016-11-10 2017-02-22 速感科技(北京)有限公司 Robot autonomous navigation method and system based on vision positioning algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHANGJIU ZHOU 等: ""Reinforcement Learning with Fuzzy Evaluative Feedback for a Biped Robot"", 《PROCEEDINGS OF THE 2000 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS & AUTOMATION》 *
唐开强 等: ""约束条件下基于强化学习的六足机器人步态规划"", 《第18届中国系统仿真技术及其应用学术年会论文集》 *
郭祖华 等: ""基于全局轨迹的六足机器人运动规划算法"", 《系统仿真学报》 *

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108161934B (en) * 2017-12-25 2020-06-09 清华大学 Method for realizing robot multi-axis hole assembly by utilizing deep reinforcement learning
CN108161934A (en) * 2017-12-25 2018-06-15 清华大学 A kind of method for learning to realize robot multi peg-in-hole using deeply
CN108321795A (en) * 2018-01-19 2018-07-24 上海交通大学 Start-stop of generator set configuration method based on depth deterministic policy algorithm and system
CN108321795B (en) * 2018-01-19 2021-01-22 上海交通大学 Generator set start-stop configuration method and system based on deep certainty strategy algorithm
CN108536011A (en) * 2018-03-19 2018-09-14 中山大学 A kind of Hexapod Robot complicated landform adaptive motion control method based on deeply study
CN108549928A (en) * 2018-03-19 2018-09-18 清华大学 Visual tracking method and device based on continuous moving under deeply learning guide
CN108549928B (en) * 2018-03-19 2020-09-25 清华大学 Continuous movement-based visual tracking method and device under deep reinforcement learning guidance
US11966838B2 (en) 2018-06-19 2024-04-23 Nvidia Corporation Behavior-guided path planning in autonomous machine applications
CN110618678A (en) * 2018-06-19 2019-12-27 辉达公司 Behavioral guided path planning in autonomous machine applications
CN109242099A (en) * 2018-08-07 2019-01-18 中国科学院深圳先进技术研究院 Training method, device, training equipment and the storage medium of intensified learning network
CN109242099B (en) * 2018-08-07 2020-11-10 中国科学院深圳先进技术研究院 Training method and device of reinforcement learning network, training equipment and storage medium
CN109116854A (en) * 2018-09-16 2019-01-01 南京大学 A kind of robot cooperated control method of multiple groups based on intensified learning and control system
CN109483530A (en) * 2018-10-18 2019-03-19 北京控制工程研究所 A kind of legged type robot motion control method and system based on deeply study
CN109521774B (en) * 2018-12-27 2023-04-07 南京芊玥机器人科技有限公司 Spraying robot track optimization method based on reinforcement learning
CN109521774A (en) * 2018-12-27 2019-03-26 南京芊玥机器人科技有限公司 A kind of spray robot track optimizing method based on intensified learning
CN109871011A (en) * 2019-01-15 2019-06-11 哈尔滨工业大学(深圳) A kind of robot navigation method based on pretreatment layer and deeply study
CN109855616A (en) * 2019-01-16 2019-06-07 电子科技大学 A kind of multiple sensor robot air navigation aid based on virtual environment and intensified learning
CN111796514A (en) * 2019-04-09 2020-10-20 罗伯特·博世有限公司 Controlling and monitoring a physical system based on a trained bayesian neural network
CN110307848A (en) * 2019-07-04 2019-10-08 南京大学 A kind of Mobile Robotics Navigation method
CN110442129B (en) * 2019-07-26 2021-10-22 中南大学 Control method and system for multi-agent formation
CN110442129A (en) * 2019-07-26 2019-11-12 中南大学 A kind of control method and system that multiple agent is formed into columns
CN110764415A (en) * 2019-10-31 2020-02-07 清华大学深圳国际研究生院 Gait planning method for leg movement of quadruped robot
CN110764415B (en) * 2019-10-31 2022-04-15 清华大学深圳国际研究生院 Gait planning method for leg movement of quadruped robot
CN110861084A (en) * 2019-11-18 2020-03-06 东南大学 Four-legged robot falling self-resetting control method based on deep reinforcement learning
CN110861084B (en) * 2019-11-18 2022-04-05 东南大学 Four-legged robot falling self-resetting control method based on deep reinforcement learning
CN110908384A (en) * 2019-12-05 2020-03-24 中山大学 Formation navigation method for distributed multi-robot collaborative unknown random maze
CN110908384B (en) * 2019-12-05 2022-09-23 中山大学 Formation navigation method for distributed multi-robot collaborative unknown random maze
CN111487864A (en) * 2020-05-14 2020-08-04 山东师范大学 Robot path navigation method and system based on deep reinforcement learning
CN111667513B (en) * 2020-06-01 2022-02-18 西北工业大学 Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN111667513A (en) * 2020-06-01 2020-09-15 西北工业大学 Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
WO2022048472A1 (en) * 2020-09-07 2022-03-10 腾讯科技(深圳)有限公司 Legged robot movement control method, apparatus and device, and medium
CN112161630A (en) * 2020-10-12 2021-01-01 北京化工大学 AGV (automatic guided vehicle) online collision-free path planning method suitable for large-scale storage system
CN112684794A (en) * 2020-12-07 2021-04-20 杭州未名信科科技有限公司 Foot type robot motion control method, device and medium based on meta reinforcement learning
CN112859851A (en) * 2021-01-08 2021-05-28 广州视源电子科技股份有限公司 Multi-legged robot control system and multi-legged robot
CN112859851B (en) * 2021-01-08 2023-02-21 广州视源电子科技股份有限公司 Multi-legged robot control system and multi-legged robot
CN113110459A (en) * 2021-04-20 2021-07-13 上海交通大学 Motion planning method for multi-legged robot
CN113406957A (en) * 2021-05-19 2021-09-17 成都理工大学 Mobile robot autonomous navigation method based on immune deep reinforcement learning
CN113406957B (en) * 2021-05-19 2022-07-08 成都理工大学 Mobile robot autonomous navigation method based on immune deep reinforcement learning
CN115542913A (en) * 2022-10-05 2022-12-30 哈尔滨理工大学 Hexapod robot fault-tolerant free gait planning method based on geometric and physical feature map
CN115542913B (en) * 2022-10-05 2023-09-12 哈尔滨理工大学 Six-foot robot fault-tolerant free gait planning method based on geometric and physical feature map
CN116151359B (en) * 2022-11-29 2023-09-29 哈尔滨理工大学 Deep neural network-based layered training method for six-foot robot driver decision model
CN116151359A (en) * 2022-11-29 2023-05-23 哈尔滨理工大学 Deep neural network-based layered training method for six-foot robot driver decision model

Similar Documents

Publication Publication Date Title
CN107450555A (en) A kind of Hexapod Robot real-time gait planing method based on deeply study
CN106444780B (en) A kind of autonomous navigation method and system of the robot of view-based access control model location algorithm
JP7082416B2 (en) Real-time 3D that expresses the real world Two-way real-time 3D interactive operation of real-time 3D virtual objects in a virtual world
Chen et al. Stabilization approaches for reinforcement learning-based end-to-end autonomous driving
CN108227735B (en) Method, computer readable medium and system for self-stabilization based on visual flight
CN110666793B (en) Method for realizing robot square part assembly based on deep reinforcement learning
CN107562052A (en) A kind of Hexapod Robot gait planning method based on deeply study
CN106094516A (en) A kind of robot self-adapting grasping method based on deeply study
EP3547267A1 (en) Robot control system, machine control system, robot control method, machine control method, and recording medium
Zhou et al. A deep Q-network (DQN) based path planning method for mobile robots
CN106648116A (en) Virtual reality integrated system based on action capture
CN108780325A (en) System and method for adjusting unmanned vehicle track
CN105027030A (en) Wireless wrist computing and control device and method for 3d imaging, mapping, networking and interfacing
CN206497423U (en) A kind of virtual reality integrated system with inertia action trap setting
CN106078752A (en) Method is imitated in a kind of anthropomorphic robot human body behavior based on Kinect
CN107085422A (en) A kind of tele-control system of the multi-functional Hexapod Robot based on Xtion equipment
CN101610877A (en) The method and apparatus that is used for sense of touch control
Jain et al. From pixels to legs: Hierarchical learning of quadruped locomotion
CN113076615B (en) High-robustness mechanical arm operation method and system based on antagonistic deep reinforcement learning
Felbrich et al. Autonomous robotic additive manufacturing through distributed model‐free deep reinforcement learning in computational design environments
CN103991077A (en) Robot hand controller shared control method based on force fusion
WO2018198909A1 (en) Information processing device, information processing method, and program
Mahmoudi et al. MRL team description paper for humanoid KidSize league of RoboCup 2019
Murhij et al. Hand gestures recognition model for Augmented reality robotic applications
Yoo et al. Recent progress and development of the humanoid robot HanSaRam

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171208

RJ01 Rejection of invention patent application after publication