CN106950969A - It is a kind of based on the mobile robot continuous control method without map movement planner - Google Patents
It is a kind of based on the mobile robot continuous control method without map movement planner Download PDFInfo
- Publication number
- CN106950969A CN106950969A CN201710294685.9A CN201710294685A CN106950969A CN 106950969 A CN106950969 A CN 106950969A CN 201710294685 A CN201710294685 A CN 201710294685A CN 106950969 A CN106950969 A CN 106950969A
- Authority
- CN
- China
- Prior art keywords
- movement planner
- mobile robot
- gradient
- network
- map movement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000033001 locomotion Effects 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000006870 function Effects 0.000 claims abstract description 36
- 238000012549 training Methods 0.000 claims abstract description 25
- 230000004888 barrier function Effects 0.000 claims abstract description 9
- 238000012546 transfer Methods 0.000 claims abstract description 9
- 230000008569 process Effects 0.000 claims abstract description 7
- 230000009471 action Effects 0.000 claims description 21
- 230000001537 neural effect Effects 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000003139 buffering effect Effects 0.000 claims description 6
- 238000005259 measurement Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 5
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims description 2
- 238000012986 modification Methods 0.000 abstract description 5
- 230000004048 modification Effects 0.000 abstract description 5
- 230000007613 environmental effect Effects 0.000 description 2
- BULVZWIRKLYCBC-UHFFFAOYSA-N phorate Chemical compound CCOP(=S)(OCC)SCSCC BULVZWIRKLYCBC-UHFFFAOYSA-N 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0219—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory ensuring the processing of the whole working surface
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Manipulator (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
What is proposed in the present invention is a kind of based on the mobile robot continuous control method without map movement planner, and its main contents includes:Policy-Gradient, intensified learning are determined without map movement planner, asynchronous depth, network and reward function is assessed, and its process is to carry out end-to-end training using without map movement planner, be that no map movement planner finds transfer function to control frequency;Modification original depth determines Policy-Gradient, and Policy-Gradient is determined as asynchronous depth;Intensified learning is carried out, training and sample is collected and can be performed parallel;Movement planner is estimated using network is assessed, reward function is defined and checks up to target.The present invention uses high-precision laser range-finding sensor, can accurately calculate outbound path, more efficient;While demonstrating without any manual designs and in advance, feasible path optimizing can be efficiently searched out, by robot navigation to target location, and will not be collided with the barrier in environment.
Description
Technical field
The present invention relates to robot control field, more particularly, to a kind of based on the moving machine without map movement planner
Device people's continuous control method.
Background technology
With the development of science and technology, Mobile Robotics Navigation increasingly turns into robotics and artificial intelligence field is ground
One of hot issue studied carefully, meanwhile, it is also the embodiment of full autonomous robot level of intelligence.It is desirable to realize unknown
During environmental work, mobile robot can obtain local environmental information according to self-sensor device, with independently setting up environment
Figure, and according to the map of foundation, cook up the feasible path that can be arrived at collisionless.So, mobile robot can
So that applied to fields such as daily navigation, path plannings, the trip and work given people offers convenience.However, traditional method makes
Navigation is realized with simultaneous localization and mapping, is not only taken, and there is stronger dependence to map.
The present invention propose it is a kind of based on the mobile robot continuous control method without map movement planner, using without ground
Figure movement planner carries out end-to-end training, is that no map movement planner finds transfer function to control frequency, so as to machine
People can make a response to new observation result immediately;Modification original depth determines Policy-Gradient, and plan is determined as asynchronous depth
Omit gradient;Intensified learning is carried out, training and sample is collected and can be performed parallel;Movement planner is carried out using network is assessed
Assess, define reward function and check up to target.The present invention uses high-precision laser range-finding sensor, can accurately calculate outlet
Footpath, it is more efficient;While demonstrating without any manual designs and in advance, feasible path optimizing can be efficiently searched out, will
Robot navigation will not collide to target location with the barrier in environment.
The content of the invention
The problems such as being taken for navigation, it is an object of the invention to provide a kind of based on the movement without map movement planner
Robot continuous control method, carries out end-to-end training using without map movement planner, is that no map movement planner is found
Transfer function controls frequency, so that robot can make a response to new observation result immediately;Original depth is changed to determine
Policy-Gradient, Policy-Gradient is determined as asynchronous depth;Intensified learning is carried out, training and sample is collected and can be performed parallel;
Movement planner is estimated using network is assessed, reward function is defined and checks up to target.
To solve the above problems, the present invention provides a kind of based on the mobile robot continuous control without map movement planner
Method, its main contents include:
(1) without map movement planner;
(2) asynchronous depth determines Policy-Gradient;
(3) intensified learning;
(4) network is assessed;
(5) reward function.
Wherein, it is described based on the mobile robot continuous control method without map movement planner, only extract 10 dimensions and survey
Away from result and target relative information as reference, no map movement planner is strengthened learning method by asynchronous deep layer and started anew
Trained end to end, and can directly export continuous straight line and angular speed.
Wherein, it is described to be used as input by taking 10 dimension range measurements and target location without map movement planner, even
Continuous diversion order is used as output;To carrying out end-to-end training without map movement planner, it may be directly applied to virtual and true
In real environment;Can be by Mobile Robotics Navigation to required target without map movement planner, and will not be with any obstacle
Thing collides.
Further, described transfer function, is that no map movement planner defines transfer function:
vt=f (xt,pt,vt-1) (1)
Wherein, xtIt is the observed value of raw sensor data, ptIt is the relative position of target, vt-1In being final time step-length
The speed of mobile robot;They can be considered as the immediate status of mobile robot;State is mapped directly to action by model,
Speed v i.e. next timet;Effective movement planner must assure that control frequency, so that robot can be immediately to new sight
Result is examined to make a response.
Wherein, described asynchronous depth determines Policy-Gradient, compared with original depth determines Policy-Gradient, by sampling process
It is separated to another thread;In training thread, each iterative step updates by the batch collected from buffering area and assesses network
θQWith actor network θuWeight;The prediction target for assessing network is according to reward riWith estimation Q value γ Q ' calculating;Q ' is
Next state st+1The output of the weight θ Q ' provided goal-based assessment network, and with the optimal action a of estimationt+1=u '
(si+1|θu′) target actor network θuIt is used as input.
Further, described sample is collected, and actor network is updated by the Policy-Gradient for Batch conversion of sampling;Sample
This collection thread parallel is performed, and action is determined by actor network;Within the training time, random process N is added, is excited to action
The exploration in space;New conversion is saved in by the shared response buffering area of thread of training and sample;Asynchronous depth determines strategy
Gradient can also use multiple Data Collection threads to realize other asynchronous methods;Original depth determines Policy-Gradient reverse every time
A sample is collected in propagation iterative, and parallel asynchronous depth determines that the sample that Policy-Gradient is collected in each step is more.
Wherein, described intensified learning, 10 abstract dimension laser ranging results, previous action and relative target position quilt
Merge as 14 dimensional input vectors;10 dimension laser ranging results are used with the original laser result between 90 degree and 90 degree
Angular distribution, ranging information is normalized to (0,1);The two dimensional motion of each time step includes the angular speed of mobile robot
And linear velocity;Two dimension target position is represented with the polar coordinates (distance and angle) relative to mobile robot coordinate system;With
After the neural net layer that 3 of 512 nodes are fully connected, input vector is sent to the linear velocity and angle speed of mobile robot
Drag out a miserable existence order.
Further, described laser ranging result, in order to constrain the angular velocity range in (- 1,1), uses tanh
Function (tanh) is used as activation primitive;In addition, the scope of linear speed is constrained in (0,1) by sigmoid function;Due to laser knot
Fruit can not cover the dorsal area of mobile robot, so can not be moved rearwards by;Output action is multiplied by two hyper parameters, determines to move
The final linear and angular speed that mobile robot is directly performed;In view of real kinetic, selection 0.5m/s is used as maximum line velocity
Maximum angular rate is used as with 1rad/s.
Wherein, described assessment network, for assess network, predict state and action to Q values;Using 3 completely
The neural net layer of connection handles input state;The action merges in second neural net layer being fully connected;Q values are most
Activated eventually by linear activation primitive:
Y=kx+b (2)
Wherein, x is the input of last layer, and y is the Q values of prediction, and k and b are the weight of training and the deviation of this layer.
Wherein, collided without barrier described reward function, the target location that mobile robot attempts needed for reaching;
Reward function has three kinds of different conditions:
If robot is checked up to target by distance threshold, on the occasion of reward rReach, but if pass through minimum
Range measurement checks that robot collides with barrier, then rewards r for negative valueCollision;The two conditions all can stop training;
Otherwise, reward function and the difference of a upper time step distance, dt-1-dt, it is multiplied by hyper parameter cr;Reward function can lean on robot
Close-target position;Reward function is directly used by assessment network, without cutting out or normalizing.
Brief description of the drawings
Fig. 1 is a kind of system framework based on the mobile robot continuous control method without map movement planner of the present invention
Figure.
Fig. 2 is that the present invention is a kind of to be transported based on the mobile robot continuous control method without map movement planner without map
The transfer function of dynamic planner.
Fig. 3 is a kind of extensive chemical based on the mobile robot continuous control method without map movement planner of the present invention
Practise.
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
Fig. 1 is a kind of system framework based on the mobile robot continuous control method without map movement planner of the present invention
Figure.Main to include without map movement planner, asynchronous depth determines Policy-Gradient, and intensified learning assesses network and reward function.
Based on the mobile robot continuous control method without map movement planner, only extract 10 and tie up range measurement and target
Relative information is as reference, and no map movement planner strengthens learning method by asynchronous deep layer and starts anew to carry out end to end
Training, and can directly export continuous straight line and angular speed.
Asynchronous depth determines Policy-Gradient, compared with original depth determines Policy-Gradient, sampling process is separated to another
Individual thread;In training thread, each iterative step updates by the batch collected from buffering area and assesses network θQAnd actuator
Network θuWeight;The prediction target for assessing network is according to reward riWith estimation Q value γ Q ' calculating;Q ' is next state
st+1The output of the weight θ Q ' provided goal-based assessment network, and with the optimal action a of estimationt+1=u ' (si+1|θu′) target
Actor network θuIt is used as input.
Actor network is updated by the Policy-Gradient for Batch conversion of sampling;Sample is collected thread parallel and performed, action
Determined by actor network;Within the training time, random process N is added, the exploration to motion space is excited;New conversion is preserved
To in by the shared response buffering area of thread of training and sample;Asynchronous depth determines that Policy-Gradient can also use multiple data to receive
Collection thread realizes other asynchronous methods;Original depth determines that Policy-Gradient collects a sample in each backpropagation iteration,
And parallel asynchronous depth determines that the sample that Policy-Gradient is collected in each step is more.
For assess network, predict state and action to Q values;Located using 3 neural net layers being fully connected
Manage input state;The action merges in second neural net layer being fully connected;Q values swash eventually through linear activation primitive
It is living:
Y=kx+b (1)
Wherein, x is the input of last layer, and y is the Q values of prediction, and k and b are the weight of training and the deviation of this layer.
Collided without barrier reward function, the target location that mobile robot attempts needed for reaching;Reward function has
Three kinds of different conditions:
If robot is checked up to target by distance threshold, on the occasion of reward rReach, but if pass through minimum
Range measurement checks that robot collides with barrier, then rewards r for negative valueCollision;The two conditions all can stop training;
Otherwise, reward function and the difference of a upper time step distance, dt-1-dt, it is multiplied by hyper parameter cr;Reward function can lean on robot
Close-target position;Reward function is directly used by assessment network, without cutting out or normalizing.
Fig. 2 is that the present invention is a kind of to be transported based on the mobile robot continuous control method without map movement planner without map
The transfer function of dynamic planner.It is used as input by taking 10 dimension range measurements and target location without map movement planner, even
Continuous diversion order is used as output;To carrying out end-to-end training without map movement planner, it may be directly applied to virtual and true
In real environment;Can be by Mobile Robotics Navigation to required target without map movement planner, and will not be with any obstacle
Thing collides.
To define transfer function without map movement planner:
vt=f (xt,pt,vt-1) (3)
Wherein, xtIt is the observed value of raw sensor data, ptIt is the relative position of target, vt-1In being final time step-length
The speed of mobile robot;They can be considered as the immediate status of mobile robot;State is mapped directly to action by model,
Speed v i.e. next timet;Effective movement planner must assure that control frequency, so that robot can be immediately to new sight
Result is examined to make a response.
Fig. 3 is a kind of extensive chemical based on the mobile robot continuous control method without map movement planner of the present invention
Practise.Abstract 10 dimension laser ranging results, previous action and relative target position be merged together as 14 dimension inputs to
Amount;10 dimension laser ranging results use angular distribution with the original laser result between 90 degree and 90 degree, and ranging information is by normalizing
Turn to (0,1);The two dimensional motion of each time step includes the angular speed and linear velocity of mobile robot;Two dimension target position with
Polar coordinates (distance and angle) relative to mobile robot coordinate system are represented;It is fully connected in 3 with 512 nodes
After neural net layer, input vector is sent to linear velocity and the angular speed order of mobile robot.
In order to constrain the angular velocity range in (- 1,1), activation primitive is used as using hyperbolic tangent function (tanh);In addition,
The scope of linear speed is constrained in (0,1) by sigmoid function;Because lasing result can not cover the back region of mobile robot
Domain, so can not be moved rearwards by;Output action is multiplied by two hyper parameters, determine that mobile robot is directly performed final linear and
Angular speed;In view of real kinetic, selection 0.5m/s is used as maximum angular rate as maximum line velocity and 1rad/s.
For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention
In the case of refreshing and scope, the present invention can be realized with other concrete forms.In addition, those skilled in the art can be to this hair
Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these are improved and modification also should be regarded as the present invention's
Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention
More and modification.
Claims (10)
1. it is a kind of based on the mobile robot continuous control method without map movement planner, it is characterised in that mainly including nothing
Map movement planner (one);Asynchronous depth determines Policy-Gradient (two);Intensified learning (three);Assess network (four);Reward letter
Number (five).
2. based on, based on the mobile robot continuous control method without map movement planner, it is special described in claims 1
Levy and be, only extract 10 and tie up range measurement and target relative information as reference, no map movement planner passes through asynchronous deep layer
Strengthen learning method to start anew to be trained end to end, and can directly export continuous straight line and angular speed.
3. based on described in claims 1 without map movement planner (one), it is characterised in that by taking 10 dimension ranging knots
Fruit and target location are as input, and continuous diversion order is used as output;To carrying out end-to-end training without map movement planner,
It may be directly applied in virtual and true environment;Can be by Mobile Robotics Navigation to required mesh without map movement planner
Mark, and will not be collided with any barrier.
4. based on the transfer function described in claims 3, it is characterised in that be without map movement planner definition conversion letter
Number:
vt=f (xt,pt,vt-1) (1)
Wherein, xtIt is the observed value of raw sensor data, ptIt is the relative position of target, vt-1It is to move in final time step-length
The speed of robot;They can be considered as the immediate status of mobile robot;State is mapped directly to action by model, i.e., under
Speed v oncet;Effective movement planner must assure that control frequency, so that robot can be immediately to new observation knot
Fruit is made a response.
5. Policy-Gradient (two) is determined based on the asynchronous depth described in claims 1, it is characterised in that determine with original depth
Policy-Gradient is compared, and sampling process is separated into another thread;In training thread, each iterative step is by from buffering area
The batch of collection, updates and assesses network θQWith actor network θuWeight;The prediction target for assessing network is according to reward riWith
Estimate Q value γ Q ' calculating;Q ' is next state st+1The output of the weight θ Q ' provided goal-based assessment network, and to estimate
The optimal action a of metert+1=u ' (si+1|θu′) target actor network θuIt is used as input.
6. collected based on the sample described in claims 5, it is characterised in that the plan that actor network passes through Batch conversion of sampling
Gradient is omited to update;Sample is collected thread parallel and performed, and action is determined by actor network;Within the training time, addition is random
Process N, excites the exploration to motion space;New conversion is saved in by the shared response buffering area of thread of training and sample;
Asynchronous depth determines that Policy-Gradient can also use multiple Data Collection threads to realize other asynchronous methods;Original depth determines plan
Slightly gradient collects a sample in each backpropagation iteration, and parallel asynchronous depth determines Policy-Gradient in each step
The sample of collection is more.
7. based on the intensified learning (three) described in claims 1, it is characterised in that 10 abstract dimension laser ranging results, first
Preceding action and relative target position is merged together as 14 dimensional input vectors;10 dimension laser ranging results are with 90 degree and 90
Original laser result between degree uses angular distribution, and ranging information is normalized to (0,1);The two dimension of each time step is moved
Work includes the angular speed and linear velocity of mobile robot;Two dimension target position is with the polar coordinates relative to mobile robot coordinate system
(distance and angle) is represented;After the neural net layer that 3 with 512 nodes are fully connected, input vector is sent to
The linear velocity of mobile robot and angular speed order.
8. based on the laser ranging result described in claims 7, it is characterised in that in order to constrain the angular speed model in (- 1,1)
Enclose, activation primitive is used as using hyperbolic tangent function (tanh);In addition, the scope of linear speed constrained in by sigmoid function (0,
1) in;Because lasing result can not cover the dorsal area of mobile robot, so can not be moved rearwards by;Output action is multiplied by two
Individual hyper parameter, determines the final linear and angular speed that mobile robot is directly performed;In view of real kinetic, 0.5m/s is selected
Maximum angular rate is used as maximum line velocity and 1rad/s.
9. based on the assessment network (four) described in claims 1, it is characterised in that for assessing network, predict state and
Act to Q values;Input state is handled using 3 neural net layers being fully connected;The action merges at second completely
In the neural net layer of connection;Q values are activated eventually through linear activation primitive:
Y=kx+b (2)
Wherein, x is the input of last layer, and y is the Q values of prediction, and k and b are the weight of training and the deviation of this layer.
10. based on the reward function (five) described in claims 1, it is characterised in that mobile robot is attempted needed for reaching
Collided without barrier target location;Reward function has three kinds of different conditions:
If robot is checked up to target by distance threshold, on the occasion of reward rReach, but if pass through minimum ranging
As a result check, robot collides with barrier, then reward r for negative valueCollision;The two conditions all can stop training;It is no
Then, reward function and the difference of a upper time step distance, dt-1-dt, it is multiplied by hyper parameter cr;Reward function can make robot close
Target location;Reward function is directly used by assessment network, without cutting out or normalizing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710294685.9A CN106950969A (en) | 2017-04-28 | 2017-04-28 | It is a kind of based on the mobile robot continuous control method without map movement planner |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710294685.9A CN106950969A (en) | 2017-04-28 | 2017-04-28 | It is a kind of based on the mobile robot continuous control method without map movement planner |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106950969A true CN106950969A (en) | 2017-07-14 |
Family
ID=59477823
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710294685.9A Withdrawn CN106950969A (en) | 2017-04-28 | 2017-04-28 | It is a kind of based on the mobile robot continuous control method without map movement planner |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106950969A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107450593A (en) * | 2017-08-30 | 2017-12-08 | 清华大学 | A kind of unmanned plane autonomous navigation method and system |
CN107490377A (en) * | 2017-07-17 | 2017-12-19 | 五邑大学 | Indoor map-free navigation system and navigation method |
CN108287542A (en) * | 2018-01-04 | 2018-07-17 | 浙江大学 | Unmanned plane and unmanned boat cooperation control system and method based on collaboration cloud control |
CN108320051A (en) * | 2018-01-17 | 2018-07-24 | 哈尔滨工程大学 | A kind of mobile robot dynamic collision-free planning method based on GRU network models |
CN108536144A (en) * | 2018-04-10 | 2018-09-14 | 上海理工大学 | A kind of paths planning method of fusion dense convolutional network and competition framework |
CN109085825A (en) * | 2018-07-13 | 2018-12-25 | 安徽灵图壹智能科技有限公司 | A kind of unmanned mine car mining optimal route selection method |
CN109242098A (en) * | 2018-07-25 | 2019-01-18 | 深圳先进技术研究院 | Limit neural network structure searching method and Related product under cost |
CN109241552A (en) * | 2018-07-12 | 2019-01-18 | 哈尔滨工程大学 | A kind of underwater robot motion planning method based on multiple constraint target |
CN109668484A (en) * | 2019-01-18 | 2019-04-23 | 北京瀚科瑞杰科技发展有限公司 | A kind of target drone maneuvering control method and system that target drone is interacted with attack plane |
CN110147891A (en) * | 2019-05-23 | 2019-08-20 | 北京地平线机器人技术研发有限公司 | Method, apparatus and electronic equipment applied to intensified learning training process |
CN110488835A (en) * | 2019-08-28 | 2019-11-22 | 北京航空航天大学 | A kind of unmanned systems intelligence local paths planning method based on double reverse transmittance nerve networks |
CN110753936A (en) * | 2017-08-25 | 2020-02-04 | 谷歌有限责任公司 | Batch reinforcement learning |
CN110908384A (en) * | 2019-12-05 | 2020-03-24 | 中山大学 | Formation navigation method for distributed multi-robot collaborative unknown random maze |
CN111515961A (en) * | 2020-06-02 | 2020-08-11 | 南京大学 | Reinforcement learning reward method suitable for mobile mechanical arm |
CN112857370A (en) * | 2021-01-07 | 2021-05-28 | 北京大学 | Robot map-free navigation method based on time sequence information modeling |
CN113093727A (en) * | 2021-03-08 | 2021-07-09 | 哈尔滨工业大学(深圳) | Robot map-free navigation method based on deep security reinforcement learning |
CN113260936A (en) * | 2018-12-26 | 2021-08-13 | 三菱电机株式会社 | Mobile body control device, mobile body control learning device, and mobile body control method |
TWI815613B (en) * | 2022-08-16 | 2023-09-11 | 和碩聯合科技股份有限公司 | Navigation method for robot and robot thereof |
-
2017
- 2017-04-28 CN CN201710294685.9A patent/CN106950969A/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
LEI TAI等: "Virtual-to-real Deep Reinforcement Learning:Continuous Control of Mobile Robots for Mapless Navigation", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1703.00420》 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107490377A (en) * | 2017-07-17 | 2017-12-19 | 五邑大学 | Indoor map-free navigation system and navigation method |
CN110753936A (en) * | 2017-08-25 | 2020-02-04 | 谷歌有限责任公司 | Batch reinforcement learning |
CN107450593B (en) * | 2017-08-30 | 2020-06-12 | 清华大学 | Unmanned aerial vehicle autonomous navigation method and system |
CN107450593A (en) * | 2017-08-30 | 2017-12-08 | 清华大学 | A kind of unmanned plane autonomous navigation method and system |
CN108287542A (en) * | 2018-01-04 | 2018-07-17 | 浙江大学 | Unmanned plane and unmanned boat cooperation control system and method based on collaboration cloud control |
CN108287542B (en) * | 2018-01-04 | 2021-01-26 | 浙江大学 | Unmanned aerial vehicle and unmanned ship cooperative control system and method based on cooperative cloud control |
CN108320051B (en) * | 2018-01-17 | 2021-11-23 | 哈尔滨工程大学 | Mobile robot dynamic collision avoidance planning method based on GRU network model |
CN108320051A (en) * | 2018-01-17 | 2018-07-24 | 哈尔滨工程大学 | A kind of mobile robot dynamic collision-free planning method based on GRU network models |
CN108536144A (en) * | 2018-04-10 | 2018-09-14 | 上海理工大学 | A kind of paths planning method of fusion dense convolutional network and competition framework |
CN109241552A (en) * | 2018-07-12 | 2019-01-18 | 哈尔滨工程大学 | A kind of underwater robot motion planning method based on multiple constraint target |
CN109241552B (en) * | 2018-07-12 | 2022-04-05 | 哈尔滨工程大学 | Underwater robot motion planning method based on multiple constraint targets |
CN109085825A (en) * | 2018-07-13 | 2018-12-25 | 安徽灵图壹智能科技有限公司 | A kind of unmanned mine car mining optimal route selection method |
CN109242098A (en) * | 2018-07-25 | 2019-01-18 | 深圳先进技术研究院 | Limit neural network structure searching method and Related product under cost |
CN113260936A (en) * | 2018-12-26 | 2021-08-13 | 三菱电机株式会社 | Mobile body control device, mobile body control learning device, and mobile body control method |
CN113260936B (en) * | 2018-12-26 | 2024-05-07 | 三菱电机株式会社 | Moving object control device, moving object control learning device, and moving object control method |
CN109668484A (en) * | 2019-01-18 | 2019-04-23 | 北京瀚科瑞杰科技发展有限公司 | A kind of target drone maneuvering control method and system that target drone is interacted with attack plane |
CN109668484B (en) * | 2019-01-18 | 2023-05-02 | 北京瀚科科技集团有限公司 | Target aircraft maneuvering flight control method and system for interaction of target aircraft and attack aircraft |
CN110147891A (en) * | 2019-05-23 | 2019-08-20 | 北京地平线机器人技术研发有限公司 | Method, apparatus and electronic equipment applied to intensified learning training process |
CN110488835A (en) * | 2019-08-28 | 2019-11-22 | 北京航空航天大学 | A kind of unmanned systems intelligence local paths planning method based on double reverse transmittance nerve networks |
CN110908384A (en) * | 2019-12-05 | 2020-03-24 | 中山大学 | Formation navigation method for distributed multi-robot collaborative unknown random maze |
CN110908384B (en) * | 2019-12-05 | 2022-09-23 | 中山大学 | Formation navigation method for distributed multi-robot collaborative unknown random maze |
CN111515961A (en) * | 2020-06-02 | 2020-08-11 | 南京大学 | Reinforcement learning reward method suitable for mobile mechanical arm |
CN111515961B (en) * | 2020-06-02 | 2022-06-21 | 南京大学 | Reinforcement learning reward method suitable for mobile mechanical arm |
CN112857370A (en) * | 2021-01-07 | 2021-05-28 | 北京大学 | Robot map-free navigation method based on time sequence information modeling |
CN113093727A (en) * | 2021-03-08 | 2021-07-09 | 哈尔滨工业大学(深圳) | Robot map-free navigation method based on deep security reinforcement learning |
TWI815613B (en) * | 2022-08-16 | 2023-09-11 | 和碩聯合科技股份有限公司 | Navigation method for robot and robot thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106950969A (en) | It is a kind of based on the mobile robot continuous control method without map movement planner | |
CN113110509B (en) | Warehousing system multi-robot path planning method based on deep reinforcement learning | |
CN108279692B (en) | UUV dynamic planning method based on LSTM-RNN | |
Brunner et al. | Teaching a machine to read maps with deep reinforcement learning | |
CN104155998B (en) | A kind of path planning method based on potential field method | |
CN108645413A (en) | The dynamic correcting method of positioning and map building while a kind of mobile robot | |
CN106873585A (en) | One kind navigation method for searching, robot and system | |
Saulnier et al. | Information theoretic active exploration in signed distance fields | |
CN110095120A (en) | Biology of the Autonomous Underwater aircraft under ocean circulation inspires Self-organizing Maps paths planning method | |
CN110515382A (en) | A kind of smart machine and its localization method | |
CN114879660B (en) | Robot environment sensing method based on target drive | |
Wang | Automatic control of mobile robot based on autonomous navigation algorithm | |
Klein | Data-driven meets navigation: Concepts, models, and experimental validation | |
CN107562837B (en) | Maneuvering target tracking algorithm based on road network | |
Jiang et al. | Intelligent Plant Cultivation Robot Based on Key Marker Algorithm Using Visual and Laser Sensors | |
CN114594776B (en) | Navigation obstacle avoidance method based on layering and modular learning | |
CN114153216B (en) | Lunar surface path planning system and method based on deep reinforcement learning and block planning | |
Kim et al. | Path integration mechanism with coarse coding of neurons | |
CN115690343A (en) | Robot laser radar scanning and mapping method based on visual following | |
Chauvin-Hameau | Informative path planning for algae farm surveying | |
CN112907644B (en) | Machine map-oriented visual positioning method | |
Abidin et al. | A calibration framework for swarming ASVs’ system design | |
El-Fakdi et al. | Autonomous underwater vehicle control using reinforcement learning policy search methods | |
Kashyap et al. | Modified type-2 fuzzy controller for intercollision avoidance of single and multi-humanoid robots in complex terrains | |
KR20220090732A (en) | Method and system for determining action of device for given state using model trained based on risk measure parameter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170714 |