CN106950969A - It is a kind of based on the mobile robot continuous control method without map movement planner - Google Patents

It is a kind of based on the mobile robot continuous control method without map movement planner Download PDF

Info

Publication number
CN106950969A
CN106950969A CN201710294685.9A CN201710294685A CN106950969A CN 106950969 A CN106950969 A CN 106950969A CN 201710294685 A CN201710294685 A CN 201710294685A CN 106950969 A CN106950969 A CN 106950969A
Authority
CN
China
Prior art keywords
movement planner
mobile robot
gradient
network
map movement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710294685.9A
Other languages
Chinese (zh)
Inventor
夏春秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vision Technology Co Ltd
Original Assignee
Shenzhen Vision Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vision Technology Co Ltd filed Critical Shenzhen Vision Technology Co Ltd
Priority to CN201710294685.9A priority Critical patent/CN106950969A/en
Publication of CN106950969A publication Critical patent/CN106950969A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0219Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory ensuring the processing of the whole working surface

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

What is proposed in the present invention is a kind of based on the mobile robot continuous control method without map movement planner, and its main contents includes:Policy-Gradient, intensified learning are determined without map movement planner, asynchronous depth, network and reward function is assessed, and its process is to carry out end-to-end training using without map movement planner, be that no map movement planner finds transfer function to control frequency;Modification original depth determines Policy-Gradient, and Policy-Gradient is determined as asynchronous depth;Intensified learning is carried out, training and sample is collected and can be performed parallel;Movement planner is estimated using network is assessed, reward function is defined and checks up to target.The present invention uses high-precision laser range-finding sensor, can accurately calculate outbound path, more efficient;While demonstrating without any manual designs and in advance, feasible path optimizing can be efficiently searched out, by robot navigation to target location, and will not be collided with the barrier in environment.

Description

It is a kind of based on the mobile robot continuous control method without map movement planner
Technical field
The present invention relates to robot control field, more particularly, to a kind of based on the moving machine without map movement planner Device people's continuous control method.
Background technology
With the development of science and technology, Mobile Robotics Navigation increasingly turns into robotics and artificial intelligence field is ground One of hot issue studied carefully, meanwhile, it is also the embodiment of full autonomous robot level of intelligence.It is desirable to realize unknown During environmental work, mobile robot can obtain local environmental information according to self-sensor device, with independently setting up environment Figure, and according to the map of foundation, cook up the feasible path that can be arrived at collisionless.So, mobile robot can So that applied to fields such as daily navigation, path plannings, the trip and work given people offers convenience.However, traditional method makes Navigation is realized with simultaneous localization and mapping, is not only taken, and there is stronger dependence to map.
The present invention propose it is a kind of based on the mobile robot continuous control method without map movement planner, using without ground Figure movement planner carries out end-to-end training, is that no map movement planner finds transfer function to control frequency, so as to machine People can make a response to new observation result immediately;Modification original depth determines Policy-Gradient, and plan is determined as asynchronous depth Omit gradient;Intensified learning is carried out, training and sample is collected and can be performed parallel;Movement planner is carried out using network is assessed Assess, define reward function and check up to target.The present invention uses high-precision laser range-finding sensor, can accurately calculate outlet Footpath, it is more efficient;While demonstrating without any manual designs and in advance, feasible path optimizing can be efficiently searched out, will Robot navigation will not collide to target location with the barrier in environment.
The content of the invention
The problems such as being taken for navigation, it is an object of the invention to provide a kind of based on the movement without map movement planner Robot continuous control method, carries out end-to-end training using without map movement planner, is that no map movement planner is found Transfer function controls frequency, so that robot can make a response to new observation result immediately;Original depth is changed to determine Policy-Gradient, Policy-Gradient is determined as asynchronous depth;Intensified learning is carried out, training and sample is collected and can be performed parallel; Movement planner is estimated using network is assessed, reward function is defined and checks up to target.
To solve the above problems, the present invention provides a kind of based on the mobile robot continuous control without map movement planner Method, its main contents include:
(1) without map movement planner;
(2) asynchronous depth determines Policy-Gradient;
(3) intensified learning;
(4) network is assessed;
(5) reward function.
Wherein, it is described based on the mobile robot continuous control method without map movement planner, only extract 10 dimensions and survey Away from result and target relative information as reference, no map movement planner is strengthened learning method by asynchronous deep layer and started anew Trained end to end, and can directly export continuous straight line and angular speed.
Wherein, it is described to be used as input by taking 10 dimension range measurements and target location without map movement planner, even Continuous diversion order is used as output;To carrying out end-to-end training without map movement planner, it may be directly applied to virtual and true In real environment;Can be by Mobile Robotics Navigation to required target without map movement planner, and will not be with any obstacle Thing collides.
Further, described transfer function, is that no map movement planner defines transfer function:
vt=f (xt,pt,vt-1) (1)
Wherein, xtIt is the observed value of raw sensor data, ptIt is the relative position of target, vt-1In being final time step-length The speed of mobile robot;They can be considered as the immediate status of mobile robot;State is mapped directly to action by model, Speed v i.e. next timet;Effective movement planner must assure that control frequency, so that robot can be immediately to new sight Result is examined to make a response.
Wherein, described asynchronous depth determines Policy-Gradient, compared with original depth determines Policy-Gradient, by sampling process It is separated to another thread;In training thread, each iterative step updates by the batch collected from buffering area and assesses network θQWith actor network θuWeight;The prediction target for assessing network is according to reward riWith estimation Q value γ Q ' calculating;Q ' is Next state st+1The output of the weight θ Q ' provided goal-based assessment network, and with the optimal action a of estimationt+1=u ' (si+1u′) target actor network θuIt is used as input.
Further, described sample is collected, and actor network is updated by the Policy-Gradient for Batch conversion of sampling;Sample This collection thread parallel is performed, and action is determined by actor network;Within the training time, random process N is added, is excited to action The exploration in space;New conversion is saved in by the shared response buffering area of thread of training and sample;Asynchronous depth determines strategy Gradient can also use multiple Data Collection threads to realize other asynchronous methods;Original depth determines Policy-Gradient reverse every time A sample is collected in propagation iterative, and parallel asynchronous depth determines that the sample that Policy-Gradient is collected in each step is more.
Wherein, described intensified learning, 10 abstract dimension laser ranging results, previous action and relative target position quilt Merge as 14 dimensional input vectors;10 dimension laser ranging results are used with the original laser result between 90 degree and 90 degree Angular distribution, ranging information is normalized to (0,1);The two dimensional motion of each time step includes the angular speed of mobile robot And linear velocity;Two dimension target position is represented with the polar coordinates (distance and angle) relative to mobile robot coordinate system;With After the neural net layer that 3 of 512 nodes are fully connected, input vector is sent to the linear velocity and angle speed of mobile robot Drag out a miserable existence order.
Further, described laser ranging result, in order to constrain the angular velocity range in (- 1,1), uses tanh Function (tanh) is used as activation primitive;In addition, the scope of linear speed is constrained in (0,1) by sigmoid function;Due to laser knot Fruit can not cover the dorsal area of mobile robot, so can not be moved rearwards by;Output action is multiplied by two hyper parameters, determines to move The final linear and angular speed that mobile robot is directly performed;In view of real kinetic, selection 0.5m/s is used as maximum line velocity Maximum angular rate is used as with 1rad/s.
Wherein, described assessment network, for assess network, predict state and action to Q values;Using 3 completely The neural net layer of connection handles input state;The action merges in second neural net layer being fully connected;Q values are most Activated eventually by linear activation primitive:
Y=kx+b (2)
Wherein, x is the input of last layer, and y is the Q values of prediction, and k and b are the weight of training and the deviation of this layer.
Wherein, collided without barrier described reward function, the target location that mobile robot attempts needed for reaching; Reward function has three kinds of different conditions:
If robot is checked up to target by distance threshold, on the occasion of reward rReach, but if pass through minimum Range measurement checks that robot collides with barrier, then rewards r for negative valueCollision;The two conditions all can stop training; Otherwise, reward function and the difference of a upper time step distance, dt-1-dt, it is multiplied by hyper parameter cr;Reward function can lean on robot Close-target position;Reward function is directly used by assessment network, without cutting out or normalizing.
Brief description of the drawings
Fig. 1 is a kind of system framework based on the mobile robot continuous control method without map movement planner of the present invention Figure.
Fig. 2 is that the present invention is a kind of to be transported based on the mobile robot continuous control method without map movement planner without map The transfer function of dynamic planner.
Fig. 3 is a kind of extensive chemical based on the mobile robot continuous control method without map movement planner of the present invention Practise.
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
Fig. 1 is a kind of system framework based on the mobile robot continuous control method without map movement planner of the present invention Figure.Main to include without map movement planner, asynchronous depth determines Policy-Gradient, and intensified learning assesses network and reward function.
Based on the mobile robot continuous control method without map movement planner, only extract 10 and tie up range measurement and target Relative information is as reference, and no map movement planner strengthens learning method by asynchronous deep layer and starts anew to carry out end to end Training, and can directly export continuous straight line and angular speed.
Asynchronous depth determines Policy-Gradient, compared with original depth determines Policy-Gradient, sampling process is separated to another Individual thread;In training thread, each iterative step updates by the batch collected from buffering area and assesses network θQAnd actuator Network θuWeight;The prediction target for assessing network is according to reward riWith estimation Q value γ Q ' calculating;Q ' is next state st+1The output of the weight θ Q ' provided goal-based assessment network, and with the optimal action a of estimationt+1=u ' (si+1u′) target Actor network θuIt is used as input.
Actor network is updated by the Policy-Gradient for Batch conversion of sampling;Sample is collected thread parallel and performed, action Determined by actor network;Within the training time, random process N is added, the exploration to motion space is excited;New conversion is preserved To in by the shared response buffering area of thread of training and sample;Asynchronous depth determines that Policy-Gradient can also use multiple data to receive Collection thread realizes other asynchronous methods;Original depth determines that Policy-Gradient collects a sample in each backpropagation iteration, And parallel asynchronous depth determines that the sample that Policy-Gradient is collected in each step is more.
For assess network, predict state and action to Q values;Located using 3 neural net layers being fully connected Manage input state;The action merges in second neural net layer being fully connected;Q values swash eventually through linear activation primitive It is living:
Y=kx+b (1)
Wherein, x is the input of last layer, and y is the Q values of prediction, and k and b are the weight of training and the deviation of this layer.
Collided without barrier reward function, the target location that mobile robot attempts needed for reaching;Reward function has Three kinds of different conditions:
If robot is checked up to target by distance threshold, on the occasion of reward rReach, but if pass through minimum Range measurement checks that robot collides with barrier, then rewards r for negative valueCollision;The two conditions all can stop training; Otherwise, reward function and the difference of a upper time step distance, dt-1-dt, it is multiplied by hyper parameter cr;Reward function can lean on robot Close-target position;Reward function is directly used by assessment network, without cutting out or normalizing.
Fig. 2 is that the present invention is a kind of to be transported based on the mobile robot continuous control method without map movement planner without map The transfer function of dynamic planner.It is used as input by taking 10 dimension range measurements and target location without map movement planner, even Continuous diversion order is used as output;To carrying out end-to-end training without map movement planner, it may be directly applied to virtual and true In real environment;Can be by Mobile Robotics Navigation to required target without map movement planner, and will not be with any obstacle Thing collides.
To define transfer function without map movement planner:
vt=f (xt,pt,vt-1) (3)
Wherein, xtIt is the observed value of raw sensor data, ptIt is the relative position of target, vt-1In being final time step-length The speed of mobile robot;They can be considered as the immediate status of mobile robot;State is mapped directly to action by model, Speed v i.e. next timet;Effective movement planner must assure that control frequency, so that robot can be immediately to new sight Result is examined to make a response.
Fig. 3 is a kind of extensive chemical based on the mobile robot continuous control method without map movement planner of the present invention Practise.Abstract 10 dimension laser ranging results, previous action and relative target position be merged together as 14 dimension inputs to Amount;10 dimension laser ranging results use angular distribution with the original laser result between 90 degree and 90 degree, and ranging information is by normalizing Turn to (0,1);The two dimensional motion of each time step includes the angular speed and linear velocity of mobile robot;Two dimension target position with Polar coordinates (distance and angle) relative to mobile robot coordinate system are represented;It is fully connected in 3 with 512 nodes After neural net layer, input vector is sent to linear velocity and the angular speed order of mobile robot.
In order to constrain the angular velocity range in (- 1,1), activation primitive is used as using hyperbolic tangent function (tanh);In addition, The scope of linear speed is constrained in (0,1) by sigmoid function;Because lasing result can not cover the back region of mobile robot Domain, so can not be moved rearwards by;Output action is multiplied by two hyper parameters, determine that mobile robot is directly performed final linear and Angular speed;In view of real kinetic, selection 0.5m/s is used as maximum angular rate as maximum line velocity and 1rad/s.
For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and scope, the present invention can be realized with other concrete forms.In addition, those skilled in the art can be to this hair Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these are improved and modification also should be regarded as the present invention's Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention More and modification.

Claims (10)

1. it is a kind of based on the mobile robot continuous control method without map movement planner, it is characterised in that mainly including nothing Map movement planner (one);Asynchronous depth determines Policy-Gradient (two);Intensified learning (three);Assess network (four);Reward letter Number (five).
2. based on, based on the mobile robot continuous control method without map movement planner, it is special described in claims 1 Levy and be, only extract 10 and tie up range measurement and target relative information as reference, no map movement planner passes through asynchronous deep layer Strengthen learning method to start anew to be trained end to end, and can directly export continuous straight line and angular speed.
3. based on described in claims 1 without map movement planner (one), it is characterised in that by taking 10 dimension ranging knots Fruit and target location are as input, and continuous diversion order is used as output;To carrying out end-to-end training without map movement planner, It may be directly applied in virtual and true environment;Can be by Mobile Robotics Navigation to required mesh without map movement planner Mark, and will not be collided with any barrier.
4. based on the transfer function described in claims 3, it is characterised in that be without map movement planner definition conversion letter Number:
vt=f (xt,pt,vt-1) (1)
Wherein, xtIt is the observed value of raw sensor data, ptIt is the relative position of target, vt-1It is to move in final time step-length The speed of robot;They can be considered as the immediate status of mobile robot;State is mapped directly to action by model, i.e., under Speed v oncet;Effective movement planner must assure that control frequency, so that robot can be immediately to new observation knot Fruit is made a response.
5. Policy-Gradient (two) is determined based on the asynchronous depth described in claims 1, it is characterised in that determine with original depth Policy-Gradient is compared, and sampling process is separated into another thread;In training thread, each iterative step is by from buffering area The batch of collection, updates and assesses network θQWith actor network θuWeight;The prediction target for assessing network is according to reward riWith Estimate Q value γ Q ' calculating;Q ' is next state st+1The output of the weight θ Q ' provided goal-based assessment network, and to estimate The optimal action a of metert+1=u ' (si+1u′) target actor network θuIt is used as input.
6. collected based on the sample described in claims 5, it is characterised in that the plan that actor network passes through Batch conversion of sampling Gradient is omited to update;Sample is collected thread parallel and performed, and action is determined by actor network;Within the training time, addition is random Process N, excites the exploration to motion space;New conversion is saved in by the shared response buffering area of thread of training and sample; Asynchronous depth determines that Policy-Gradient can also use multiple Data Collection threads to realize other asynchronous methods;Original depth determines plan Slightly gradient collects a sample in each backpropagation iteration, and parallel asynchronous depth determines Policy-Gradient in each step The sample of collection is more.
7. based on the intensified learning (three) described in claims 1, it is characterised in that 10 abstract dimension laser ranging results, first Preceding action and relative target position is merged together as 14 dimensional input vectors;10 dimension laser ranging results are with 90 degree and 90 Original laser result between degree uses angular distribution, and ranging information is normalized to (0,1);The two dimension of each time step is moved Work includes the angular speed and linear velocity of mobile robot;Two dimension target position is with the polar coordinates relative to mobile robot coordinate system (distance and angle) is represented;After the neural net layer that 3 with 512 nodes are fully connected, input vector is sent to The linear velocity of mobile robot and angular speed order.
8. based on the laser ranging result described in claims 7, it is characterised in that in order to constrain the angular speed model in (- 1,1) Enclose, activation primitive is used as using hyperbolic tangent function (tanh);In addition, the scope of linear speed constrained in by sigmoid function (0, 1) in;Because lasing result can not cover the dorsal area of mobile robot, so can not be moved rearwards by;Output action is multiplied by two Individual hyper parameter, determines the final linear and angular speed that mobile robot is directly performed;In view of real kinetic, 0.5m/s is selected Maximum angular rate is used as maximum line velocity and 1rad/s.
9. based on the assessment network (four) described in claims 1, it is characterised in that for assessing network, predict state and Act to Q values;Input state is handled using 3 neural net layers being fully connected;The action merges at second completely In the neural net layer of connection;Q values are activated eventually through linear activation primitive:
Y=kx+b (2)
Wherein, x is the input of last layer, and y is the Q values of prediction, and k and b are the weight of training and the deviation of this layer.
10. based on the reward function (five) described in claims 1, it is characterised in that mobile robot is attempted needed for reaching Collided without barrier target location;Reward function has three kinds of different conditions:
If robot is checked up to target by distance threshold, on the occasion of reward rReach, but if pass through minimum ranging As a result check, robot collides with barrier, then reward r for negative valueCollision;The two conditions all can stop training;It is no Then, reward function and the difference of a upper time step distance, dt-1-dt, it is multiplied by hyper parameter cr;Reward function can make robot close Target location;Reward function is directly used by assessment network, without cutting out or normalizing.
CN201710294685.9A 2017-04-28 2017-04-28 It is a kind of based on the mobile robot continuous control method without map movement planner Withdrawn CN106950969A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710294685.9A CN106950969A (en) 2017-04-28 2017-04-28 It is a kind of based on the mobile robot continuous control method without map movement planner

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710294685.9A CN106950969A (en) 2017-04-28 2017-04-28 It is a kind of based on the mobile robot continuous control method without map movement planner

Publications (1)

Publication Number Publication Date
CN106950969A true CN106950969A (en) 2017-07-14

Family

ID=59477823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710294685.9A Withdrawn CN106950969A (en) 2017-04-28 2017-04-28 It is a kind of based on the mobile robot continuous control method without map movement planner

Country Status (1)

Country Link
CN (1) CN106950969A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107450593A (en) * 2017-08-30 2017-12-08 清华大学 A kind of unmanned plane autonomous navigation method and system
CN107490377A (en) * 2017-07-17 2017-12-19 五邑大学 Indoor map-free navigation system and navigation method
CN108287542A (en) * 2018-01-04 2018-07-17 浙江大学 Unmanned plane and unmanned boat cooperation control system and method based on collaboration cloud control
CN108320051A (en) * 2018-01-17 2018-07-24 哈尔滨工程大学 A kind of mobile robot dynamic collision-free planning method based on GRU network models
CN108536144A (en) * 2018-04-10 2018-09-14 上海理工大学 A kind of paths planning method of fusion dense convolutional network and competition framework
CN109085825A (en) * 2018-07-13 2018-12-25 安徽灵图壹智能科技有限公司 A kind of unmanned mine car mining optimal route selection method
CN109242098A (en) * 2018-07-25 2019-01-18 深圳先进技术研究院 Limit neural network structure searching method and Related product under cost
CN109241552A (en) * 2018-07-12 2019-01-18 哈尔滨工程大学 A kind of underwater robot motion planning method based on multiple constraint target
CN109668484A (en) * 2019-01-18 2019-04-23 北京瀚科瑞杰科技发展有限公司 A kind of target drone maneuvering control method and system that target drone is interacted with attack plane
CN110147891A (en) * 2019-05-23 2019-08-20 北京地平线机器人技术研发有限公司 Method, apparatus and electronic equipment applied to intensified learning training process
CN110488835A (en) * 2019-08-28 2019-11-22 北京航空航天大学 A kind of unmanned systems intelligence local paths planning method based on double reverse transmittance nerve networks
CN110753936A (en) * 2017-08-25 2020-02-04 谷歌有限责任公司 Batch reinforcement learning
CN110908384A (en) * 2019-12-05 2020-03-24 中山大学 Formation navigation method for distributed multi-robot collaborative unknown random maze
CN111515961A (en) * 2020-06-02 2020-08-11 南京大学 Reinforcement learning reward method suitable for mobile mechanical arm
CN112857370A (en) * 2021-01-07 2021-05-28 北京大学 Robot map-free navigation method based on time sequence information modeling
CN113093727A (en) * 2021-03-08 2021-07-09 哈尔滨工业大学(深圳) Robot map-free navigation method based on deep security reinforcement learning
CN113260936A (en) * 2018-12-26 2021-08-13 三菱电机株式会社 Mobile body control device, mobile body control learning device, and mobile body control method
TWI815613B (en) * 2022-08-16 2023-09-11 和碩聯合科技股份有限公司 Navigation method for robot and robot thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LEI TAI等: "Virtual-to-real Deep Reinforcement Learning:Continuous Control of Mobile Robots for Mapless Navigation", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1703.00420》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107490377A (en) * 2017-07-17 2017-12-19 五邑大学 Indoor map-free navigation system and navigation method
CN110753936A (en) * 2017-08-25 2020-02-04 谷歌有限责任公司 Batch reinforcement learning
CN107450593B (en) * 2017-08-30 2020-06-12 清华大学 Unmanned aerial vehicle autonomous navigation method and system
CN107450593A (en) * 2017-08-30 2017-12-08 清华大学 A kind of unmanned plane autonomous navigation method and system
CN108287542A (en) * 2018-01-04 2018-07-17 浙江大学 Unmanned plane and unmanned boat cooperation control system and method based on collaboration cloud control
CN108287542B (en) * 2018-01-04 2021-01-26 浙江大学 Unmanned aerial vehicle and unmanned ship cooperative control system and method based on cooperative cloud control
CN108320051B (en) * 2018-01-17 2021-11-23 哈尔滨工程大学 Mobile robot dynamic collision avoidance planning method based on GRU network model
CN108320051A (en) * 2018-01-17 2018-07-24 哈尔滨工程大学 A kind of mobile robot dynamic collision-free planning method based on GRU network models
CN108536144A (en) * 2018-04-10 2018-09-14 上海理工大学 A kind of paths planning method of fusion dense convolutional network and competition framework
CN109241552A (en) * 2018-07-12 2019-01-18 哈尔滨工程大学 A kind of underwater robot motion planning method based on multiple constraint target
CN109241552B (en) * 2018-07-12 2022-04-05 哈尔滨工程大学 Underwater robot motion planning method based on multiple constraint targets
CN109085825A (en) * 2018-07-13 2018-12-25 安徽灵图壹智能科技有限公司 A kind of unmanned mine car mining optimal route selection method
CN109242098A (en) * 2018-07-25 2019-01-18 深圳先进技术研究院 Limit neural network structure searching method and Related product under cost
CN113260936A (en) * 2018-12-26 2021-08-13 三菱电机株式会社 Mobile body control device, mobile body control learning device, and mobile body control method
CN113260936B (en) * 2018-12-26 2024-05-07 三菱电机株式会社 Moving object control device, moving object control learning device, and moving object control method
CN109668484A (en) * 2019-01-18 2019-04-23 北京瀚科瑞杰科技发展有限公司 A kind of target drone maneuvering control method and system that target drone is interacted with attack plane
CN109668484B (en) * 2019-01-18 2023-05-02 北京瀚科科技集团有限公司 Target aircraft maneuvering flight control method and system for interaction of target aircraft and attack aircraft
CN110147891A (en) * 2019-05-23 2019-08-20 北京地平线机器人技术研发有限公司 Method, apparatus and electronic equipment applied to intensified learning training process
CN110488835A (en) * 2019-08-28 2019-11-22 北京航空航天大学 A kind of unmanned systems intelligence local paths planning method based on double reverse transmittance nerve networks
CN110908384A (en) * 2019-12-05 2020-03-24 中山大学 Formation navigation method for distributed multi-robot collaborative unknown random maze
CN110908384B (en) * 2019-12-05 2022-09-23 中山大学 Formation navigation method for distributed multi-robot collaborative unknown random maze
CN111515961A (en) * 2020-06-02 2020-08-11 南京大学 Reinforcement learning reward method suitable for mobile mechanical arm
CN111515961B (en) * 2020-06-02 2022-06-21 南京大学 Reinforcement learning reward method suitable for mobile mechanical arm
CN112857370A (en) * 2021-01-07 2021-05-28 北京大学 Robot map-free navigation method based on time sequence information modeling
CN113093727A (en) * 2021-03-08 2021-07-09 哈尔滨工业大学(深圳) Robot map-free navigation method based on deep security reinforcement learning
TWI815613B (en) * 2022-08-16 2023-09-11 和碩聯合科技股份有限公司 Navigation method for robot and robot thereof

Similar Documents

Publication Publication Date Title
CN106950969A (en) It is a kind of based on the mobile robot continuous control method without map movement planner
CN113110509B (en) Warehousing system multi-robot path planning method based on deep reinforcement learning
CN108279692B (en) UUV dynamic planning method based on LSTM-RNN
Brunner et al. Teaching a machine to read maps with deep reinforcement learning
CN104155998B (en) A kind of path planning method based on potential field method
CN108645413A (en) The dynamic correcting method of positioning and map building while a kind of mobile robot
CN106873585A (en) One kind navigation method for searching, robot and system
Saulnier et al. Information theoretic active exploration in signed distance fields
CN110095120A (en) Biology of the Autonomous Underwater aircraft under ocean circulation inspires Self-organizing Maps paths planning method
CN110515382A (en) A kind of smart machine and its localization method
CN114879660B (en) Robot environment sensing method based on target drive
Wang Automatic control of mobile robot based on autonomous navigation algorithm
Klein Data-driven meets navigation: Concepts, models, and experimental validation
CN107562837B (en) Maneuvering target tracking algorithm based on road network
Jiang et al. Intelligent Plant Cultivation Robot Based on Key Marker Algorithm Using Visual and Laser Sensors
CN114594776B (en) Navigation obstacle avoidance method based on layering and modular learning
CN114153216B (en) Lunar surface path planning system and method based on deep reinforcement learning and block planning
Kim et al. Path integration mechanism with coarse coding of neurons
CN115690343A (en) Robot laser radar scanning and mapping method based on visual following
Chauvin-Hameau Informative path planning for algae farm surveying
CN112907644B (en) Machine map-oriented visual positioning method
Abidin et al. A calibration framework for swarming ASVs’ system design
El-Fakdi et al. Autonomous underwater vehicle control using reinforcement learning policy search methods
Kashyap et al. Modified type-2 fuzzy controller for intercollision avoidance of single and multi-humanoid robots in complex terrains
KR20220090732A (en) Method and system for determining action of device for given state using model trained based on risk measure parameter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20170714