CN109814565A - The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study - Google Patents

The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study Download PDF

Info

Publication number
CN109814565A
CN109814565A CN201910091342.1A CN201910091342A CN109814565A CN 109814565 A CN109814565 A CN 109814565A CN 201910091342 A CN201910091342 A CN 201910091342A CN 109814565 A CN109814565 A CN 109814565A
Authority
CN
China
Prior art keywords
depth
network
unmanned boat
learning network
barrier avoiding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910091342.1A
Other languages
Chinese (zh)
Inventor
黄志坚
随博文
温家一
吴恭兴
张桂臣
刘雁集
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN201910091342.1A priority Critical patent/CN109814565A/en
Publication of CN109814565A publication Critical patent/CN109814565A/en
Pending legal-status Critical Current

Links

Landscapes

  • Feedback Control In General (AREA)

Abstract

The present invention proposes a kind of depth Q learning network method of space and the driving of time-division double fluid big data to realize unmanned boat autonomous intelligence navigation control that high-precision is navigated lower, and specific steps include: sampling space-time double fluid big data, projected depth Q learning network intelligent barrier avoiding controller, design Reward-Penalty Functions, design intelligence switching threshold function, on-line study.The present invention may be implemented: unmanned boat being allowed to navigate by water when spacious waters under high accuracy positioning navigation;When complex water areas, by unmanned boat, the automatic obstacle avoiding under intelligent barrier avoiding mode navigates by water depth Q learning network intelligent barrier avoiding controller;And the real-time evaluation of risk factor can be assessed according to Environment features, so that real-time intelligent switches between both modes.In addition, depth Q learning network intelligent barrier avoiding controller has the artificial intelligence of self-learning capability and height.Finally, this method is preferable to the compatibility of spot ship navigation control system, realize that the software and hardware resources requirement of this method is also relatively easy.

Description

The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study
Technical field
The present invention relates to a kind of unmanned boat intelligence navigation control methods of space-time double fluid data-driven depth Q study, especially It is a kind of under high accuracy positioning navigation, using space and the driving of time-division double fluid real-time sampling data, is based on depth Q learning network Unmanned boat intelligence navigation control method.Belong to unmanned boat field of intelligent control technology.
Background technique
Ship under allowing high accuracy positioning to navigate possesses the observation ability and intelligence of the mankind, and independent of driver's Lookout and steering pass through the complicated water surface and realize autonomous intelligence navigation and avoidance, be not an easy thing.Since the water surface is opened Wealthy and Obstacle Position is changeable, and unmanned boat cannot depend on lane detection as unmanned vehicle;Also without image of Buddha Boston power Robot carries out 3D modeling like that or the reference effect of 3D modeling is very limited;It is the unmanned boat of representative with cloud continent intelligence, adopts With high precision location navigation and Radar Collision Avoidance, new theory and technology is also needed to support in terms of intelligence.
Past, people use Automatic Control Theory and modern control theory method, realize the closed loop feedback of ship's navigation Control and System design based on model;Later, the adaptive control algorithms such as least square method, SVM, ant group algorithm, make Ship has the ability of adaptive path planning.Now, through being found to existing patent retrieval, application No. is 201710502348.4 and 201810454631.9 patent, invented a kind of unmanned boat barrier-avoiding method based on image vision and Device, but they require extremely complex traditional images Processing Algorithm to calculate the coordinate position of barrier.Application No. is 201710458496.0 patent, has invented a kind of unmanned boat method for lateral control based on enhancing learning algorithm, and enhancing is learned It practises controller and uses Actor-Critic structure, and need the model of controlled system.Application No. is 201810008481.9 Patent has invented a kind of collaboration cloud control system of unmanned boat autonomous navigation, but it needs bank end, boat-carrying, communication and collaboration cloud The common interaction and effect of the extremely complex system such as control system and information.Application No. is 201710691295.5, 201711285895.8 and 201810160232.1 patent has invented the autonomous navigation system and method for a kind of unmanned boat, but All without using artificial intelligence approach.
Summary of the invention
With the development of artificial intelligence and deep learning theory, to overcome the shortcomings of that prior art and defect, the present invention mention A kind of unmanned boat intelligence navigation control method based on depth Q learning network is adopted in real time by 360 ° of pulse laser laser welders out Space and time-division double fluid big data information of the sample unmanned boat relative to obstacle distance in ambient enviroment inputs to and specially designs Depth Q learning network intelligent barrier avoiding controller, is largely emulated and the intensified learning under thread environment, will be according to presetting Threshold value, real-time intelligent switches between high accuracy positioning navigation is intelligent barrier avoiding sail mode, and finally realize it is pilotless under It is entirely autonomous intelligence navigation, with height learning ability and artificial intelligence.
In order to achieve the goal above, the present invention is achieved by the following technical solutions:
A kind of unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study, its main feature is that, this method Include the following steps:
S1 samples space-time double fluid big data: by 360 ° of pulse laser laser welders being mounted at the top of unmanned boat, using pre- If angular resolution scans unmanned boat and ambient enviroment distance dtSpace big data, that is, measure every frame N-dimensional unmanned boat and week Collarette border distance dtSpace big data;Pass through adjacent two frames d againtThe poor o of datat=dt-dt-1, measure nobody of every frame N-dimensional Ship and ambient enviroment speed of related movement otTime-division big data;
S2 designs a depth Q learning network intelligent barrier avoiding controller: using space length stream convolutional neural networks in parallel With when the component velocity stream convolutional neural networks and concatenated full articulamentum neural network of subsequent one, carry out projected depth Q learning network Intelligent barrier avoiding controller;
S3, design Reward-Penalty Functions: Reward-Penalty Functions indicate that it is for being oriented to depth Q learning network intelligent barrier avoiding control with scalar r The learning process of device processed makes evaluation to the quality of the taken movement of depth Q learning network intelligent barrier avoiding controller;
S4, design intelligent switching threshold function: the intelligent switching threshold function is used for according to its threshold value, in depth Q Real-time intelligent switches between learning network intelligent barrier avoiding controller and high accuracy positioning navigation controller, and finally realizes unmanned boat Entirely autonomous intelligence navigation under pilotless;
On-line study: S5 for the on-line study process for describing depth Q learning network intelligent barrier avoiding controller, need to define shape State variable S, memory playback library D, valuation functions Q (st,at) it is as follows:
S=[dt,ot], t=0,1,2 ... (1)
D=[(s, a, r, s', a') ... ...], t=0,1,2 ... (2)
And design a network be used as depth Q learning network intelligent barrier avoiding controller current value network, while design one and Its completely identical in structure network, as depth Q learning network intelligent barrier avoiding controller target value network.
N-dimensional d in the step S2tThe far design data of input is space length stream convolutional Neural in parallel The input of network, the network have 2 layers, are respectively from output is input to: N-dimensional dtInput ties up middle layer after the layer of convolution pond for M, The M ties up middle layer and ties up output layer after the layer of convolution pond for M/2;
N-dimensional otThe input of component velocity stream convolutional neural networks when the when component velocity big data of input is designed as in parallel, should Network also has 2 layers, is respectively from output is input to: N-dimensional otInput ties up middle layer after the layer of convolution pond for M, which ties up middle layer Output layer is tieed up after the layer of convolution pond for M/2;
Space length stream convolutional neural networks M/2 dimension output layer and when component velocity stream convolutional neural networks M/2 dimension it is defeated Layer is in parallel out, forms a M dimension input layer, by a full articulamentum neural network, realizes to 5 dimension output layer Q (s, a;W) Output, the 5 dimension output are the steering that " stopping, forward and backward, right, left " is executed to unmanned boat and the Q of motion control execution signal respectively Value estimation.
If unmanned boat successfully avoids the barrier of ambient enviroment in the step S3, it is evaluated as r=1;If unmanned boat is hit The barrier of upper ambient enviroment, is evaluated as r=-1;It is other as a result, being then evaluated as r=0, the depth Q learning network is intelligently kept away The purpose of barrier controller seeks to the sum of the Reward-Penalty Functions value for obtaining unmanned boat maximum;
In the step S4, real-time evaluation of risk factor ξ is first designed are as follows:
Wherein, [0,1] k ∈ is used to indicate the susceptibility to relative velocity, and the real-time evaluation of risk factor ξ is bigger, touches The risk hit is higher;
And switching threshold λ=0.091 of intelligent switching threshold function is set, as ξ >=λ, execute depth Q learning network intelligence The control signal a=a of energy avoidance obstacle device output1;As ξ < λ, the control letter of high accuracy positioning navigation controller output is executed Number a=a2, it is shown below:
In the step S6, learning process recycles execution as follows:
Step S6.1, initialization memory playback library D is full 0 matrix, learns net with small pseudo random number random initializtion depth Q The connection weight parameter w and depth Q learning network intelligent barrier avoiding controller target value of network intelligent barrier avoiding controller current value network The connection weight parameter w of network-
The acquired time-division big data information of 360 ° of pulse laser laser welders of unmanned boat is passed to depth Q by step S6.2 The input of learning network intelligent barrier avoiding controller current value network and depth Q learning network intelligent barrier avoiding controller target value network End, obtains the characteristic information (s, a, r, s', a') of environment, and is stored in memory playback library D, wherein s' is lower a period of time The state variable at quarter, a' are the control signals of subsequent time output;
Step S6.3 takes out a collection of sample as learning data at random from memory playback library D;
Step S6.4, with depth Q learning network intelligent barrier avoiding controller current value network query function Q (s, a;W), with depth Q Learning network intelligent barrier avoiding controller target value network query function: y=r+ γ maxa'Q(s',a';w-), wherein γ be discount because Son;
Step S6.5, with I=(r+ γ maxa'Q(s',a';w-)-Q(s,a;w))2It is damaged for the model of depth Q learning network Function, and loss function based on this model are lost, is learnt using stochastic gradient descent algorithm, Lai Tisheng depth Q learning network intelligence The connection weight parameter w of avoidance obstacle device current value network;
Step S6.6 is walked every N by the connection weight parameter of depth Q learning network intelligent barrier avoiding controller current value network W is assigned to the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller target value network-
Compared with prior art, the present invention having the advantage that
The present invention is big relative to the space of obstacle distance in ambient enviroment and time-division double fluid by real-time sampling unmanned boat Data information is given the complete scheme and method for realizing the navigation of unmanned boat autonomous intelligence, and is realized based on depth Q learning network The control of unmanned boat autonomous intelligence navigation.Unmanned boat is allowed to navigate by water when spacious waters under high accuracy positioning navigation;When complex water areas, By unmanned boat, the automatic obstacle avoiding under intelligent barrier avoiding mode navigates by water depth Q learning network intelligent barrier avoiding controller;And it can be according to environment The real-time evaluation of risk factor is assessed in sampling, so that real-time intelligent switches under both modes.In addition, depth Q learning network intelligence Can avoidance obstacle device there is self-learning capability, largely simulated and the feedback learning under thread environment, be finally able to achieve ship and exist Entirely autonomous intelligence navigation, the artificial intelligence with height in various water environments under pilotless.Finally, this method is to existing The compatibility for having ship's navigation control system preferably, realizes that the software and hardware resources requirement of this method is also relatively easy.
Detailed description of the invention
Fig. 1 is the unmanned boat intelligently navigation control of space-time double fluid data-driven depth Q study under present invention high-precision is navigated The structure and schematic illustration of method.
Fig. 2 is the structural schematic diagram of depth Q learning network intelligent barrier avoiding controller of the present invention.
Fig. 3 is the schematic illustration of depth Q learning network intelligent barrier avoiding controller on-line study of the present invention.
In Fig. 1,1- unmanned boat;2- ambient enviroment;The relative position of 3- and high-precision navigation;4-360 ° of pulsed laser ranging Instrument data ot,dt;5- depth Q learning network intelligent barrier avoiding controller;6- space length stream convolutional neural networks;Component velocity when 7- Flow convolutional neural networks;The full articulamentum neural network of 8-;9- stops, is forward and backward, is right, is left;10- high accuracy positioning navigation controller; 11- intelligence switching threshold function.
In Fig. 2,12-200 ties up dtInput;13- convolution pond layer;14-128 ties up middle layer;15- convolution pond layer;16-64 Tie up output layer;17-200 ties up otInput;18- convolution pond layer;19-128 ties up middle layer;20- convolution pond layer;21-64 dimension is defeated Layer out;22-128 ties up input layer;The full articulamentum neural network of 23-;24-5 ties up output layer Q (s, a;w).
In Fig. 3,2- ambient enviroment;25- memory playback library D;26- depth Q learning network intelligent barrier avoiding controller it is current It is worth network;The target value network of 27- depth Q learning network intelligent barrier avoiding controller;The model of 28- depth Q learning network loses Function.
Specific embodiment
The present invention is further elaborated by the way that a preferable specific embodiment is described in detail below in conjunction with attached drawing.
1, space-time double fluid big data is sampled
By being mounted on 360 ° of pulse laser laser welders at 1 top of unmanned boat, nobody is scanned using 1.8 ° of angular resolutions Ship 1 and 2 distance d of ambient enviromenttSpace big data, that is, measure the unmanned boat 1 and 2 distance d of ambient enviroment that every frame 200 is tieed uptSky Between big data.Pass through adjacent two frames d againtThe poor o of datat=dt-dt-1, measure the unmanned boat 1 and ambient enviroment 2 that every frame 200 is tieed up Speed of related movement otTime-division big data, wherein subscript t indicate sampling instant t.360 ° of pulse laser laser welders as shown in figure 1 Data ot,dtShown in 4.
2, projected depth Q learning network intelligent barrier avoiding controller 5
Using space length stream convolutional neural networks 6 in parallel with when component velocity stream convolutional neural networks 7 and subsequent one Concatenated full articulamentum neural network 8 carrys out projected depth Q learning network intelligent barrier avoiding controller 5.Depth Q learning network intelligence The specific structure and principle of avoidance obstacle device 5 are as shown in Figure 2.
Wherein, 200 dimension dtThe far design data of input 12 is space length stream convolutional neural networks 6 in parallel Input, which has 2 layers, is respectively from output is input to: 200 dimension dtInput 12 is after convolution pond layer 13 in 128 dimensions Interbed 14, the 128 dimension middle layer 14 are 64 dimension output layers 16 after convolution pond layer 15.
200 dimension otComponent velocity stream convolutional neural networks 7 is defeated when the when component velocity big data of input 17 is designed as in parallel Enter, which also there are 2 layers, is respectively from output is input to: 200 dimension otInput 17 is intermediate for 128 dimensions after convolution pond layer 18 Layer 19, the 128 dimension middle layer 19 are 64 dimension output layers 21 after convolution pond layer 20.
64 dimension output layers 16 of space length stream convolutional neural networks 6 and when component velocity stream convolutional neural networks 7 64 dimensions Output layer 21 is in parallel, forms one 128 dimension input layer 22, by a full articulamentum neural network 23, realizes to 5 dimension output layers Q(s,a;W) 24 output.This 5 dimension output is that the steering and motion control of " stopping, forward and backward, right, left " 9 are executed to unmanned boat respectively Execute the Q value estimation of signal.Wherein, S is the state variable in formula (1), and a is control signal, and w is depth Q learning network intelligence The connection weight parameter of avoidance obstacle device current value network.
3, Reward-Penalty Functions are designed
Reward-Penalty Functions indicate that it is used to be oriented to the learning process of depth Q learning network intelligent barrier avoiding controller 5 with scalar r, Evaluation is made to the quality of the taken movement of depth Q learning network intelligent barrier avoiding controller 5.
Wherein, if unmanned boat 1 successfully avoids the barrier of ambient enviroment 2, it is evaluated as r=1;Around if unmanned boat 1 knocks The barrier of environment 2, is evaluated as r=-1;It is other as a result, being then evaluated as r=0.Depth Q learning network intelligent barrier avoiding controller 5 The sum of the purpose Reward-Penalty Functions value that seeks to make unmanned boat 1 to obtain maximum.
4, intelligent switching threshold function 11 is designed
Intelligent switching threshold function 11 is used for according to its threshold value, in depth Q learning network intelligent barrier avoiding controller 5 and height Real-time intelligent switches between precision location navigation controller 10, and finally realizes that unmanned boat 1 is entirely autonomous under pilotless Intelligence navigation.
First design real-time evaluation of risk factor ξ are as follows:
Wherein, [0,1] k ∈ is used to indicate the susceptibility to relative velocity, generally takes k=0.9 here.So real-time wind Danger estimation factor ξ is bigger, and the risk of collision is higher.
And set switching threshold λ=0.091 of intelligent switching threshold function 11.As ξ >=λ, depth Q learning network is executed The control signal a=a that intelligent barrier avoiding controller 5 exports1;As ξ < λ, execute what high accuracy positioning navigation controller 10 exported Control signal a=a2, it is shown below:
5, on-line study
For the on-line study process for describing depth Q learning network intelligent barrier avoiding controller 5, Fig. 3 and state variable need to be defined S, memory playback library D25, valuation functions Q (st,at) it is as follows:
S=[dt,ot], t=0,1,2 ... (4)
D=[(s, a, r, s', a') ... ...], t=0,1,2 ... (5)
And using 26 in Fig. 3 as depth Q learning network intelligent barrier avoiding controller current value network 26, while designing one With its completely identical in structure network, as depth Q learning network intelligent barrier avoiding controller target value network 27.
Learning process is recycled execution by following 6 steps:
1) initialization memory playback library D25 is full 0 matrix, with small pseudo random number random initializtion depth Q learning network intelligence The connection weight parameter w and depth Q learning network intelligent barrier avoiding controller target value net of energy avoidance obstacle device current value network 26 The connection weight parameter w of network 27-
2) by the acquired time-division big data information of 360 ° of pulse laser laser welders of unmanned boat 1, depth Q study net is passed to The input of network intelligent barrier avoiding controller current value network 26 and depth Q learning network intelligent barrier avoiding controller target value network 27 End, obtains the characteristic information (s, a, r, s', a') of environment, and is stored in memory playback library D25.Wherein, s' is next The state variable at moment, a' are the control signals of subsequent time output.
3) a collection of sample is taken out at random from memory playback library D25 as learning data.
4) Q (s, a are calculated with depth Q learning network intelligent barrier avoiding controller current value network 26;W), learn net with depth Q Network intelligent barrier avoiding controller target value network 27 calculates: y=r+ γ maxa'Q(s',a';w-).Wherein, γ is discount factor, one As take γ=0.9.
5) with I=(r+ γ maxa'Q(s',a';w-)-Q(s,a;w))2For the model loss function of depth Q learning network 28, and loss function based on this model, learnt using stochastic gradient descent algorithm, Lai Tisheng depth Q learning network intelligent barrier avoiding The connection weight parameter w of controller current value network 26.
6) it walks every N by the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller current value network 26, assigns It is worth to the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller target value network 27-
It is discussed in detail although the contents of the present invention have passed through above preferred embodiment, but it should be appreciated that above-mentioned Description is not considered as limitation of the present invention.After those skilled in the art have read above content, for of the invention A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.

Claims (5)

1. a kind of unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study, which is characterized in that this method Include the following steps:
S1 samples space-time double fluid big data: by 360 ° of pulse laser laser welders being mounted at the top of unmanned boat, using preset angle Spend resolution scan unmanned boat and ambient enviroment distance dtSpace big data, that is, measure every frame N-dimensional unmanned boat and surrounding ring Border distance dtSpace big data;Pass through adjacent two frames d againtThe poor o of datat=dt-dt-1, measure the unmanned boat of every frame N-dimensional with Ambient enviroment speed of related movement otTime-division big data;Wherein, subscript t indicates sampling instant t;
S2 designs a depth Q learning network intelligent barrier avoiding controller: using in parallel space length stream convolutional neural networks and when Component velocity stream convolutional neural networks and the concatenated full articulamentum neural network of subsequent one carry out projected depth Q learning network intelligence Avoidance obstacle device;
S3, design Reward-Penalty Functions: Reward-Penalty Functions indicate that it is for being oriented to depth Q learning network intelligent barrier avoiding controller with scalar r Learning process, evaluation is made to the quality of the taken movement of depth Q learning network intelligent barrier avoiding controller;
S4, design intelligent switching threshold function: the intelligent switching threshold function is used to be learnt according to its threshold value in depth Q Real-time intelligent switches between network intelligence avoidance obstacle device and high accuracy positioning navigation controller, and finally realizes unmanned boat in nothing Entirely autonomous intelligence navigation under people's manipulation;
On-line study: S5 for the on-line study process for describing depth Q learning network intelligent barrier avoiding controller, needs definition status to become Measure S, memory playback library D, valuation functions Q (st,at) it is as follows:
S=[dt,ot], t=0,1,2 ... (1)
D=[(s, a, r, s', a') ... ...], t=0,1,2 ... (2)
Wherein, subscript t indicates sampling instant t, so stIndicate the state variable of t moment;atIndicate the control signal of t moment;
And a network is designed as depth Q learning network intelligent barrier avoiding controller current value network, while designing one and tying with it The identical network of structure, as depth Q learning network intelligent barrier avoiding controller target value network.
2. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special Sign is, N-dimensional d in the step S2tThe far design data of input is space length stream convolutional Neural in parallel The input of network, the network have 2 layers, are respectively from output is input to: N-dimensional dtInput ties up middle layer after the layer of convolution pond for M, The M ties up middle layer and ties up output layer after the layer of convolution pond for M/2;
N-dimensional otThe input of component velocity stream convolutional neural networks, the network when component velocity big data of input is designed as in parallel There are 2 layers, is respectively from output is input to: N-dimensional otInput ties up middle layer after the layer of convolution pond for M, which ties up middle layer through convolution Output layer is tieed up after the layer of pond for M/2;
Space length stream convolutional neural networks M/2 dimension output layer and when component velocity stream convolutional neural networks M/2 tie up output layer Parallel connection forms a M dimension input layer, by a full articulamentum neural network, realizes to 5 dimension output layer Q (s, a;W) defeated Out, the 5 dimension output is the steering that " stopping, forward and backward, right, left " is executed to unmanned boat and the Q value of motion control execution signal respectively Estimation.Wherein, S is the state variable in formula (1), and a is control signal, and w is that depth Q learning network intelligent barrier avoiding controller is current It is worth the connection weight parameter of network.
3. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special Sign is, if unmanned boat successfully avoids the barrier of ambient enviroment in the step S3, is evaluated as r=1;If unmanned boat knocks The barrier of ambient enviroment, is evaluated as r=-1;It is other as a result, be then evaluated as r=0, the depth Q learning network intelligent barrier avoiding The purpose of controller seeks to the sum of the Reward-Penalty Functions value for obtaining unmanned boat maximum;
4. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special Sign is, in the step S4, first designs real-time evaluation of risk factor ξ are as follows:
Wherein, [0,1] k ∈ is used to indicate the susceptibility to relative velocity, and the real-time evaluation of risk factor ξ is bigger, collision Risk is higher;
And switching threshold λ=0.091 of intelligent switching threshold function is set, as ξ >=λ, executes depth Q learning network and intelligently keep away Hinder the control signal a=a of controller output1;As ξ < λ, the control signal a of high accuracy positioning navigation controller output is executed =a2, it is shown below:
5. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special Sign is that in the step S6, learning process recycles execution as follows:
Step S6.1, initialization memory playback library D is full 0 matrix, with small pseudo random number random initializtion depth Q learning network intelligence The connection weight parameter w and depth Q learning network intelligent barrier avoiding controller target value network of energy avoidance obstacle device current value network Connection weight parameter w-
The acquired time-division big data information of 360 ° of pulse laser laser welders of unmanned boat is passed to depth Q study by step S6.2 The input terminal of network intelligence avoidance obstacle device current value network and depth Q learning network intelligent barrier avoiding controller target value network, The characteristic information (s, a, r, s', a') of environment is obtained, and is stored in memory playback library D, wherein s' is subsequent time State variable, a' are the control signals of subsequent time output;
Step S6.3 takes out a collection of sample as learning data at random from memory playback library D;
Step S6.4, with depth Q learning network intelligent barrier avoiding controller current value network query function Q (s, a;W), learnt with depth Q Network intelligence avoidance obstacle device target value network query function: y=r+ γ maxa'Q(s',a';w-), wherein γ is discount factor;
Step S6.5, with I=(r+ γ maxa'Q(s',a';w-)-Q(s,a;w))2Letter is lost for the model of depth Q learning network Number, and loss function based on this model, are learnt, Lai Tisheng depth Q learning network intelligent barrier avoiding using stochastic gradient descent algorithm The connection weight parameter w of controller current value network;
Step S6.6 is assigned every N step by the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller current value network It is worth to the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller target value network-
CN201910091342.1A 2019-01-30 2019-01-30 The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study Pending CN109814565A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910091342.1A CN109814565A (en) 2019-01-30 2019-01-30 The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910091342.1A CN109814565A (en) 2019-01-30 2019-01-30 The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study

Publications (1)

Publication Number Publication Date
CN109814565A true CN109814565A (en) 2019-05-28

Family

ID=66606011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910091342.1A Pending CN109814565A (en) 2019-01-30 2019-01-30 The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study

Country Status (1)

Country Link
CN (1) CN109814565A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm
CN110645981A (en) * 2019-10-15 2020-01-03 四方智能(武汉)控制技术有限公司 Unmanned ship navigation system and method for cleaning pile foundation type waterborne photovoltaic module
CN110826609A (en) * 2019-10-29 2020-02-21 华中科技大学 Double-flow feature fusion image identification method based on reinforcement learning
CN111026127A (en) * 2019-12-27 2020-04-17 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN111275249A (en) * 2020-01-15 2020-06-12 吉利汽车研究院(宁波)有限公司 Driving behavior optimization method based on DQN neural network and high-precision positioning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107553490A (en) * 2017-09-08 2018-01-09 深圳市唯特视科技有限公司 A kind of monocular vision barrier-avoiding method based on deep learning
CN108921037A (en) * 2018-06-07 2018-11-30 四川大学 A kind of Emotion identification method based on BN-inception binary-flow network
CN109263826A (en) * 2018-08-30 2019-01-25 武汉理工大学 Ship Intelligent Collision Avoidance system and method based on maneuverability modeling

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107553490A (en) * 2017-09-08 2018-01-09 深圳市唯特视科技有限公司 A kind of monocular vision barrier-avoiding method based on deep learning
CN108921037A (en) * 2018-06-07 2018-11-30 四川大学 A kind of Emotion identification method based on BN-inception binary-flow network
CN109263826A (en) * 2018-08-30 2019-01-25 武汉理工大学 Ship Intelligent Collision Avoidance system and method based on maneuverability modeling

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
YUANDA WANG等: "Learning to Navigate Through Complex Dynamic Environment With Modular Deep Reinforcement Learning", 《 IEEE TRANSACTIONS ON GAMES》 *
刘志荣等: "基于深度Q学习的移动机器人路径规划", 《测控技术》 *
张亚初等: "基于双流卷积神经网络的智能小车避障算法研究", 《新技术新工艺》 *
张浩杰等: "基于深度Q网络学习的机器人端到端控制方法", 《仪器仪表学报》 *
翟军勇等: "基于神经网络多模型自适应切换控制研究", 《中国电机工程学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm
CN110645981A (en) * 2019-10-15 2020-01-03 四方智能(武汉)控制技术有限公司 Unmanned ship navigation system and method for cleaning pile foundation type waterborne photovoltaic module
CN110826609A (en) * 2019-10-29 2020-02-21 华中科技大学 Double-flow feature fusion image identification method based on reinforcement learning
CN110826609B (en) * 2019-10-29 2023-03-24 华中科技大学 Double-current feature fusion image identification method based on reinforcement learning
CN111026127A (en) * 2019-12-27 2020-04-17 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN111026127B (en) * 2019-12-27 2021-09-28 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN111275249A (en) * 2020-01-15 2020-06-12 吉利汽车研究院(宁波)有限公司 Driving behavior optimization method based on DQN neural network and high-precision positioning

Similar Documents

Publication Publication Date Title
CN109814565A (en) The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study
Ruan et al. Mobile robot navigation based on deep reinforcement learning
CN106970615B (en) A kind of real-time online paths planning method of deeply study
US20190147610A1 (en) End-to-End Tracking of Objects
Wang et al. Cooperative USV–UAV marine search and rescue with visual navigation and reinforcement learning-based control
Eresen et al. Autonomous quadrotor flight with vision-based obstacle avoidance in virtual environment
CN114384920A (en) Dynamic obstacle avoidance method based on real-time construction of local grid map
Kelchtermans et al. How hard is it to cross the room?--Training (Recurrent) Neural Networks to steer a UAV
CN116263335A (en) Indoor navigation method based on vision and radar information fusion and reinforcement learning
Qu et al. Pursuit-evasion game strategy of USV based on deep reinforcement learning in complex multi-obstacle environment
Ji-Yong et al. Design and vision based autonomous capture of sea organism with absorptive type remotely operated vehicle
Sans-Muntadas et al. Learning an AUV docking maneuver with a convolutional neural network
Yan et al. Reinforcement Learning‐Based Autonomous Navigation and Obstacle Avoidance for USVs under Partially Observable Conditions
Lan et al. Path planning for underwater gliders in time-varying ocean current using deep reinforcement learning
Katyal et al. High-speed robot navigation using predicted occupancy maps
Yang et al. Autonomous UAV navigation in dynamic environments with double deep Q-networks
Pal et al. Mobile robot navigation using a neural net
CN116679711A (en) Robot obstacle avoidance method based on model-based reinforcement learning and model-free reinforcement learning
Patil et al. Deep reinforcement learning for continuous docking control of autonomous underwater vehicles: a benchmarking study
Tan et al. A local path planning method based on Q-learning
CN113467462B (en) Pedestrian accompanying control method and device for robot, mobile robot and medium
CN113674310B (en) Four-rotor unmanned aerial vehicle target tracking method based on active visual perception
de Oliveira et al. A robot architecture for outdoor competitions
Song et al. Surface path tracking method of autonomous surface underwater vehicle based on deep reinforcement learning
CN111611869B (en) End-to-end monocular vision obstacle avoidance method based on serial deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190528