CN109814565A - The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study - Google Patents
The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study Download PDFInfo
- Publication number
- CN109814565A CN109814565A CN201910091342.1A CN201910091342A CN109814565A CN 109814565 A CN109814565 A CN 109814565A CN 201910091342 A CN201910091342 A CN 201910091342A CN 109814565 A CN109814565 A CN 109814565A
- Authority
- CN
- China
- Prior art keywords
- depth
- network
- unmanned boat
- learning network
- barrier avoiding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Feedback Control In General (AREA)
Abstract
The present invention proposes a kind of depth Q learning network method of space and the driving of time-division double fluid big data to realize unmanned boat autonomous intelligence navigation control that high-precision is navigated lower, and specific steps include: sampling space-time double fluid big data, projected depth Q learning network intelligent barrier avoiding controller, design Reward-Penalty Functions, design intelligence switching threshold function, on-line study.The present invention may be implemented: unmanned boat being allowed to navigate by water when spacious waters under high accuracy positioning navigation;When complex water areas, by unmanned boat, the automatic obstacle avoiding under intelligent barrier avoiding mode navigates by water depth Q learning network intelligent barrier avoiding controller;And the real-time evaluation of risk factor can be assessed according to Environment features, so that real-time intelligent switches between both modes.In addition, depth Q learning network intelligent barrier avoiding controller has the artificial intelligence of self-learning capability and height.Finally, this method is preferable to the compatibility of spot ship navigation control system, realize that the software and hardware resources requirement of this method is also relatively easy.
Description
Technical field
The present invention relates to a kind of unmanned boat intelligence navigation control methods of space-time double fluid data-driven depth Q study, especially
It is a kind of under high accuracy positioning navigation, using space and the driving of time-division double fluid real-time sampling data, is based on depth Q learning network
Unmanned boat intelligence navigation control method.Belong to unmanned boat field of intelligent control technology.
Background technique
Ship under allowing high accuracy positioning to navigate possesses the observation ability and intelligence of the mankind, and independent of driver's
Lookout and steering pass through the complicated water surface and realize autonomous intelligence navigation and avoidance, be not an easy thing.Since the water surface is opened
Wealthy and Obstacle Position is changeable, and unmanned boat cannot depend on lane detection as unmanned vehicle;Also without image of Buddha Boston power
Robot carries out 3D modeling like that or the reference effect of 3D modeling is very limited;It is the unmanned boat of representative with cloud continent intelligence, adopts
With high precision location navigation and Radar Collision Avoidance, new theory and technology is also needed to support in terms of intelligence.
Past, people use Automatic Control Theory and modern control theory method, realize the closed loop feedback of ship's navigation
Control and System design based on model;Later, the adaptive control algorithms such as least square method, SVM, ant group algorithm, make
Ship has the ability of adaptive path planning.Now, through being found to existing patent retrieval, application No. is
201710502348.4 and 201810454631.9 patent, invented a kind of unmanned boat barrier-avoiding method based on image vision and
Device, but they require extremely complex traditional images Processing Algorithm to calculate the coordinate position of barrier.Application No. is
201710458496.0 patent, has invented a kind of unmanned boat method for lateral control based on enhancing learning algorithm, and enhancing is learned
It practises controller and uses Actor-Critic structure, and need the model of controlled system.Application No. is 201810008481.9
Patent has invented a kind of collaboration cloud control system of unmanned boat autonomous navigation, but it needs bank end, boat-carrying, communication and collaboration cloud
The common interaction and effect of the extremely complex system such as control system and information.Application No. is 201710691295.5,
201711285895.8 and 201810160232.1 patent has invented the autonomous navigation system and method for a kind of unmanned boat, but
All without using artificial intelligence approach.
Summary of the invention
With the development of artificial intelligence and deep learning theory, to overcome the shortcomings of that prior art and defect, the present invention mention
A kind of unmanned boat intelligence navigation control method based on depth Q learning network is adopted in real time by 360 ° of pulse laser laser welders out
Space and time-division double fluid big data information of the sample unmanned boat relative to obstacle distance in ambient enviroment inputs to and specially designs
Depth Q learning network intelligent barrier avoiding controller, is largely emulated and the intensified learning under thread environment, will be according to presetting
Threshold value, real-time intelligent switches between high accuracy positioning navigation is intelligent barrier avoiding sail mode, and finally realize it is pilotless under
It is entirely autonomous intelligence navigation, with height learning ability and artificial intelligence.
In order to achieve the goal above, the present invention is achieved by the following technical solutions:
A kind of unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study, its main feature is that, this method
Include the following steps:
S1 samples space-time double fluid big data: by 360 ° of pulse laser laser welders being mounted at the top of unmanned boat, using pre-
If angular resolution scans unmanned boat and ambient enviroment distance dtSpace big data, that is, measure every frame N-dimensional unmanned boat and week
Collarette border distance dtSpace big data;Pass through adjacent two frames d againtThe poor o of datat=dt-dt-1, measure nobody of every frame N-dimensional
Ship and ambient enviroment speed of related movement otTime-division big data;
S2 designs a depth Q learning network intelligent barrier avoiding controller: using space length stream convolutional neural networks in parallel
With when the component velocity stream convolutional neural networks and concatenated full articulamentum neural network of subsequent one, carry out projected depth Q learning network
Intelligent barrier avoiding controller;
S3, design Reward-Penalty Functions: Reward-Penalty Functions indicate that it is for being oriented to depth Q learning network intelligent barrier avoiding control with scalar r
The learning process of device processed makes evaluation to the quality of the taken movement of depth Q learning network intelligent barrier avoiding controller;
S4, design intelligent switching threshold function: the intelligent switching threshold function is used for according to its threshold value, in depth Q
Real-time intelligent switches between learning network intelligent barrier avoiding controller and high accuracy positioning navigation controller, and finally realizes unmanned boat
Entirely autonomous intelligence navigation under pilotless;
On-line study: S5 for the on-line study process for describing depth Q learning network intelligent barrier avoiding controller, need to define shape
State variable S, memory playback library D, valuation functions Q (st,at) it is as follows:
S=[dt,ot], t=0,1,2 ... (1)
D=[(s, a, r, s', a') ... ...], t=0,1,2 ... (2)
And design a network be used as depth Q learning network intelligent barrier avoiding controller current value network, while design one and
Its completely identical in structure network, as depth Q learning network intelligent barrier avoiding controller target value network.
N-dimensional d in the step S2tThe far design data of input is space length stream convolutional Neural in parallel
The input of network, the network have 2 layers, are respectively from output is input to: N-dimensional dtInput ties up middle layer after the layer of convolution pond for M,
The M ties up middle layer and ties up output layer after the layer of convolution pond for M/2;
N-dimensional otThe input of component velocity stream convolutional neural networks when the when component velocity big data of input is designed as in parallel, should
Network also has 2 layers, is respectively from output is input to: N-dimensional otInput ties up middle layer after the layer of convolution pond for M, which ties up middle layer
Output layer is tieed up after the layer of convolution pond for M/2;
Space length stream convolutional neural networks M/2 dimension output layer and when component velocity stream convolutional neural networks M/2 dimension it is defeated
Layer is in parallel out, forms a M dimension input layer, by a full articulamentum neural network, realizes to 5 dimension output layer Q (s, a;W)
Output, the 5 dimension output are the steering that " stopping, forward and backward, right, left " is executed to unmanned boat and the Q of motion control execution signal respectively
Value estimation.
If unmanned boat successfully avoids the barrier of ambient enviroment in the step S3, it is evaluated as r=1;If unmanned boat is hit
The barrier of upper ambient enviroment, is evaluated as r=-1;It is other as a result, being then evaluated as r=0, the depth Q learning network is intelligently kept away
The purpose of barrier controller seeks to the sum of the Reward-Penalty Functions value for obtaining unmanned boat maximum;
In the step S4, real-time evaluation of risk factor ξ is first designed are as follows:
Wherein, [0,1] k ∈ is used to indicate the susceptibility to relative velocity, and the real-time evaluation of risk factor ξ is bigger, touches
The risk hit is higher;
And switching threshold λ=0.091 of intelligent switching threshold function is set, as ξ >=λ, execute depth Q learning network intelligence
The control signal a=a of energy avoidance obstacle device output1;As ξ < λ, the control letter of high accuracy positioning navigation controller output is executed
Number a=a2, it is shown below:
In the step S6, learning process recycles execution as follows:
Step S6.1, initialization memory playback library D is full 0 matrix, learns net with small pseudo random number random initializtion depth Q
The connection weight parameter w and depth Q learning network intelligent barrier avoiding controller target value of network intelligent barrier avoiding controller current value network
The connection weight parameter w of network-;
The acquired time-division big data information of 360 ° of pulse laser laser welders of unmanned boat is passed to depth Q by step S6.2
The input of learning network intelligent barrier avoiding controller current value network and depth Q learning network intelligent barrier avoiding controller target value network
End, obtains the characteristic information (s, a, r, s', a') of environment, and is stored in memory playback library D, wherein s' is lower a period of time
The state variable at quarter, a' are the control signals of subsequent time output;
Step S6.3 takes out a collection of sample as learning data at random from memory playback library D;
Step S6.4, with depth Q learning network intelligent barrier avoiding controller current value network query function Q (s, a;W), with depth Q
Learning network intelligent barrier avoiding controller target value network query function: y=r+ γ maxa'Q(s',a';w-), wherein γ be discount because
Son;
Step S6.5, with I=(r+ γ maxa'Q(s',a';w-)-Q(s,a;w))2It is damaged for the model of depth Q learning network
Function, and loss function based on this model are lost, is learnt using stochastic gradient descent algorithm, Lai Tisheng depth Q learning network intelligence
The connection weight parameter w of avoidance obstacle device current value network;
Step S6.6 is walked every N by the connection weight parameter of depth Q learning network intelligent barrier avoiding controller current value network
W is assigned to the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller target value network-。
Compared with prior art, the present invention having the advantage that
The present invention is big relative to the space of obstacle distance in ambient enviroment and time-division double fluid by real-time sampling unmanned boat
Data information is given the complete scheme and method for realizing the navigation of unmanned boat autonomous intelligence, and is realized based on depth Q learning network
The control of unmanned boat autonomous intelligence navigation.Unmanned boat is allowed to navigate by water when spacious waters under high accuracy positioning navigation;When complex water areas,
By unmanned boat, the automatic obstacle avoiding under intelligent barrier avoiding mode navigates by water depth Q learning network intelligent barrier avoiding controller;And it can be according to environment
The real-time evaluation of risk factor is assessed in sampling, so that real-time intelligent switches under both modes.In addition, depth Q learning network intelligence
Can avoidance obstacle device there is self-learning capability, largely simulated and the feedback learning under thread environment, be finally able to achieve ship and exist
Entirely autonomous intelligence navigation, the artificial intelligence with height in various water environments under pilotless.Finally, this method is to existing
The compatibility for having ship's navigation control system preferably, realizes that the software and hardware resources requirement of this method is also relatively easy.
Detailed description of the invention
Fig. 1 is the unmanned boat intelligently navigation control of space-time double fluid data-driven depth Q study under present invention high-precision is navigated
The structure and schematic illustration of method.
Fig. 2 is the structural schematic diagram of depth Q learning network intelligent barrier avoiding controller of the present invention.
Fig. 3 is the schematic illustration of depth Q learning network intelligent barrier avoiding controller on-line study of the present invention.
In Fig. 1,1- unmanned boat;2- ambient enviroment;The relative position of 3- and high-precision navigation;4-360 ° of pulsed laser ranging
Instrument data ot,dt;5- depth Q learning network intelligent barrier avoiding controller;6- space length stream convolutional neural networks;Component velocity when 7-
Flow convolutional neural networks;The full articulamentum neural network of 8-;9- stops, is forward and backward, is right, is left;10- high accuracy positioning navigation controller;
11- intelligence switching threshold function.
In Fig. 2,12-200 ties up dtInput;13- convolution pond layer;14-128 ties up middle layer;15- convolution pond layer;16-64
Tie up output layer;17-200 ties up otInput;18- convolution pond layer;19-128 ties up middle layer;20- convolution pond layer;21-64 dimension is defeated
Layer out;22-128 ties up input layer;The full articulamentum neural network of 23-;24-5 ties up output layer Q (s, a;w).
In Fig. 3,2- ambient enviroment;25- memory playback library D;26- depth Q learning network intelligent barrier avoiding controller it is current
It is worth network;The target value network of 27- depth Q learning network intelligent barrier avoiding controller;The model of 28- depth Q learning network loses
Function.
Specific embodiment
The present invention is further elaborated by the way that a preferable specific embodiment is described in detail below in conjunction with attached drawing.
1, space-time double fluid big data is sampled
By being mounted on 360 ° of pulse laser laser welders at 1 top of unmanned boat, nobody is scanned using 1.8 ° of angular resolutions
Ship 1 and 2 distance d of ambient enviromenttSpace big data, that is, measure the unmanned boat 1 and 2 distance d of ambient enviroment that every frame 200 is tieed uptSky
Between big data.Pass through adjacent two frames d againtThe poor o of datat=dt-dt-1, measure the unmanned boat 1 and ambient enviroment 2 that every frame 200 is tieed up
Speed of related movement otTime-division big data, wherein subscript t indicate sampling instant t.360 ° of pulse laser laser welders as shown in figure 1
Data ot,dtShown in 4.
2, projected depth Q learning network intelligent barrier avoiding controller 5
Using space length stream convolutional neural networks 6 in parallel with when component velocity stream convolutional neural networks 7 and subsequent one
Concatenated full articulamentum neural network 8 carrys out projected depth Q learning network intelligent barrier avoiding controller 5.Depth Q learning network intelligence
The specific structure and principle of avoidance obstacle device 5 are as shown in Figure 2.
Wherein, 200 dimension dtThe far design data of input 12 is space length stream convolutional neural networks 6 in parallel
Input, which has 2 layers, is respectively from output is input to: 200 dimension dtInput 12 is after convolution pond layer 13 in 128 dimensions
Interbed 14, the 128 dimension middle layer 14 are 64 dimension output layers 16 after convolution pond layer 15.
200 dimension otComponent velocity stream convolutional neural networks 7 is defeated when the when component velocity big data of input 17 is designed as in parallel
Enter, which also there are 2 layers, is respectively from output is input to: 200 dimension otInput 17 is intermediate for 128 dimensions after convolution pond layer 18
Layer 19, the 128 dimension middle layer 19 are 64 dimension output layers 21 after convolution pond layer 20.
64 dimension output layers 16 of space length stream convolutional neural networks 6 and when component velocity stream convolutional neural networks 7 64 dimensions
Output layer 21 is in parallel, forms one 128 dimension input layer 22, by a full articulamentum neural network 23, realizes to 5 dimension output layers
Q(s,a;W) 24 output.This 5 dimension output is that the steering and motion control of " stopping, forward and backward, right, left " 9 are executed to unmanned boat respectively
Execute the Q value estimation of signal.Wherein, S is the state variable in formula (1), and a is control signal, and w is depth Q learning network intelligence
The connection weight parameter of avoidance obstacle device current value network.
3, Reward-Penalty Functions are designed
Reward-Penalty Functions indicate that it is used to be oriented to the learning process of depth Q learning network intelligent barrier avoiding controller 5 with scalar r,
Evaluation is made to the quality of the taken movement of depth Q learning network intelligent barrier avoiding controller 5.
Wherein, if unmanned boat 1 successfully avoids the barrier of ambient enviroment 2, it is evaluated as r=1;Around if unmanned boat 1 knocks
The barrier of environment 2, is evaluated as r=-1;It is other as a result, being then evaluated as r=0.Depth Q learning network intelligent barrier avoiding controller 5
The sum of the purpose Reward-Penalty Functions value that seeks to make unmanned boat 1 to obtain maximum.
4, intelligent switching threshold function 11 is designed
Intelligent switching threshold function 11 is used for according to its threshold value, in depth Q learning network intelligent barrier avoiding controller 5 and height
Real-time intelligent switches between precision location navigation controller 10, and finally realizes that unmanned boat 1 is entirely autonomous under pilotless
Intelligence navigation.
First design real-time evaluation of risk factor ξ are as follows:
Wherein, [0,1] k ∈ is used to indicate the susceptibility to relative velocity, generally takes k=0.9 here.So real-time wind
Danger estimation factor ξ is bigger, and the risk of collision is higher.
And set switching threshold λ=0.091 of intelligent switching threshold function 11.As ξ >=λ, depth Q learning network is executed
The control signal a=a that intelligent barrier avoiding controller 5 exports1;As ξ < λ, execute what high accuracy positioning navigation controller 10 exported
Control signal a=a2, it is shown below:
5, on-line study
For the on-line study process for describing depth Q learning network intelligent barrier avoiding controller 5, Fig. 3 and state variable need to be defined
S, memory playback library D25, valuation functions Q (st,at) it is as follows:
S=[dt,ot], t=0,1,2 ... (4)
D=[(s, a, r, s', a') ... ...], t=0,1,2 ... (5)
And using 26 in Fig. 3 as depth Q learning network intelligent barrier avoiding controller current value network 26, while designing one
With its completely identical in structure network, as depth Q learning network intelligent barrier avoiding controller target value network 27.
Learning process is recycled execution by following 6 steps:
1) initialization memory playback library D25 is full 0 matrix, with small pseudo random number random initializtion depth Q learning network intelligence
The connection weight parameter w and depth Q learning network intelligent barrier avoiding controller target value net of energy avoidance obstacle device current value network 26
The connection weight parameter w of network 27-。
2) by the acquired time-division big data information of 360 ° of pulse laser laser welders of unmanned boat 1, depth Q study net is passed to
The input of network intelligent barrier avoiding controller current value network 26 and depth Q learning network intelligent barrier avoiding controller target value network 27
End, obtains the characteristic information (s, a, r, s', a') of environment, and is stored in memory playback library D25.Wherein, s' is next
The state variable at moment, a' are the control signals of subsequent time output.
3) a collection of sample is taken out at random from memory playback library D25 as learning data.
4) Q (s, a are calculated with depth Q learning network intelligent barrier avoiding controller current value network 26;W), learn net with depth Q
Network intelligent barrier avoiding controller target value network 27 calculates: y=r+ γ maxa'Q(s',a';w-).Wherein, γ is discount factor, one
As take γ=0.9.
5) with I=(r+ γ maxa'Q(s',a';w-)-Q(s,a;w))2For the model loss function of depth Q learning network
28, and loss function based on this model, learnt using stochastic gradient descent algorithm, Lai Tisheng depth Q learning network intelligent barrier avoiding
The connection weight parameter w of controller current value network 26.
6) it walks every N by the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller current value network 26, assigns
It is worth to the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller target value network 27-。
It is discussed in detail although the contents of the present invention have passed through above preferred embodiment, but it should be appreciated that above-mentioned
Description is not considered as limitation of the present invention.After those skilled in the art have read above content, for of the invention
A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.
Claims (5)
1. a kind of unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study, which is characterized in that this method
Include the following steps:
S1 samples space-time double fluid big data: by 360 ° of pulse laser laser welders being mounted at the top of unmanned boat, using preset angle
Spend resolution scan unmanned boat and ambient enviroment distance dtSpace big data, that is, measure every frame N-dimensional unmanned boat and surrounding ring
Border distance dtSpace big data;Pass through adjacent two frames d againtThe poor o of datat=dt-dt-1, measure the unmanned boat of every frame N-dimensional with
Ambient enviroment speed of related movement otTime-division big data;Wherein, subscript t indicates sampling instant t;
S2 designs a depth Q learning network intelligent barrier avoiding controller: using in parallel space length stream convolutional neural networks and when
Component velocity stream convolutional neural networks and the concatenated full articulamentum neural network of subsequent one carry out projected depth Q learning network intelligence
Avoidance obstacle device;
S3, design Reward-Penalty Functions: Reward-Penalty Functions indicate that it is for being oriented to depth Q learning network intelligent barrier avoiding controller with scalar r
Learning process, evaluation is made to the quality of the taken movement of depth Q learning network intelligent barrier avoiding controller;
S4, design intelligent switching threshold function: the intelligent switching threshold function is used to be learnt according to its threshold value in depth Q
Real-time intelligent switches between network intelligence avoidance obstacle device and high accuracy positioning navigation controller, and finally realizes unmanned boat in nothing
Entirely autonomous intelligence navigation under people's manipulation;
On-line study: S5 for the on-line study process for describing depth Q learning network intelligent barrier avoiding controller, needs definition status to become
Measure S, memory playback library D, valuation functions Q (st,at) it is as follows:
S=[dt,ot], t=0,1,2 ... (1)
D=[(s, a, r, s', a') ... ...], t=0,1,2 ... (2)
Wherein, subscript t indicates sampling instant t, so stIndicate the state variable of t moment;atIndicate the control signal of t moment;
And a network is designed as depth Q learning network intelligent barrier avoiding controller current value network, while designing one and tying with it
The identical network of structure, as depth Q learning network intelligent barrier avoiding controller target value network.
2. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special
Sign is, N-dimensional d in the step S2tThe far design data of input is space length stream convolutional Neural in parallel
The input of network, the network have 2 layers, are respectively from output is input to: N-dimensional dtInput ties up middle layer after the layer of convolution pond for M,
The M ties up middle layer and ties up output layer after the layer of convolution pond for M/2;
N-dimensional otThe input of component velocity stream convolutional neural networks, the network when component velocity big data of input is designed as in parallel
There are 2 layers, is respectively from output is input to: N-dimensional otInput ties up middle layer after the layer of convolution pond for M, which ties up middle layer through convolution
Output layer is tieed up after the layer of pond for M/2;
Space length stream convolutional neural networks M/2 dimension output layer and when component velocity stream convolutional neural networks M/2 tie up output layer
Parallel connection forms a M dimension input layer, by a full articulamentum neural network, realizes to 5 dimension output layer Q (s, a;W) defeated
Out, the 5 dimension output is the steering that " stopping, forward and backward, right, left " is executed to unmanned boat and the Q value of motion control execution signal respectively
Estimation.Wherein, S is the state variable in formula (1), and a is control signal, and w is that depth Q learning network intelligent barrier avoiding controller is current
It is worth the connection weight parameter of network.
3. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special
Sign is, if unmanned boat successfully avoids the barrier of ambient enviroment in the step S3, is evaluated as r=1;If unmanned boat knocks
The barrier of ambient enviroment, is evaluated as r=-1;It is other as a result, be then evaluated as r=0, the depth Q learning network intelligent barrier avoiding
The purpose of controller seeks to the sum of the Reward-Penalty Functions value for obtaining unmanned boat maximum;
4. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special
Sign is, in the step S4, first designs real-time evaluation of risk factor ξ are as follows:
Wherein, [0,1] k ∈ is used to indicate the susceptibility to relative velocity, and the real-time evaluation of risk factor ξ is bigger, collision
Risk is higher;
And switching threshold λ=0.091 of intelligent switching threshold function is set, as ξ >=λ, executes depth Q learning network and intelligently keep away
Hinder the control signal a=a of controller output1;As ξ < λ, the control signal a of high accuracy positioning navigation controller output is executed
=a2, it is shown below:
5. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special
Sign is that in the step S6, learning process recycles execution as follows:
Step S6.1, initialization memory playback library D is full 0 matrix, with small pseudo random number random initializtion depth Q learning network intelligence
The connection weight parameter w and depth Q learning network intelligent barrier avoiding controller target value network of energy avoidance obstacle device current value network
Connection weight parameter w-;
The acquired time-division big data information of 360 ° of pulse laser laser welders of unmanned boat is passed to depth Q study by step S6.2
The input terminal of network intelligence avoidance obstacle device current value network and depth Q learning network intelligent barrier avoiding controller target value network,
The characteristic information (s, a, r, s', a') of environment is obtained, and is stored in memory playback library D, wherein s' is subsequent time
State variable, a' are the control signals of subsequent time output;
Step S6.3 takes out a collection of sample as learning data at random from memory playback library D;
Step S6.4, with depth Q learning network intelligent barrier avoiding controller current value network query function Q (s, a;W), learnt with depth Q
Network intelligence avoidance obstacle device target value network query function: y=r+ γ maxa'Q(s',a';w-), wherein γ is discount factor;
Step S6.5, with I=(r+ γ maxa'Q(s',a';w-)-Q(s,a;w))2Letter is lost for the model of depth Q learning network
Number, and loss function based on this model, are learnt, Lai Tisheng depth Q learning network intelligent barrier avoiding using stochastic gradient descent algorithm
The connection weight parameter w of controller current value network;
Step S6.6 is assigned every N step by the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller current value network
It is worth to the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller target value network-。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910091342.1A CN109814565A (en) | 2019-01-30 | 2019-01-30 | The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910091342.1A CN109814565A (en) | 2019-01-30 | 2019-01-30 | The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109814565A true CN109814565A (en) | 2019-05-28 |
Family
ID=66606011
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910091342.1A Pending CN109814565A (en) | 2019-01-30 | 2019-01-30 | The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109814565A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110345948A (en) * | 2019-08-16 | 2019-10-18 | 重庆邮智机器人研究院有限公司 | Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm |
CN110645981A (en) * | 2019-10-15 | 2020-01-03 | 四方智能(武汉)控制技术有限公司 | Unmanned ship navigation system and method for cleaning pile foundation type waterborne photovoltaic module |
CN110826609A (en) * | 2019-10-29 | 2020-02-21 | 华中科技大学 | Double-flow feature fusion image identification method based on reinforcement learning |
CN111026127A (en) * | 2019-12-27 | 2020-04-17 | 南京大学 | Automatic driving decision method and system based on partially observable transfer reinforcement learning |
CN111275249A (en) * | 2020-01-15 | 2020-06-12 | 吉利汽车研究院(宁波)有限公司 | Driving behavior optimization method based on DQN neural network and high-precision positioning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107553490A (en) * | 2017-09-08 | 2018-01-09 | 深圳市唯特视科技有限公司 | A kind of monocular vision barrier-avoiding method based on deep learning |
CN108921037A (en) * | 2018-06-07 | 2018-11-30 | 四川大学 | A kind of Emotion identification method based on BN-inception binary-flow network |
CN109263826A (en) * | 2018-08-30 | 2019-01-25 | 武汉理工大学 | Ship Intelligent Collision Avoidance system and method based on maneuverability modeling |
-
2019
- 2019-01-30 CN CN201910091342.1A patent/CN109814565A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107553490A (en) * | 2017-09-08 | 2018-01-09 | 深圳市唯特视科技有限公司 | A kind of monocular vision barrier-avoiding method based on deep learning |
CN108921037A (en) * | 2018-06-07 | 2018-11-30 | 四川大学 | A kind of Emotion identification method based on BN-inception binary-flow network |
CN109263826A (en) * | 2018-08-30 | 2019-01-25 | 武汉理工大学 | Ship Intelligent Collision Avoidance system and method based on maneuverability modeling |
Non-Patent Citations (5)
Title |
---|
YUANDA WANG等: "Learning to Navigate Through Complex Dynamic Environment With Modular Deep Reinforcement Learning", 《 IEEE TRANSACTIONS ON GAMES》 * |
刘志荣等: "基于深度Q学习的移动机器人路径规划", 《测控技术》 * |
张亚初等: "基于双流卷积神经网络的智能小车避障算法研究", 《新技术新工艺》 * |
张浩杰等: "基于深度Q网络学习的机器人端到端控制方法", 《仪器仪表学报》 * |
翟军勇等: "基于神经网络多模型自适应切换控制研究", 《中国电机工程学报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110345948A (en) * | 2019-08-16 | 2019-10-18 | 重庆邮智机器人研究院有限公司 | Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm |
CN110645981A (en) * | 2019-10-15 | 2020-01-03 | 四方智能(武汉)控制技术有限公司 | Unmanned ship navigation system and method for cleaning pile foundation type waterborne photovoltaic module |
CN110826609A (en) * | 2019-10-29 | 2020-02-21 | 华中科技大学 | Double-flow feature fusion image identification method based on reinforcement learning |
CN110826609B (en) * | 2019-10-29 | 2023-03-24 | 华中科技大学 | Double-current feature fusion image identification method based on reinforcement learning |
CN111026127A (en) * | 2019-12-27 | 2020-04-17 | 南京大学 | Automatic driving decision method and system based on partially observable transfer reinforcement learning |
CN111026127B (en) * | 2019-12-27 | 2021-09-28 | 南京大学 | Automatic driving decision method and system based on partially observable transfer reinforcement learning |
CN111275249A (en) * | 2020-01-15 | 2020-06-12 | 吉利汽车研究院(宁波)有限公司 | Driving behavior optimization method based on DQN neural network and high-precision positioning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109814565A (en) | The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study | |
Ruan et al. | Mobile robot navigation based on deep reinforcement learning | |
CN106970615B (en) | A kind of real-time online paths planning method of deeply study | |
US20190147610A1 (en) | End-to-End Tracking of Objects | |
Wang et al. | Cooperative USV–UAV marine search and rescue with visual navigation and reinforcement learning-based control | |
Eresen et al. | Autonomous quadrotor flight with vision-based obstacle avoidance in virtual environment | |
CN114384920A (en) | Dynamic obstacle avoidance method based on real-time construction of local grid map | |
Kelchtermans et al. | How hard is it to cross the room?--Training (Recurrent) Neural Networks to steer a UAV | |
CN116263335A (en) | Indoor navigation method based on vision and radar information fusion and reinforcement learning | |
Qu et al. | Pursuit-evasion game strategy of USV based on deep reinforcement learning in complex multi-obstacle environment | |
Ji-Yong et al. | Design and vision based autonomous capture of sea organism with absorptive type remotely operated vehicle | |
Sans-Muntadas et al. | Learning an AUV docking maneuver with a convolutional neural network | |
Yan et al. | Reinforcement Learning‐Based Autonomous Navigation and Obstacle Avoidance for USVs under Partially Observable Conditions | |
Lan et al. | Path planning for underwater gliders in time-varying ocean current using deep reinforcement learning | |
Katyal et al. | High-speed robot navigation using predicted occupancy maps | |
Yang et al. | Autonomous UAV navigation in dynamic environments with double deep Q-networks | |
Pal et al. | Mobile robot navigation using a neural net | |
CN116679711A (en) | Robot obstacle avoidance method based on model-based reinforcement learning and model-free reinforcement learning | |
Patil et al. | Deep reinforcement learning for continuous docking control of autonomous underwater vehicles: a benchmarking study | |
Tan et al. | A local path planning method based on Q-learning | |
CN113467462B (en) | Pedestrian accompanying control method and device for robot, mobile robot and medium | |
CN113674310B (en) | Four-rotor unmanned aerial vehicle target tracking method based on active visual perception | |
de Oliveira et al. | A robot architecture for outdoor competitions | |
Song et al. | Surface path tracking method of autonomous surface underwater vehicle based on deep reinforcement learning | |
CN111611869B (en) | End-to-end monocular vision obstacle avoidance method based on serial deep neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190528 |