CN109814565A

CN109814565A - The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study

Info

Publication number: CN109814565A
Application number: CN201910091342.1A
Authority: CN
Inventors: 黄志坚; 随博文; 温家一; 吴恭兴; 张桂臣; 刘雁集
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2019-01-30
Filing date: 2019-01-30
Publication date: 2019-05-28

Abstract

The present invention proposes a kind of depth Q learning network method of space and the driving of time-division double fluid big data to realize unmanned boat autonomous intelligence navigation control that high-precision is navigated lower, and specific steps include: sampling space-time double fluid big data, projected depth Q learning network intelligent barrier avoiding controller, design Reward-Penalty Functions, design intelligence switching threshold function, on-line study.The present invention may be implemented: unmanned boat being allowed to navigate by water when spacious waters under high accuracy positioning navigation；When complex water areas, by unmanned boat, the automatic obstacle avoiding under intelligent barrier avoiding mode navigates by water depth Q learning network intelligent barrier avoiding controller；And the real-time evaluation of risk factor can be assessed according to Environment features, so that real-time intelligent switches between both modes.In addition, depth Q learning network intelligent barrier avoiding controller has the artificial intelligence of self-learning capability and height.Finally, this method is preferable to the compatibility of spot ship navigation control system, realize that the software and hardware resources requirement of this method is also relatively easy.

Description

The unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study

Technical field

The present invention relates to a kind of unmanned boat intelligence navigation control methods of space-time double fluid data-driven depth Q study, especially It is a kind of under high accuracy positioning navigation, using space and the driving of time-division double fluid real-time sampling data, is based on depth Q learning network Unmanned boat intelligence navigation control method.Belong to unmanned boat field of intelligent control technology.

Background technique

Ship under allowing high accuracy positioning to navigate possesses the observation ability and intelligence of the mankind, and independent of driver's Lookout and steering pass through the complicated water surface and realize autonomous intelligence navigation and avoidance, be not an easy thing.Since the water surface is opened Wealthy and Obstacle Position is changeable, and unmanned boat cannot depend on lane detection as unmanned vehicle；Also without image of Buddha Boston power Robot carries out 3D modeling like that or the reference effect of 3D modeling is very limited；It is the unmanned boat of representative with cloud continent intelligence, adopts With high precision location navigation and Radar Collision Avoidance, new theory and technology is also needed to support in terms of intelligence.

Past, people use Automatic Control Theory and modern control theory method, realize the closed loop feedback of ship's navigation Control and System design based on model；Later, the adaptive control algorithms such as least square method, SVM, ant group algorithm, make Ship has the ability of adaptive path planning.Now, through being found to existing patent retrieval, application No. is 201710502348.4 and 201810454631.9 patent, invented a kind of unmanned boat barrier-avoiding method based on image vision and Device, but they require extremely complex traditional images Processing Algorithm to calculate the coordinate position of barrier.Application No. is 201710458496.0 patent, has invented a kind of unmanned boat method for lateral control based on enhancing learning algorithm, and enhancing is learned It practises controller and uses Actor-Critic structure, and need the model of controlled system.Application No. is 201810008481.9 Patent has invented a kind of collaboration cloud control system of unmanned boat autonomous navigation, but it needs bank end, boat-carrying, communication and collaboration cloud The common interaction and effect of the extremely complex system such as control system and information.Application No. is 201710691295.5, 201711285895.8 and 201810160232.1 patent has invented the autonomous navigation system and method for a kind of unmanned boat, but All without using artificial intelligence approach.

Summary of the invention

With the development of artificial intelligence and deep learning theory, to overcome the shortcomings of that prior art and defect, the present invention mention A kind of unmanned boat intelligence navigation control method based on depth Q learning network is adopted in real time by 360 ° of pulse laser laser welders out Space and time-division double fluid big data information of the sample unmanned boat relative to obstacle distance in ambient enviroment inputs to and specially designs Depth Q learning network intelligent barrier avoiding controller, is largely emulated and the intensified learning under thread environment, will be according to presetting Threshold value, real-time intelligent switches between high accuracy positioning navigation is intelligent barrier avoiding sail mode, and finally realize it is pilotless under It is entirely autonomous intelligence navigation, with height learning ability and artificial intelligence.

In order to achieve the goal above, the present invention is achieved by the following technical solutions:

A kind of unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study, its main feature is that, this method Include the following steps:

S1 samples space-time double fluid big data: by 360 ° of pulse laser laser welders being mounted at the top of unmanned boat, using pre- If angular resolution scans unmanned boat and ambient enviroment distance d_tSpace big data, that is, measure every frame N-dimensional unmanned boat and week Collarette border distance d_tSpace big data；Pass through adjacent two frames d again_tThe poor o of data_t=d_t-d_t-1, measure nobody of every frame N-dimensional Ship and ambient enviroment speed of related movement o_tTime-division big data；

S2 designs a depth Q learning network intelligent barrier avoiding controller: using space length stream convolutional neural networks in parallel With when the component velocity stream convolutional neural networks and concatenated full articulamentum neural network of subsequent one, carry out projected depth Q learning network Intelligent barrier avoiding controller；

S3, design Reward-Penalty Functions: Reward-Penalty Functions indicate that it is for being oriented to depth Q learning network intelligent barrier avoiding control with scalar r The learning process of device processed makes evaluation to the quality of the taken movement of depth Q learning network intelligent barrier avoiding controller；

S4, design intelligent switching threshold function: the intelligent switching threshold function is used for according to its threshold value, in depth Q Real-time intelligent switches between learning network intelligent barrier avoiding controller and high accuracy positioning navigation controller, and finally realizes unmanned boat Entirely autonomous intelligence navigation under pilotless；

On-line study: S5 for the on-line study process for describing depth Q learning network intelligent barrier avoiding controller, need to define shape State variable S, memory playback library D, valuation functions Q (s_t,a_t) it is as follows:

S=[d_t,o_t], t=0,1,2 ... (1)

D=[(s, a, r, s', a') ... ...], t=0,1,2 ... (2)

And design a network be used as depth Q learning network intelligent barrier avoiding controller current value network, while design one and Its completely identical in structure network, as depth Q learning network intelligent barrier avoiding controller target value network.

N-dimensional d in the step S2_tThe far design data of input is space length stream convolutional Neural in parallel The input of network, the network have 2 layers, are respectively from output is input to: N-dimensional d_tInput ties up middle layer after the layer of convolution pond for M, The M ties up middle layer and ties up output layer after the layer of convolution pond for M/2；

N-dimensional o_tThe input of component velocity stream convolutional neural networks when the when component velocity big data of input is designed as in parallel, should Network also has 2 layers, is respectively from output is input to: N-dimensional o_tInput ties up middle layer after the layer of convolution pond for M, which ties up middle layer Output layer is tieed up after the layer of convolution pond for M/2；

Space length stream convolutional neural networks M/2 dimension output layer and when component velocity stream convolutional neural networks M/2 dimension it is defeated Layer is in parallel out, forms a M dimension input layer, by a full articulamentum neural network, realizes to 5 dimension output layer Q (s, a；W) Output, the 5 dimension output are the steering that " stopping, forward and backward, right, left " is executed to unmanned boat and the Q of motion control execution signal respectively Value estimation.

If unmanned boat successfully avoids the barrier of ambient enviroment in the step S3, it is evaluated as r=1；If unmanned boat is hit The barrier of upper ambient enviroment, is evaluated as r=-1；It is other as a result, being then evaluated as r=0, the depth Q learning network is intelligently kept away The purpose of barrier controller seeks to the sum of the Reward-Penalty Functions value for obtaining unmanned boat maximum；

In the step S4, real-time evaluation of risk factor ξ is first designed are as follows:

Wherein, [0,1] k ∈ is used to indicate the susceptibility to relative velocity, and the real-time evaluation of risk factor ξ is bigger, touches The risk hit is higher；

And switching threshold λ=0.091 of intelligent switching threshold function is set, as ξ >=λ, execute depth Q learning network intelligence The control signal a=a of energy avoidance obstacle device output₁；As ξ < λ, the control letter of high accuracy positioning navigation controller output is executed Number a=a₂, it is shown below:

In the step S6, learning process recycles execution as follows:

Step S6.1, initialization memory playback library D is full 0 matrix, learns net with small pseudo random number random initializtion depth Q The connection weight parameter w and depth Q learning network intelligent barrier avoiding controller target value of network intelligent barrier avoiding controller current value network The connection weight parameter w of network^-；

The acquired time-division big data information of 360 ° of pulse laser laser welders of unmanned boat is passed to depth Q by step S6.2 The input of learning network intelligent barrier avoiding controller current value network and depth Q learning network intelligent barrier avoiding controller target value network End, obtains the characteristic information (s, a, r, s', a') of environment, and is stored in memory playback library D, wherein s' is lower a period of time The state variable at quarter, a' are the control signals of subsequent time output；

Step S6.3 takes out a collection of sample as learning data at random from memory playback library D；

Step S6.4, with depth Q learning network intelligent barrier avoiding controller current value network query function Q (s, a；W), with depth Q Learning network intelligent barrier avoiding controller target value network query function: y=r+ γ max_a'Q(s',a'；w^-), wherein γ be discount because Son；

Step S6.5, with I=(r+ γ max_a'Q(s',a'；w^-)-Q(s,a；w))²It is damaged for the model of depth Q learning network Function, and loss function based on this model are lost, is learnt using stochastic gradient descent algorithm, Lai Tisheng depth Q learning network intelligence The connection weight parameter w of avoidance obstacle device current value network；

Step S6.6 is walked every N by the connection weight parameter of depth Q learning network intelligent barrier avoiding controller current value network W is assigned to the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller target value network^-。

Compared with prior art, the present invention having the advantage that

The present invention is big relative to the space of obstacle distance in ambient enviroment and time-division double fluid by real-time sampling unmanned boat Data information is given the complete scheme and method for realizing the navigation of unmanned boat autonomous intelligence, and is realized based on depth Q learning network The control of unmanned boat autonomous intelligence navigation.Unmanned boat is allowed to navigate by water when spacious waters under high accuracy positioning navigation；When complex water areas, By unmanned boat, the automatic obstacle avoiding under intelligent barrier avoiding mode navigates by water depth Q learning network intelligent barrier avoiding controller；And it can be according to environment The real-time evaluation of risk factor is assessed in sampling, so that real-time intelligent switches under both modes.In addition, depth Q learning network intelligence Can avoidance obstacle device there is self-learning capability, largely simulated and the feedback learning under thread environment, be finally able to achieve ship and exist Entirely autonomous intelligence navigation, the artificial intelligence with height in various water environments under pilotless.Finally, this method is to existing The compatibility for having ship's navigation control system preferably, realizes that the software and hardware resources requirement of this method is also relatively easy.

Detailed description of the invention

Fig. 1 is the unmanned boat intelligently navigation control of space-time double fluid data-driven depth Q study under present invention high-precision is navigated The structure and schematic illustration of method.

Fig. 2 is the structural schematic diagram of depth Q learning network intelligent barrier avoiding controller of the present invention.

Fig. 3 is the schematic illustration of depth Q learning network intelligent barrier avoiding controller on-line study of the present invention.

In Fig. 1,1- unmanned boat；2- ambient enviroment；The relative position of 3- and high-precision navigation；4-360 ° of pulsed laser ranging Instrument data o_t,d_t；5- depth Q learning network intelligent barrier avoiding controller；6- space length stream convolutional neural networks；Component velocity when 7- Flow convolutional neural networks；The full articulamentum neural network of 8-；9- stops, is forward and backward, is right, is left；10- high accuracy positioning navigation controller； 11- intelligence switching threshold function.

In Fig. 2,12-200 ties up d_tInput；13- convolution pond layer；14-128 ties up middle layer；15- convolution pond layer；16-64 Tie up output layer；17-200 ties up o_tInput；18- convolution pond layer；19-128 ties up middle layer；20- convolution pond layer；21-64 dimension is defeated Layer out；22-128 ties up input layer；The full articulamentum neural network of 23-；24-5 ties up output layer Q (s, a；w).

In Fig. 3,2- ambient enviroment；25- memory playback library D；26- depth Q learning network intelligent barrier avoiding controller it is current It is worth network；The target value network of 27- depth Q learning network intelligent barrier avoiding controller；The model of 28- depth Q learning network loses Function.

Specific embodiment

The present invention is further elaborated by the way that a preferable specific embodiment is described in detail below in conjunction with attached drawing.

1, space-time double fluid big data is sampled

By being mounted on 360 ° of pulse laser laser welders at 1 top of unmanned boat, nobody is scanned using 1.8 ° of angular resolutions Ship 1 and 2 distance d of ambient enviroment_tSpace big data, that is, measure the unmanned boat 1 and 2 distance d of ambient enviroment that every frame 200 is tieed up_tSky Between big data.Pass through adjacent two frames d again_tThe poor o of data_t=d_t-d_t-1, measure the unmanned boat 1 and ambient enviroment 2 that every frame 200 is tieed up Speed of related movement o_tTime-division big data, wherein subscript t indicate sampling instant t.360 ° of pulse laser laser welders as shown in figure 1 Data o_t,d_tShown in 4.

2, projected depth Q learning network intelligent barrier avoiding controller 5

Using space length stream convolutional neural networks 6 in parallel with when component velocity stream convolutional neural networks 7 and subsequent one Concatenated full articulamentum neural network 8 carrys out projected depth Q learning network intelligent barrier avoiding controller 5.Depth Q learning network intelligence The specific structure and principle of avoidance obstacle device 5 are as shown in Figure 2.

Wherein, 200 dimension d_tThe far design data of input 12 is space length stream convolutional neural networks 6 in parallel Input, which has 2 layers, is respectively from output is input to: 200 dimension d_tInput 12 is after convolution pond layer 13 in 128 dimensions Interbed 14, the 128 dimension middle layer 14 are 64 dimension output layers 16 after convolution pond layer 15.

200 dimension o_tComponent velocity stream convolutional neural networks 7 is defeated when the when component velocity big data of input 17 is designed as in parallel Enter, which also there are 2 layers, is respectively from output is input to: 200 dimension o_tInput 17 is intermediate for 128 dimensions after convolution pond layer 18 Layer 19, the 128 dimension middle layer 19 are 64 dimension output layers 21 after convolution pond layer 20.

64 dimension output layers 16 of space length stream convolutional neural networks 6 and when component velocity stream convolutional neural networks 7 64 dimensions Output layer 21 is in parallel, forms one 128 dimension input layer 22, by a full articulamentum neural network 23, realizes to 5 dimension output layers Q(s,a；W) 24 output.This 5 dimension output is that the steering and motion control of " stopping, forward and backward, right, left " 9 are executed to unmanned boat respectively Execute the Q value estimation of signal.Wherein, S is the state variable in formula (1), and a is control signal, and w is depth Q learning network intelligence The connection weight parameter of avoidance obstacle device current value network.

3, Reward-Penalty Functions are designed

Reward-Penalty Functions indicate that it is used to be oriented to the learning process of depth Q learning network intelligent barrier avoiding controller 5 with scalar r, Evaluation is made to the quality of the taken movement of depth Q learning network intelligent barrier avoiding controller 5.

Wherein, if unmanned boat 1 successfully avoids the barrier of ambient enviroment 2, it is evaluated as r=1；Around if unmanned boat 1 knocks The barrier of environment 2, is evaluated as r=-1；It is other as a result, being then evaluated as r=0.Depth Q learning network intelligent barrier avoiding controller 5 The sum of the purpose Reward-Penalty Functions value that seeks to make unmanned boat 1 to obtain maximum.

4, intelligent switching threshold function 11 is designed

Intelligent switching threshold function 11 is used for according to its threshold value, in depth Q learning network intelligent barrier avoiding controller 5 and height Real-time intelligent switches between precision location navigation controller 10, and finally realizes that unmanned boat 1 is entirely autonomous under pilotless Intelligence navigation.

First design real-time evaluation of risk factor ξ are as follows:

Wherein, [0,1] k ∈ is used to indicate the susceptibility to relative velocity, generally takes k=0.9 here.So real-time wind Danger estimation factor ξ is bigger, and the risk of collision is higher.

And set switching threshold λ=0.091 of intelligent switching threshold function 11.As ξ >=λ, depth Q learning network is executed The control signal a=a that intelligent barrier avoiding controller 5 exports₁；As ξ < λ, execute what high accuracy positioning navigation controller 10 exported Control signal a=a₂, it is shown below:

5, on-line study

For the on-line study process for describing depth Q learning network intelligent barrier avoiding controller 5, Fig. 3 and state variable need to be defined S, memory playback library D25, valuation functions Q (s_t,a_t) it is as follows:

S=[d_t,o_t], t=0,1,2 ... (4)

D=[(s, a, r, s', a') ... ...], t=0,1,2 ... (5)

And using 26 in Fig. 3 as depth Q learning network intelligent barrier avoiding controller current value network 26, while designing one With its completely identical in structure network, as depth Q learning network intelligent barrier avoiding controller target value network 27.

Learning process is recycled execution by following 6 steps:

1) initialization memory playback library D25 is full 0 matrix, with small pseudo random number random initializtion depth Q learning network intelligence The connection weight parameter w and depth Q learning network intelligent barrier avoiding controller target value net of energy avoidance obstacle device current value network 26 The connection weight parameter w of network 27^-。

2) by the acquired time-division big data information of 360 ° of pulse laser laser welders of unmanned boat 1, depth Q study net is passed to The input of network intelligent barrier avoiding controller current value network 26 and depth Q learning network intelligent barrier avoiding controller target value network 27 End, obtains the characteristic information (s, a, r, s', a') of environment, and is stored in memory playback library D25.Wherein, s' is next The state variable at moment, a' are the control signals of subsequent time output.

3) a collection of sample is taken out at random from memory playback library D25 as learning data.

4) Q (s, a are calculated with depth Q learning network intelligent barrier avoiding controller current value network 26；W), learn net with depth Q Network intelligent barrier avoiding controller target value network 27 calculates: y=r+ γ max_a'Q(s',a'；w^-).Wherein, γ is discount factor, one As take γ=0.9.

5) with I=(r+ γ max_a'Q(s',a'；w^-)-Q(s,a；w))²For the model loss function of depth Q learning network 28, and loss function based on this model, learnt using stochastic gradient descent algorithm, Lai Tisheng depth Q learning network intelligent barrier avoiding The connection weight parameter w of controller current value network 26.

6) it walks every N by the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller current value network 26, assigns It is worth to the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller target value network 27^-。

It is discussed in detail although the contents of the present invention have passed through above preferred embodiment, but it should be appreciated that above-mentioned Description is not considered as limitation of the present invention.After those skilled in the art have read above content, for of the invention A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.

Claims

1. a kind of unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study, which is characterized in that this method Include the following steps:

S1 samples space-time double fluid big data: by 360 ° of pulse laser laser welders being mounted at the top of unmanned boat, using preset angle Spend resolution scan unmanned boat and ambient enviroment distance d_tSpace big data, that is, measure every frame N-dimensional unmanned boat and surrounding ring Border distance d_tSpace big data；Pass through adjacent two frames d again_tThe poor o of data_t=d_t-d_t-1, measure the unmanned boat of every frame N-dimensional with Ambient enviroment speed of related movement o_tTime-division big data；Wherein, subscript t indicates sampling instant t；

S2 designs a depth Q learning network intelligent barrier avoiding controller: using in parallel space length stream convolutional neural networks and when Component velocity stream convolutional neural networks and the concatenated full articulamentum neural network of subsequent one carry out projected depth Q learning network intelligence Avoidance obstacle device；

S3, design Reward-Penalty Functions: Reward-Penalty Functions indicate that it is for being oriented to depth Q learning network intelligent barrier avoiding controller with scalar r Learning process, evaluation is made to the quality of the taken movement of depth Q learning network intelligent barrier avoiding controller；

S4, design intelligent switching threshold function: the intelligent switching threshold function is used to be learnt according to its threshold value in depth Q Real-time intelligent switches between network intelligence avoidance obstacle device and high accuracy positioning navigation controller, and finally realizes unmanned boat in nothing Entirely autonomous intelligence navigation under people's manipulation；

On-line study: S5 for the on-line study process for describing depth Q learning network intelligent barrier avoiding controller, needs definition status to become Measure S, memory playback library D, valuation functions Q (s_t,a_t) it is as follows:

S=[d_t,o_t], t=0,1,2 ... (1)

D=[(s, a, r, s', a') ... ...], t=0,1,2 ... (2)

Wherein, subscript t indicates sampling instant t, so s_tIndicate the state variable of t moment；a_tIndicate the control signal of t moment；

And a network is designed as depth Q learning network intelligent barrier avoiding controller current value network, while designing one and tying with it The identical network of structure, as depth Q learning network intelligent barrier avoiding controller target value network.

2. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special Sign is, N-dimensional d in the step S2_tThe far design data of input is space length stream convolutional Neural in parallel The input of network, the network have 2 layers, are respectively from output is input to: N-dimensional d_tInput ties up middle layer after the layer of convolution pond for M, The M ties up middle layer and ties up output layer after the layer of convolution pond for M/2；

N-dimensional o_tThe input of component velocity stream convolutional neural networks, the network when component velocity big data of input is designed as in parallel There are 2 layers, is respectively from output is input to: N-dimensional o_tInput ties up middle layer after the layer of convolution pond for M, which ties up middle layer through convolution Output layer is tieed up after the layer of pond for M/2；

Space length stream convolutional neural networks M/2 dimension output layer and when component velocity stream convolutional neural networks M/2 tie up output layer Parallel connection forms a M dimension input layer, by a full articulamentum neural network, realizes to 5 dimension output layer Q (s, a；W) defeated Out, the 5 dimension output is the steering that " stopping, forward and backward, right, left " is executed to unmanned boat and the Q value of motion control execution signal respectively Estimation.Wherein, S is the state variable in formula (1), and a is control signal, and w is that depth Q learning network intelligent barrier avoiding controller is current It is worth the connection weight parameter of network.

3. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special Sign is, if unmanned boat successfully avoids the barrier of ambient enviroment in the step S3, is evaluated as r=1；If unmanned boat knocks The barrier of ambient enviroment, is evaluated as r=-1；It is other as a result, be then evaluated as r=0, the depth Q learning network intelligent barrier avoiding The purpose of controller seeks to the sum of the Reward-Penalty Functions value for obtaining unmanned boat maximum；

4. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special Sign is, in the step S4, first designs real-time evaluation of risk factor ξ are as follows:

Wherein, [0,1] k ∈ is used to indicate the susceptibility to relative velocity, and the real-time evaluation of risk factor ξ is bigger, collision Risk is higher；

And switching threshold λ=0.091 of intelligent switching threshold function is set, as ξ >=λ, executes depth Q learning network and intelligently keep away Hinder the control signal a=a of controller output₁；As ξ < λ, the control signal a of high accuracy positioning navigation controller output is executed =a₂, it is shown below:

5. the unmanned boat intelligence navigation control method of space-time double fluid data-driven depth Q study as described in claim 1, special Sign is that in the step S6, learning process recycles execution as follows:

Step S6.1, initialization memory playback library D is full 0 matrix, with small pseudo random number random initializtion depth Q learning network intelligence The connection weight parameter w and depth Q learning network intelligent barrier avoiding controller target value network of energy avoidance obstacle device current value network Connection weight parameter w^-；

The acquired time-division big data information of 360 ° of pulse laser laser welders of unmanned boat is passed to depth Q study by step S6.2 The input terminal of network intelligence avoidance obstacle device current value network and depth Q learning network intelligent barrier avoiding controller target value network, The characteristic information (s, a, r, s', a') of environment is obtained, and is stored in memory playback library D, wherein s' is subsequent time State variable, a' are the control signals of subsequent time output；

Step S6.4, with depth Q learning network intelligent barrier avoiding controller current value network query function Q (s, a；W), learnt with depth Q Network intelligence avoidance obstacle device target value network query function: y=r+ γ max_a'Q(s',a'；w^-), wherein γ is discount factor；

Step S6.5, with I=(r+ γ max_a'Q(s',a'；w^-)-Q(s,a；w))²Letter is lost for the model of depth Q learning network Number, and loss function based on this model, are learnt, Lai Tisheng depth Q learning network intelligent barrier avoiding using stochastic gradient descent algorithm The connection weight parameter w of controller current value network；

Step S6.6 is assigned every N step by the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller current value network It is worth to the connection weight parameter w of depth Q learning network intelligent barrier avoiding controller target value network^-。