CN110196605A

CN110196605A - A kind of more dynamic object methods of the unmanned aerial vehicle group of intensified learning collaboratively searching in unknown sea area

Info

Publication number: CN110196605A
Application number: CN201910346512.6A
Authority: CN
Inventors: 岳伟; 关显赫; 刘中常; 王丽媛
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2019-09-03
Anticipated expiration: 2039-04-26
Also published as: CN110196605B

Abstract

The more dynamic object methods of collaboratively searching that the invention discloses a kind of unmanned aerial vehicle groups of intensified learning in unknown sea area, comprising the following steps: S1: divide using Grid Method to searching area: manor consciousness information figure S2 is established according to the pheromone concentration of unmanned plane within a certain area: Q value table is designed according to drone status information and decision u (k)；S3: flight path and execution according to the Q value of unmanned aerial vehicle group current state using Boltzmann distribution mechanism selection unmanned plane；S4: it is designed to the Reward-Penalty Functions of evaluation unmanned plane during flying state using search efficiency function, is updated according to Q value of the Reward-Penalty Functions to the new state that unmanned aerial vehicle group reaches；S5: unmanned aerial vehicle group arrival new state is updated to current state, persistently makes the study that flight path decision is finally completed entire Q value table, unmanned aerial vehicle group makes a policy according to trained Q table, completes search mission.

Description

A kind of unmanned aerial vehicle group of intensified learning more dynamic objects of collaboratively searching in unknown sea area Method

Technical field

The present invention relates to unmanned aerial vehicle (UAV) control technical fields more particularly to a kind of unmanned aerial vehicle group of intensified learning in unknown sea area The interior more dynamic object methods of collaboratively searching.

Background technique

With the fast development of the technologies such as sensor, wireless communication, intelligent control, the function of unmanned population system increasingly increases By force, application field constantly expands.Unmanned population system is because of its scalability, strong collaborative and low-loss, Synergy It is more and more paid close attention to application study by academia, industry and national defence, and more UAV collaborative searching systems can be effective Search efficiency is improved, there is huge in particular for the search that there is dynamic object under the complicated sea situation such as uncertainty, strong jamming Big advantage, therefore, more UAV collaborations sea area search are one of the important directions of unmanned population system research.

Traditional searching method is using cover type search, such as the search of Back Word type, traversal search etc., this way of search Generally to maximize covering mission area to find target as much as possible, in recent years, combining target existing probability establishes search Graph model is solved using distributed model predictive control, effectively reduces the solution scale of searching decisions problem, but only It is limited to the search of static object.For dynamic object, average detected time and average detection probability are calculated using bayes method, But it is only applicable to the search to marine single target, is not able to satisfy the demand of multiple target search.

Summary of the invention

According to problem of the existing technology, the invention discloses a kind of unmanned aerial vehicle groups of intensified learning in unknown sea area The more dynamic object methods of collaboratively searching, this method consider environment, unmanned plane dynamic, target dynamic and sensor detection mould first Type establishes more sea areas UAV search graph, then, is updated using the concept of manor consciousness information figure to search graph, expands original Search graph.Intensified learning method is finally utilized, designs Reward-Penalty Functions in conjunction with search efficiency function, generates more UAV collaborations online The path of search.

Specifically includes the following steps:

S1: searching area is divided using Grid Method: based on sea environment, unmanned plane dynamic, sea moving ship Dynamic and sensor detection model information establish more sea areas UAV search graph；It is dense according to the pheromones of unmanned plane within a certain area Degree establishes manor consciousness information figure, expands more sea areas UAV search graph using manor consciousness information figure；

S2: Q value table is designed according to drone status information and decision u (k)；

S3: according to the Q value of unmanned aerial vehicle group current state using the flight path of Boltzmann distribution mechanism selection unmanned plane And execute, when unmanned aerial vehicle group reaches new state according to target detection income J_p, environment search for income J_χ, Executing Cost C, collision The weighted sum of cost I obtains search efficiency function；

S4: the Reward-Penalty Functions of evaluation unmanned plane during flying state are designed to using search efficiency function, according to Reward-Penalty Functions The Q value for the new state that unmanned aerial vehicle group reaches is updated；

S5: by unmanned aerial vehicle group arrival new state be updated to current state, persistently make flight path decision be finally completed it is whole The study of a Q value table, unmanned aerial vehicle group make a policy according to trained Q table, complete search mission.

In S1 specifically in the following way:

S11: manor consciousness information figure is established: as unmanned plane V_iPheromones H is generated when searching for grid (m, n)_i(mn)(k), should Pheromones can be to diffusion at other grids, at grid (a, b), diffusive transport function in search graph are as follows:

Wherein ρ, β are constant；

Work as N_vWhen frame unmanned plane executes search mission, then there is N_vKind pheromones are constantly generated and are spread, and are with grid (c, d) Example, current time pheromone concentration are last moment because the pheromone concentration that volatilization leaves is spread with current newly generated pheromones To the summation of the grid concentration, renewal equation are as follows:

Wherein, τ_H∈ [0,1] is volatilization factor；

As unmanned plane V_iWhen detecting that other information element concentration are high in grid (m, n), indicate other UAV in grid Frequent activity at (m, n), unmanned plane V_iOther information element concentration detected are as follows:

S12: establish destination probability figure: destination probability more new formula is,

P in formula_mn(k) probability existing for for k moment target at (m, n), p_DFor sensor detection probability；p_FFor sensor False-alarm probability；τ ∈ [0,1] is the destination probability multidate information factor.ΔP_mn(k) be probability knots modification, i.e., when grid (m, n) not by When UAV is accessed, since other grids are accessed, probability changes at caused grid (m, n),

In formula, D (k) is the set of k moment all accessed grids；N_vFor unmanned plane quantity.

S13: establish degree of certainty figure: degree of certainty renewal equation is,

Wherein, τ_cFor the multidate information factor of degree of certainty；χ ∈ [0,1] is a constant.

S14: H is set_mn(k) be total pheromone concentration at grid (m, n), wherein pheromone concentration be about grid positions and The function of time, obtaining environment search graph is

In S2 specifically in the following way:

The size of Q value table is by the control instruction of drone status and input, and wherein location status shares L_x×L_yKind, nobody There is z kind in possibility course of the machine at each grid, and every frame UAV, which optionally controls input, l kind, then the line number of the Q table designed is L_x×L_y× z, columns l.

In S3 specifically in the following way:

S31: collision cost I is defined as,

In formulaFor unmanned plane V_iThe manor consciousness shown, that is, other information element concentration detected, Calculation formula is as follows:

In above formula, H_mn(k) the pheromones total amount generated at grid (m, n) for all unmanned planes.

In S4 specifically in the following way:

S41: not considering no-fly zone, then Reward-Penalty Functions design is as follows,

A is constant, influences the generalization ability and a × J of learning process^k(s (k), u (k)) ∈ (- R, R), maximum reward are R, Maximum punishment is set as-R, actual range of the d between UAV, and J (s (k), u (k)) is search efficiency function, and D is minimum safe distance From d >=D need to be met to guarantee each UAV safe flight.

S42: when there is no-fly zone, if B is unmanned plane apart from no-fly zone circle center distance, then B is greater than no-fly zone radius D^*, Reward-Penalty Functions are further improved as follows at this time,

That is UAV, which collides or fly into no-fly zone, will all give maximum punishment.

By adopting the above-described technical solution, a kind of unmanned aerial vehicle group of intensified learning provided by the invention is in unknown sea area The more dynamic object methods of collaboratively searching, this method solve multiple no-manned planes to cooperate with this primary safety problem of collision prevention, and utilizes Search efficiency function designs new Reward-Penalty Functions, can plan that online more UAV are searched according to the quality of efficiency using intensified learning method Rope track, and search graph is updated with search result, greatly improve search efficiency.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.

Fig. 1 is airborne sensor detection model schematic diagram；

Fig. 2 is intensified learning search initial stage schematic diagram；

Fig. 3 is sea area information learning search process schematic diagram；

Fig. 4 is UAV search result schematic diagram；

Fig. 5 is random search track plot；

Fig. 6 is traversal search track plot；

Fig. 7 is the flow chart of this method.

Specific embodiment

To keep technical solution of the present invention and advantage clearer, with reference to the attached drawing in the embodiment of the present invention, to this Technical solution in inventive embodiments carries out clear and complete description:

Such as a kind of unmanned aerial vehicle group of the intensified learning more dynamic object sides of collaboratively searching in unknown sea area shown in Fig. 7 Fig. 1 Method, specifically includes the following steps:

S1: searching area is divided using Grid Method, is divided into L_x×L_yA grid.Based on sea environment, unmanned plane Dynamically, sea moving ship dynamic and sensor detection model information establish more sea areas UAV search graph, establish search graphWherein (m, n) is grid coordinate, and k is the moment, and specific value calculating process is as follows:

Wherein ρ, β are constant；

Wherein, τ_H∈ [0,1] is volatilization factor；

S14: under inertial reference coordinate, it is as follows to establish unmanned plane motion model:

Wherein, (x_i,y_i)∈R²For V_iSearch plane location status,v_iRespectively V_iYaw angle and speed, u_i For decision variable u_i∈ [- 1,1], η_maxiFor V_iPeak turn rate, by UAV performance constraints, speed, which need to meet, v_i ∈[v_min,v_max], and

S15:UAV with fixed angle installs visible light sensor and the case where with fixed height horizontal flight.Such as Fig. 1 institute Show, under relative coordinate system, detection width can be described by following formula:

d_u=2h_u·tanγ_u/sinα_u

In formula, h_uFor the flying height of UAV, α_uFor sensor established angle, γ_uFor the horizontal field of view angle of sensor.

In S2 specifically in the following way: there is z kind in possibility course of the unmanned plane at each grid, and every frame UAV is optional Control input has l kind, so the line number of the Q table of design is L_x×L_y× z, columns l.Each value of Q value table initialization is 0.

In S3 specifically in the following way:

S31: according to the Q value of unmanned aerial vehicle group current state, using Boltzmann distribution mechanism as probability selection decision.In shape Under state s (k), probability that set of strategies u (k) is selected are as follows:

In formula, u ∈ A indicates that strategy u is a certain executable strategy in decision set A；The size of T determines that study is explored not Know the ability in space, T value is bigger, and the ability for exploring new decision space stronger (if T is infinitely great, has P (u (k))=1/m, as Stochastic Decision-making).T is defined as,

T=T₀n^-1/λ

In formula, parameter lambda > 1, T₀>0。

S32: unmanned aerial vehicle group reaches new state after executing the decision, generates search efficiency function, the Efficiency Function is by target It was found that income J_p, environment search for income J_χ, Executing Cost C, collide cost I weighted sum obtain,

J (s (k), u (k))=w₁J_p(k)+w₂J_χ(k)-w₃C(k)-w₄I(k)

In formula: 0≤w_i≤ 1 (i=1,2,3,4) is weight.Notice that above-mentioned every income and cost have different amounts Guiding principle, it is therefore desirable to sum again after being normalized respectively.

S33: wherein target detection income J_pIt is to be got by destination probability calculating, destination probability more new formula is,

P in formula_DFor sensor detection probability；p_FFor sensor false-alarm probability；τ ∈ [0,1] be destination probability multidate information because Son.ΔP_mnIt (k) is probability knots modification, i.e., when grid (m, n) is not accessed by UAV, since other grids are accessed, caused grid Probability changes at lattice (m, n),

In formula, D (k) is the set of k moment all accessed grids；If UAV platform access (m, n), p_mn(k) It updates and detects variable b with platform sensor_kCorrelation, b_k=1 expression airborne sensor detects target, b_k=0 indicates sensor Do not detect target.

So target detection income is,

S34: it calculates environment and searches for income J_χ.With the search of UAV and the observation of sensor, UAV to region of search gradually Understand, the comentropy of corresponding search graph gradually decreases, therefore environment search income is defined as the reduction amount of comentropy:

J_χ(k)=H (k)-H (k+1)

In formulaFor the comentropy at k moment, it describes current environment Degree of uncertainty.

S35: calculate Executing Cost C, Executing Cost be UAV fly to during target point time loss and fuel oil disappear Consumption can be estimated using formula (14):

S36: collision cost I is defined as,

In formulaFor unmanned plane V_iThe manor consciousness shown, that is, other information element concentration detected.Its Calculation formula is as follows:

As unmanned plane V_iWhen searching for grid (m, n), pheromones H is generated_i(mn)(k), which can be to it in search graph It is spread at his grid, at grid (a, b), diffusive transport function is,

Wherein ρ, β are constant.Work as N_vWhen frame UAV executes search mission, then there is N_vKind pheromones are constantly generated and are spread, with For grid (c, d), current time pheromone concentration is last moment because the pheromone concentration that volatilization leaves newly is generated with current Pheromones be diffused into the summation of the grid concentration, renewal equation are as follows:

In formula, τ_H∈ [0,1] is volatilization factor.

In S4 specifically in the following way:

S41: expectation obtains higher overall efficiency J (s (k), u (k)), after unmanned plane executes hunting action every time, such as Fruit obtains higher performance, then gives and reward immediately；If obtaining lower efficiency, gives and punish immediately.

Reward-Penalty Functions r (k) design is as follows,

Wherein, a is constant, influences the generalization ability and a × J of learning process^k(s (k), u (k)) ∈ (- R, R), most Grand Prix It encourages as R, maximum punishment is set as-R.J (s (k), u (k)) is determined by formula (16).Actual range of the d between UAV,

D is minimum safe distance, to guarantee each UAV safe flight, need to meet d >=D.

S42: considering no-fly zone, if B is unmanned plane apart from no-fly zone circle center distance, then B should be greater than no-fly zone radius D^*, this When Reward-Penalty Functions be further improved it is as follows,

That is UAV, which collides or fly into no-fly zone, will all give maximum punishment.Phytal zone and no-fly is searched as shown in Figure 2 Area will give and punish.

The update rule of S43:Q value function is,

In formula, s_iIt (k) is V_iCurrent state；u_i(k) it is the decision currently selected, that is, changes the yaw angle of UAV；r(k) It is aircraft with state s_i(k) implementation strategy u_i(k) state s is reached_i(k+1) reward value at once or penalty value obtained afterwards；Expression state s_i(k+1) the maximum Q value that tactful u is obtained is taken；α ∈ [0,1] is learning rate；γ is folding Detain the factor.

In S5 specifically in the following way: the new state that unmanned aerial vehicle group reaches being updated to current state, is persistently made certainly Plan is finally completed the study to entire Q value table, and after Q table is finally restrained, it is as shown in Figure 3 to have recorded sea area information.Unmanned aerial vehicle group root It makes a policy according to trained Q table, completion search mission, the sea area information that Fig. 4 is grasped by UAV more after search mission, Including whole phytal zones, no-fly zone, 9 naval vessels sea area covered along straight line cruise.

Under this experiment condition, the search effect based on Q-Learning algorithm is compared with random search and traversal search Compared with using monte carlo method experiment 500 times, random collection and traversal search analogous diagram are as shown in Figure 5,6.It is strong in three kinds of methods Chemistry practises search efficiency highest, average each time step about than random search more searches out a target, with the time into Although row traversal search may also find that all targets, efficiency are extremely low.Emulation experiment shows the validity of the algorithm, and passes through Multiple no-manned plane collaboration dynamic object search is realized in comparative analysis verifying, is more effectively realized than original searching method to dynamic mesh Target search.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims

1. a kind of more dynamic object methods of unmanned aerial vehicle group of intensified learning collaboratively searching in unknown sea area, it is characterised in that including Following steps:

S1: searching area is divided using Grid Method: based on sea environment, unmanned plane dynamic, sea moving ship dynamic More sea areas UAV search graph is established with sensor detection model information；It is built according to the pheromone concentration of unmanned plane within a certain area Stand-up collar ground consciousness information figure, utilizes manor consciousness information figure to expand more sea areas UAV search graph；

S3: it using the flight path of Boltzmann distribution mechanism selection unmanned plane and is held according to the Q value of unmanned aerial vehicle group current state Row, when unmanned aerial vehicle group reaches new state according to target detection income J_p, environment search for income J_χ, Executing Cost C, collision cost I Weighted sum obtain search efficiency function；

S4: the Reward-Penalty Functions of evaluation unmanned plane during flying state are designed to using search efficiency function, according to Reward-Penalty Functions to nothing The Q value for the new state that man-machine group reaches is updated；

S5: unmanned aerial vehicle group arrival new state is updated to current state, flight path decision is persistently made and is finally completed entire Q value The study of table, unmanned aerial vehicle group make a policy according to trained Q table, complete search mission.

2. a kind of unmanned aerial vehicle group of intensified learning according to claim 1 more dynamic objects of collaboratively searching in unknown sea area Method, it is further characterized in that: in S1 specifically in the following way:

S11: manor consciousness information figure is established: as unmanned plane V_iPheromones H is generated when searching for grid (m, n)_i(mn)(k), the information Element can be spread in search graph at other grids, then the pheromones diffusive transport function at grid (a, b) are as follows:

Wherein ρ, β are constant；

Work as N_vWhen frame unmanned plane executes search mission, then there is N_vKind of pheromones are generated and are spread, if at grid (c, d), current time It is dense that pheromone concentration is that the pheromone concentration that leaves by volatilization last moment with currently newly generated pheromones is diffused into the grid The summation of degree, renewal equation are as follows:

Wherein, τ_H∈ [0,1] is volatilization factor；

As unmanned plane V_iWhen detecting that other information element concentration are high in grid (m, n), indicate other UAV at grid (m, n) Frequent activity, unmanned plane V_iOther information element concentration detected are as follows:

P in formula_mn(k) probability existing for for k moment target at (m, n), p_DFor sensor detection probability；p_FFor sensor false-alarm Probability；τ ∈ [0,1] is the destination probability multidate information factor, Δ P_mnIt (k) is probability knots modification, i.e., when grid (m, n) is not by UAV When access, since other grids are accessed, probability changes at caused grid (m, n):

In formula, D (k) is the set of k moment all accessed grids；N_vFor unmanned plane quantity；

Wherein, τ_cFor the multidate information factor of degree of certainty；χ ∈ [0,1] is a constant；

S14: H is set_mnIt (k) is pheromone concentration total at grid (m, n), wherein pheromone concentration is about grid positions and time Function, obtain environment search graph be

3. a kind of unmanned aerial vehicle group of intensified learning according to claim 1 more dynamic objects of collaboratively searching in unknown sea area Method, it is further characterized in that: in S2 specifically in the following way:

Wherein the size of Q value table is by the control instruction of drone status and input, and wherein location status shares L_x×L_yKind, nobody There is z kind in possibility course of the machine at each grid, and every frame UAV, which optionally controls input, l kind, then the line number of the Q table designed is L_x×L_y× z, columns l.

4. a kind of unmanned aerial vehicle group of intensified learning according to claim 1 more dynamic objects of collaboratively searching in unknown sea area Method, it is further characterized in that: in S3 specifically in the following way:

S31: collision cost I is defined as,

In formulaFor unmanned plane V_iThe manor consciousness shown, that is, other information element concentration detected calculate Formula is as follows:

5. a kind of unmanned aerial vehicle group of intensified learning according to claim 1 more dynamic objects of collaboratively searching in unknown sea area Method, it is further characterized in that: in S4 specifically in the following way:

A is constant, influences the generalization ability and a × J of learning process^k(s (k), u (k)) ∈ (- R, R), maximum reward is R, maximum Punishment is set as-R, actual range of the d between UAV, and J (s (k), u (k)) is search efficiency function, and D is minimum safe distance, is Guarantee each UAV safe flight, d >=D need to be met；

S42: when there is no-fly zone, if B is unmanned plane apart from no-fly zone circle center distance, then B is greater than no-fly zone radius D^*, at this time Reward-Penalty Functions further improvement is as follows,