CN115718497A - Multi-unmanned-boat collision avoidance decision method - Google Patents

Multi-unmanned-boat collision avoidance decision method Download PDF

Info

Publication number
CN115718497A
CN115718497A CN202211480755.7A CN202211480755A CN115718497A CN 115718497 A CN115718497 A CN 115718497A CN 202211480755 A CN202211480755 A CN 202211480755A CN 115718497 A CN115718497 A CN 115718497A
Authority
CN
China
Prior art keywords
collision avoidance
unmanned
speed
mutual
theta
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211480755.7A
Other languages
Chinese (zh)
Inventor
吴德烽
薛德来
刘源铄
刘启俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jimei University
Original Assignee
Jimei University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jimei University filed Critical Jimei University
Priority to CN202211480755.7A priority Critical patent/CN115718497A/en
Publication of CN115718497A publication Critical patent/CN115718497A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to a collision avoidance decision method for multiple unmanned boats. Meanwhile, collision risks and COLREGs are considered, environmental information is represented and environmental risks are evaluated through mutual speed obstacle areas, and the near-end strategy optimization makes decision according to the evaluated environmental risks. A mutual velocity barrier algorithm is used for improving the action space and the reward function of a near-end strategy optimization algorithm, and a neural network based on a recursion module is used for directly mapping states of different peripheral barriers into actions so as to solve the collision avoidance problem under limited information. The method develops a new reward function based on the mutual velocity obstacle area and the expected collision time, can adapt to a plurality of different environments and solves the problem of sparse reward. The invention combines the near-end strategy optimization and the mutual speed obstacle, and combines the advantages of the near-end strategy optimization and the mutual speed obstacle, thereby perfectly realizing collision avoidance of the multiple unmanned boats on the basis of COLREGs, and ensuring the multiple unmanned boats to perform task safe navigation.

Description

Multi-unmanned-boat collision avoidance decision method
Technical Field
The invention belongs to an autonomous decision-making method for multiple unmanned boats, relates to the unmanned boat technology, the field of path planning algorithms, the field of collision avoidance algorithms, a control method for the multiple unmanned boats and the like, and particularly relates to a collision avoidance decision-making method for the multiple unmanned boats.
Background
In recent years, the demand for resources has prompted countries to increase the exploration and utilization of oceans, and the development of unmanned technology provides technical support for ocean exploration and utilization. The unmanned ship is widely applied to the exploration and utilization of ocean resources as novel ocean equipment. For the marine exploration and development task, a single unmanned ship is difficult to perfectly complete, and the unmanned ship cluster can effectively complete the tasks of marine monitoring, marine rescue, auxiliary mooring and the like. Unmanned ship is the new field of unmanned driving technology research, and marine environment is more complicated than land environment, and many unmanned ships have proposed the challenge to marine safety and environmental protection in the marine traffic engineering, consequently have proposed higher requirement to unmanned ship navigation control and navigation safety. The safe marine navigation of the unmanned boats is ensured under the marine collision avoidance rules (COLREGs), and the realization of autonomous collision avoidance among the unmanned boats has important strategic significance.
In the research of multiple unmanned boats, the control method mainly comprises two forms: 1) The centralized control method is characterized in that in a centralized system, a controller can flexibly coordinate a plurality of unmanned boats in the same working space, and collision in a group is avoided under the condition that group environment information is known. The method can realize more accurate control, but has higher requirements on the system and lower robustness, and is difficult to expand to large-scale groups. 2) A distributed control method, which allows each vessel to make decisions independently from the sensors, is suitable for deploying large numbers of unmanned boats with relatively low computational complexity. The method has strong robustness to the occurrence of errors and emergent faults in the movement of the individual unmanned ships in the cluster. But the control precision is lower, the response is slow, and therefore a mature collision avoidance algorithm needs to be carried to realize safe sailing on the sea. A great deal of research is carried out on ship path planning and collision avoidance algorithms in various scientific research institutes, colleges and universities and enterprises to obtain a series of research results. However, most of the unmanned ships aim at collision avoidance path planning in the field of single ships, and research in the field of multiple unmanned ships is less. Therefore, a collision avoidance decision method for the multiple unmanned boats needs to be researched to realize safe navigation and safe operation of the multiple unmanned boats on the sea.
The prior art has lower control precision on a plurality of unmanned boats, and the control method has no good generalization capability. The artificial potential field method, the dynamic window method and the model prediction control method are mostly applied to the field of single unmanned ships and are less applied to the interaction aspect of multiple unmanned ships. The grid graph method ignores the characteristic of smooth motion trail of the unmanned ship, the speed obstacle method is more applied to the field of multiple unmanned ships, and the unmanned ships can vibrate in the collision avoidance process. Deep reinforcement learning provides a solution for collision avoidance in a complex environment, but network adjustment and reward function adjustment are required to be carried out in collision avoidance of multiple unmanned boats, and randomness is achieved. Most of the existing collision avoidance algorithms aim at the problems that a single unmanned ship is easy to vibrate and fall into local optimality in collision avoidance of multiple unmanned ships.
Disclosure of Invention
The invention aims to solve the problems that the existing scheme can not follow collision avoidance and path planning algorithms conforming to COLREGs and can not well realize safe navigation and safe operation of multiple unmanned boats at sea, and provides a collision avoidance decision method for multiple unmanned boats. A mutual velocity barrier algorithm is used for improving the action space and the reward function of a near-end strategy optimization algorithm, and a neural network based on a recursion module is used for directly mapping states of different peripheral barriers into actions so as to solve the collision avoidance problem under limited information. The method develops a new reward function based on the mutual velocity obstacle area and the expected collision time, can adapt to a plurality of different environments and solves the problem of sparse reward. The multiple unmanned boats have collision-prevention path planning capability under the control of the algorithm provided by the invention and comply with COLREGs.
In order to achieve the purpose, the technical scheme of the invention is as follows: a collision avoidance decision method for multiple unmanned boats is based on a near-end strategy optimization algorithm, and is assisted by an expansion strategy of a mutual velocity barrier algorithm, the mutual velocity barrier algorithm improves a reward function in the near-end strategy optimization algorithm, the problem of sparse reward in reinforcement learning is solved, the network updating speed is higher, the learning efficiency is higher, the defects of high randomness and low learning rate are improved, and as shown in figure 1, the method specifically comprises the following steps:
step 1, constructing a decision model;
step 2, loading unknown environment and training a model;
step 3, designing a test environment, and extracting the current monitorable environment information;
step 4, environmental perception;
step 5, data processing;
step 6, risk assessment is carried out, and the current risk state of the unmanned ship is checked;
step 7, executing corresponding decision behaviors aiming at risks according to the step 6;
step 8, calculating the reward value according to the step 7;
and 9, judging whether collision avoidance is realized or not, and returning a reward value and a result.
For step 1, the near-end strategy optimization is a three-network structure, which is a variation of a strategy gradient algorithm, and the algorithm structure is as shown in fig. 2, the algorithm starts with initializing a neural network, and is provided with two operator networks, the structure is two layers, and each layer has 256 neurons. Wherein the network is sampled by pi, the old network is sampled by pi old And (6) updating. During the training cycle, π receives current context information, updates state s' according to the information selection action and returns a reward r. The two operator networks are punished through self-adaptive KL, the critic network structure is two layers, each layer of 256 neurons is evaluated to be good or bad through s' and r, and pi is updated. The network updating time can be shortened, and the algorithm efficiency is improved. As shown in fig. 3 and 4, the mutual velocity barrier is a collision avoidance algorithm based on velocity, surrounding information is represented by vectors, and collision risk is evaluated through moving velocity and direction, so that collision avoidance efficiency is improved compared with the situation of observing only the position. Near-end strategy optimization, combined with mutual velocity barriers, performs well on many different tasks, better than previous algorithms.
For step 2, designing a training environment, wherein the optimization target of the near-end strategy optimization algorithm is the expectation of maximization reward, and when the expectation is calculated, the sampling method selects importance sampling. Importance sampling is the key to achieving updates to the theta network with data collected under the parameter theta' network, and two unmanned boats are described by two distribution functions p, q. The calculation expectation formula is as follows:
Figure BDA0003960720020000031
in theory q (x) can be an arbitrary distribution, but in practice p (x) and q (x) are close, from the point of view of the two distribution variances
Var x~p [f(x)]=E x~p [f(x) 2 ]-(E x~p [f(x)]) 2
Figure BDA0003960720020000032
When the sampling data reaches 1000 or more, p (x) = q (x).
And converting the online strategy into the offline strategy by using an importance sampling method. Solving for expectations in strategic gradients
Figure BDA0003960720020000033
Is converted into
Figure BDA0003960720020000034
Where tau is the sampling trajectory and where,
Figure BDA0003960720020000035
is a correction term.
Applied to the actual environment for gradient updating
Figure BDA0003960720020000036
Wherein A is θ (s t ,a t ) Is an evaluation function for evaluating the quality of the selection action a under the state s at the time t.
New optimization function
Figure BDA0003960720020000037
Obtaining a near-end strategy optimization definitional formula from the above formula
Figure BDA0003960720020000038
Wherein beta is a weight coefficient, the KL divergence is used for describing the difference measurement between theta and theta', and the difference refers to the difference of behaviors (operators) corresponding to the parameters. β KL (θ, θ') is a limiting condition.
Mutual velocity barriers assume that the opposing party uses the same strategy, rather than maintaining uniform motion, as shown in FIG. 4, and can be described using equation (9)
Figure BDA0003960720020000041
The mutual velocity barrier is not to select a new velocity for each of the unmanned boats other than the other unmanned boat velocity barrier, but to select an average of the current velocity and velocities outside the other unmanned boat velocity barrier, v A 、v B Is the current selected speed of the unmanned vehicle; mutual velocity barrier from unmanned boat B to unmanned boat A
Figure BDA0003960720020000042
All speeds comprising agent A, i.e. current speed v A And speed barrier of unmanned boat B
Figure BDA0003960720020000043
Average value of the velocity in. It can be geometrically interpreted as a speed obstacle
Figure BDA0003960720020000044
It is translated so that its vertex is located
Figure BDA0003960720020000045
Considering that collision avoidance of unmanned boats follows the rules of sea traffic collision avoidance, the right side is selected when the collision avoidance strategy is executed. Let drones A and B select a new speed v 'outside of mutual speed barriers of each other' A And v' B Equation (10) demonstrates its security.
Figure BDA0003960720020000046
For step 2, the operation process of the algorithm training model is specifically divided into the following steps:
step 2.1, determining the current positions of the unmanned boats and target points of the unmanned boats according to the designed unknown environment;
2.2, evaluating the current collision risk by mutual speed barriers, feeding the result back to the near-end strategy optimization, and executing the action by the network pi and updating the position state and the action state to obtain a network parameter theta';
step 2.3, network π old Making a decision according to the environment to obtain a network parameter theta;
step 2.4, updating theta by theta 'through the KL divergence of theta and theta';
step 2.5, in the mutual velocity obstacle evaluation of the current collision risk, if the collision risk is detected, predicting the velocity state of the barrier at the next moment, and changing the velocity and the direction of the unmanned ship according to the state of the barrier at the next moment to enable the unmanned ship to avoid the barrier;
step 2.6, if the unmanned ship is farther away from the target point, feeding back a lower reward value, and adjusting the movement direction of the unmanned ship to approach the target point;
step 2.7, if the difference between the selected speed and the expected speed is large, feeding back a lower reward value, and adjusting the speed of the unmanned ship to approach the expected speed;
step 2.8, judging whether collision avoidance is finished or not, and if the collision avoidance is finished and a target point is reached, obtaining a basic collision avoidance route;
step 2.9, if the collision avoidance behavior is not finished, returning to the step 2.1, and continuing to iteratively update until a target point is reached;
and 2.10, training for N times to obtain the optimal collision avoidance route, completing algorithm training and obtaining a training model.
And 3, designing a test environment, and obtaining preliminary information according to the test environment and the current unmanned ship position state for making a decision at the next moment.
For step 4, ambient information is monitored, represented by a mutual velocity barrier vector.
For step 5, the gru neural network processes the input information into the same dimension, see fig. 5.
And 6, setting a maximum detection range for each unmanned boat sensor, and dividing the signals to be received by the size, the current speed, the current heading and the collision avoidance radius of each unmanned boat in the detection range. After the prior information of the local environment is obtained, the local collision avoidance path planning can be realized.
And 7, performing collision avoidance behavior, normal navigation or acceleration behavior according to mutual velocity obstacle algorithm evaluation.
And 8, feeding back the reward according to the distance between the current state of the unmanned ship and the target point, and guiding the decision-making behavior of the unmanned ship at the next moment.
For step 9, the model learns the action strategy by continuously interacting with the environment, the learning effect is represented by the cumulative reward value for each training event, and the total reward value and outcome are calculated.
Compared with the prior art, the invention has the following beneficial effects:
the method of the invention forms an extension strategy combined with a mutual velocity barrier algorithm on the basis of a near-end strategy optimization algorithm. When the algorithm is used for local collision avoidance, a mutual velocity barrier improvement reward function determines decision-making behaviors, surrounding barriers and other environmental information are uniformly represented by mutual velocity barrier vectors and used for strategy evaluation of collision risks, namely, the barriers are found in a detectable range, and whether the positions of the barriers cause collision threats at the next moment is judged according to the velocity information (size and direction) of the observed barriers. The near-end strategy optimization executes collision avoidance behavior according to the collision risk, collision avoidance behavior rules conform to COLREGs, a collision avoidance safe navigation task is completed through an optimal path, and an algorithm operation flow structure diagram after mutual velocity barriers are added is shown in FIG. 6.
The method carries out the fusion of the near-end strategy optimization and the mutual speed barrier, the mutual speed barrier is used for representing the improvement of the environmental information and the reward function, the collision prevention efficiency of the algorithm is improved, the problems that the algorithm is easy to fall into local optimum and shock motion are solved, the collision prevention capability of the algorithm is improved, the generalization capability is good, and the safe sailing efficiency of collision prevention of the unmanned boats on the water surface is improved generally.
Drawings
Fig. 1 is a flow chart of collision avoidance decision of multiple unmanned boats.
Fig. 2 is a diagram of a near-end policy optimization algorithm.
FIG. 3 is a diagram of a velocity barrier algorithm
Fig. 4 is a diagram of a mutual velocity barrier algorithm.
Fig. 5 is a flowchart of GRU data processing.
FIG. 6 is a schematic diagram of a fusion mutual velocity barrier algorithm structure based on near-end strategy optimization.
Fig. 7 is a structure view of a double-paddle unmanned boat.
Fig. 8 is verification of mutual collision avoidance of multiple unmanned boats.
Fig. 9 is a static obstacle verification scenario for collision avoidance of multiple unmanned boats.
Fig. 10 is a verification of multiple drones in a dynamic, static barrier scenario.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
Aiming at the defects that the collision avoidance rate of the near-end strategy optimization algorithm is low and the randomness is too high and the situation that the near-end strategy optimization algorithm is easy to fall into the local optimal solution, the near-end strategy optimization algorithm is improved by adding the mutual speed obstacle.
One of the technical problems faced in the unmanned ship cluster is collision avoidance, and a good decision strategy is needed in a complex sea area environment to ensure safe sailing of the unmanned ships. The near-end strategy optimization has good performance in unknown environment exploration and very quick response, but the characteristics of low navigation speed, smooth track and the like of the unmanned ship need to be considered in unmanned ship application, a mutual velocity barrier algorithm is introduced to improve a reward function mechanism, and the collision avoidance problem under limited information is solved.
By improving the optimization of the near-end strategy, an expansion strategy combined with mutual speed barriers is added, and the process is as follows:
the geometric definition of the velocity barrier is shown in fig. 7. Let
Figure BDA0003960720020000061
Minkowski sum representing two drones a and B, then let-a represent drones a at their reference points:
Figure BDA0003960720020000062
let λ (s, v) denote the a-ray in the v direction starting from s:
λ(s,v)={s+tv|t≥0}
the VO area of the unmanned boat A generated by the unmanned boat B is given by the following formula
Figure BDA0003960720020000063
Indicating that unmanned boats a and B will collide at a certain moment.
In the actual voyage of USVs, this approach can result in undesirable oscillatory motion when each drone views the other drones as moving obstacles and selects for itself a velocity that is outside of any velocity obstacle induced by the other drones. Imagine the following. The two unmanned boats A and B are respectively provided with v A And v B Are moved towards each other, and thus
Figure BDA0003960720020000064
And
Figure BDA0003960720020000065
along the currentThe continuation of the velocity will result in a collision. Therefore, unmanned ship a decides to change its speed to v' A So that it is outside the speed barrier of B, i.e. it is outside the speed barrier of B
Figure BDA0003960720020000066
At the same time, unmanned boat B changed its speed to v' B So that it is outside the speed barrier of B, i.e.
Figure BDA0003960720020000067
However, in the new case, the old speed v A And v B Outside the speed barriers of B and A, respectively (i.e.
Figure BDA0003960720020000068
And
Figure BDA0003960720020000069
). If both agents prefer the old speeds, they will again select them, since it directly guides them to the target. In the next cycle, these velocities appear to cause collisions, which may be again selected v' A And v' B And so on. Thus, when the speed barrier method is used to avoid each other, the agent oscillates between these two speeds.
To solve the above problem, the speed barrier is improved and described by the following formula:
Figure BDA00039607200200000610
instead of selecting a new speed for each unmanned boat outside the other unmanned boat speed obstacles, a new speed, i.e. the average of its current speed and the speeds outside the other unmanned boat speed obstacles, is selected. Mutual velocity barrier from unmanned boat B to unmanned boat A
Figure BDA0003960720020000071
All speeds including agent A, i.e.Current velocity v A And speed barrier of unmanned boat B
Figure BDA0003960720020000072
Average value of the velocity in. It can be geometrically interpreted as a speed obstacle
Figure BDA0003960720020000073
It is translated so that its vertex is located
Figure BDA0003960720020000074
Considering that collision avoidance of unmanned boats follows the rules of sea traffic collision avoidance, the right side is selected when the collision avoidance strategy is executed. Let drones A and B select a new speed v 'outside of mutual speed barriers of each other' A And v' B The following formula demonstrates its safety.
Figure BDA0003960720020000075
The simplified structure of the algorithm operation flow after the improvement is shown in fig. 6. The operation steps are as follows:
step 1, constructing a decision model, wherein the neural network structure is 2 layers, and each layer comprises 256 neurons.
And 2, training a model, and making a decision to act, wherein the unmanned ship is a double-paddle under-actuated unmanned ship, as shown in fig. 7. The center of mass c of the unmanned boat is positioned at the center of the double-oar axis, (x) c ,y c ) Is the barycentric coordinate of the unmanned boat; alpha is a direction angle, namely an included angle between the motion direction of the unmanned boat and the x axis. The pose vector of the unmanned ship is as follows: p = (x) c ,y c ,α) T . Wherein r is l For the radius of motion, Δ α is the double-oar heading angle increment, v l Indicating the linear velocity of the left blade, v r Indicating the linear speed of the right paddle and l is the distance of the double paddles.
The kinematics equation of the double-paddle differential driving unmanned ship which can be obtained according to the rigid body mechanics is as follows:
Figure BDA0003960720020000076
wherein v is the linear velocity at the barycenter of the unmanned ship, and omega is the steering angular velocity of the unmanned ship;
assuming that the initial pose vector of the unmanned ship is S start =(x 0 ,y 0 ,α) T Current position x c =S start [0],y c =S start [1],α=S start [2]。
Figure BDA0003960720020000077
Wherein cur represents curvature, ste = { -1,0,1}, ste = -1 represents the unmanned boat turns left, ste =0 represents the unmanned boat moves straight, and ste =1 represents the unmanned boat turns right. r is min Representing the minimum turning radius.
Rotation angle
δ=|ste|×l step ×cur×gea
Wherein l step Representing a step size, gea = { -1,1}, gea = -1 is the reverse gear, and gea =1 is the forward gear.
Distance of movement
l trans =(1-|ste|×l step ×gea
Rotation matrix
Figure BDA0003960720020000081
Migration matrix
Figure BDA0003960720020000082
If omega is more than or equal to 0.01 or less than or equal to-0.01, the position at the next moment is
Figure BDA0003960720020000083
If ω → 0, the next time position is
Figure BDA0003960720020000084
Wherein
Figure BDA0003960720020000085
T s Is the sampling time.
Center position and mass coordinate transformation of double-oar center after unmanned ship moving
Figure BDA0003960720020000086
Figure BDA0003960720020000087
And 3, designing different test environments to embody the generalization capability of the model.
And 4, uniformly expressing the surrounding environment information by using vectors, and performing decision by using the vectors as model input.
And step 5, in the sailing process, the behaviors of other ships are observed while the safe sailing of the unmanned ship is ensured, and all the ships learn at the same time, so that the environment is continuously remodeled, the unmanned ship can detect that the number of the other ships around changes continuously, and the dimensionality of the network learning input data changes. For variable-length input sequences, we use the GRU algorithm to extract valid information, as shown in FIG. 4, where O 1 ,O 2 ,O 3 ,O n For the observation of the surrounding vessel within the detection range, O self Is the self state and is connected with the self state value of the ship to form an observation value O with a fixed length. The GRU algorithm reserves the information of each ship on the premise of no distortion, adopts normalization processing to observe data to accelerate the training process, and selects the optimal action through network learning.
And 6, for collision risk evaluation, inputting environmental information into the model through mutual speed barriers, and enabling the model to adjust decision-making behaviors through position information and speed information of barriers around the unmanned ship.
Step 7, the collision avoidance algorithm is converted into a circle segment collision detection algorithm, and as can be seen from the mutual velocity barrier geometric definition, collision avoidance of the two unmanned boats can be converted into collision avoidance of mass points on a circle, namely, the unmanned boat A is regarded as mass points, and the radius R of the unmanned boat A is regarded as A And adding the mixture to the unmanned boat B. The particle motion track is equivalent to the velocity track emitted from the starting point E and is a ray, and assuming that collision avoidance is completed after the time t, the end point is marked as L. C denotes the center of the collision, i.e. P B And R represents the radius of the circle, i.e. R A +R B
Figure BDA0003960720020000091
Wherein
Figure BDA0003960720020000092
The direction vector representing the ray, in the mutual velocity barrier, represents the velocity, from the starting point to the end point.
Figure BDA0003960720020000093
Wherein
Figure BDA0003960720020000094
Representing a vector from the center of the circle to the origin of the ray.
The insertion parameter equation:
P x =E x +td x
P y =E y +td y
finally, a quadratic equation for t is obtained:
Figure BDA0003960720020000095
solving equations in a classification discussion judges the positions of the particle velocity trajectory and the circle.
And 8, in order to solve the problem of sparse reward, a reward evaluation function is set in each step of action, a positive reward is given when the reward evaluation function is close to a target point and avoids an obstacle, and a negative reward is given when the reward is not close to the target point, so that the reward reaches the target point in the shortest time in the optimal path. For this purpose, the invention sets a reward function for a relative speed obstacle algorithm to be described as R rvo
Figure BDA0003960720020000096
Figure BDA0003960720020000097
Figure BDA0003960720020000098
Wherein p is 1 ,p 2 ,p 3 ,p 4 ,p 5 ,p 6 The constant value is set according to the environment in the experiment and is used for adjusting the reward function so as to improve the performance of the strategy function.
Figure BDA0003960720020000099
Representing a selected speed v t And the required speed
Figure BDA00039607200200000910
By setting the maximum distance to 3, dd max =3,R dd Is a speed reward function, is set in the range of (0,1) and is inversely proportional to distance, i.e., the closer the selected speed is to the desired speed reward value, the greater the selected speed. R t Is a time reward function, is set in the range of (0,1) and is inversely proportional to time, i.e., the shorter the time, the larger the reward value. t is t min Is the expected minimum time for the unmanned vehicle to collide with an obstacle at the current speed.
And 9, judging whether the unmanned ship reaches a target point or not according to the test result and the feedback reward value after the test is finished.
The invention combines the near-end strategy optimization and the mutual speed obstacle, and combines the advantages of the near-end strategy optimization and the mutual speed obstacle, thereby perfectly realizing collision avoidance of the multiple unmanned boats on the basis of COLREGs, and ensuring the multiple unmanned boats to perform task safe navigation.
In the invention, the algorithm fusion and the use of the kinematic model are closer to the navigation state of the actual unmanned ship, and the multiple unmanned ships can independently execute actions and can cooperatively operate, so that the collision avoidance of the multiple unmanned ships can be efficiently realized.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (10)

1. A collision avoidance decision method for multiple unmanned boats is characterized by comprising the following steps:
step 1, constructing a decision model;
step 2, loading unknown environment and training a decision model;
step 3, designing a test environment, and extracting current environment information capable of being monitored;
step 4, sensing the environment;
step 5, data processing;
step 6, risk assessment is carried out, and the current risk state of the unmanned ship is checked;
step 7, executing corresponding decision behaviors aiming at risks according to the step 6;
step 8, calculating the reward value according to the step 7;
and 9, judging whether collision avoidance is realized or not, and returning a reward value and a result.
2. The collision avoidance decision method for the multiple unmanned boats according to claim 1, wherein in the step 1, the decision model is constructed by a near-end strategy optimization algorithm and a mutual velocity barrier algorithm; the near-end strategy optimization algorithm firstly starts from initializing a neural network, two operator networks are arranged, the structure is two layers, each layer comprises 256 neurons, wherein the network pi samples the old network pi old Updating(ii) a During the training cycle, the network pi receives the current environment information, selects an action according to the information to update the state s' and returns the reward r; two operator networks are punished through self-adaptive KL; the critic network structure is two layers, each layer of 256 neurons evaluates the action quality through s' and r, and updates the network pi; the mutual velocity obstacle algorithm is a collision avoidance algorithm based on velocity, surrounding information is represented by vectors, and collision risk is evaluated through the moving velocity and the moving direction.
3. The collision avoidance decision method for the multiple unmanned boats according to claim 2, characterized in that in step 2, an unknown environment needs to be designed, the optimization target of the near-end strategy optimization algorithm is the expectation of maximization reward, and when the expectation is calculated, the sampling method selects importance sampling; importance sampling is the key to updating a theta network by collecting data under the condition that the parameter is the theta' network, and two unmanned boats are described by two distribution functions p and q; the calculation expectation formula is as follows:
Figure FDA0003960720010000011
where f (x) is a sampling function, x is the sampling value of p (x), q (x), p = p (x), q = q (x), and q can be theoretically any distribution, but in practice p and q are close, from two distribution variances:
Var x~p [f(x)]=E x~p [f(x) 2 ]-(E x~p [f(x)]) 2
Figure FDA0003960720010000012
when p (x) and q (x) are distributed and the down-sampling data reaches more than 1000, p (x) = q (x);
converting an online strategy into an offline strategy by using an importance sampling method; in a strategy gradient, solving for the expectation:
Figure FDA0003960720010000021
the conversion is:
Figure FDA0003960720010000022
where R (τ) is the reward value, τ is the sample trace, p θ ,p θ’ Is a probability value that is a function of,
Figure FDA0003960720010000023
is a correction term;
and when the method is applied to an actual environment, gradient updating is carried out:
Figure FDA0003960720010000024
wherein A is θ (s t ,a t ) Is an evaluation function, pi θ ,π θ' Is a strategy for two distributions, p θ ,p θ' The probability value is n, and n represents the nth sample and is used for evaluating the quality of the selected action a under the state s at the moment t;
the new optimization function:
Figure FDA0003960720010000025
the near-end policy optimization algorithm definition is obtained from the above formula:
Figure FDA0003960720010000026
wherein beta is a weight coefficient, theta 'represents a demonstration parameter, theta represents a parameter needing to be optimized, KL divergence is used for describing difference measurement between theta and theta', and the difference refers to the difference of behaviors (actors) corresponding to the parameters; beta KL (theta, theta') is a limiting condition;
mutual speed barriers assume that the other party uses the same strategy, rather than maintaining uniform motion, and are described using the following equation:
Figure FDA0003960720010000027
the mutual speed obstacle is not to select a new speed for each unmanned boat except for other unmanned boat speed obstacles, but to select the average value of the current speed and the speeds outside the other unmanned boat speed obstacles; v. of A 、v B Is the current selected speed of the unmanned ship, and the mutual speed barrier from the unmanned ship B to the unmanned ship A
Figure FDA0003960720010000028
All speeds comprising agent A, i.e. current speed v A And speed barrier of unmanned boat B
Figure FDA0003960720010000029
Average of the velocities within; it can be geometrically interpreted as a speed obstacle
Figure FDA00039607200100000210
It is translated so that its vertex is located
Figure FDA00039607200100000211
Considering that collision avoidance of the unmanned ship follows the rules of sea traffic collision avoidance, the right side is selected when a collision avoidance strategy is executed; let drones A and B select a new speed v 'outside of mutual speed barriers of each other' A And v' B The following formula demonstrates its safety:
Figure FDA0003960720010000031
4. the collision avoidance decision method for multiple unmanned boats according to claim 3, wherein in the step 2, the specific steps of training the decision model are as follows:
step 2.1, determining the current positions of the unmanned boats and target points of the unmanned boats according to the designed unknown environment;
2.2, evaluating the current collision risk by mutual speed barriers, feeding the result back to the near-end strategy optimization, and executing the action by the network pi and updating the position state and the action state to obtain a network parameter theta';
step 2.3, network π old Making a decision according to the environment to obtain a network parameter theta;
2.4, updating theta by theta 'through the KL divergence of theta and theta';
step 2.5, in the mutual velocity obstacle evaluation of the current collision risk, if the collision risk is detected, predicting the velocity state of the barrier at the next moment, and changing the velocity and the direction of the unmanned ship according to the state of the barrier at the next moment to enable the unmanned ship to avoid the barrier;
step 2.6, if the unmanned surface vehicle is farther away from the target point, feeding back a lower reward value, and adjusting the movement direction of the unmanned surface vehicle to approach the target point;
step 2.7, if the difference between the selected speed and the expected speed is large, feeding back a lower reward value, and adjusting the speed of the unmanned ship to approach the expected speed;
step 2.8, judging whether collision avoidance is finished or not, and if the collision avoidance is finished and a target point is reached, obtaining a basic collision avoidance route;
step 2.9, if the collision avoidance behavior is not finished, returning to the step 2.1, and continuing to iteratively update until a target point is reached;
and 2.10, training N times to obtain an optimal collision avoidance route algorithm, and finishing training to obtain a trained decision model.
5. The collision avoidance decision method for multiple unmanned boats according to claim 1, wherein the step 3 is specifically as follows: designing a test environment, and obtaining preliminary information according to the test environment and the current unmanned ship position state for making a decision at the next moment.
6. The collision avoidance decision method for multiple unmanned boats according to claim 2, wherein the step 4 is implemented in a manner that: ambient information is monitored and represented by a mutual velocity barrier vector.
7. The collision avoidance decision method for multiple unmanned boats according to claim 1, wherein the step 5 is implemented in a specific manner as follows: the GRU neural network processes the input information into the same dimension.
8. The collision avoidance decision-making method for multiple unmanned boats according to claim 2, wherein the step 6 is implemented in a manner that: the sensor of each unmanned boat needs to set a maximum detection range, and signals needing to be received are divided by the size, the current speed, the current heading and the collision avoidance radius of other boats in the detectable range; and after the prior information of the local environment is obtained, the local collision avoidance path planning is realized.
9. The collision avoidance decision method for multiple unmanned boats according to claim 2, wherein the step 7 is implemented in a manner that: and performing collision avoidance behavior, normal navigation or acceleration behavior according to the mutual velocity barrier algorithm evaluation.
10. The collision avoidance decision method for multiple unmanned boats according to claim 1, wherein the step 8 is implemented in a manner of: and feeding back the reward according to the distance between the current state of the unmanned ship and the target point, and guiding the decision-making behavior of the unmanned ship at the next moment.
CN202211480755.7A 2022-11-24 2022-11-24 Multi-unmanned-boat collision avoidance decision method Pending CN115718497A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211480755.7A CN115718497A (en) 2022-11-24 2022-11-24 Multi-unmanned-boat collision avoidance decision method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211480755.7A CN115718497A (en) 2022-11-24 2022-11-24 Multi-unmanned-boat collision avoidance decision method

Publications (1)

Publication Number Publication Date
CN115718497A true CN115718497A (en) 2023-02-28

Family

ID=85256246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211480755.7A Pending CN115718497A (en) 2022-11-24 2022-11-24 Multi-unmanned-boat collision avoidance decision method

Country Status (1)

Country Link
CN (1) CN115718497A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117168468A (en) * 2023-11-03 2023-12-05 安徽大学 Multi-unmanned-ship deep reinforcement learning collaborative navigation method based on near-end strategy optimization

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117168468A (en) * 2023-11-03 2023-12-05 安徽大学 Multi-unmanned-ship deep reinforcement learning collaborative navigation method based on near-end strategy optimization
CN117168468B (en) * 2023-11-03 2024-02-06 安徽大学 Multi-unmanned-ship deep reinforcement learning collaborative navigation method based on near-end strategy optimization

Similar Documents

Publication Publication Date Title
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
Cheng et al. Path planning and obstacle avoidance for AUV: A review
JP6854549B2 (en) AUV action planning and motion control methods based on reinforcement learning
CN112034711B (en) Unmanned ship sea wave interference resistance control method based on deep reinforcement learning
Lin et al. An improved recurrent neural network for unmanned underwater vehicle online obstacle avoidance
Grigorescu et al. Neurotrajectory: A neuroevolutionary approach to local state trajectory learning for autonomous vehicles
Cao et al. Target search control of AUV in underwater environment with deep reinforcement learning
CN109765929B (en) UUV real-time obstacle avoidance planning method based on improved RNN
CN112558612B (en) Heterogeneous intelligent agent formation control method based on cloud model quantum genetic algorithm
CN113052372B (en) Dynamic AUV tracking path planning method based on deep reinforcement learning
CN112925319B (en) Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
CN111240345A (en) Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN111240356A (en) Unmanned aerial vehicle cluster convergence method based on deep reinforcement learning
CN114967721B (en) Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet
CN115718497A (en) Multi-unmanned-boat collision avoidance decision method
Zhang et al. Intelligent vector field histogram based collision avoidance method for auv
CN108459614B (en) UUV real-time collision avoidance planning method based on CW-RNN network
CN114943168B (en) Method and system for combining floating bridges on water
CN113050420B (en) AUV path tracking method and system based on S-plane control and TD3
Jose et al. Navigating the Ocean with DRL: Path following for marine vessels
CN115480580A (en) NMPC-based underwater robot path tracking and obstacle avoidance control method
CN116774576A (en) Underwater vehicle dynamics black box modeling method based on neural network indirect estimation
Kang et al. Fuzzy logic based behavior fusion for multi-AUV formation keeping in uncertain ocean environment
Sun et al. Deep learning-based trajectory tracking control forunmanned surface vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination