CN115718497A - Multi-unmanned-boat collision avoidance decision method - Google Patents
Multi-unmanned-boat collision avoidance decision method Download PDFInfo
- Publication number
- CN115718497A CN115718497A CN202211480755.7A CN202211480755A CN115718497A CN 115718497 A CN115718497 A CN 115718497A CN 202211480755 A CN202211480755 A CN 202211480755A CN 115718497 A CN115718497 A CN 115718497A
- Authority
- CN
- China
- Prior art keywords
- collision avoidance
- unmanned
- speed
- mutual
- theta
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention relates to a collision avoidance decision method for multiple unmanned boats. Meanwhile, collision risks and COLREGs are considered, environmental information is represented and environmental risks are evaluated through mutual speed obstacle areas, and the near-end strategy optimization makes decision according to the evaluated environmental risks. A mutual velocity barrier algorithm is used for improving the action space and the reward function of a near-end strategy optimization algorithm, and a neural network based on a recursion module is used for directly mapping states of different peripheral barriers into actions so as to solve the collision avoidance problem under limited information. The method develops a new reward function based on the mutual velocity obstacle area and the expected collision time, can adapt to a plurality of different environments and solves the problem of sparse reward. The invention combines the near-end strategy optimization and the mutual speed obstacle, and combines the advantages of the near-end strategy optimization and the mutual speed obstacle, thereby perfectly realizing collision avoidance of the multiple unmanned boats on the basis of COLREGs, and ensuring the multiple unmanned boats to perform task safe navigation.
Description
Technical Field
The invention belongs to an autonomous decision-making method for multiple unmanned boats, relates to the unmanned boat technology, the field of path planning algorithms, the field of collision avoidance algorithms, a control method for the multiple unmanned boats and the like, and particularly relates to a collision avoidance decision-making method for the multiple unmanned boats.
Background
In recent years, the demand for resources has prompted countries to increase the exploration and utilization of oceans, and the development of unmanned technology provides technical support for ocean exploration and utilization. The unmanned ship is widely applied to the exploration and utilization of ocean resources as novel ocean equipment. For the marine exploration and development task, a single unmanned ship is difficult to perfectly complete, and the unmanned ship cluster can effectively complete the tasks of marine monitoring, marine rescue, auxiliary mooring and the like. Unmanned ship is the new field of unmanned driving technology research, and marine environment is more complicated than land environment, and many unmanned ships have proposed the challenge to marine safety and environmental protection in the marine traffic engineering, consequently have proposed higher requirement to unmanned ship navigation control and navigation safety. The safe marine navigation of the unmanned boats is ensured under the marine collision avoidance rules (COLREGs), and the realization of autonomous collision avoidance among the unmanned boats has important strategic significance.
In the research of multiple unmanned boats, the control method mainly comprises two forms: 1) The centralized control method is characterized in that in a centralized system, a controller can flexibly coordinate a plurality of unmanned boats in the same working space, and collision in a group is avoided under the condition that group environment information is known. The method can realize more accurate control, but has higher requirements on the system and lower robustness, and is difficult to expand to large-scale groups. 2) A distributed control method, which allows each vessel to make decisions independently from the sensors, is suitable for deploying large numbers of unmanned boats with relatively low computational complexity. The method has strong robustness to the occurrence of errors and emergent faults in the movement of the individual unmanned ships in the cluster. But the control precision is lower, the response is slow, and therefore a mature collision avoidance algorithm needs to be carried to realize safe sailing on the sea. A great deal of research is carried out on ship path planning and collision avoidance algorithms in various scientific research institutes, colleges and universities and enterprises to obtain a series of research results. However, most of the unmanned ships aim at collision avoidance path planning in the field of single ships, and research in the field of multiple unmanned ships is less. Therefore, a collision avoidance decision method for the multiple unmanned boats needs to be researched to realize safe navigation and safe operation of the multiple unmanned boats on the sea.
The prior art has lower control precision on a plurality of unmanned boats, and the control method has no good generalization capability. The artificial potential field method, the dynamic window method and the model prediction control method are mostly applied to the field of single unmanned ships and are less applied to the interaction aspect of multiple unmanned ships. The grid graph method ignores the characteristic of smooth motion trail of the unmanned ship, the speed obstacle method is more applied to the field of multiple unmanned ships, and the unmanned ships can vibrate in the collision avoidance process. Deep reinforcement learning provides a solution for collision avoidance in a complex environment, but network adjustment and reward function adjustment are required to be carried out in collision avoidance of multiple unmanned boats, and randomness is achieved. Most of the existing collision avoidance algorithms aim at the problems that a single unmanned ship is easy to vibrate and fall into local optimality in collision avoidance of multiple unmanned ships.
Disclosure of Invention
The invention aims to solve the problems that the existing scheme can not follow collision avoidance and path planning algorithms conforming to COLREGs and can not well realize safe navigation and safe operation of multiple unmanned boats at sea, and provides a collision avoidance decision method for multiple unmanned boats. A mutual velocity barrier algorithm is used for improving the action space and the reward function of a near-end strategy optimization algorithm, and a neural network based on a recursion module is used for directly mapping states of different peripheral barriers into actions so as to solve the collision avoidance problem under limited information. The method develops a new reward function based on the mutual velocity obstacle area and the expected collision time, can adapt to a plurality of different environments and solves the problem of sparse reward. The multiple unmanned boats have collision-prevention path planning capability under the control of the algorithm provided by the invention and comply with COLREGs.
In order to achieve the purpose, the technical scheme of the invention is as follows: a collision avoidance decision method for multiple unmanned boats is based on a near-end strategy optimization algorithm, and is assisted by an expansion strategy of a mutual velocity barrier algorithm, the mutual velocity barrier algorithm improves a reward function in the near-end strategy optimization algorithm, the problem of sparse reward in reinforcement learning is solved, the network updating speed is higher, the learning efficiency is higher, the defects of high randomness and low learning rate are improved, and as shown in figure 1, the method specifically comprises the following steps:
step 3, designing a test environment, and extracting the current monitorable environment information;
step 5, data processing;
step 6, risk assessment is carried out, and the current risk state of the unmanned ship is checked;
step 7, executing corresponding decision behaviors aiming at risks according to the step 6;
step 8, calculating the reward value according to the step 7;
and 9, judging whether collision avoidance is realized or not, and returning a reward value and a result.
For step 1, the near-end strategy optimization is a three-network structure, which is a variation of a strategy gradient algorithm, and the algorithm structure is as shown in fig. 2, the algorithm starts with initializing a neural network, and is provided with two operator networks, the structure is two layers, and each layer has 256 neurons. Wherein the network is sampled by pi, the old network is sampled by pi old And (6) updating. During the training cycle, π receives current context information, updates state s' according to the information selection action and returns a reward r. The two operator networks are punished through self-adaptive KL, the critic network structure is two layers, each layer of 256 neurons is evaluated to be good or bad through s' and r, and pi is updated. The network updating time can be shortened, and the algorithm efficiency is improved. As shown in fig. 3 and 4, the mutual velocity barrier is a collision avoidance algorithm based on velocity, surrounding information is represented by vectors, and collision risk is evaluated through moving velocity and direction, so that collision avoidance efficiency is improved compared with the situation of observing only the position. Near-end strategy optimization, combined with mutual velocity barriers, performs well on many different tasks, better than previous algorithms.
For step 2, designing a training environment, wherein the optimization target of the near-end strategy optimization algorithm is the expectation of maximization reward, and when the expectation is calculated, the sampling method selects importance sampling. Importance sampling is the key to achieving updates to the theta network with data collected under the parameter theta' network, and two unmanned boats are described by two distribution functions p, q. The calculation expectation formula is as follows:
in theory q (x) can be an arbitrary distribution, but in practice p (x) and q (x) are close, from the point of view of the two distribution variances
Var x~p [f(x)]=E x~p [f(x) 2 ]-(E x~p [f(x)]) 2
When the sampling data reaches 1000 or more, p (x) = q (x).
And converting the online strategy into the offline strategy by using an importance sampling method. Solving for expectations in strategic gradients
Is converted into
Applied to the actual environment for gradient updating
Wherein A is θ (s t ,a t ) Is an evaluation function for evaluating the quality of the selection action a under the state s at the time t.
New optimization function
Obtaining a near-end strategy optimization definitional formula from the above formula
Wherein beta is a weight coefficient, the KL divergence is used for describing the difference measurement between theta and theta', and the difference refers to the difference of behaviors (operators) corresponding to the parameters. β KL (θ, θ') is a limiting condition.
Mutual velocity barriers assume that the opposing party uses the same strategy, rather than maintaining uniform motion, as shown in FIG. 4, and can be described using equation (9)
The mutual velocity barrier is not to select a new velocity for each of the unmanned boats other than the other unmanned boat velocity barrier, but to select an average of the current velocity and velocities outside the other unmanned boat velocity barrier, v A 、v B Is the current selected speed of the unmanned vehicle; mutual velocity barrier from unmanned boat B to unmanned boat AAll speeds comprising agent A, i.e. current speed v A And speed barrier of unmanned boat BAverage value of the velocity in. It can be geometrically interpreted as a speed obstacleIt is translated so that its vertex is located
Considering that collision avoidance of unmanned boats follows the rules of sea traffic collision avoidance, the right side is selected when the collision avoidance strategy is executed. Let drones A and B select a new speed v 'outside of mutual speed barriers of each other' A And v' B Equation (10) demonstrates its security.
For step 2, the operation process of the algorithm training model is specifically divided into the following steps:
step 2.1, determining the current positions of the unmanned boats and target points of the unmanned boats according to the designed unknown environment;
2.2, evaluating the current collision risk by mutual speed barriers, feeding the result back to the near-end strategy optimization, and executing the action by the network pi and updating the position state and the action state to obtain a network parameter theta';
step 2.3, network π old Making a decision according to the environment to obtain a network parameter theta;
step 2.4, updating theta by theta 'through the KL divergence of theta and theta';
step 2.5, in the mutual velocity obstacle evaluation of the current collision risk, if the collision risk is detected, predicting the velocity state of the barrier at the next moment, and changing the velocity and the direction of the unmanned ship according to the state of the barrier at the next moment to enable the unmanned ship to avoid the barrier;
step 2.6, if the unmanned ship is farther away from the target point, feeding back a lower reward value, and adjusting the movement direction of the unmanned ship to approach the target point;
step 2.7, if the difference between the selected speed and the expected speed is large, feeding back a lower reward value, and adjusting the speed of the unmanned ship to approach the expected speed;
step 2.8, judging whether collision avoidance is finished or not, and if the collision avoidance is finished and a target point is reached, obtaining a basic collision avoidance route;
step 2.9, if the collision avoidance behavior is not finished, returning to the step 2.1, and continuing to iteratively update until a target point is reached;
and 2.10, training for N times to obtain the optimal collision avoidance route, completing algorithm training and obtaining a training model.
And 3, designing a test environment, and obtaining preliminary information according to the test environment and the current unmanned ship position state for making a decision at the next moment.
For step 4, ambient information is monitored, represented by a mutual velocity barrier vector.
For step 5, the gru neural network processes the input information into the same dimension, see fig. 5.
And 6, setting a maximum detection range for each unmanned boat sensor, and dividing the signals to be received by the size, the current speed, the current heading and the collision avoidance radius of each unmanned boat in the detection range. After the prior information of the local environment is obtained, the local collision avoidance path planning can be realized.
And 7, performing collision avoidance behavior, normal navigation or acceleration behavior according to mutual velocity obstacle algorithm evaluation.
And 8, feeding back the reward according to the distance between the current state of the unmanned ship and the target point, and guiding the decision-making behavior of the unmanned ship at the next moment.
For step 9, the model learns the action strategy by continuously interacting with the environment, the learning effect is represented by the cumulative reward value for each training event, and the total reward value and outcome are calculated.
Compared with the prior art, the invention has the following beneficial effects:
the method of the invention forms an extension strategy combined with a mutual velocity barrier algorithm on the basis of a near-end strategy optimization algorithm. When the algorithm is used for local collision avoidance, a mutual velocity barrier improvement reward function determines decision-making behaviors, surrounding barriers and other environmental information are uniformly represented by mutual velocity barrier vectors and used for strategy evaluation of collision risks, namely, the barriers are found in a detectable range, and whether the positions of the barriers cause collision threats at the next moment is judged according to the velocity information (size and direction) of the observed barriers. The near-end strategy optimization executes collision avoidance behavior according to the collision risk, collision avoidance behavior rules conform to COLREGs, a collision avoidance safe navigation task is completed through an optimal path, and an algorithm operation flow structure diagram after mutual velocity barriers are added is shown in FIG. 6.
The method carries out the fusion of the near-end strategy optimization and the mutual speed barrier, the mutual speed barrier is used for representing the improvement of the environmental information and the reward function, the collision prevention efficiency of the algorithm is improved, the problems that the algorithm is easy to fall into local optimum and shock motion are solved, the collision prevention capability of the algorithm is improved, the generalization capability is good, and the safe sailing efficiency of collision prevention of the unmanned boats on the water surface is improved generally.
Drawings
Fig. 1 is a flow chart of collision avoidance decision of multiple unmanned boats.
Fig. 2 is a diagram of a near-end policy optimization algorithm.
FIG. 3 is a diagram of a velocity barrier algorithm
Fig. 4 is a diagram of a mutual velocity barrier algorithm.
Fig. 5 is a flowchart of GRU data processing.
FIG. 6 is a schematic diagram of a fusion mutual velocity barrier algorithm structure based on near-end strategy optimization.
Fig. 7 is a structure view of a double-paddle unmanned boat.
Fig. 8 is verification of mutual collision avoidance of multiple unmanned boats.
Fig. 9 is a static obstacle verification scenario for collision avoidance of multiple unmanned boats.
Fig. 10 is a verification of multiple drones in a dynamic, static barrier scenario.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
Aiming at the defects that the collision avoidance rate of the near-end strategy optimization algorithm is low and the randomness is too high and the situation that the near-end strategy optimization algorithm is easy to fall into the local optimal solution, the near-end strategy optimization algorithm is improved by adding the mutual speed obstacle.
One of the technical problems faced in the unmanned ship cluster is collision avoidance, and a good decision strategy is needed in a complex sea area environment to ensure safe sailing of the unmanned ships. The near-end strategy optimization has good performance in unknown environment exploration and very quick response, but the characteristics of low navigation speed, smooth track and the like of the unmanned ship need to be considered in unmanned ship application, a mutual velocity barrier algorithm is introduced to improve a reward function mechanism, and the collision avoidance problem under limited information is solved.
By improving the optimization of the near-end strategy, an expansion strategy combined with mutual speed barriers is added, and the process is as follows:
the geometric definition of the velocity barrier is shown in fig. 7. LetMinkowski sum representing two drones a and B, then let-a represent drones a at their reference points:
let λ (s, v) denote the a-ray in the v direction starting from s:
λ(s,v)={s+tv|t≥0}
the VO area of the unmanned boat A generated by the unmanned boat B is given by the following formula
Indicating that unmanned boats a and B will collide at a certain moment.
In the actual voyage of USVs, this approach can result in undesirable oscillatory motion when each drone views the other drones as moving obstacles and selects for itself a velocity that is outside of any velocity obstacle induced by the other drones. Imagine the following. The two unmanned boats A and B are respectively provided with v A And v B Are moved towards each other, and thusAndalong the currentThe continuation of the velocity will result in a collision. Therefore, unmanned ship a decides to change its speed to v' A So that it is outside the speed barrier of B, i.e. it is outside the speed barrier of BAt the same time, unmanned boat B changed its speed to v' B So that it is outside the speed barrier of B, i.e.
However, in the new case, the old speed v A And v B Outside the speed barriers of B and A, respectively (i.e.And). If both agents prefer the old speeds, they will again select them, since it directly guides them to the target. In the next cycle, these velocities appear to cause collisions, which may be again selected v' A And v' B And so on. Thus, when the speed barrier method is used to avoid each other, the agent oscillates between these two speeds.
To solve the above problem, the speed barrier is improved and described by the following formula:
instead of selecting a new speed for each unmanned boat outside the other unmanned boat speed obstacles, a new speed, i.e. the average of its current speed and the speeds outside the other unmanned boat speed obstacles, is selected. Mutual velocity barrier from unmanned boat B to unmanned boat AAll speeds including agent A, i.e.Current velocity v A And speed barrier of unmanned boat BAverage value of the velocity in. It can be geometrically interpreted as a speed obstacleIt is translated so that its vertex is located
Considering that collision avoidance of unmanned boats follows the rules of sea traffic collision avoidance, the right side is selected when the collision avoidance strategy is executed. Let drones A and B select a new speed v 'outside of mutual speed barriers of each other' A And v' B The following formula demonstrates its safety.
The simplified structure of the algorithm operation flow after the improvement is shown in fig. 6. The operation steps are as follows:
And 2, training a model, and making a decision to act, wherein the unmanned ship is a double-paddle under-actuated unmanned ship, as shown in fig. 7. The center of mass c of the unmanned boat is positioned at the center of the double-oar axis, (x) c ,y c ) Is the barycentric coordinate of the unmanned boat; alpha is a direction angle, namely an included angle between the motion direction of the unmanned boat and the x axis. The pose vector of the unmanned ship is as follows: p = (x) c ,y c ,α) T . Wherein r is l For the radius of motion, Δ α is the double-oar heading angle increment, v l Indicating the linear velocity of the left blade, v r Indicating the linear speed of the right paddle and l is the distance of the double paddles.
The kinematics equation of the double-paddle differential driving unmanned ship which can be obtained according to the rigid body mechanics is as follows:
wherein v is the linear velocity at the barycenter of the unmanned ship, and omega is the steering angular velocity of the unmanned ship;
assuming that the initial pose vector of the unmanned ship is S start =(x 0 ,y 0 ,α) T Current position x c =S start [0],y c =S start [1],α=S start [2]。
Wherein cur represents curvature, ste = { -1,0,1}, ste = -1 represents the unmanned boat turns left, ste =0 represents the unmanned boat moves straight, and ste =1 represents the unmanned boat turns right. r is min Representing the minimum turning radius.
Rotation angle
δ=|ste|×l step ×cur×gea
Wherein l step Representing a step size, gea = { -1,1}, gea = -1 is the reverse gear, and gea =1 is the forward gear.
Distance of movement
l trans =(1-|ste|×l step ×gea
Rotation matrix
Migration matrix
If omega is more than or equal to 0.01 or less than or equal to-0.01, the position at the next moment is
If ω → 0, the next time position is
Center position and mass coordinate transformation of double-oar center after unmanned ship moving
And 3, designing different test environments to embody the generalization capability of the model.
And 4, uniformly expressing the surrounding environment information by using vectors, and performing decision by using the vectors as model input.
And step 5, in the sailing process, the behaviors of other ships are observed while the safe sailing of the unmanned ship is ensured, and all the ships learn at the same time, so that the environment is continuously remodeled, the unmanned ship can detect that the number of the other ships around changes continuously, and the dimensionality of the network learning input data changes. For variable-length input sequences, we use the GRU algorithm to extract valid information, as shown in FIG. 4, where O 1 ,O 2 ,O 3 ,O n For the observation of the surrounding vessel within the detection range, O self Is the self state and is connected with the self state value of the ship to form an observation value O with a fixed length. The GRU algorithm reserves the information of each ship on the premise of no distortion, adopts normalization processing to observe data to accelerate the training process, and selects the optimal action through network learning.
And 6, for collision risk evaluation, inputting environmental information into the model through mutual speed barriers, and enabling the model to adjust decision-making behaviors through position information and speed information of barriers around the unmanned ship.
Step 7, the collision avoidance algorithm is converted into a circle segment collision detection algorithm, and as can be seen from the mutual velocity barrier geometric definition, collision avoidance of the two unmanned boats can be converted into collision avoidance of mass points on a circle, namely, the unmanned boat A is regarded as mass points, and the radius R of the unmanned boat A is regarded as A And adding the mixture to the unmanned boat B. The particle motion track is equivalent to the velocity track emitted from the starting point E and is a ray, and assuming that collision avoidance is completed after the time t, the end point is marked as L. C denotes the center of the collision, i.e. P B And R represents the radius of the circle, i.e. R A +R B 。
WhereinThe direction vector representing the ray, in the mutual velocity barrier, represents the velocity, from the starting point to the end point.
The insertion parameter equation:
P x =E x +td x
P y =E y +td y
finally, a quadratic equation for t is obtained:
solving equations in a classification discussion judges the positions of the particle velocity trajectory and the circle.
And 8, in order to solve the problem of sparse reward, a reward evaluation function is set in each step of action, a positive reward is given when the reward evaluation function is close to a target point and avoids an obstacle, and a negative reward is given when the reward is not close to the target point, so that the reward reaches the target point in the shortest time in the optimal path. For this purpose, the invention sets a reward function for a relative speed obstacle algorithm to be described as R rvo :
Wherein p is 1 ,p 2 ,p 3 ,p 4 ,p 5 ,p 6 The constant value is set according to the environment in the experiment and is used for adjusting the reward function so as to improve the performance of the strategy function.Representing a selected speed v t And the required speedBy setting the maximum distance to 3, dd max =3,R dd Is a speed reward function, is set in the range of (0,1) and is inversely proportional to distance, i.e., the closer the selected speed is to the desired speed reward value, the greater the selected speed. R t Is a time reward function, is set in the range of (0,1) and is inversely proportional to time, i.e., the shorter the time, the larger the reward value. t is t min Is the expected minimum time for the unmanned vehicle to collide with an obstacle at the current speed.
And 9, judging whether the unmanned ship reaches a target point or not according to the test result and the feedback reward value after the test is finished.
The invention combines the near-end strategy optimization and the mutual speed obstacle, and combines the advantages of the near-end strategy optimization and the mutual speed obstacle, thereby perfectly realizing collision avoidance of the multiple unmanned boats on the basis of COLREGs, and ensuring the multiple unmanned boats to perform task safe navigation.
In the invention, the algorithm fusion and the use of the kinematic model are closer to the navigation state of the actual unmanned ship, and the multiple unmanned ships can independently execute actions and can cooperatively operate, so that the collision avoidance of the multiple unmanned ships can be efficiently realized.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.
Claims (10)
1. A collision avoidance decision method for multiple unmanned boats is characterized by comprising the following steps:
step 1, constructing a decision model;
step 2, loading unknown environment and training a decision model;
step 3, designing a test environment, and extracting current environment information capable of being monitored;
step 4, sensing the environment;
step 5, data processing;
step 6, risk assessment is carried out, and the current risk state of the unmanned ship is checked;
step 7, executing corresponding decision behaviors aiming at risks according to the step 6;
step 8, calculating the reward value according to the step 7;
and 9, judging whether collision avoidance is realized or not, and returning a reward value and a result.
2. The collision avoidance decision method for the multiple unmanned boats according to claim 1, wherein in the step 1, the decision model is constructed by a near-end strategy optimization algorithm and a mutual velocity barrier algorithm; the near-end strategy optimization algorithm firstly starts from initializing a neural network, two operator networks are arranged, the structure is two layers, each layer comprises 256 neurons, wherein the network pi samples the old network pi old Updating(ii) a During the training cycle, the network pi receives the current environment information, selects an action according to the information to update the state s' and returns the reward r; two operator networks are punished through self-adaptive KL; the critic network structure is two layers, each layer of 256 neurons evaluates the action quality through s' and r, and updates the network pi; the mutual velocity obstacle algorithm is a collision avoidance algorithm based on velocity, surrounding information is represented by vectors, and collision risk is evaluated through the moving velocity and the moving direction.
3. The collision avoidance decision method for the multiple unmanned boats according to claim 2, characterized in that in step 2, an unknown environment needs to be designed, the optimization target of the near-end strategy optimization algorithm is the expectation of maximization reward, and when the expectation is calculated, the sampling method selects importance sampling; importance sampling is the key to updating a theta network by collecting data under the condition that the parameter is the theta' network, and two unmanned boats are described by two distribution functions p and q; the calculation expectation formula is as follows:
where f (x) is a sampling function, x is the sampling value of p (x), q (x), p = p (x), q = q (x), and q can be theoretically any distribution, but in practice p and q are close, from two distribution variances:
Var x~p [f(x)]=E x~p [f(x) 2 ]-(E x~p [f(x)]) 2
when p (x) and q (x) are distributed and the down-sampling data reaches more than 1000, p (x) = q (x);
converting an online strategy into an offline strategy by using an importance sampling method; in a strategy gradient, solving for the expectation:
the conversion is:
where R (τ) is the reward value, τ is the sample trace, p θ ,p θ’ Is a probability value that is a function of,is a correction term;
and when the method is applied to an actual environment, gradient updating is carried out:
wherein A is θ (s t ,a t ) Is an evaluation function, pi θ ,π θ' Is a strategy for two distributions, p θ ,p θ' The probability value is n, and n represents the nth sample and is used for evaluating the quality of the selected action a under the state s at the moment t;
the new optimization function:
the near-end policy optimization algorithm definition is obtained from the above formula:
wherein beta is a weight coefficient, theta 'represents a demonstration parameter, theta represents a parameter needing to be optimized, KL divergence is used for describing difference measurement between theta and theta', and the difference refers to the difference of behaviors (actors) corresponding to the parameters; beta KL (theta, theta') is a limiting condition;
mutual speed barriers assume that the other party uses the same strategy, rather than maintaining uniform motion, and are described using the following equation:
the mutual speed obstacle is not to select a new speed for each unmanned boat except for other unmanned boat speed obstacles, but to select the average value of the current speed and the speeds outside the other unmanned boat speed obstacles; v. of A 、v B Is the current selected speed of the unmanned ship, and the mutual speed barrier from the unmanned ship B to the unmanned ship AAll speeds comprising agent A, i.e. current speed v A And speed barrier of unmanned boat BAverage of the velocities within; it can be geometrically interpreted as a speed obstacleIt is translated so that its vertex is located
Considering that collision avoidance of the unmanned ship follows the rules of sea traffic collision avoidance, the right side is selected when a collision avoidance strategy is executed; let drones A and B select a new speed v 'outside of mutual speed barriers of each other' A And v' B The following formula demonstrates its safety:
4. the collision avoidance decision method for multiple unmanned boats according to claim 3, wherein in the step 2, the specific steps of training the decision model are as follows:
step 2.1, determining the current positions of the unmanned boats and target points of the unmanned boats according to the designed unknown environment;
2.2, evaluating the current collision risk by mutual speed barriers, feeding the result back to the near-end strategy optimization, and executing the action by the network pi and updating the position state and the action state to obtain a network parameter theta';
step 2.3, network π old Making a decision according to the environment to obtain a network parameter theta;
2.4, updating theta by theta 'through the KL divergence of theta and theta';
step 2.5, in the mutual velocity obstacle evaluation of the current collision risk, if the collision risk is detected, predicting the velocity state of the barrier at the next moment, and changing the velocity and the direction of the unmanned ship according to the state of the barrier at the next moment to enable the unmanned ship to avoid the barrier;
step 2.6, if the unmanned surface vehicle is farther away from the target point, feeding back a lower reward value, and adjusting the movement direction of the unmanned surface vehicle to approach the target point;
step 2.7, if the difference between the selected speed and the expected speed is large, feeding back a lower reward value, and adjusting the speed of the unmanned ship to approach the expected speed;
step 2.8, judging whether collision avoidance is finished or not, and if the collision avoidance is finished and a target point is reached, obtaining a basic collision avoidance route;
step 2.9, if the collision avoidance behavior is not finished, returning to the step 2.1, and continuing to iteratively update until a target point is reached;
and 2.10, training N times to obtain an optimal collision avoidance route algorithm, and finishing training to obtain a trained decision model.
5. The collision avoidance decision method for multiple unmanned boats according to claim 1, wherein the step 3 is specifically as follows: designing a test environment, and obtaining preliminary information according to the test environment and the current unmanned ship position state for making a decision at the next moment.
6. The collision avoidance decision method for multiple unmanned boats according to claim 2, wherein the step 4 is implemented in a manner that: ambient information is monitored and represented by a mutual velocity barrier vector.
7. The collision avoidance decision method for multiple unmanned boats according to claim 1, wherein the step 5 is implemented in a specific manner as follows: the GRU neural network processes the input information into the same dimension.
8. The collision avoidance decision-making method for multiple unmanned boats according to claim 2, wherein the step 6 is implemented in a manner that: the sensor of each unmanned boat needs to set a maximum detection range, and signals needing to be received are divided by the size, the current speed, the current heading and the collision avoidance radius of other boats in the detectable range; and after the prior information of the local environment is obtained, the local collision avoidance path planning is realized.
9. The collision avoidance decision method for multiple unmanned boats according to claim 2, wherein the step 7 is implemented in a manner that: and performing collision avoidance behavior, normal navigation or acceleration behavior according to the mutual velocity barrier algorithm evaluation.
10. The collision avoidance decision method for multiple unmanned boats according to claim 1, wherein the step 8 is implemented in a manner of: and feeding back the reward according to the distance between the current state of the unmanned ship and the target point, and guiding the decision-making behavior of the unmanned ship at the next moment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211480755.7A CN115718497A (en) | 2022-11-24 | 2022-11-24 | Multi-unmanned-boat collision avoidance decision method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211480755.7A CN115718497A (en) | 2022-11-24 | 2022-11-24 | Multi-unmanned-boat collision avoidance decision method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115718497A true CN115718497A (en) | 2023-02-28 |
Family
ID=85256246
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211480755.7A Pending CN115718497A (en) | 2022-11-24 | 2022-11-24 | Multi-unmanned-boat collision avoidance decision method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115718497A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117168468A (en) * | 2023-11-03 | 2023-12-05 | 安徽大学 | Multi-unmanned-ship deep reinforcement learning collaborative navigation method based on near-end strategy optimization |
-
2022
- 2022-11-24 CN CN202211480755.7A patent/CN115718497A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117168468A (en) * | 2023-11-03 | 2023-12-05 | 安徽大学 | Multi-unmanned-ship deep reinforcement learning collaborative navigation method based on near-end strategy optimization |
CN117168468B (en) * | 2023-11-03 | 2024-02-06 | 安徽大学 | Multi-unmanned-ship deep reinforcement learning collaborative navigation method based on near-end strategy optimization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108803321B (en) | Autonomous underwater vehicle track tracking control method based on deep reinforcement learning | |
Cheng et al. | Path planning and obstacle avoidance for AUV: A review | |
JP6854549B2 (en) | AUV action planning and motion control methods based on reinforcement learning | |
CN112034711B (en) | Unmanned ship sea wave interference resistance control method based on deep reinforcement learning | |
Lin et al. | An improved recurrent neural network for unmanned underwater vehicle online obstacle avoidance | |
Grigorescu et al. | Neurotrajectory: A neuroevolutionary approach to local state trajectory learning for autonomous vehicles | |
Cao et al. | Target search control of AUV in underwater environment with deep reinforcement learning | |
CN109765929B (en) | UUV real-time obstacle avoidance planning method based on improved RNN | |
CN112558612B (en) | Heterogeneous intelligent agent formation control method based on cloud model quantum genetic algorithm | |
CN113052372B (en) | Dynamic AUV tracking path planning method based on deep reinforcement learning | |
CN112925319B (en) | Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
CN111240345A (en) | Underwater robot trajectory tracking method based on double BP network reinforcement learning framework | |
CN111240356A (en) | Unmanned aerial vehicle cluster convergence method based on deep reinforcement learning | |
CN114967721B (en) | Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet | |
CN115718497A (en) | Multi-unmanned-boat collision avoidance decision method | |
Zhang et al. | Intelligent vector field histogram based collision avoidance method for auv | |
CN108459614B (en) | UUV real-time collision avoidance planning method based on CW-RNN network | |
CN114943168B (en) | Method and system for combining floating bridges on water | |
CN113050420B (en) | AUV path tracking method and system based on S-plane control and TD3 | |
Jose et al. | Navigating the Ocean with DRL: Path following for marine vessels | |
CN115480580A (en) | NMPC-based underwater robot path tracking and obstacle avoidance control method | |
CN116774576A (en) | Underwater vehicle dynamics black box modeling method based on neural network indirect estimation | |
Kang et al. | Fuzzy logic based behavior fusion for multi-AUV formation keeping in uncertain ocean environment | |
Sun et al. | Deep learning-based trajectory tracking control forunmanned surface vehicle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |