CN113671994A

CN113671994A - Multi-unmanned aerial vehicle and multi-unmanned ship inspection control system based on reinforcement learning

Info

Publication number: CN113671994A
Application number: CN202111020276.2A
Authority: CN
Inventors: 陈刚; 乔永龙
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2021-11-19
Anticipated expiration: 2041-09-01
Also published as: CN113671994B

Abstract

The invention relates to a multi-unmanned aerial vehicle and multi-unmanned ship inspection control system based on reinforcement learning, and belongs to the field of robot control. The system comprises a plurality of unmanned aerial vehicles and a plurality of unmanned ships; in the inspection process, the depth camera carried by the unmanned aerial vehicle is used for sensing the abnormal point, the unmanned aerial vehicle closest to the abnormal point is used as a pilot to perform information fusion on acquired data, and the pilot sends fused position information to the unmanned aerial vehicle which does not sense the abnormal point and performs limited time formation control based on event triggering. And as the laser radar carried by part of unmanned ships senses the abnormal point, the navigator fuses the sensing information of the laser radar and the depth camera again, so that all the unmanned ships are driven to cooperatively approach the abnormal point. The anomaly points are handled by actuators configured with drones or drones. The invention adopts the combination of heterogeneous multi-agents, and utilizes deep learning, reinforcement learning and information fusion technologies to carry out routing inspection operation, thereby directly realizing the monitoring and abnormal condition processing of the unmanned water area.

Description

Multi-unmanned aerial vehicle and multi-unmanned ship inspection control system based on reinforcement learning

Technical Field

The invention belongs to the field of robot control, and relates to a multi-unmanned aerial vehicle and multi-unmanned ship inspection control system based on reinforcement learning.

Background

For water area inspection, most of the prior inspection methods such as walking and ship-riding are adopted, so that the working intensity is high, the efficiency is low, the inspection methods are easily limited by conditions such as river and lake terrain environments, dead corners of inspection blind areas and incomplete problem discovery exist, and the like. The manual operation of a ship in an open water area is dangerous. The unmanned aerial vehicle is advanced in a water area inspection mode, an operator judges abnormality according to an image acquired by the unmanned aerial vehicle by remote control operation and a wireless image transmission technology, and then corresponding measures are taken. Although unmanned aerial vehicle has very strong flexibility, mobility, when can discovering actual problem fast, but unmanned aerial vehicle is because its self structure, or factor influence such as duration, and the actuating mechanism that can carry on is limited, and when going to solve the problem actually, the ability of its self handling problem is a bottleneck.

According to the problem that the manual work and the unmanned aerial vehicle work face together. Thus adopting the respective advantages of combining unmanned ship and unmanned plane simultaneously. The heterogeneous multi-agent system is built, so that the problems of insufficient execution capacity, low efficiency of manual working modes and the like of the unmanned aerial vehicle can be solved. Moreover, the unmanned aerial vehicle can also utilize the unmanned ship to serve as a relay station of the unmanned aerial vehicle, and batteries are replaced or a temporary parking place is parked. If the practical problem is met, the plurality of agents can work cooperatively. And the real unmanned operation is realized.

Disclosure of Invention

In view of this, the present invention provides a system for routing inspection and controlling multiple drones and multiple drones based on reinforcement learning.

In order to achieve the purpose, the invention provides the following technical scheme:

a multi-unmanned aerial vehicle and multi-unmanned ship inspection control system based on reinforcement learning comprises a plurality of unmanned aerial vehicles and a plurality of unmanned ships;

the unmanned aerial vehicles are provided with a D435i vision mechanism and a Jetson XavierNX computing platform;

the plurality of unmanned ships are provided with RPLIDAR-A3 laser radar and TX2 computing platforms;

the unmanned planes and the unmanned ships are provided with inertial measurement units IMU and GPS;

the unmanned planes and the unmanned ships are also provided with power batteries;

the D435i vision mechanism is in signal connection with a Jetson XavierNX computing platform;

the RPLIDAR-A3 lidar is in signal connection with a TX2 computing platform;

the inertial measurement unit IMU and the GPS are respectively in signal connection with a Jetson XavierNX computing platform and a TX2 computing platform;

the unmanned aerial vehicles and the multi-unmanned ship are also provided with actuating mechanisms;

when the unmanned aerial vehicles find abnormality, the unmanned aerial vehicle closest to the abnormal point serves as a navigator, the position information of an object is sensed through a D435i vision mechanism, position information fusion is carried out at the navigator, and the unmanned aerial vehicles are guided to approach the abnormal point;

When the unmanned ship moves, sensing abnormal points by using the carried RPLIDAR-A3 laser radar, carrying out information fusion on data acquired by the unmanned ship at a pilot, and sending fused position information to the unmanned ship by the unmanned ship so as to drive the unmanned ship to approach the abnormal points in a cooperative manner and process the abnormal points by using an executing mechanism configured by the unmanned ship or the unmanned ship.

Optionally, the actuating mechanism includes arm, water quality sampling instrument, backup battery and megaphone.

Optionally, a plurality of unmanned aerial vehicles and a plurality of unmanned ship carry out the waters and patrol and examine, when unmanned aerial vehicle discovery is unusual, the nearest unmanned aerial vehicle of departure anomaly will be as the pilot, through the positional information of D435i vision mechanism perception object, carries out positional information and fuses in pilot department, and the guide a plurality of unmanned aerial vehicles is close the anomaly and specifically is:

the unmanned aerial vehicle closest to the abnormal point serves as a pilot and serves as an information processing platform; if the unmanned aerial vehicles find the same target, position information is obtained by adopting a weighted average algorithm;

the pilot guides the rest unmanned aerial vehicles to approach the abnormal points;

as the unmanned ships approach and sense abnormal points, sending a plurality of unmanned ship laser radar data capable of identifying objects to a node serving as a navigator, fusing the received laser point cloud data and the vision mechanism data by the unmanned plane, and calculating the final positions of the abnormal points;

And guiding the ground unmanned ship to achieve the position approaching to the target abnormal point according to the information after the position fusion of the pilot, and then carrying out subsequent abnormal point processing.

Optionally, the Jetson Xavier NX computing platform constructs a water area inspection picture data set with a standard size, which contains a labeled training set and a labeled test set, and the ratio is 3: 1; the training data set is sent into a deep convolution neural network to learn and optimize internal structure weight;

performing targeted scoring on a target detection result, screening the detection result by using a non-maximum value inhibition method, selecting a detection frame with the highest confidence coefficient as a first output boundary frame, selecting other detection frames to calculate the overlapping rate with the first output boundary frame, if the overlapping rate is greater than a preset threshold value, discarding the detection frame, and if the overlapping rate is not greater than the preset threshold value, reserving the detection frame; continuously selecting a prediction frame with the highest reliability except the first output boundary frame, and repeating the steps until no detection frame remains, and the remaining detection frame is the target detection result in the image;

in the output result, each grid corresponds to 3 prior frames, and the prediction information of each prior frame comprises 4 frame position parameters, 1 target evaluation and 5 category predictions; the frame position parameters comprise center coordinates, width and height;

Calculating a loss function, and continuously adjusting model parameters by using a gradient descent method through back propagation to finally obtain an optimal network model;

inputting images in the test set, extracting target features by using a trained model, outputting a multi-scale prediction result, performing targeted scoring through a classifier, screening a detection result by using a non-maximum inhibition method, and finally obtaining an object identification result based on the deep convolutional neural network.

Optionally, if the multiple drones find the same target, combining the GPS and D435i to detect the positions (a) of multiple objects according to the uniqueness of the target in the world coordinate_i,b_i) Weighted average is carried out to obtain the final positioning position (a)_t,b_t) N is the number of the unmanned aerial vehicles which identify the abnormal points;

the method comprises the following steps that a plurality of unmanned aerial vehicles are subjected to limited time formation approach control based on event triggering, and an unmanned aerial vehicle dynamic model is established:

the unmanned aerial vehicle selects a four-rotor aircraft, and the specific form of the unmanned aerial vehicle after the dynamic model is established is as follows:

in the formula: x, y, z represent the position of the drone in space; phi, theta and psi represent a rolling angle, a pitching angle and a yaw angle; m represents the mass of the drone; i is_xx、I_yy、I_zzRepresenting moments of inertia about the x, y, z axes, respectively; l represents the distance between the motor shaft and the center of the machine body; g represents the gravitational acceleration; u. of ₁、u₂、u₃、u₄Represents drone control input, defined as:

wherein: b represents a lift coefficient; d represents a torque coefficient; omega₁、ω₂、ω₃、ω₄Respectively representing the rotational speed of

rotors

1, 2, 3, 4; u. of₁Represents the total lift perpendicular to the fuselage direction; u. of₂Representing a difference in lift that affects the pitch motion of the aircraft; u. of₃Representing lift differences that affect the rolling motion of the aircraft; u. of₄Representing a torque affecting yaw movement of the aircraft;

the cooperative processing capability is more important in the water area inspection, and the position and the attitude of the unmanned aerial vehicle are not controlled;

and (3) carrying out linearization processing on the model to obtain a second-order integrator model as follows:

wherein

u_i＝[u_xi,u_yi,u_zi]^TRespectively representing position, velocity, and control inputs; the matrix form is:

wherein

The position information fused by the unmanned aerial vehicles serving as a pilot is issued to the unmanned aerial vehicles to be formed, and then the unmanned aerial vehicles approach to an abnormal point under a limited time formation control protocol based on event triggering, so that subsequent operation is conveniently executed;

setting a formation form h in space as h ═ h₁,h₂,...,h_n]，

Order to

The formation problem is translated into a consistency problem as follows:

wherein i, j ∈ [1, n ]]And i ≠ j represents the drone number,

defining control input vectors

Obtaining a new system model:

when the formation states are consistent, the system forms corresponding formation control;

Order to

The vector of composition is

The vector of composition is

And defines an error vector:

the distribution based on event-driven finite-time sliding mode control is as follows:

wherein

α∈(0,1)，<*>^α＝|*|^α·sgn(*)，β₁,β₂,β₃,β₄> 0 is a parameter of the controller,

is an integral sliding mode surface, defined as follows:

S_i(t)＝[S_i1(t),S_i2(t),S_i3(t)]^Tis a three-dimensional column vector; each drone is designed with the following event trigger functions, wherein the ith is taken as an example:

where eta > 0 is a parameter for adjusting the event function, Delta_i(t) when the system is normally operated, the system is normally operated; delta_i(t) when the error vector is not less than 0, triggering an event, and resetting the error vector so as to regulate and control the updating of the control system;

wherein λ₂And a second small eigenvalue of a Laplace matrix of the undirected communication topological graph formed by the plurality of unmanned aerial vehicles.

Optionally, when the D435i vision mechanism and the RPLIDAR-A3 laser radar cooperate to perform 3D target detection and positioning, a depth convolution neural network is used to detect a two-dimensional object region of an outlier in an RGB image and classify an object. And determining abnormal points to be processed according to the category library. And obtaining a view frustum (a near plane and a far plane are specified by a depth sensor range) of a 3D search space of the abnormal point object corresponding to the 2D bounding box by using a known depth camera projection relation and combining laser radar point cloud data. All points within the viewing frustum form a viewing frustum point cloud.

The principle of nearest neighbor clustering is based on the continuity of the surface of the same object, wherein the continuity is that the reflection point of the object is a continuous point set; combining the formed view cone, wherein point cloud formed by the abnormal point depth map is used as a reference point, and segmenting the 3D point cloud data in the view cone; obtaining 3D point cloud data of abnormal points and estimating the central position of the point cloud

Estimating the real center of the whole object by using a T-Net network, and then converting coordinates to enable the prediction center to be an origin; and correcting the center of the bounding box by adopting a residual error method. Calculating the final object center by combining the center residual error of the network prediction estimated by the bounding box, the previous center residual error from the T-Net and the centroid obtained by the nearest neighbor clustering algorithm;

for the point cloud of the selected object in the 3D point cloud data, a bounding box is predicted by adopting a bounding box estimation network, and parameters of the three-dimensional bounding box are output; i.e. the bounding box center (c)_x,c_y,c_z) Magnitude (h, w, l), yaw angle θ; the total loss of optimization for both networks is:

L_{general assembly}＝λ(L_c1-reg+L_c2-reg+L_h-cls+L_h-reg+L_s-cls+L_s-reg+γL_corner)

Wherein: l is_c1-regAnd L_c2-regRespectively estimating the loss generated by a network judgment center for the coordinate translation loss and the boundary box of the T-Net; λ, γ are model parameters; l is_h-clsAnd L_h-regCategory losses and regression losses for the estimated 3D bounding box corresponding orientations, respectively; l is _s-clsAnd L_s-regClass loss and regression loss representing bounding box size, respectively; the Softmax method is used in the category determination process, and the smooth-l method is used in the regression problem₁Loss; since a bounding box information is determined by both size and angle, L_cornerThe angular loss quantifies this, and the formula is:

where NS represents the number of bounding boxes of different sizes and NH represents the bounding boxes of different orientations. And according to the finally obtained position of the 3D detection frame relative to the camera, combining the coordinate change to obtain the position of the object in the world coordinate system.

And establishing a dynamic model of the unmanned ship.

The invention has the beneficial effects that: the combination form of heterogeneous multi-agent is adopted, deep learning, reinforcement learning and information fusion technologies are applied to carry out routing inspection operation, and unmanned water area monitoring and abnormal condition processing can be directly realized. Not only reduces the operation risk of personnel in the water area, but also greatly improves the working efficiency.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a schematic diagram of routing inspection by cooperation of multiple unmanned planes and multiple unmanned ships;

FIG. 2 is a deep convolutional neural network structure;

FIG. 3 is an example of a conventional convolution process;

FIG. 4 is an example of a depth separable convolution process;

fig. 5 illustrates cooperative positioning of multiple drones;

fig. 6 is a schematic diagram of multi-drone collaboration;

FIG. 7 is laser radar and depth camera information fusion 3D target detection;

FIG. 8 is a 3D point cloud segmentation nearest neighbor clustering algorithm;

FIG. 9 is a center residual estimation T-Net network;

FIG. 10 is a bounding box evaluation network PointNet;

FIG. 11 is a reference motion coordinate system and motion variables for the motion of the unmanned ship;

FIG. 12 is a diagram of a heterogeneous multi-agent system distributed observer;

FIG. 13 is an actor neural network;

FIG. 14 is a review family neural network;

FIG. 15 is a graph of reinforcement learning algorithm solving for output synchronization control;

FIG. 16 is a depth separable convolution Conv2D structure;

FIG. 17 shows a standard Conv2D structure;

FIG. 18 is a residual base unit structure;

FIG. 19 is a combined residual block structure;

FIG. 20 is a binocular vision measurement principle;

FIG. 21 illustrates the laser radar triangulation ranging principle;

Fig. 22 is a flow chart of the operation of the multiple drone and drone system.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

The constructed multi-unmanned aerial vehicle and multi-unmanned ship system comprises the following steps:

1. multiple drones are each configured with vision D435i and a computing platform Jetson XavierNX.

2. Multiple drones are each configured with a laser radar RPLIDAR-a3 and a computing platform TX 2.

3. Sensors such as a GPS (global positioning system), an IMU (inertial measurement unit) and the like commonly used by unmanned planes and unmanned ships are all carried and configured, and can be added or deleted according to a specific scene.

4. Unmanned aerial vehicles and unmanned ships carry corresponding actuating mechanisms according to scene needs. Such as mechanical arms, water quality sampling instruments, megaphones, power batteries and the like.

II, working flows of multiple unmanned aerial vehicles and multiple unmanned ships:

the system has the working flow as follows: firstly, a plurality of (or single) unmanned aerial vehicles and a plurality of unmanned ships carry out water area patrol, and the unmanned aerial vehicles find abnormality firstly because the visual patrol range of the unmanned aerial vehicles is wider and the speed is higher, and the unmanned aerial vehicles closest to the abnormal points serve as pilots. According to the position information of the object sensed by the D435i depth camera carried by the unmanned aerial vehicle, position information fusion is carried out at a pilot, and the unmanned aerial vehicles are guided to approach abnormal points. Along with the movement of the unmanned ship, multiple (or single) unmanned ships sense abnormal points by using the carried RPLIDAR-A3 laser radar, information fusion is carried out on data collected by the unmanned ships at a pilot, and the unmanned ships send the fused position information to the unmanned ships, so that the unmanned ships are driven to cooperatively approach the abnormal points. And finally, carrying out abnormal management by using an executing mechanism configured by the unmanned aerial vehicle or the unmanned ship.

Fig. 1 is a schematic diagram of routing inspection cooperatively performed by multiple unmanned aerial vehicles and multiple unmanned ships, and the process is as follows:

the first step is as follows: the unmanned aerial vehicle closest to the abnormal point serves as a pilot and serves as an information processing station. If a plurality of unmanned aerial vehicles find the same target, more accurate position information is obtained by adopting a weighted average algorithm.

The second step is that: the unmanned aerial vehicle as the pilot guides the other unmanned aerial vehicles to approach the abnormal point.

The third step: as the multiple unmanned ships approach and perceive the abnormal points, the laser radar data of the multiple unmanned ships capable of recognizing the objects are sent to the unmanned aerial vehicle nodes serving as pilots, and the unmanned aerial vehicle fuses the received laser point cloud data and the depth camera data to calculate the final positions of the abnormal points.

The fourth step: and guiding the ground unmanned ship to achieve the position approaching to the target abnormal point according to the information after the position fusion of the pilot, and then performing subsequent abnormal point management operation.

Many unmanned aerial vehicles and many unmanned aerial vehicle system are explained in detail:

the first step is as follows: depth camera D435i perceives anomalous objects

By combining an actual application scene and the real-time processing capability of an unmanned aerial vehicle computing platform (Jetson Xavier NX), the following deep convolution-based neural network learning structure is designed to perform an object detection algorithm and achieve a multi-classification target. And sequencing the danger degrees of the objects in the database class, and processing according to the emergency degree of the abnormal class during each actual tracking.

FIG. 2 is a deep convolutional neural network structure, in which:

1. the parameter calculation amount is reduced by adopting a depth separable convolution technology in a feature extraction part, and the Jetson Xavier NX calculation force can meet the calculation requirement.

2. In addition, during inspection of a water area, 104 × 104 scales are specially designed for target detection in order to enhance the detection accuracy of small-scale objects such as sewage image colors during inspection of pollution sources.

As shown in fig. 3 and 4, the overall object recognition process is summarized as follows:

1. firstly, a water area patrol inspection picture data set with a standard size needs to be constructed, and the situation that the data set is insufficient in some aspects can be made up by using a data enhancement technology. Containing the labeled training set and test set, in a ratio of 3: 1. and (5) sending the training data set into a deep convolution neural network to learn and optimize internal structure weight.

2. Performing targeted scoring in the target detection result, screening the detection result by using a non-maximum value inhibition method, selecting the detection frame with the highest confidence coefficient as a first output boundary frame, selecting other detection frames to calculate the overlapping rate with the first output boundary frame, if the overlapping rate is greater than a preset threshold value, discarding the detection frame, and otherwise, reserving the detection frame; and continuously selecting the prediction frame with the highest external reliability except the first output boundary frame, and repeating the steps until no detection frames remain, wherein the target detection result in the image is remained.

3. In the output result, each grid corresponds to 3 prior frames, and the prediction information of each prior frame comprises 4 frame position parameters (center coordinates and width and height), 1 target evaluation and 5 category predictions (categories can be added according to actual conditions).

4. And calculating a loss function, and continuously adjusting model parameters by using a gradient descent method through back propagation to finally obtain the optimal network model.

5. Inputting images in the test set, extracting target features by using a trained model, outputting a multi-scale prediction result, performing targeted scoring through a classifier, screening a detection result by using a non-maximum inhibition method, and finally obtaining an object identification result based on the deep convolutional neural network. If the obtained test effect is better, the method can be put into practical use.

The second step is that: multiple D435i, GPS cooperative object location

Multi-drone vision (D435i) cooperative positioning

When a plurality of unmanned aerial vehicles sense abnormal objects, the invention combines the functions of the unmanned aerial vehicles according to the uniqueness of the objects under the world coordinatesGPS, D435i for multiple perceived object positions (a)_i,b_i) Weighted average is carried out to obtain the final positioning position (a)_t,b_t) And n is the number of the unmanned aerial vehicles for identifying the abnormal points. As shown in fig. 5, the principle of depth camera position perception is shown in the appendix.

The third step: the method comprises the following steps of (1) performing finite time formation approach control on multiple unmanned aerial vehicles based on event triggering, as shown in fig. 6, establishing an unmanned aerial vehicle dynamic model:

In the formula: x, y, z represent the position of the drone in space; phi, theta and psi represent a rolling angle, a pitching angle and a yaw angle; m represents the mass of the drone; i is_xx、I_yy、I_zzRepresenting moments of inertia about the x, y, z axes, respectively; l represents the distance between the motor shaft and the center of the machine body; g represents the gravitational acceleration; u. of₁、u₂、u₃、u₄Represents drone control input, defined as:

rotors

1, 2, 3, 4; u. of₁Represents the total lift perpendicular to the fuselage direction; u. of₂Representing a difference in lift that affects the pitch motion of the aircraft; u. of₃Representing lift differences that affect the rolling motion of the aircraft; u. of₄Representing the torque that affects the yaw motion of the aircraft.

The cooperative processing capability is more important in the water area inspection, and the unmanned aerial vehicle is not used for controlling the position and the attitude of the unmanned aerial vehicle. All unmanned aerial vehicles participating in the inspection tour ignore the dynamic process of attitude control based on the situation in the patent, and only consider the positions of the unmanned aerial vehicles.

The model is linearized and simplified to obtain the following second-order integrator model:

wherein

u_i＝[u_xi,u_yi,u_zi]^TRespectively representing position, velocity, and control inputs. (4) The formula can also be expressed in matrix form as follows:

wherein

The position information fused by the unmanned aerial vehicles serving as the pilot is issued to the unmanned aerial vehicles to be formed, and then the unmanned aerial vehicles approach to abnormal points under a limited time formation control protocol based on event triggering, so that subsequent operation is conveniently executed.

Setting a formation form h in space as h ═ h₁,h₂,...,h_n]，

Order to

The formation problem is translated into a consistency problem as follows:

wherein i, j ∈ [1, n ]]And i ≠ j represents the drone number,

defining control input vectors

The following new system model was obtained:

when the forming states of the formula (7) are consistent, the formula (4) system forms corresponding formation control.

Order to

The vector of composition is

The vector of composition is

And defines an error vector:

designing distributed event-driven finite-time sliding mode control as follows:

wherein

is an integral sliding mode surface, defined as follows:

S_i(t)＝[S_i1(t),S_i2(t),S_i3(t)]^Tis a three-dimensional column vector. Each drone is designed with the following event trigger functions, wherein the ith is taken as an example:

where eta > 0 is a parameter for adjusting the event function, Delta_i(t) when the system is normally operated, the system is normally operated; delta_iAnd (t) is more than or equal to 0, the event is triggered, and the error vector is cleared, so that the updating of the control system is regulated and controlled.

Therefore, the system (7) adopts the controller protocol designed by the controller (9), and based on the event trigger function (11), when the parameters of the controller and the event function meet the following conditions, the state of the system (7) can be agreed within a limited time, namely, the formation control of a plurality of unmanned aerial vehicles on the approach of abnormal points is completed.

Wherein λ₂La of undirected communication topological graph formed by multiple unmanned aerial vehiclesThe second smallest eigenvalue of the place matrix.

The fourth step: d435i and RPLIDAR-A3 cooperate to perform 3D object detection localization

And at the moment, the 3D target detection of the abnormal point is carried out by fusing the object depth and the anchor frame data sensed by the depth camera and the laser point cloud data sensed after the unmanned ship approaches. The invention adopts 3D abnormal point detection, which can greatly improve the accurate processing of the subsequent actuating mechanism operation, as shown in figure 7.

Table 1 fusion algorithm flow:

as shown in table 1, the details of the fusion algorithm are:

Step1：

the depth convolution neural network designed by the patent is utilized to detect the two-dimensional object area of the abnormal point in the RGB image and classify the object. And determining abnormal points to be processed according to the category library. And (3) obtaining a view frustum (a near plane and a far plane specified by the range of the depth sensor) of the 3D search space of the abnormal point object corresponding to the 2D bounding box by using a known depth camera projection matrix and combining laser radar point cloud data. All points within the viewing frustum form a viewing frustum point cloud.

Step2：

The principle of nearest neighbor clustering is based on the continuity of the surface of the same object, i.e. the reflection point of the object will be a continuous set of points. And (4) dividing the 3D point cloud data in the view cone by combining the view cone formed in Step1, wherein the point cloud formed by the abnormal point depth map is used as a reference point. Obtaining 3D point cloud data of abnormal points and estimating the central position of the point cloud

Fig. 8 is a 3D point cloud segmentation nearest neighbor clustering algorithm.

Step3：

The real center of the whole object is estimated by using a T-Net network, and then coordinates are converted so that the predicted center becomes the origin. And (3) correcting the center of the boundary box by adopting a residual error method, and combining the center residual error predicted by the network estimation of the boundary box, the previous center residual error from the T-Net and the centroid obtained by a nearest neighbor clustering algorithm to calculate the final object center, wherein the expression (14) is as follows. And for the point cloud of the selected object in the 3D point cloud data, predicting a boundary box by adopting a boundary box estimation network, and outputting parameters of the three-dimensional boundary box. I.e. the bounding box center (c)_x,c_y,c_z) Magnitude (h, w, l), deflection angle θ. The optimized total loss of the two networks is as follows (15).

L_{General assembly}＝λ(L_c1-reg+L_c2-reg+L_h-cls+L_h-reg+L_s-cls+L_s-reg+γL_corner) (15)

Wherein: l is_c1-regAnd L_c2-regThe penalty incurred by the discrimination center of the network is estimated for the left translation penalty of T-Net and the bounding box, respectively. L is_h-clsAnd L_h-regThe class loss and the regression loss, respectively, for estimating the corresponding orientation of the 3D bounding box. L is_s-clsAnd L_s-regClass penalty and regression penalty for bounding box size, respectively. The Softmax method is used in the category determination process, and the smooth-l method is used in the regression problem₁And (4) loss. Since a bounding box information is determined by both size and angle, L _cornerThe angular loss was quantified by the following calculation formula (16), λ, γ being model parameters, T-net and bounding box estimation network structure fig. 9 and 10.

Step4：

And according to the finally obtained position of the 3D detection frame relative to the camera, combining the coordinate change to obtain the position of the object in the world coordinate system.

The fifth step: event trigger-based reinforcement learning unmanned aerial vehicle and multi-unmanned ship cooperative control

The heterogeneous multi-agent cooperative control design is carried out by taking a plurality of unmanned ships in the water area as Follower and taking unmanned planes in the air as pilots.

Establishing a dynamic model of the unmanned ship:

the unmanned ship can be regarded as a rigid body, adopts a rectangular coordinate system fixed on a ship body, and has 6 degrees of freedom, namely, the linear motion along 3 axes is respectively forward direction, rolling and pitching, and the rotary motion around the 3 axes is respectively rolling, pitching and yawing. The degrees of freedom have coupling effect, and in the course control research of the unmanned ship on the water surface, the coupling effect is very small and can be ignored, and only the plane motion is considered. The motion of the unmanned ship is a complex six-degree-of-freedom motion, and two coordinate systems are defined for research convenience. I.e. inertial coordinate system O with inspection station as origin of coordinates ₀X₀Y₀Z₀And an attached body coordinate system Oxyz taking the gravity center of the ship body as a coordinate origin.

The most important factors are considered, where ω is 0, p is 0, and q is 0, and at this time, the six-degree-of-freedom motion of the unmanned boat can be simplified into three-degree-of-freedom motion (forward speed along the X-axis direction is u, traverse speed along the Y-axis direction is v, and yaw rate r of rotation around the Z-axis). The transformation relationship of the motion in the two coordinate systems is shown in formula (17). The variables are defined as shown in table 2.

TABLE 2 definition of variables

Fig. 11 is a reference motion coordinate system and motion variables for the motion of the unmanned ship.

Setting the control state of the unmanned ship to [ x y psi]^TThen, equation (17) is simplified to the following state space expression form:

wherein x_i(t),u_i(t),y_i(t) represents the state input and output of the unmanned ship, respectively, f_i,g_i,C_iRepresenting the internal dynamics, the input dynamics and the output dynamics of the unmanned ship, respectively. The model is subjected to linearization processing in the patent, a model-free reinforcement learning mode is adopted for hull control, model parameters are only applied in event triggering conditions, sampling time is very small, and the influence on actual application is not large. The linearized form is as follows:

1. designing a distributed observer of a pilot:

in a communication topology consisting of a pilot (unmanned aerial vehicle) and a Follower (unmanned ship), the unmanned ship is considered

A temporary increase may be faced in that only a portion of the unmanned vessels have access to the leader's information, and a distributed observer will be designed for each unmanned vessel in this patent to estimate the leader's state and output. Therefore, the unmanned ship can know the state and the output of the unmanned plane and is used for consistency control of a subsequent heterogeneous intelligent system. According to the formula (5) in the third step.

The following distributed observer form is designed:

in which ξ_i，η_i，

Respectively the state, control input and output of the ith observer. The structure of the observation structure is shown in figure 12,

is the actual position, xi_iA position is estimated for the observer.

Control signal eta_iIs designed as follows:

η_i＝c₁Fz_i+c₂σ(Fz_i) (21)

function σ (·):

and z_iThe definition is as follows:

the distributed observer parameters are:

wherein P is a positive definite matrix and satisfies the following linear matrix inequality

Is the upper bound of the control input.

The distributed observers (20) and (21) thus designed are in c₁,c₂The parameter F satisfies the above condition, (A)₀,B₀) Is calmable, A₀In the case of all eigenvalues on the imaginary axis. When lim_t→∞Then z is_i(t) → 0. The output of all unmanned ships is ensured to be consistent with the output of a pilot.

2. Designing event-triggered based controllers

Since continuous sampling control is not necessary in the tracking control process approaching the abnormal point, wireless communication resources are very tight in a scene of multiple drones and multiple drones. Therefore, the control algorithm of the patent adopts the idea based on event triggering. An observer-based unmanned ship augmented state defined as follows:

Combining (19) (20) the augmented dynamics system of the unmanned ship:

wherein:

design controller u_iLet e_iTowards 0, then z_iTending to 0. (25) It becomes the following form and defines the performance function as follows.

Let the augmented state of two consecutive events be expressed as follows:

wherein

Is a sequence of sampling instants at which the ith drone monotonically increases. The sampling error of the ith unmanned ship is as follows:

the following event-triggered controllers are designed:

designing an event trigger function:

wherein:

P_iis a positive definite symmetric matrix and satisfies the algebraic Riccati equation

α_i,β_iThe right side of equation (30) is guaranteed to be greater than 0 for adjusting the parameters.

3. And a reinforcement learning algorithm is applied to realize the cooperative consistency of the unmanned ships and the unmanned planes.

Combining the augmented state (26) and the performance function (27), the performance function becomes of the form:

wherein: q_iT＝[C_i -C₀]^TQ_i[C_i -C₀]

Differentiating (31) yields the Hamiltonian defined as:

(26) formula (iv) represents another form as follows:

wherein: u. ofⁱIn order to explore the strategy, the method comprises the following steps,

is the v iteration strategy. According to the optimal control theory, the update control at v +1 times is as follows:

from (32) to (34) can be obtained:

designing an Actor network to approximate an updated control strategy

Critic network approximation function V_i ^v。

As shown in fig. 13 and 14, Actor NN inputs the actual state deviation and outputs the drive Follower motion.

Critic NN inputs actual state deviations and outputs evaluation values corresponding to the states.

Wherein

In order to be the basis function(s),

weight vector,/₁And l₂The number of neurons. Definition of

R_i＝diag(r₁,...,r_m). On both sides of formula (35), multiply

And integrating and combining the integration with the formula (36) to obtain the reinforcement learning algorithm of the offline strategy.

Wherein omega_i(t) is the Bellman approximation error,

is as follows

The ith column weight value of (1). Using least square method to minimize Bellman error

And V_i ^v. A specific algorithm flow diagram is shown in fig. 15.

And finally, the multiple unmanned ships approach abnormal points according to the unmanned aerial vehicle tracks. After the unmanned ship arrives, mechanisms such as a broadcaster, a mechanical arm and a detection device arranged on the unmanned ship are driven to perform further abnormal point treatment.

The unmanned ship can be loaded with a large-capacity power device and a large-scale precision instrument, has strong abnormal point processing capability, but is slow in moving speed and small in visual field range, and the action capability is greatly limited in an area with dense obstacles; compared with the prior art, the unmanned aerial vehicle has higher moving speed and space flexibility, does not need to consider the complex obstacle environment on the ground in the moving process, and has a strong visual field advantage in high altitude. Therefore, by combining the advantages of the unmanned aerial vehicle and the navigation device, the unmanned aerial vehicle is used as visual data acquired by a pilot and position information of the unmanned aerial vehicle, and is received by the unmanned ship under the communication topology. If an emergency situation occurs, the multiple unmanned aerial vehicles and the multiple unmanned ships cooperatively develop related work to handle the abnormal situation.

The heterogeneous system can process the following water area inspection operation (and can bring the unconsidered abnormal conditions into the collected data set, and can widen the application range):

1. search and rescue for drowned person

Such as: and (4) carrying out events such as ship turning, drowning and the like to develop the cooperative rescue.

2. Monitoring illegal ship mined water resources

Such as: the fish are illegally harvested, and natural resources in the water area are exploited.

3. Patrol for preventing and controlling water pollution

Such as: patrol and go into water area department drain and set up the condition, have the pollution sources in newly-increased water territory, if: pollution of industrial and mining enterprises, urban domestic sewage, ship oil pollution leakage, livestock and poultry breeding pollution and the like.

4. Disposal of garbage in water area

Such as: garbage, floating objects, sundries and the like on the surface of the river.

5. Fixed point sampling inspection of water quality

Such as: the water quality spot check of the water area part is mainly protected.

FIG. 16 is a depth separable convolution Conv2D structure; FIG. 17 shows a standard Conv2D structure; FIG. 18 is a residual base unit structure; FIG. 19 is a combined residual block structure; FIG. 20 is a binocular vision measurement principle;

assuming that the distance between the left camera and the right camera is determined as L (intrinsic parameter of the camera), two angles of < SAB and < SBA can be obtained through binocular vision measurement, and at the moment, the triangle SAB is determined. The distance of the object relative to the camera can be obtained by the following calculation. Namely, the distance information of the water area target, the calculation method is as follows:

And finally, the position of the object relative to the world coordinate system can be obtained through coordinate conversion.

FIG. 21 illustrates the laser radar triangulation ranging principle; laser radar:

the positions of light spots formed by the emergent laser on the sensor for imaging are also different for objects positioned at different distances; on the other hand, the internal structure of the ranging module is fixed, and the focal length f of the receiving lens and the offset (i.e. the baseline distance) L between the optical axis of the transmitting light path and the main optical axis of the receiving lens are known. According to the similarity relation of the triangles, the distance D of the object can be calculated as follows:

The following advantageous effects are expected by the implementation of the present invention:

firstly, the existing water area inspection is basically carried out in a manual mode, and the adoption of the system can greatly save manpower, realize unmanned operation and reduce the possible risk of personnel in the water area;

the plurality of unmanned aerial vehicles and unmanned ships are used for carrying out operation cooperatively, so that the capacity of processing abnormal work can be greatly improved, and the situation that the working capacity of a single intelligent body is insufficient is avoided;

and thirdly, a new depth convolution neural network is designed by combining a depth camera of the unmanned aerial vehicle, the data operation amount is reduced by applying a depth separable convolution method and multi-scale output prediction, and the processing can be realized on a Jetson Xavier NX embedded platform. The inspection abnormity in the water area can be accurately sensed.

And fourthly, the cooperative working capacity of the unmanned aerial vehicles and the unmanned ship system can be improved by the formation control design of the positions of the unmanned aerial vehicles around the abnormal points.

And fifthly, by combining the vision of the unmanned aerial vehicle and sensors such as unmanned ship laser radar and GPS, a new multi-information fusion algorithm is designed to realize 3D detection and positioning on abnormal objects, so that the convenience of subsequent execution mechanism operation can be greatly improved.

In a communication architecture formed by multiple unmanned planes and multiple unmanned ships, the unmanned ships can realize the output cooperation with the unmanned planes under the condition that the unmanned planes as pilots are dynamic.

And seventhly, combining event triggering and reinforcement learning ideas, and designing an Actor-Critic neural network framework to realize cooperative optimal output regulation of the unmanned aerial vehicle and the unmanned ship multi-agent system.

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. Many unmanned aerial vehicles and many unmanned ships patrol and examine control system based on reinforcement study, its characterized in that: the unmanned aerial vehicle comprises a plurality of unmanned aerial vehicles and a plurality of unmanned ships;

the unmanned aerial vehicles are provided with a D435i vision mechanism and a Jetson Xavier NX computing platform;

the D435i vision mechanism is in signal connection with a Jetson Xavier NX computing platform;

the RPLIDAR-A3 lidar is in signal connection with a TX2 computing platform;

the inertial measurement unit IMU and the GPS are respectively in signal connection with a Jetson Xavier NX computing platform and a TX2 computing platform;

2. The reinforcement learning-based multi-unmanned aerial vehicle and multi-unmanned ship inspection control system according to claim 1, wherein: the actuating mechanism comprises a mechanical arm, a water quality sampling instrument, a standby battery and a megaphone.

3. The reinforcement learning-based multi-unmanned aerial vehicle and multi-unmanned ship inspection control system according to claim 1, wherein: a plurality of unmanned aerial vehicle and a plurality of unmanned ship carry out the waters and patrol and examine, when unmanned aerial vehicle discovery is unusual, the nearest unmanned aerial vehicle of departure anomaly will be as the pilot, through the positional information of D435i vision mechanism perception object, carries out positional information and fuses in pilot department, and the guide of a plurality of unmanned aerial vehicle approaches the anomaly and specifically does:

and guiding the ground unmanned ship to approach the target abnormal point according to the information fused with the position of the pilot, and then carrying out subsequent abnormal point processing.

4. The reinforcement learning-based multi-unmanned aerial vehicle and multi-unmanned ship inspection control system according to claim 3, wherein: the Jetson XavierNX computing platform constructs a water area patrol inspection picture data set with standard size, which contains a marked training set and a marked testing set, and the proportion is 3: 1; the training data set is sent into a deep convolution neural network to learn and optimize internal structure weight;

5. The method of claim 4 based onControl system is patrolled and examined to many unmanned aerial vehicles and many unmanned ships of reinforcement study, its characterized in that: if the unmanned aerial vehicles find the same target, combining GPS and D435i to sense the positions (a) of the objects according to the uniqueness of the target in the world coordinate_i,b_i) Weighted average is carried out to obtain the final positioning position (a)_t,b_t) N is the number of the unmanned aerial vehicles which identify the abnormal points;

wherein: b represents a lift coefficient; d represents a torque coefficient; omega₁、ω₂、ω₃、ω₄Respectively representing the rotational speed of rotors 1, 2, 3, 4; u. of₁Represents the total lift perpendicular to the fuselage direction; u. of₂Representing a difference in lift that affects the pitch motion of the aircraft; u. of₃Representing lift differences that affect the rolling motion of the aircraft; u. of₄Representing a torque affecting yaw movement of the aircraft;

wherein

wherein

Setting a formation form h in space as h ═ h₁,h₂,...,h_n]，

Order to

The formation problem is translated into a consistency problem as follows:

wherein i, j ∈ [1, n ]]And i ≠ j represents the drone number,

defining control input vectors

Obtaining a new system model:

order to

The vector of composition is

The vector of composition is

And defines an error vector:

the finite time sliding mode controller based on distributed event driving is designed as follows:

wherein

is an integral sliding mode surface, defined as follows:

6. The reinforcement learning-based multi-unmanned aerial vehicle and multi-unmanned ship inspection control system according to claim 5, wherein: when the D435i vision mechanism and the RPLIDAR-A3 laser radar cooperate to detect and locate A3D target, a depth convolution neural network is used for detecting a two-dimensional object area of an abnormal point in an RGB image and classifying objects; the anomaly points to be processed are determined from the category library. And obtaining a view frustum (a near plane and a far plane are specified by a depth sensor range) of a 3D search space of the abnormal point object corresponding to the 2D bounding box by using a known depth camera projection relation and combining laser radar point cloud data. All points within the viewing frustum form a viewing frustum point cloud.

Estimating the real center of the whole object by using a T-Net network, and then converting coordinates to enable the prediction center to be an origin; correcting the center of the boundary box by adopting a residual error method, and calculating the final object center by combining the center residual error of the estimated network of the boundary box, the previous center residual error from T-Net and the centroid obtained by a nearest neighbor clustering algorithm;

for the point clouds of selected objects in the 3D point cloud data, the bounding box is predicted by using a bounding box estimation network, and the output isParameters of the three-dimensional bounding box; i.e. the bounding box center (c)_x,c_y,c_z) Magnitude (h, w, l), yaw angle θ; the total loss of optimization for both networks is:

Wherein: l is_c1-regAnd L_c2-regRespectively estimating the coordinate translation loss of the T-Net and the loss generated by a judgment center of the network for the bounding box; λ, γ are model parameters; l is_h-clsAnd L_h-regRespectively estimating the category loss and the regression loss of the corresponding orientation of the 3D bounding box; l is _s-clsAnd L_s-regClass losses and regression losses representing box sizes, respectively; the Softmax method is used in the category determination process, and the smooth-l method is used in the regression problem₁Loss; since a bounding box information is determined by both size and angle, L_cornerThe angular loss quantifies this, and the formula is:

And establishing a dynamic model of the unmanned ship.