Disclosure of Invention
The invention aims to provide a method and a system for detecting and positioning an unmanned aerial vehicle-mounted small target under a complex background, which aim to solve the problem that in the prior art, an object is difficult to separate from the background under the complex background, so that the positioning accuracy is low; the imaging effect of the remote target is poor, the number of pixel points occupied by the target in the image is small, the target is difficult to be accurately identified, and the accurate detection and positioning can not be realized.
To solve the above technical problem, an embodiment of the present invention provides the following solutions:
on the one hand, the method for detecting and positioning the small unmanned aerial vehicle-mounted target under the complex background comprises the following steps:
acquiring a video image of a small target and attitude data of an unmanned aerial vehicle and a cloud deck;
preprocessing the acquired video image by a multichannel optimal adaptive guided filtering defogging algorithm;
detecting the preprocessed video image by using a finite pixel target space-semantic information fusion detection algorithm to obtain the pixel position of a small target;
performing associated mapping on the pixel positions of the small targets in the video images and the geographic coordinates of the unmanned aerial vehicle by using a spatial multi-parameter pixel mapping geographic coordinate positioning algorithm to obtain the longitude, latitude and height of the small targets in a geographic coordinate system;
and displaying the small target in the unmanned aerial vehicle-mounted video image and the longitude and latitude and the height of the small target under the geographic coordinate system in real time.
Preferably, the preprocessing the acquired video image by the multichannel preferably adaptive guided filtering defogging algorithm specifically includes:
and carrying out channel segmentation on the obtained video image, wherein the channel segmentation comprises the following steps: dark channel images, bright channel images and RGB three-channel images;
estimating a local atmospheric value according to the dark channel image and the bright channel image;
carrying out weighted guided filtering on the RGB three-channel image, and then estimating local transmittance;
and obtaining a defogged image according to the estimated local atmospheric value and local transmissivity by combining a fog interference model.
Preferably, the detecting the preprocessed image by using the finite pixel target space-semantic information fusion detection algorithm specifically includes:
constructing a space-semantic feature extraction network, extracting multilayer space feature information by pooling and normalizing the region of interest, and extracting semantic feature information by cyclic convolution;
and performing regression classification on the bounding box of the target through two fully-connected layers to obtain a detection result of the video image.
Preferably, the mapping of the pixel position of the small target in the video image and the geographic coordinate of the unmanned aerial vehicle by using the spatial multi-parameter pixel mapping geographic coordinate positioning algorithm specifically includes:
obtaining the pixel position of the small target in the video image according to the detection result of the video image;
the height of the unmanned aerial vehicle at the ground position is used as the geodetic height, the pixel position of the small target in the video image and the geographical coordinates of the unmanned aerial vehicle are mapped, and the longitude and latitude of the small target in a geographical coordinate system are solved;
and determining the geodetic height of the small target according to the calculated longitude and latitude, comparing the obtained geodetic height of the small target with the height of the unmanned aerial vehicle at the ground position, correcting, and determining the longitude and latitude and the height of the small target.
In one aspect, a system for detecting and positioning a small unmanned aerial vehicle-mounted target under a complex background is provided, which includes:
the target vision enhancement subsystem is used for acquiring a video image of a small target and attitude data of the unmanned aerial vehicle and the tripod head, and preprocessing the acquired video image through a multichannel optimal adaptive guided filtering defogging algorithm;
the deep learning airborne detection positioning subsystem is used for detecting the preprocessed video image by utilizing a limited pixel target space-semantic information fusion detection algorithm to obtain the pixel position of a small target; the pixel position of the small target in the video image is associated with the geographic coordinate of the unmanned aerial vehicle by utilizing a spatial multi-parameter pixel mapping geographic coordinate positioning algorithm, so that the longitude, the latitude and the height of the small target in a geographic coordinate system are obtained;
and the data return and ground station subsystem is used for transmitting and displaying the small target in the unmanned aerial vehicle video image and the longitude and latitude and the height of the small target under the geographic coordinate system in real time.
Preferably, the target visual enhancer system is specifically for:
and carrying out channel segmentation on the obtained video image, wherein the channel segmentation comprises the following steps: dark channel images, bright channel images and RGB three-channel images;
estimating a local atmospheric value according to the dark channel image and the bright channel image;
carrying out weighted guided filtering on the RGB three-channel image, and then estimating local transmittance;
and obtaining a defogged image according to the estimated local atmospheric value and local transmissivity by combining a fog interference model.
Preferably, the deep learning airborne detection positioning subsystem is specifically configured to:
constructing a space-semantic feature extraction network, extracting multilayer space feature information by pooling and normalizing the region of interest, and extracting semantic feature information by cyclic convolution;
and performing regression classification on the bounding box of the target through two fully-connected layers to obtain a detection result of the video image.
Preferably, the deep learning airborne detection positioning subsystem is further specifically configured to:
obtaining the pixel position of the small target in the video image according to the detection result of the video image;
the height of the unmanned aerial vehicle at the ground position is used as the geodetic height, the pixel position of the small target in the video image and the geographical coordinates of the unmanned aerial vehicle are mapped, and the longitude and latitude of the small target in a geographical coordinate system are solved;
and determining the geodetic height of the small target according to the calculated longitude and latitude, comparing the obtained geodetic height of the small target with the height of the unmanned aerial vehicle at the ground position, correcting, and determining the longitude and latitude and the height of the small target.
Preferably, the detection positioning system comprises a camera, a holder, an inertial measurement unit, a complex background image processing and space attitude calculation coprocessing unit and a high-performance computing unit;
the complex background image processing and spatial attitude calculation coprocessing unit comprises an I/O module, a clock control circuit, a JTAG controller and a basic programmable logic unit;
the camera, the cradle head and the inertia measurement unit are connected with the I/O module, the I/O module is connected with the clock control circuit, the JTAG controller and the basic programmable logic unit, the clock control circuit and the JTAG controller are connected with the basic programmable logic unit, the basic programmable logic unit is connected with the high-performance computing unit, and the high-performance computing unit is connected with the data return and ground station subsystem.
Preferably, the inertial measurement unit includes a three-axis gyroscope, a three-axis accelerometer, a three-axis geomagnetic sensor, a barometer, and a GPS module.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
aiming at the characteristic that the detection effect of the limited pixel small target is reduced due to the local detail loss of the traditional defogging algorithm under the complex background, the invention provides a multichannel optimal selection adaptive guided filtering defogging algorithm, removes the stray noise and retains the integral characteristic information of the limited pixel small target; aiming at the characteristics that the small targets of the unmanned airborne video occupy few pixels and the targets are difficult to separate from the background, a limited pixel target space-semantic fusion detection algorithm is provided, and effective feature extraction and detection of the small targets under the complex background are realized; aiming at the characteristics that the central position of a small target is difficult to measure and the target positioning precision and real-time performance are poor, a spatial multi-parameter pixel mapping geographic coordinate positioning algorithm is provided, the pixel position of the small target in a video image is associated and mapped with the geographic coordinate of the unmanned aerial vehicle, and the longitude and latitude and the height of the small target under a geographic coordinate system are obtained.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The embodiment of the invention provides a method for detecting and positioning an unmanned aerial vehicle-mounted small target under a complex background, as shown in fig. 1, the method comprises the following steps:
acquiring a video image of a small target and attitude data of an unmanned aerial vehicle and a cloud deck;
preprocessing the acquired video image by a multichannel optimal adaptive guided filtering defogging algorithm;
detecting the preprocessed video image by using a finite pixel target space-semantic information fusion detection algorithm to obtain the pixel position of a small target;
performing associated mapping on the pixel position of the small target in the video image and the geographic coordinate of the unmanned aerial vehicle by using a spatial multi-parameter pixel mapping geographic coordinate positioning algorithm to obtain the longitude and latitude and the height of the small target in a geographic coordinate system (WGS-84 GPS);
and displaying the small target in the unmanned aerial vehicle-mounted video image and the longitude and latitude and the height of the small target under the geographic coordinate system in real time.
Aiming at the characteristic that the detection effect of the limited pixel small target is reduced due to the local detail loss of the traditional defogging algorithm under the complex background, the invention provides a multichannel optimal selection adaptive guided filtering defogging algorithm, removes the stray noise and retains the integral characteristic information of the limited pixel small target; aiming at the characteristics that the small targets of the unmanned airborne video occupy few pixels and the targets are difficult to separate from the background, a limited pixel target space-semantic fusion detection algorithm is provided, and effective feature extraction and detection of the small targets under the complex background are realized; aiming at the characteristics that the central position of a small target is difficult to measure and the target positioning precision and real-time performance are poor, a spatial multi-parameter pixel mapping geographic coordinate positioning algorithm is provided, the pixel position of the small target in a video image is associated and mapped with the geographic coordinate of the unmanned aerial vehicle, and the longitude and latitude and the height of the small target under a geographic coordinate system are obtained.
Further, as shown in fig. 2, the preprocessing the acquired video image by the multi-channel preferably adaptive guided filtering defogging algorithm specifically includes:
and carrying out channel segmentation on the obtained video image, wherein the channel segmentation comprises the following steps: dark channel images, bright channel images and RGB three-channel images;
estimating a local atmospheric value according to the dark channel image and the bright channel image;
carrying out weighted guided filtering on the RGB three-channel image, and then estimating local transmittance;
and obtaining a defogged image according to the estimated local atmospheric value and local transmissivity by combining a fog interference model.
In different scenarios, the detection algorithm may be subject to a number of different types of interference: the urban complex background can be interfered by artificial buildings such as buildings and the like; may be disturbed by vegetation in mountain environments; aiming at the complex background, the invention constructs small target data sets of various backgrounds such as highways, cities, mountains, villages and the like, trains a detection network and enhances the generalization capability of the unmanned aerial vehicle-mounted small target detection positioning model under the complex background.
The unmanned aerial vehicle is easily interfered by aerial fog during flying, and the detection effect of the limited pixel small target is reduced due to local detail loss of the traditional defogging algorithm under a complex background.
Specifically, the invention utilizes a classical fog model, combines prior knowledge to solve atmospheric light components and transmittance, and estimates the transmittance according to a fog pattern forming model and a dark channel prior theory in computer vision:
I(x)=J(x)t(x)+A(1-t(x))
the above formula is slightly modified to the following formula:
according to the dark channel prior theory, the method comprises the following steps:
therefore, the transmittance can be derived
An estimate of (2).
It is assumed that the atmospheric light is known, and in fact is an estimate. The traditional algorithm finds the maximum pixel value of the corresponding position of the pixel in the original fog-free image as the atmospheric light value. However, if the background area in the image is too bright or has a high brightness object, the atmospheric light estimate may be close to 255, which may cause the image to be color cast or mottled after defogging. The invention provides a multi-channel optimal guiding filtering algorithm, which estimates the atmospheric light value from a bright channel and a dark channel respectively to obtain the final atmospheric light so as to solve a fog-free image, thereby realizing the enhancement of the target structure and detail information.
Further, as shown in fig. 3, the detecting the preprocessed image by using the finite pixel target space-semantic information fusion detection algorithm specifically includes:
constructing a space-semantic feature extraction network, extracting multilayer space feature information by pooling and normalizing the region of interest, and extracting semantic feature information by cyclic convolution;
and performing regression classification on the bounding box of the target through two fully-connected layers to obtain a detection result of the video image.
Typically, a target size of less than 4 × 4 pixels may be considered a small target. At the moment, the imaging area of the target is small, the proportion of the target in the imaging view range is small due to the factors of unmanned aerial vehicle remote imaging, small target geometric size and the like, and the target pixel proportion is lower than 0.1 percent of the whole image. Aiming at the image of the small target, the invention provides a finite pixel target space-semantic information fusion detection algorithm, which fuses the space information and the semantic information of the target, enhances the feature expression of the target, and can realize the feature extraction of the small target under the complex background, thereby achieving the accurate detection of the small target under the complex background.
Further, the mapping of the pixel positions of the small targets in the video image and the geographic coordinates of the unmanned aerial vehicle by using the spatial multi-parameter pixel mapping geographic coordinate positioning algorithm specifically comprises:
obtaining the pixel position of the small target in the video image according to the detection result of the video image;
the height of the unmanned aerial vehicle at the ground position is used as the geodetic height, the pixel position of the small target in the video image and the geographical coordinates of the unmanned aerial vehicle are mapped, and the longitude and latitude of the small target in a geographical coordinate system are solved;
and determining the geodetic height of the small target according to the calculated longitude and latitude, comparing the obtained geodetic height of the small target with the height of the unmanned aerial vehicle at the ground position, correcting, and determining the longitude and latitude and the height of the small target.
Specifically, as shown in fig. 4, according to the detection result of the video image, the pixel position of the small target in the video image is obtained; converting according to the resolution of the camera and the pixel size to obtain the position of the small target under a video image physical coordinate system; and then, obtaining the position of the small target in a camera coordinate system according to the focal length of the camera, and obtaining the position of the small target in the unmanned aerial vehicle geographic coordinate system by combining the pitch angle, the yaw angle and the roll angle of the holder. When flying, the unmanned aerial vehicle can obtain the pitch angle, the yaw angle and the roll angle of the body in real time, and the position of the unmanned aerial vehicle under a camera coordinate system can be determined according to the information. The longitude and latitude and the height of the unmanned aerial vehicle at the initial position can determine the position of the small airborne target under the geodetic coordinate system, and the longitude and latitude of the geographical coordinate system of the small unmanned aerial vehicle under the geodetic coordinate system can be determined according to the earth parameters.
Aiming at the problems of low positioning precision and poor real-time performance of an airborne small target in the prior art, the invention provides a geographic coordinate positioning algorithm by utilizing spatial multi-parameter pixel mapping, the algorithm is an airborne small target real-time positioning algorithm based on projection coordinate transformation, attitude information of an airborne machine is obtained from a video image, a central point pixel of the small target in the video image is taken as a pixel position of the airborne small target in the video image, the pixel position of the airborne small target in the video image is combined with the actual ground height, the airborne small target is detected by utilizing deep learning in the field of unmanned aerial vehicle vision and is positioned by carrying out spatial attitude calculation, and the airborne small target is corresponding to the position in a real scene, so that the longitude and latitude and the height of the small target under a geographic coordinate system are obtained.
As a specific implementation mode of the invention, the detailed flow of the geographic coordinate positioning algorithm by utilizing spatial multi-parameter pixel mapping is as follows: obtaining the position information of a small target in an image according to an obtained video image, wherein the position information is coordinates under an image pixel coordinate system (the upper left corner of the image is the origin), transforming the coordinates to obtain the coordinates of the small target in an image physical coordinate system (the center of the image is the origin), transforming the coordinates to a position in a camera coordinate system (the optical axis center of the camera is the origin), transforming the camera coordinate system to a pan-tilt coordinate system (the optical axis center is the origin) by rotating and translating, obtaining the coordinates of the small target in the pan-tilt coordinate system after transforming, estimating the geographic position of the small target, extracting digital elevation information of a target area, enabling the unmanned aerial vehicle to fly around the small target, taking the height information of the unmanned aerial vehicle as the initial ground height, and further carrying out airborne coordinate system (the origin is the image point of the camera), the ground coordinate system, the pan-tilt coordinate system and the airborne coordinate system, And converting the geodetic coordinate system (the origin of the coordinate system is the image point of the camera) and the geographic coordinate system to obtain the longitude, latitude and height of the object in the actual geographic position.
Positioning the small unmanned aerial vehicle-mounted target according to the height of the ground to obtain the longitude and latitude and the height of the target; according to the digital elevation information of the area, the height of the earth at the moment is compared with the height of the target, and the height of the target can be determined within a certain error range.
Correspondingly, an embodiment of the present invention further provides a system for detecting and positioning a small target on board an unmanned aerial vehicle in a complex background, as shown in fig. 5, the system includes:
the target vision enhancement subsystem is used for acquiring a video image of a small target and attitude data of the unmanned aerial vehicle and the tripod head, and preprocessing the acquired video image through a multichannel optimal adaptive guided filtering defogging algorithm;
the deep learning airborne detection positioning subsystem is used for detecting the preprocessed video image by utilizing a limited pixel target space-semantic information fusion detection algorithm to obtain the pixel position of a small target; the pixel position of the small target in the video image is associated with the geographic coordinate of the unmanned aerial vehicle by utilizing a spatial multi-parameter pixel mapping geographic coordinate positioning algorithm, so that the longitude, the latitude and the height of the small target in a geographic coordinate system are obtained;
and the data return and ground station subsystem is used for transmitting and displaying the small target in the unmanned aerial vehicle video image and the longitude and latitude and the height of the small target under the geographic coordinate system in real time.
The unmanned aerial vehicle-mounted small target detection and positioning system under the complex background can be used for carrying out real-time online accurate detection and positioning on the unmanned aerial vehicle-mounted small target under the complex background.
Further, the target visual enhancer system is specifically for:
and carrying out channel segmentation on the obtained video image, wherein the channel segmentation comprises the following steps: dark channel images, bright channel images and RGB three-channel images;
estimating a local atmospheric value according to the dark channel image and the bright channel image;
carrying out weighted guided filtering on the RGB three-channel image, and then estimating local transmittance;
and obtaining a defogged image according to the estimated local atmospheric value and local transmissivity by combining a fog interference model.
The target vision enhancement subsystem and the channel optimization self-adaptive guided filtering defogging algorithm are deeply integrated, so that the algorithm is hardware, and the deep processing capability of the hardware and the adaptability of the algorithm are improved.
Further, the deep learning airborne detection positioning subsystem is specifically configured to:
constructing a space-semantic feature extraction network, extracting multilayer space feature information by pooling and normalizing the region of interest, and extracting semantic feature information by cyclic convolution;
and performing regression classification on the bounding box of the target through two fully-connected layers to obtain a detection result of the video image.
Further, the deep learning airborne detection positioning subsystem is further specifically configured to:
obtaining the pixel position of the small target in the video image according to the detection result of the video image;
the height of the unmanned aerial vehicle at the ground position is used as the geodetic height, the pixel position of the small target in the video image and the geographical coordinates of the unmanned aerial vehicle are mapped, and the longitude and latitude of the small target in a geographical coordinate system are solved;
and determining the geodetic height of the small target according to the calculated longitude and latitude, comparing the obtained geodetic height of the small target with the height of the unmanned aerial vehicle at the ground position, correcting, and determining the longitude and latitude and the height of the small target.
A specific structure of the unmanned aerial vehicle-mounted small target detection and positioning system under the complex background is shown in fig. 6. The system comprises a camera, a holder, an inertia measurement unit, a complex background image processing and space attitude resolving coprocessing unit and a high-performance computing unit;
the complex background image processing and spatial attitude calculation coprocessing unit comprises an I/O module, a clock control circuit, a JTAG controller and a basic programmable logic unit;
the camera, the cradle head and the inertia measurement unit are connected with the I/O module, the I/O module is connected with the clock control circuit, the JTAG controller and the basic programmable logic unit, the clock control circuit and the JTAG controller are connected with the basic programmable logic unit, the basic programmable logic unit is connected with the high-performance computing unit, and the high-performance computing unit is connected with the data return and ground station subsystem.
Specifically, IMU inertial measurement unit is integrated chip, including triaxial gyroscope, triaxial accelerometer, triaxial geomagnetic sensor, barometer and GPS module, and the effect that plays in unmanned aerial vehicle is the change of perception gesture. The three-axis gyroscope is used for measuring the inclination angle of the unmanned aerial vehicle local machine; the three-axis accelerometer is used for measuring the acceleration of the XYZ three axes of the airplane; the geomagnetic sensor is used for sensing the geomagnetism and is equivalent to an electronic compass; the barometer calculates the pressure difference by measuring the air pressure at different positions to obtain the current height; the GPS is used for acquiring longitude and latitude and height of the unmanned aerial vehicle.
The clock control circuit provides clock signals to the I/O module and the basic programmable logic unit, including a global clock signal, a clock reset signal and an output enable signal.
The JTAG controller controls the I/O module to read data through the bus, wherein a serial port collects unmanned aerial vehicle attitude data and video images transmitted by a camera in the inertia measurement unit, and a CAN bus collects attitude data of a holder and stores the attitude data to a data storage module of the complex background image co-processing unit after decoding.
The basic programmable logic unit comprises an image preprocessing module, completes target vision enhancement such as defogging of an unmanned aerial vehicle-mounted small target under a complex background, converts serial data into parallel data through serial decoding, and outputs the parallel data to the high-performance computing unit.
The high-performance computing unit detects video images of the unmanned aerial vehicle-mounted small target according to the output data of the complex background image processing and space attitude calculation coprocessing unit, the video images are mapped in association with the geographical coordinates of the unmanned aerial vehicle coordinate system to achieve geographical positioning of the unmanned aerial vehicle-mounted small target, and the final result is transmitted to the ground station subsystem through data return and displayed in real time. And the data return and ground station subsystem realizes the visualization of the airborne small target and the longitude and latitude and height thereof under the geographic coordinate system.
The detection positioning system provided by the embodiment of the invention does not need an unmanned aerial vehicle to carry laser ranging equipment, and is suitable for small-sized and light-weight unmanned aerial vehicles.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.