CN116189021B - Multi-branch intercrossing attention-enhanced unmanned aerial vehicle multispectral target detection method - Google Patents

Multi-branch intercrossing attention-enhanced unmanned aerial vehicle multispectral target detection method Download PDF

Info

Publication number
CN116189021B
CN116189021B CN202310166585.3A CN202310166585A CN116189021B CN 116189021 B CN116189021 B CN 116189021B CN 202310166585 A CN202310166585 A CN 202310166585A CN 116189021 B CN116189021 B CN 116189021B
Authority
CN
China
Prior art keywords
channel
branch
convolution
attention
multispectral
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310166585.3A
Other languages
Chinese (zh)
Other versions
CN116189021A (en
Inventor
孙备
苏绍璟
左震
郭润泽
吴鹏
袁书东
黄泓赫
党昭洋
童小钟
李�灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202310166585.3A priority Critical patent/CN116189021B/en
Publication of CN116189021A publication Critical patent/CN116189021A/en
Application granted granted Critical
Publication of CN116189021B publication Critical patent/CN116189021B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01JMEASUREMENT OF INTENSITY, VELOCITY, SPECTRAL CONTENT, POLARISATION, PHASE OR PULSE CHARACTERISTICS OF INFRARED, VISIBLE OR ULTRAVIOLET LIGHT; COLORIMETRY; RADIATION PYROMETRY
    • G01J3/00Spectrometry; Spectrophotometry; Monochromators; Measuring colours
    • G01J3/28Investigating the spectrum
    • G01J3/2823Imaging spectrometer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01JMEASUREMENT OF INTENSITY, VELOCITY, SPECTRAL CONTENT, POLARISATION, PHASE OR PULSE CHARACTERISTICS OF INFRARED, VISIBLE OR ULTRAVIOLET LIGHT; COLORIMETRY; RADIATION PYROMETRY
    • G01J3/00Spectrometry; Spectrophotometry; Monochromators; Measuring colours
    • G01J3/28Investigating the spectrum
    • G01J3/2823Imaging spectrometer
    • G01J2003/2826Multispectral imaging, e.g. filter imaging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Remote Sensing (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

According to the multi-branch interdigitated attention-enhanced unmanned aerial vehicle multispectral target detection method, an unmanned aerial vehicle multispectral camera is calibrated to obtain an internal and external parameter matrix; collecting gray plate data of standard reflectivity; then, the multispectral image of the environmental target is acquired in real time, the multispectral remote sensing data is subjected to distortion removal and reflectivity correction, six-channel data are combined into two 3-channel images after correction, then, the two 3-channel images are subjected to feature extraction by utilizing two feature extraction modules, attention is calculated on the extracted convolution features by utilizing an attention module, attention of a branch 1 is weighted to the convolution features of the corresponding dimension of a branch 2 by utilizing a cross attention module, attention of the branch 2 is weighted to the convolution features of the corresponding dimension of the branch 1, multi-branch cross attention enhancement is realized, and finally, the target category and the target mask are predicted based on the fused enhanced features. The unmanned aerial vehicle multispectral target detection method provided by the invention has the advantages of high efficiency, low cost and flexible operation.

Description

Multi-branch intercrossing attention-enhanced unmanned aerial vehicle multispectral target detection method
Technical Field
The invention relates to the technical field of unmanned aerial vehicle target detection, in particular to a multi-branch interdigitated attention-enhanced unmanned aerial vehicle multispectral target detection method.
Background
The unmanned aerial vehicle multispectral remote sensing is a novel technology for carrying out remote sensing imaging based on the multispectral camera carried by the unmanned aerial vehicle, can fly at ultra-low altitude, can avoid the influence of cloud layers, can rapidly acquire small-scale target remote sensing images, and has the advantages of maneuver, flexibility, economy and the like. As a new generation of remote sensing technology, the multispectral remote sensing of the unmanned aerial vehicle can provide support for air-to-ground environment monitoring and target detection, is complementary with the existing aerospace remote sensing modes and the like, can rapidly acquire monitoring information of key areas, and realizes real-time tracking and dynamic supervision of targets.
At present, a certain research work is carried out in the fields of agriculture, forestry, animal husbandry, mineral industry, environmental monitoring and the like based on an unmanned aerial vehicle-mounted multispectral remote sensing mode. For example, zhang Zhitao et al used unmanned aerial vehicle multispectral imaging for remote sensing inversion studies of soil moisture content; guilamaes et al use neural networks for regression analysis of unmanned aerial vehicle images to predict water body suspended matter concentration; wei Pengfei et al remotely sensed estimates of nitrogen content of summer corn leaves at different growth stages using unmanned aerial vehicle multispectral images; hu Pingxiang et al utilize remote sensing images of different periods to realize beach topography evolution monitoring; the army et al realize coastal ground object classification of spectrum characteristic and image texture fusion based on a neural network.
However, the foregoing operations based on the unmanned aerial vehicle multispectral remote sensing technology are mainly based on the traditional spectral index inversion to analyze the optical characteristics of the target, and generally face external environmental influences, including: (1) the influence of environmental changes such as illumination and weather, and (2) the influence of small-size and approaching of 'target-background' textures, the target detection effect is unstable, and the detection efficiency is required to be improved.
Disclosure of Invention
In order to achieve higher-precision and higher-efficiency unmanned aerial vehicle target detection and identification, the invention provides a multi-branch mutually-crossing attention-enhanced unmanned aerial vehicle multispectral target detection method.
In order to solve the technical problems, the invention adopts the following technical methods: a multi-branch interdigitated attention-enhanced unmanned aerial vehicle multispectral target detection method comprises the following steps:
step S1, installing an airborne image data transmission station and a multispectral camera comprising six channels on an unmanned aerial vehicle, connecting a PC platform with the ground end image data transmission station, and calibrating the multispectral camera to obtain an internal reference matrix and an external reference matrix of the multispectral camera;
step S2, preparing a gray plate with standard reflectivity, placing the gray plate on the ground, placing a handheld unmanned aerial vehicle right above the gray plate, enabling a multispectral camera to perform 90-degree nodding shooting on the ground, shooting the gray plate with the frequency of 1Hz, and collecting a group of gray plate data with standard reflectivity;
step S3, taking off the unmanned aerial vehicle, collecting multispectral images of an environmental target by using a multispectral camera, acquiring multispectral remote sensing data of airborne shooting in real time by using a ground end map data transmission station, de-distorting the multispectral remote sensing data by using the PC platform on the basis of the internal reference matrix and the external reference matrix acquired in the step S1, correcting reflectivity on the basis of gray plate data acquired in the step S2, and finally randomly combining the corrected six-channel data to generate two 3-channel images;
and S4, inputting the generated two 3-channel images into a multi-branch segmentation network as a branch 1 and a branch 2, respectively carrying out feature extraction on the 3-channel images of the two branches by utilizing two feature extraction modules, then calculating attention to the extracted convolution features by utilizing an attention module, then weighting the attention of the branch 1 to the convolution features of the corresponding dimension of the branch 2 by utilizing a cross attention module, weighting the attention of the branch 2 to the convolution features of the corresponding dimension of the branch 1, realizing multi-branch cross attention enhancement, and finally predicting a target category and a target mask based on the fused enhanced features.
Further, the wavelength of each channel of the multispectral camera is as follows: first channel 450nm, second channel 550nm, third channel 660nm, fourth channel 720nm, fifth channel 750nm, sixth channel 840nm.
Further, in step S1, when the multispectral camera is calibrated:
firstly, obtaining a calibration sample: preparing a black-and-white chess grid calibration plate, shooting N groups of black-and-white chess grid calibration plate images by using a multispectral camera, and marking the images as I i c Wherein the value of C is 1-6, which represents the image data of the first channel to the sixth channel of the multispectral camera, i represents the data of the first group, and the value is 1-N;
then solving the reference matrix of each channel: image-basedPerforming internal reference calibration on the C channel;
finally, respectively solving the outer parameter matrixes of the second channel, the sixth channel and the first channel: based on images respectivelyAnd image->Solving an extrinsic matrix of the first channel and the second channel; based on image->And image->Solving an extrinsic matrix of the first channel and the third channel; based on image->And image->Solving an extrinsic matrix of the first channel and the fourth channel; based on image->And image->Solving an extrinsic matrix of the first channel and the fifth channel; based on image->And image->Solving an extrinsic matrix of the first channel and the sixth channel; sequentially marking the acquired multispectral camera external parameter matrix as { C } 1~2 ,C 1~3 ,C 1~4 ,C 1~5 ,C 1~6 }。
In step S3, the six-channel data after correction is sent to the multi-branch splitting network of the PC platform, the band combining module synthesizes the 1 st channel, 3 rd channel and 5 th channel images after external registration into the 3-channel image of the branch 1, and synthesizes the 2 nd channel, 4 th channel and 6 th channel images after external registration into the 3-channel image of the branch 2.
Still further, the two feature extraction modules in the multi-branch splitting network are respectively composed of two res net18 backbone networks, and the two res net18 backbone networks respectively comprise 5-layer convolution modules.
Still further, the step S4 includes:
s41, inputting the generated two 3-channel images into a multi-branch segmentation network as a branch 1 and a branch 2, respectively extracting the features of the two 3-channel images of the two branches by using a 2 nd, 3 rd, 4 th and 5 th layer convolution module of the two ResNet18 backbone networks, and outputting a convolution feature graph by the branch 1Branch 2 output convolution feature map +.>By F i j A convolution characteristic diagram output by a j-th layer convolution module of a branch i is represented, wherein i=1 and 2; j=2, 3, 4, 5;
s42, convolving the convolution characteristic map of branch 1Input to attention module, calculate spatial attention weight +.>Convolving characteristic map of branch 2->Input to attention module, calculate spatial attention weight +.>With A i j Representing spatial attention weights of the j-th layer of branch i, where i=1, 2; j=2, 3, 4, 5;
s43, the spatial attention weight of branch 1 is calculatedOf the layer corresponding to branch 2Convolution feature map->Weighting fusion is carried out to obtain a new convolution characteristic diagram +.>Replacing the output of the original j-th layer convolution module of the branch 2; spatial attention weight of Branch 2 +.>Convolved feature map F with corresponding layer of branch 1 1 j Weighting fusion is carried out to obtain a new convolution characteristic diagram +.>Replacing the output of the original ith convolution module of the branch 1; use->A convolution characteristic diagram of a j-th layer of a branch i after weighted fusion is represented, wherein i=1 and 2; j=2, 3, 4, 5;
s44, respectively carrying out weighted fusion on the convolution feature graphs output by the convolution modules at the last layer of the two branches, and inputting the convolution feature graphs into the FPN pyramid structure to obtain the final target class and the target mask.
Further, in the step S42, when calculating the spatial attention weights of the two branches, the convolution feature graphs output by the convolution modules of the 2 nd, 3 rd, 4 th and 5 th layers of the two branches are extracted, and then the dimension of the convolution feature graphs is calculated as (W 0 ,H 0 C) is transformed into (W, H, C), then space dimension compression is carried out to generate a channel attention weight with the dimension of (1, C), then channel attention weighting is carried out, the dimension is kept unchanged, and finally channel dimension compression is carried out to generate the space attention weight with the dimension of (W, H, 1).
Further, in the step S42, an input feature is givenCalculation ofThe formula of feature pooling of channel dimensions is:
in formula 1, I (k, W, H) is the data of the convolution layer of the kth channel of the convolution feature at coordinates (W, H), z (W, H) is the average value of all channels of the convolution feature at coordinates (W, H), the dimension of the convolution feature is [ C, W, H ], where C is the channel dimension, and W and H are wide and high, respectively;
given input featuresThe formula for computing feature pooling of spatial dimensions is:
in equation 2, I (c, I, j) is the value of the convolution layer of the c-th channel of the convolution feature at the coordinate (I, j), and z (c) is the average value of the convolution layer of the convolution feature at the c-th channel.
Preferably, in the step S43, the spatial attention weight is weighted in weighted fusionSequentially multiplying the (x, y) th element of (c) by the convolution feature map F i j (C, x, y) element to obtain a new convolution feature map
Compared with the prior art, the multi-branch interdigitated attention-enhanced unmanned aerial vehicle multispectral target detection method provided by the invention has the advantages of higher efficiency, lower cost and more flexible operation, and specifically:
1. according to the method, the multi-spectrum images are processed by the deep learning method of the two backbone networks, and the plurality of spectrum images with the characteristic of spectrum information and space structure are combined and extracted in characteristics, so that the conventional multi-spectrum target detection means is effectively expanded, and the target detection and identification under typical camouflage and weak contrast scenes are improved.
2. According to the invention, the mutual-crossing attention module is designed, the multi-scale convolution features of different spectral images are mutually weighted and enhanced, new features are generated by combination, the target feature representation of a weak contrast scene is effectively improved by utilizing the spectral imaging of different wavelengths, and the mutual-crossing attention module does not change feature dimension and has the characteristic of flexible deployment.
3. Aiming at the problems of weak target characteristics, high detection difficulty and the like in a target detection task of an unmanned aerial vehicle in a weak contrast scene, compared with a typical RGB (red, green and blue) target detection and identification mode, the invention adopts an unmanned aerial vehicle multispectral remote sensing detection method, designs a multispectral split network for multispectral target detection, extracts spectral characteristics of different wavelengths through a plurality of backbone networks, carries out intercrossing attention enhancement, effectively fuses multispectral image characteristics, improves the intelligent perception capability of the unmanned aerial vehicle in the weak contrast scene, and improves the target identification precision and detection efficiency.
Drawings
FIG. 1 is a schematic diagram of a multi-spectral target detection system for an unmanned aerial vehicle according to the present invention;
figure 2 is a flow chart of a method for multi-branch interdigitated enhanced unmanned aerial vehicle multispectral target detection in accordance with the present invention;
figure 3 is a schematic diagram of a multi-branch splitting network according to the present invention;
FIG. 4 is a schematic diagram of the structure of the attention module according to the present invention;
fig. 5 is a block diagram of a cross-attention module according to the present invention (a) showing a cross-attention module applied to two branches having the same convolution dimensions, and (b) showing a cross-attention module applied to two branches having different convolution dimensions).
Detailed Description
The invention will be further described with reference to examples and drawings, to which reference is made, but which are not intended to limit the scope of the invention.
Compared with the existing unmanned aerial vehicle multispectral remote sensing technical method, the method disclosed by the invention focuses on the organic combination of deep learning and multispectral remote sensing technology, so that the technical development of unmanned aerial vehicle multispectral remote sensing in the field of target detection is effectively promoted, and the method has higher detection efficiency in the application of unmanned aerial vehicle target detection and identification. In the background art, it is mentioned that unmanned aerial vehicle target detection is generally subject to external environmental influences, including: (1) environmental change effects of lighting, weather, etc., (2) effects of small size, and "target-background" texture proximity. Therefore, how to utilize the technical advantages of deep learning in the field of target detection is important in academic significance and application value in researching the unmanned aerial vehicle multispectral target detection method based on the deep learning.
Currently, the target detection method for deep learning includes two types of target detection networks and target segmentation networks. The object detection network usually describes the position information in a polygonal frame (usually a rectangular frame is most), the object segmentation network obtains pixel-level semantic information of the object, and characterizes the position of the object in a form of a contour binary image mask, so that the contour details of the object and the environment can be more resolved. Therefore, the invention adopts the target segmentation network to realize the detection and identification of the target. However, the existing deep learning target detection network takes 3-channel RGB images as input, and for more than 3-channel images, the deep learning method has two processing ideas: (1) firstly, converting 6-channel data into 3-channel data by means of downsampling, compression and the like without changing the original network structure, and inputting the 3-channel data into an RGB target detection network; (2) and secondly, the characteristic extraction backbone network of the original network is modified into a multi-branch input structure, so that characteristic extraction fusion can be carried out on a plurality of spectrum channels at the same time. The mode 1 is simpler, but the detailed information of the spectrum is ignored, and the mode 1 is mostly adopted in the world; the method 2 can extract multispectral detail information, which is beneficial to obtaining higher detection efficiency, and the method 2 is adopted in the invention.
In view of the above thought, in order to improve the intelligent perception capability of the unmanned aerial vehicle in a weak contrast scene, the invention provides a multi-branch interdigitated attention-enhanced unmanned aerial vehicle multispectral target detection method, which mainly comprises four steps, and specifically comprises the following steps.
Step S1
Before the unmanned aerial vehicle takes off, an unmanned aerial vehicle multispectral target detection system is built, as shown in fig. 1, the unmanned aerial vehicle multispectral target detection system comprises an unmanned aerial vehicle multispectral remote sensing platform and a ground data processing platform, the unmanned aerial vehicle multispectral remote sensing platform comprises a multispectral camera and an airborne map data transmission radio station which are installed on the unmanned aerial vehicle, and the ground data processing platform comprises a ground end map data transmission radio station and a PC platform. The multispectral camera is connected to the airborne image data transmission station through a network cable, the PC platform is connected to the ground end image data transmission station through a network cable, and the multispectral camera and the IP address of the PC platform are set to be in the same section. The unmanned aerial vehicle collects multispectral image data of a target through a multispectral camera, then the multispectral image data is transmitted to the ground through an airborne image data transmission radio station and a ground terminal image data transmission radio station, a multi-branch segmentation network is deployed on a PC platform of the ground terminal, the multi-branch segmentation network is improved on the basis of SOLO_V2 and comprises a wave band combination module, a feature extraction module, an attention module, a cross attention module, an FPN pyramid structure and the like, and multispectral data is aligned, calibrated, target detection and the like. Different from the post-processing of the multispectral data of most unmanned aerial vehicles in the prior art, the method is designed to be a real-time online detection mode for the image transmission and processing of multispectral remote sensing of unmanned aerial vehicles.
The multispectral camera of the unmanned aerial vehicle comprises 6 channels, and the wavelength of each channel is as follows in sequence: first channel 450nm, second channel 550nm, third channel 660nm, fourth channel 720nm, fifth channel 750nm, sixth channel 840nm.
After the unmanned aerial vehicle multispectral target detection system is built, calibrating the multispectral camera to obtain an internal reference matrix and an external reference matrix of the multispectral camera. Referring to fig. 2, the black-and-white checkers calibration board provides data support for calculating an internal reference matrix of each channel of the multispectral camera and an external reference matrix between channels, and the calibration method adopted by the invention is a Zhang Zhengyou camera calibration method, and the calibration process is as follows:
firstly, obtaining a calibration sample: preparing a black-white chess grid calibration plate by using multiple spectrumsThe camera shoots N groups of black and white chess grid calibration plate images, and the images are recorded asWherein the value of C is 1-6, which represents the image data of the first channel to the sixth channel of the multispectral camera, i represents the data of the first group, and the value is 1-N.
Then solving the reference matrix of each channel: image-basedAnd (5) performing internal reference calibration on the C channel.
Finally, respectively solving the outer parameter matrixes of the second channel, the sixth channel and the first channel: based on images respectivelyAnd image->Solving an extrinsic matrix of the first channel and the second channel; based on image->And image->Solving an extrinsic matrix of the first channel and the third channel; based on image->And image->Solving an extrinsic matrix of the first channel and the fourth channel; based on image->And image->Solving the first channel and the second channelAn extrinsic matrix of five channels; based on image->And image->Solving an extrinsic matrix of the first channel and the sixth channel; sequentially marking the acquired multispectral camera external parameter matrix as { C } 1~2 ,C 1~3 ,C 1~4 ,C 1~5 ,C 1~6 }。
Step S2
Before the unmanned aerial vehicle takes off, shoot gray plate data of standard reflectivity: preparing a gray plate with standard reflectivity, placing the gray plate on the ground, placing a handheld unmanned aerial vehicle right above the gray plate, enabling a multispectral camera to perform 90-degree nodding shooting on the ground, shooting the gray plate with the frequency of 1Hz, and collecting multispectral image data of a group of gray plates with standard reflectivity. In a specific application, referring to fig. 2, the gray plate may be chosen to have a standard reflectivity, or may be a target plate of other known standard reflectivity.
Step S3
Taking off the unmanned aerial vehicle, collecting multispectral images of an environmental target by using a multispectral camera, acquiring multispectral remote sensing data of airborne shooting by using a PC platform through a ground terminal diagram data transmission station in real time, de-distorting the multispectral remote sensing data by using the PC platform firstly based on the internal reference matrix and the external reference matrix acquired in the step S1, then performing reflectivity correction based on the gray plate data acquired in the step S2, and finally performing random combination on the six corrected data to generate two 3-channel images. Specifically, in this embodiment, the six-channel data after correction is sent to the multi-branch segmentation network of the PC platform, and the band combination module synthesizes the 3-channel images of the 1 st channel, the 3 rd channel and the 5 th channel images after external registration, synthesizes the 3-channel images of the 2 nd channel, the 4 th channel and the 6 th channel images after external registration into two 3-channel images, and the channel combination can be other combinations according to the requirements.
In practical application, the construction process of the unmanned aerial vehicle multispectral image dataset comprises the following steps: storing the acquired multichannel data with the same time sequence into a format of 't_ch.jpg', wherein't' represents time, and 'ch' represents channel wavelength which is respectively '450 nm, 550nm, 660nm, 720nm, 750nm and 840 nm'; carrying out multichannel alignment and reflectivity correction on multichannel data with the same time sequence; and labeling the channel image based on LabelMe standard software, wherein the multichannel data with the same time sequence only need to select any 1 channel for labeling, and generating a labeling file 't_num.json', wherein't' represents time and 'num' represents an image sequence number.
Step S4
S41, as shown in fig. 3, the generated two 3-channel images are input as branch 1 and branch 2 into the feature extraction module of the multi-branch segmentation network. The multi-branch segmentation network is provided with two feature extraction modules, in particular two ResNet18 backbone networks, wherein the ResNet18 backbone networks comprise structures such as a 5-layer convolution module, a pooling module, an activation module and the like. The 2 nd, 3 rd, 4 th and 5 th layer convolution modules of two ResNet18 backbone networks respectively conduct feature extraction on 3-channel images of two branches, and branch 1 outputs a convolution feature mapBranch 2 output convolution feature map +.>By F i j A convolution characteristic diagram output by a j-th layer convolution module of a branch i is represented, wherein i=1 and 2; j=2, 3, 4, 5.
S42, as shown in FIG. 4, the convolution characteristic map of branch 1Input to attention module, calculate spatial attention weight +.>Convolving characteristic map of branch 2->Input to attention module, calculate spatial attention weight +.>Use->Representing spatial attention weights of the j-th layer of branch i, where i=1, 2; j=2, 3, 4, 5;
further, when calculating the spatial attention weights of the two branches, the convolution feature graphs output by the convolution modules of the layers 2, 3, 4 and 5 of the two branches are extracted respectively, and then the dimension of the convolution feature graphs is calculated from (W 0 ,H 0 C) is transformed into (W, H, C), then space dimension compression is carried out to generate a channel attention weight with the dimension of (1, C), then channel attention weighting is carried out, the dimension is kept unchanged, and finally channel dimension compression is carried out to generate the space attention weight with the dimension of (W, H, 1).
Given input featuresThe formulas for calculating the feature pooling of the channel dimension and the feature pooling of the space dimension are the following formulas (1) and (2), respectively.
In equation 1, I (k, W, H) is the data of the convolution layer of the kth channel of the convolution feature at coordinates (W, H), z (W, H) is the average of all channels of the convolution feature at coordinates (W, H), the dimension of the convolution feature is [ C, W, H ], where C is the channel dimension, and W and H are wide and high, respectively.
Given input featuresThe formula for computing feature pooling of spatial dimensions is:
in equation 2, I (c, I, j) is the value of the convolution layer of the c-th channel of the convolution feature at the coordinate (I, j), and z (c) is the average value of the convolution layer of the convolution feature at the c-th channel.
The two transforms aggregate features along the spatial and channel dimensions, respectively, and produce a pair of differently oriented feature maps z (w, h) and z (c). The two transformations described above allow the network to learn the interactions of one channel dimension while preserving the enhancement of the other, which, unlike the purely spatial or channel-based approach, helps the network locate objects of interest more accurately.
S43, as shown in FIG. 5, the spatial attention weight of branch 1 is calculated by the inter-crossing attention moduleConvolution feature map of layer corresponding to branch 2 +.>Weighting fusion is carried out to obtain a new convolution characteristic diagram +.>Replacing the output of the original j-th layer convolution module of the branch 2; spatial attention weight of Branch 2 +.>Convolved feature map F with corresponding layer of branch 1 1 j Weighting fusion is carried out to obtain a new convolution characteristic diagram +.>Replacing the output of the original ith convolution module of the branch 1; use->A convolution characteristic diagram of a j-th layer of a branch i after weighted fusion is represented, wherein i=1 and 2; j=2, 3, 4, 5.
The mutual cross attention module comprises an attention calculation module and a cross mutual weighting module, wherein the input of the attention calculation module is a convolution characteristic diagram of a branch 1 or a branch 2, the dimension is (C, W, H), and the output is a spatial attention weight with the dimension of (1, W, H); the cross mutual weighting module weights the attention weight of the branch 1 and the convolution feature diagram corresponding to the branch 2 to obtain a fused convolution feature diagram with the same resolution, wherein the spatial attention weighting rule is that the (x, y) th element of the spatial attention weight is sequentially multiplied by the (C, x, y) th element of the convolution feature diagram. The mutual cross attention can be applied to the same convolution dimensions in two branches, referring to fig. 5 (a), or to layers with different convolution dimensions in two branches, referring to fig. 5 (b), the convolution features need to be first subjected to dimension transformation.
S44, respectively carrying out weighted fusion on convolution feature graphs output by a convolution module at the last layer of two branches, and inputting the convolution feature graphs into an FPN pyramid structure of a multi-branch segmentation network to carry out target category and target mask prediction, wherein the process is consistent with a SOLO_V2 segmentation method, and comprises a predicted branch structure, a loss function design, a model training strategy and the like, and is abbreviated herein. The subsequent data processing is the same as SOLO_V2, and will not be described further herein, after the final target class and target mask are obtained.
It should be noted that in practical application, the weighting mode of the convolution feature map output by the convolution module of the last layer of two branches may be the data weighting and the multiplication weight of the corresponding position, or may be a convolution module for dimension transformation, and more methods can be referred to herein.
From the foregoing, the invention designs a multi-branch structure for multi-spectral image input, designs a multi-spectral fusion enhancement mechanism based on the mutual cross attention, improves the existing RGB target detection network, and builds a real-time detection system of unmanned aerial vehicle multi-spectral remote sensing so as to realize unmanned aerial vehicle target detection and identification with higher precision and higher efficiency. Compared with the target detection method based on RGB, the target detection of the multi-band spectrum can obtain more target optical characteristics, and has higher detection efficiency in complex weak contrast and weak texture target detection scenes, so that the target detection and recognition capability of the unmanned aerial vehicle in complex scenes is improved.
The foregoing embodiments are preferred embodiments of the present invention, and in addition, the present invention may be implemented in other ways, and any obvious substitution is within the scope of the present invention without departing from the concept of the present invention.
In order to facilitate understanding of the improvements of the present invention over the prior art, some of the figures and descriptions of the present invention have been simplified, and some other elements have been omitted from this document for clarity, as will be appreciated by those of ordinary skill in the art.

Claims (8)

1. The multi-branch interdigitated attention-enhancing unmanned aerial vehicle multispectral target detection method is characterized by comprising the following steps of:
step S1, installing an airborne image data transmission station and a multispectral camera comprising six channels on an unmanned aerial vehicle, connecting a PC platform with the ground end image data transmission station, and calibrating the multispectral camera to obtain an internal reference matrix and an external reference matrix of the multispectral camera; when calibrating a multispectral camera:
firstly, obtaining a calibration sample: preparing a black-and-white chess grid calibration plate, shooting N groups of black-and-white chess grid calibration plate images by using a multispectral camera, and marking asWherein the value of C is 1-6, which represents the image data of the first channel to the sixth channel of the multispectral camera, i represents the data of the first group, and the value is 1-N;
then solving the reference matrix of each channel: image-basedPerforming internal reference calibration on the C channel;
finally, respectively solving the outer parameter matrixes of the second channel, the sixth channel and the first channel: based on respectivelyImage processing apparatusAnd image->Solving an extrinsic matrix of the first channel and the second channel; based on image->And image->Solving an extrinsic matrix of the first channel and the third channel; based on image->And image->Solving an extrinsic matrix of the first channel and the fourth channel; based on image->And image->Solving an extrinsic matrix of the first channel and the fifth channel; image-basedAnd image->Solving an extrinsic matrix of the first channel and the sixth channel; sequentially marking the acquired multispectral camera external parameter matrix as { C } 1~2 ,C 1~3 ,C 1~4 ,C 1~5 ,C 1~6 };
Step S2, preparing a gray plate with standard reflectivity, placing the gray plate on the ground, placing a handheld unmanned aerial vehicle right above the gray plate, enabling a multispectral camera to perform 90-degree nodding shooting on the ground, shooting the gray plate with the frequency of 1Hz, and collecting a group of gray plate data with standard reflectivity;
step S3, taking off the unmanned aerial vehicle, collecting multispectral images of an environmental target by using a multispectral camera, acquiring multispectral remote sensing data of airborne shooting in real time by using a ground end map data transmission station, de-distorting the multispectral remote sensing data by using the PC platform on the basis of the internal reference matrix and the external reference matrix acquired in the step S1, correcting reflectivity on the basis of gray plate data acquired in the step S2, and finally randomly combining the corrected six-channel data to generate two 3-channel images;
and S4, inputting the generated two 3-channel images into a multi-branch segmentation network as a branch 1 and a branch 2, respectively carrying out feature extraction on the 3-channel images of the two branches by utilizing two feature extraction modules, then calculating attention to the extracted convolution features by utilizing an attention module, then weighting the attention of the branch 1 to the convolution features of the corresponding dimension of the branch 2 by utilizing a cross attention module, weighting the attention of the branch 2 to the convolution features of the corresponding dimension of the branch 1, realizing multi-branch cross attention enhancement, and finally predicting a target category and a target mask based on the fused enhanced features.
2. The multi-branch interdigitated enhanced unmanned aerial vehicle multispectral target detection method of claim 1, wherein: the wavelength of each channel of the multispectral camera is as follows in sequence: first channel 450nm, second channel 550nm, third channel 660nm, fourth channel 720nm, fifth channel 750nm, sixth channel 840nm.
3. The multi-branch interdigitated enhanced unmanned aerial vehicle multispectral target detection method of claim 2, wherein: in step S3, the six-channel data after correction is sent to the multi-branch segmentation network of the PC platform, and the band combining module synthesizes the 1 st channel, 3 rd channel and 5 th channel images after external registration into the 3-channel image of the branch 1, and synthesizes the 2 nd channel, 4 th channel and 6 th channel images after external registration into the 3-channel image of the branch 2.
4. A multi-branch interdigitated enhanced unmanned aerial vehicle multispectral target detection method according to claim 3, wherein: the two feature extraction modules in the multi-branch segmentation network are respectively composed of two ResNet18 backbone networks, and the two ResNet18 backbone networks respectively comprise 5 layers of convolution modules.
5. The multi-branch interdigitated enhanced unmanned aerial vehicle multispectral target detection method of claim 4, wherein: the step S4 includes:
s41, inputting the generated two 3-channel images into a multi-branch segmentation network as a branch 1 and a branch 2, respectively extracting the features of the two 3-channel images of the two branches by using a 2 nd, 3 rd, 4 th and 5 th layer convolution module of the two ResNet18 backbone networks, and outputting a convolution feature graph by the branch 1Branch 2 output convolution feature map +.>By F i j A convolution characteristic diagram output by a j-th layer convolution module of a branch i is represented, wherein i=1 and 2; j=2, 3, 4, 5;
s42, convolving the convolution characteristic map of branch 1Input to the attention module, calculate the spatial attention weightConvolving characteristic map of branch 2->Input to attention mouldBlock, calculate spatial attention weight +.>Use->Representing spatial attention weights of the j-th layer of branch i, where i=1, 2; j=2, 3, 4, 5;
s43, the spatial attention weight of branch 1 is calculatedConvolution feature map of layer corresponding to branch 2 +.>Weighting fusion is carried out to obtain a new convolution characteristic diagram +.>Replacing the output of the original j-th layer convolution module of the branch 2; spatial attention weighting of branch 2Convolved feature map F with corresponding layer of branch 1 1 j Weighting fusion is carried out to obtain a new convolution characteristic diagram +.>Replacing the output of the original ith convolution module of the branch 1; use->A convolution characteristic diagram of a j-th layer of a branch i after weighted fusion is represented, wherein i=1 and 2; j=2, 3, 4, 5;
s44, respectively carrying out weighted fusion on the convolution feature graphs output by the convolution modules at the last layer of the two branches, and inputting the convolution feature graphs into the FPN pyramid structure to obtain the final target class and the target mask.
6. The multi-branch interdigitated enhanced unmanned aerial vehicle multispectral target detection method of claim 5, wherein: in the step S42, when calculating the spatial attention weights of the two branches, the convolution feature graphs output by the convolution modules of the 2 nd, 3 rd, 4 th and 5 th layers of the two branches are extracted, and then the dimension of the convolution feature graphs is calculated from (W 0 ,H 0 C) is transformed into (W, H, C), then space dimension compression is carried out to generate a channel attention weight with the dimension of (1, C), then channel attention weighting is carried out, the dimension is kept unchanged, and finally channel dimension compression is carried out to generate the space attention weight with the dimension of (W, H, 1).
7. The multi-branch interdigitated enhanced unmanned aerial vehicle multispectral target detection method of claim 6, wherein: in the step S42:
given input featuresThe formula for calculating the feature pooling of the channel dimension is:
in formula 1, I (k, W, H) is the data of the convolution layer of the kth channel of the convolution feature at coordinates (W, H), z (W, H) is the average value of all channels of the convolution feature at coordinates (W, H), the dimension of the convolution feature is [ C, W, H ], where C is the channel dimension, and W and H are wide and high, respectively;
given input featuresThe formula for computing feature pooling of spatial dimensions is:
in equation 2, I (c, I, j) is the value of the convolution layer of the c-th channel of the convolution feature at the coordinate (I, j), and z (c) is the average value of the convolution layer of the convolution feature at the c-th channel.
8. The multi-branch interdigitated enhanced unmanned aerial vehicle multispectral target detection method of claim 7, wherein: in the step S43, the spatial attention weight is weightedSequentially multiplying the (x, y) th element of (c) by the convolution feature map F i j Element (C, x, y) of (2) to obtain a new convolution profile ++>
CN202310166585.3A 2023-02-27 2023-02-27 Multi-branch intercrossing attention-enhanced unmanned aerial vehicle multispectral target detection method Active CN116189021B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310166585.3A CN116189021B (en) 2023-02-27 2023-02-27 Multi-branch intercrossing attention-enhanced unmanned aerial vehicle multispectral target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310166585.3A CN116189021B (en) 2023-02-27 2023-02-27 Multi-branch intercrossing attention-enhanced unmanned aerial vehicle multispectral target detection method

Publications (2)

Publication Number Publication Date
CN116189021A CN116189021A (en) 2023-05-30
CN116189021B true CN116189021B (en) 2024-04-09

Family

ID=86450368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310166585.3A Active CN116189021B (en) 2023-02-27 2023-02-27 Multi-branch intercrossing attention-enhanced unmanned aerial vehicle multispectral target detection method

Country Status (1)

Country Link
CN (1) CN116189021B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611978B (en) * 2024-01-23 2024-05-03 日照市自然资源和规划局 Construction method and system of land resource mapping database

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443143A (en) * 2019-07-09 2019-11-12 武汉科技大学 The remote sensing images scene classification method of multiple-limb convolutional neural networks fusion
CN111667489A (en) * 2020-04-30 2020-09-15 华东师范大学 Cancer hyperspectral image segmentation method and system based on double-branch attention deep learning
CN113192124A (en) * 2021-03-15 2021-07-30 大连海事大学 Image target positioning method based on twin network
CN113449680A (en) * 2021-07-15 2021-09-28 北京理工大学 Knowledge distillation-based multimode small target detection method
CN113887645A (en) * 2021-10-13 2022-01-04 西北工业大学 Remote sensing image fusion classification method based on joint attention twin network
CN114612835A (en) * 2022-03-15 2022-06-10 中国科学院计算技术研究所 Unmanned aerial vehicle target detection model based on YOLOv5 network
CN114648684A (en) * 2022-03-24 2022-06-21 南京邮电大学 Lightweight double-branch convolutional neural network for image target detection and detection method thereof
CN114694024A (en) * 2022-03-21 2022-07-01 滨州学院 Unmanned aerial vehicle ground target tracking method based on multilayer feature self-attention transformation network
WO2022198050A1 (en) * 2021-03-19 2022-09-22 Cedars-Sinai Medical Center Convolutional long short-term memory networks for rapid medical image segmentation
CN115588140A (en) * 2022-10-24 2023-01-10 北京市遥感信息研究所 Multi-spectral remote sensing image multi-directional target detection method
GB202217717D0 (en) * 2022-05-23 2023-01-11 Univ Zhengzhou Light Ind Object detection method based on attention-enhanced bidirectional feature pyramid network (a-bifpn)
CN115713537A (en) * 2022-11-03 2023-02-24 北京理工雷科电子信息技术有限公司 Optical remote sensing image cloud and fog segmentation method based on spectral guidance and depth attention

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443143A (en) * 2019-07-09 2019-11-12 武汉科技大学 The remote sensing images scene classification method of multiple-limb convolutional neural networks fusion
CN111667489A (en) * 2020-04-30 2020-09-15 华东师范大学 Cancer hyperspectral image segmentation method and system based on double-branch attention deep learning
CN113192124A (en) * 2021-03-15 2021-07-30 大连海事大学 Image target positioning method based on twin network
WO2022198050A1 (en) * 2021-03-19 2022-09-22 Cedars-Sinai Medical Center Convolutional long short-term memory networks for rapid medical image segmentation
CN113449680A (en) * 2021-07-15 2021-09-28 北京理工大学 Knowledge distillation-based multimode small target detection method
CN113887645A (en) * 2021-10-13 2022-01-04 西北工业大学 Remote sensing image fusion classification method based on joint attention twin network
CN114612835A (en) * 2022-03-15 2022-06-10 中国科学院计算技术研究所 Unmanned aerial vehicle target detection model based on YOLOv5 network
CN114694024A (en) * 2022-03-21 2022-07-01 滨州学院 Unmanned aerial vehicle ground target tracking method based on multilayer feature self-attention transformation network
CN114648684A (en) * 2022-03-24 2022-06-21 南京邮电大学 Lightweight double-branch convolutional neural network for image target detection and detection method thereof
GB202217717D0 (en) * 2022-05-23 2023-01-11 Univ Zhengzhou Light Ind Object detection method based on attention-enhanced bidirectional feature pyramid network (a-bifpn)
CN115588140A (en) * 2022-10-24 2023-01-10 北京市遥感信息研究所 Multi-spectral remote sensing image multi-directional target detection method
CN115713537A (en) * 2022-11-03 2023-02-24 北京理工雷科电子信息技术有限公司 Optical remote sensing image cloud and fog segmentation method based on spectral guidance and depth attention

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SAR and Multi-Spectral Data Fusion for Local Climate Zone Classification with Multi-Branch Convolutional Neural Network;Guangjun He et al;《Remote Sensing 》;20230111;第1-16页 *
基于无人机航拍图像的目标检测方法研究;张斯涵;《中国优秀硕士学位论文全文数据库 工程科技II辑》;20230115;第C031-36页 *

Also Published As

Publication number Publication date
CN116189021A (en) 2023-05-30

Similar Documents

Publication Publication Date Title
Liebel et al. Single-image super resolution for multispectral remote sensing data using convolutional neural networks
CN110110596B (en) Hyperspectral image feature extraction, classification model construction and classification method
CN110826693B (en) Three-dimensional atmospheric temperature profile inversion method and system based on DenseNet convolutional neural network
CN111402306A (en) Low-light-level/infrared image color fusion method and system based on deep learning
CN116189021B (en) Multi-branch intercrossing attention-enhanced unmanned aerial vehicle multispectral target detection method
CN109753996B (en) Hyperspectral image classification method based on three-dimensional lightweight depth network
Yilmaz et al. Metaheuristic pansharpening based on symbiotic organisms search optimization
CN112464745B (en) Feature identification and classification method and device based on semantic segmentation
CN114821261A (en) Image fusion algorithm
CN104517126A (en) Air quality assessment method based on image analysis
CN112052758B (en) Hyperspectral image classification method based on attention mechanism and cyclic neural network
CN112016596A (en) Evaluation method for farmland soil fertility based on convolutional neural network
CN112884672A (en) Multi-frame unmanned aerial vehicle image relative radiation correction method based on contemporaneous satellite images
CN117409339A (en) Unmanned aerial vehicle crop state visual identification method for air-ground coordination
CN112115795A (en) Hyperspectral image classification method based on Triple GAN
Zhang et al. Two-step ResUp&Down generative adversarial network to reconstruct multispectral image from aerial RGB image
CN114926732A (en) Multi-sensor fusion crop deep learning identification method and system
CN115797184B (en) Super-resolution extraction method for surface water body
CN116721385A (en) Machine learning-based RGB camera data cyanobacteria bloom monitoring method
Martínez et al. Efficient transfer learning for spectral image reconstruction from RGB images
Zhao et al. FOV expansion of bioinspired multiband polarimetric imagers with convolutional neural networks
CN116452872A (en) Forest scene tree classification method based on improved deep pavv3+
CN112560706B (en) Method and device for identifying water body target of multi-source satellite image
CN114842360A (en) Pasturing area drought identification method, system and management platform
CN114972625A (en) Hyperspectral point cloud generation method based on RGB spectrum super-resolution technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant