CN116433959A - Ground object classification method based on visible light and infrared satellite image fusion - Google Patents

Ground object classification method based on visible light and infrared satellite image fusion Download PDF

Info

Publication number
CN116433959A
CN116433959A CN202310210423.5A CN202310210423A CN116433959A CN 116433959 A CN116433959 A CN 116433959A CN 202310210423 A CN202310210423 A CN 202310210423A CN 116433959 A CN116433959 A CN 116433959A
Authority
CN
China
Prior art keywords
ground
visible light
network
classification
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310210423.5A
Other languages
Chinese (zh)
Inventor
曹璐
赵炜东
刘勇
郭鹏宇
杨伟丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Defense Technology Innovation Institute PLA Academy of Military Science
Original Assignee
National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Defense Technology Innovation Institute PLA Academy of Military Science filed Critical National Defense Technology Innovation Institute PLA Academy of Military Science
Priority to CN202310210423.5A priority Critical patent/CN116433959A/en
Publication of CN116433959A publication Critical patent/CN116433959A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a ground object classification method based on visible light and infrared satellite image fusion, which comprises the following steps: obtaining a plurality of groups of visible light images and infrared satellite images which correspond to each other and contain ground targets; labeling ground targets in the visible light image and the infrared satellite image, obtaining labels, and generating a ground object classification training data set, wherein each training data set in the ground object classification training data set comprises a visible light image and an infrared satellite image which correspond to each other and a ground target label corresponding to the infrared satellite image; constructing a ground target classification network; training a ground target classification network by utilizing a ground object classification training data set; and classifying the ground targets by using the trained ground target classification network. According to the invention, by constructing the ground target classification network and taking the visible light image and the infrared satellite image as the input of the ground target classification network, the space characteristics and the spectral band characteristics of the ground targets can be extracted at the same time, and the detection classification precision of various ground targets is obviously improved at the same time.

Description

Ground object classification method based on visible light and infrared satellite image fusion
Technical Field
The invention relates to the technical field of image information processing, in particular to a ground object classification method based on fusion of visible light and infrared satellite images.
Background
With the continuous development of remote sensing technology, the spatial resolution of satellite images is continuously improved, powerful support is provided for researchers to acquire geographic information, and the method has great application value in the fields of environment monitoring, crop coverage and type analysis, homeland observation, city planning and management and the like. The satellite image semantic segmentation is one of the core tasks of remote sensing image interpretation, is a key research topic in the field of computer vision, and is used for distributing a label to each pixel and classifying the pixels in a pixel level. The traditional semantic segmentation method comprises an edge-based segmentation method, a clustering-based segmentation method and a threshold-based segmentation method, however, when the traditional semantic segmentation method performs satellite image semantic segmentation, only low-level features of a satellite image can be extracted, and the actual ground object classification application requirements cannot be met.
With the continuous development of deep learning technology, the convolutional neural network is applied to satellite image feature classification, and a FCN, segNet, U-Net, deep Lab and other classical network architectures are proposed to be used for satellite image feature classification.
Because the satellite image ground object classification is a pixel classification task, each object in the satellite image has semantic information, the object needs to be accurately separated from adjacent objects, pixel-level precision requirements are provided for a network, and compared with the semantic segmentation task of daily living objects, the satellite image ground object classification task can pay attention to 'background' objects such as buildings, water systems and vegetation, and the like, the existing ground object classification method based on deep learning utilizes a single satellite image to classify the ground object, and the corresponding ground object classification precision and classification efficiency are low.
Disclosure of Invention
In order to solve part or all of the technical problems in the prior art, the invention provides a ground object classification method based on fusion of visible light and infrared satellite images.
The technical scheme of the invention is as follows:
the method for classifying the ground features based on fusion of visible light and infrared satellite images comprises the following steps:
obtaining a plurality of groups of visible light images and infrared satellite images which correspond to each other and contain ground targets;
labeling ground targets in the visible light image and the infrared satellite image, obtaining a label, and generating a ground object classification training data set, wherein each training data set in the ground object classification training data set comprises a visible light image and an infrared satellite image which correspond to each other and a ground target label corresponding to the visible light image and the infrared satellite image;
constructing a ground target classification network;
training a ground target classification network by utilizing a ground object classification training data set;
and classifying the ground targets by using the trained ground target classification network.
In some possible implementations, acquiring multiple sets of visible light images and infrared satellite images corresponding to each other and containing a terrestrial target includes:
selecting a plurality of satellite multispectral images containing ground targets;
and fusing RGB three-channel data of the satellite multispectral image to obtain a corresponding visible light image, extracting infrared channel data of the satellite multispectral image, and obtaining a corresponding infrared satellite image.
In some possible implementations, labeling ground targets in visible light images and infrared satellite images, obtaining tags, and generating a ground object classification training dataset, including:
respectively labeling different ground targets in the visible light image and the infrared satellite image by using a rectangular frame, determining the coordinates of the center point of the rectangular frame, the width and the height of the rectangular frame and the type of the targets in the rectangular frame, and obtaining a label comprising the coordinates of the center point of the rectangular frame, the width, the height and the type information;
and taking the visible light image and the infrared satellite image which correspond to each other and the ground target label corresponding to the images as training data to generate a ground feature classification training data set comprising a plurality of training data.
In some possible implementations, the ground target classification network is constructed based on the YOLOv5 algorithm.
In some possible implementations, the ground target classification network includes: the system comprises an input layer, a Focus module, a backbone network, a rapid space pyramid pooling module, a neck network, a decoding layer and an output layer which are sequentially connected, wherein the backbone network adopts a CSP structure, the neck network adopts a characteristic fusion network structure of a characteristic pyramid network and a pixel aggregation network, and the decoding layer adopts a multi-scale characteristic fusion network structure.
In some possible implementations, the fast spatial pyramid pooling structure includes:
the system comprises two CBS modules, wherein the input of the first CBS module is the input of the rapid spatial pyramid pooling structure, the output of the first CBS module is connected with the input of the first maximum pooling module and the input of a Concat layer respectively, the input of the second CSB module is connected with the output of the Concat layer, the output of the second CBS module is the output of the rapid spatial pyramid pooling structure, and the CBS module comprises a convolution layer, a data normalization layer and an activation function layer which are sequentially connected;
the output of the first maximum pooling module is respectively connected with the input of the second maximum pooling module and the input of the Concat layer, the output of the second maximum pooling module is respectively connected with the input of the third maximum pooling module and the input of the Concat layer, and the output of the third maximum pooling module is connected with the input of the Concat layer;
and the Concat layer is used for connecting the input characteristics.
In some possible implementations, training a ground target classification network using a ground object classification training dataset includes:
and taking a visible light image and an infrared satellite image in training data in the ground object classification training data set as input, taking a ground object label in the training data as output, and training a ground object classification network.
In some possible implementations, taking as input the visible light image and the infrared satellite image in the training data in the ground object classification training data set, taking as output the ground object tag in the training data, training the ground object classification network, further comprising:
step S41, sequentially inputting visible light images and infrared satellite images in a plurality of training data into a ground target classification network to obtain a prediction result output by the ground target classification network;
step S42, comparing the prediction result output by the ground target classification network with the labels in the training data, and calculating the prediction accuracy of the ground target classification network;
step S43, judging whether the prediction accuracy obtained at least twice continuously is larger than a preset accuracy threshold, if yes, taking the current ground target classification network as a ground target classification network after training, if not, calculating a preset classification loss function, updating parameters of the ground target classification network by using the preset classification loss function, and returning to the step S41.
In some possible implementations, the preset classification loss function is:
L=L loc +L cls +L obj +L seg
wherein L is loc Indicating loss of positioning, L cls Representing the classification loss, L obj Indicating confidence loss, L seg Representing segmentation loss;
loss of positioning L loc Expressed as:
Figure SMS_1
classification loss L cls Expressed as:
Figure SMS_2
confidence loss L obj Expressed as:
Figure SMS_3
segmentation loss L seg Expressed as:
Figure SMS_4
where S represents the number of meshes of the input image divided by the object detection network, M represents the number of candidate object frames generated per mesh,
Figure SMS_6
indicating whether the target is located in the j candidate target frame of the i-th grid, if so,/-, a. Sup.,>
Figure SMS_11
if no, go up>
Figure SMS_13
Figure SMS_7
Representing target confidence predictors in the ith grid, C i Representing the true value of the confidence level of the object in the ith grid, c representing the class of the object, class representing the total class, +.>
Figure SMS_10
Class c classification probability predictor, p, in the ith grid i (c) Representing the true value of the classification probability of class c in the ith grid, ioU representing the intersection ratio between the predicted and real target frames, delta representing the distance loss, +.>
Figure SMS_12
γ=2-Λ,
Figure SMS_15
e represents a natural constant, < >>
Figure SMS_5
And->
Figure SMS_8
Respectively representing the abscissa and the ordinate of the center point of the real target frame, b cx And b cy Respectively representing the abscissa and the ordinate of the central point of the predicted target frame, c w And c h Respectively representing the width and the height of the circumscribed rectangle of the predicted target frame and the real target frame, Λ represents the angle loss, alpha represents the included angle between the central connecting line of the predicted target frame and the real target frame and the x axis, Ω represents the shape loss, and +.>
Figure SMS_9
Figure SMS_14
w gt And h gt The width and the height of the real target frame are respectively represented, w and h are respectively represented by the width and the height of the predicted target frame, and θ represents the preset shape cost.
In some possible implementations, the parameters of the ground target classification network are updated using the following formula:
Figure SMS_16
wherein θ represents a parameter set of the ground target classification network, Δ [. Cndot. ] represents an optimizer, and η represents a learning rate.
The technical scheme of the invention has the main advantages that:
according to the ground object classification method based on the fusion of the visible light and the infrared satellite image, the ground object classification network is constructed, and the visible light image and the infrared satellite image are used as the input of the ground object classification network, so that the spatial features and the spectral features of the ground object can be extracted at the same time, the joint utilization of spatial information and spectral information is realized, the detection classification precision of various ground objects can be obviously improved, and the pixel-level classification of various ground objects is realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and without limitation to the invention. In the drawings:
FIG. 1 is a flow chart of a method for classifying ground objects based on fusion of visible light and infrared satellite images according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a ground object classification network according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a rapid spatial pyramid pooling module in the ground target classification network shown in FIG. 2;
FIG. 4 is a schematic structural diagram of the CBS structure in the rapid spatial pyramid pooling module shown in FIG. 3;
FIG. 5 is a schematic view of vector angles between a predicted target frame and a real target frame according to an embodiment of the present invention;
fig. 6 is a schematic diagram showing comparison of classification results under different classification networks according to an example of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments of the present invention and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The following describes in detail the technical scheme provided by the embodiment of the invention with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention provides a method for classifying features based on fusion of visible light and infrared satellite images, the method comprising steps S1 to S5 of:
step S1, a plurality of groups of visible light images and infrared satellite images which correspond to each other and contain ground targets are obtained.
In one embodiment of the present invention, a plurality of sets of visible light images and infrared satellite images corresponding to each other and including a ground target are acquired by using satellite multispectral images.
Specifically, in an embodiment of the present invention, obtaining a plurality of sets of visible light images and infrared satellite images corresponding to each other and including a ground target includes:
selecting a plurality of satellite multispectral images containing ground targets;
and fusing RGB three-channel data of the satellite multispectral image to obtain a corresponding visible light image, extracting infrared channel data of the satellite multispectral image, and obtaining a corresponding infrared satellite image.
In one embodiment of the invention, the satellite multispectral image is selected from the BigEarthNet-S2 remote sensing image dataset, the BigEarthNet-S2 is a large-scale remote sensing image dataset constructed by a remote sensing image analysis group and a database system and an information management group of the Berlin university, the dataset is composed of 125 Sentinel-2 satellite images, ten European countries during the period from 6 months to 5 months of 2017 are covered, the satellite multispectral images are divided into 590326 non-overlapping image slices, and the dataset comprises 19 label categories of buildings, roads, water systems, vegetation and the like.
In one embodiment of the invention, ENVI software is adopted to perform data fusion processing and data extraction processing on the satellite multispectral images so as to obtain corresponding visible light images and infrared satellite images.
And S2, labeling ground targets in the visible light image and the infrared satellite image, obtaining labels, and generating a ground feature classification training data set, wherein each training data in the ground feature classification training data set comprises a visible light image and an infrared satellite image which correspond to each other and a ground target label corresponding to the visible light image and the infrared satellite image.
In an embodiment of the present invention, labeling ground targets in visible light images and infrared satellite images to obtain labels, and generating a ground object classification training dataset includes:
respectively labeling different ground targets in the visible light image and the infrared satellite image by using a rectangular frame, determining the coordinates of the center point of the rectangular frame, the width and the height of the rectangular frame and the type of the targets in the rectangular frame, and obtaining a label comprising the coordinates of the center point of the rectangular frame, the width, the height and the type information;
and taking the visible light image and the infrared satellite image which correspond to each other and the ground target label corresponding to the images as training data to generate a ground feature classification training data set comprising a plurality of training data.
In one embodiment of the present invention, the selected ground targets include: buildings, roads, water systems, and vegetation.
And S3, constructing a ground target classification network.
In one embodiment of the invention, a ground target classification network is constructed based on the YOLOv5 algorithm.
Specifically, referring to fig. 2, in an embodiment of the present invention, a ground target classification network includes: the system comprises an input layer, a Focus module, a backbone network, a rapid space pyramid pooling module (SPPF), a Neck network (Neck), a decoding layer and an output layer which are sequentially connected, wherein the backbone network adopts a CSP structure, the Neck network adopts a characteristic pyramid network (FPN) and a characteristic fusion network structure of a Pixel Aggregation Network (PAN), and the decoding layer adopts a multi-scale characteristic fusion network structure.
In an embodiment of the present invention, the Focus module is configured to slice the image before the image enters the backbone network of the ground target classification network. Specifically, the Focus module segments the original data input into the ground target classification network into 4 parts, which is equivalent to 2 times of downsampling the original data, then performs stitching in the channel dimension, and finally performs convolution operation to obtain a double downsampling feature map under the condition of no information loss, thereby reducing floating point operation times per second (FLOPS) and improving the calculation speed.
Further, in an embodiment of the present invention, the backbone network adopts a CSPDarknet53 structure.
Further, referring to fig. 3, in an embodiment of the present invention, the rapid spatial pyramid pooling module includes:
the input of the first CBS module is the input of the rapid space pyramid pooling module, the output of the first CBS module is connected with the input of the first maximum pooling module and the input of the Concat layer respectively, the input of the second CSB module is connected with the output of the Concat layer, and the output of the second CBS module is the output of the rapid space pyramid pooling module;
the output of the first maximum pooling module is respectively connected with the input of the second maximum pooling module and the input of the Concat layer, the output of the second maximum pooling module is respectively connected with the input of the third maximum pooling module and the input of the Concat layer, and the output of the third maximum pooling module is connected with the input of the Concat layer;
and the Concat layer is used for connecting the input features.
Further, referring to fig. 4, in an embodiment of the present invention, the CBS module includes a convolutional layer (Conv), a data Normalization layer (BN), and an activation function layer connected in sequence. Wherein the activation function may be a SiLu function.
The traditional Spatial Pyramid Pooling (SPP) working principle is that the received feature images are subjected to three different-scale maximum pooling and then are spliced to obtain feature vectors with fixed sizes, the feature images with arbitrary sizes can be converted into the feature vectors with fixed sizes, the information loss caused by graphic distortion can be relieved, convolution operation is reduced, fusion of local features and global features is realized, the expression capability of the feature images is enriched, and when the difference of target sizes is large, the detection precision can be improved. Compared with the traditional spatial pyramid pooling, the rapid spatial pyramid pooling provided by the embodiment of the invention changes the downsampling multiple of a pooling layer, changes the original parallel structure into a cascading structure, has the same effect in mathematics, has fewer floating point operation times per second (FLOPS) and higher speed, and can retain more information in multi-scale feature extraction.
Further, because in an image, the category with larger target size difference always exists, through the same feature extraction, theoretically, although each pixel point on the feature map can sense the whole map, in practice, because of limited capability of convolution kernel sensing the neighborhood and the mode of layer-by-layer superposition and sliding window, the position sensing closer to the center of the target is stronger, and the feature extraction is less at the edge position of the target. When the ground object classification is carried out, the ground object classification typical elements comprise targets such as buildings, roads, water bodies, vegetation and the like, and the targets in different categories have large differences in size characteristics and geometric characteristics. For example, the size of a building target is too large compared with the size of a water body and vegetation target, the geometry of a long and narrow road target is too large compared with the geometry of targets such as the water body and vegetation, and the sample distribution imbalance caused by the sample difference can present challenges for the classification task of the ground objects. In an embodiment of the invention, a multi-scale feature fusion network is arranged on a decoding layer, the multi-scale feature fusion network extracts features of different receptive fields on a feature map, and then the features are fused to obtain a feature which can sense a large target and a small target, the feature map with the large receptive field can be connected with context information during training, and the network performance of a ground target classification network can be further improved for unreasonable prediction and timely adjustment.
In an embodiment of the invention, a ground target classification network is constructed based on a YOLOv5 algorithm, a CSP structure inspired by a cross-stage local network (CSPNet) is adopted in a backbone network, the feature mapping of a base layer is divided into two parts, and then the two parts are combined through a cross-stage hierarchical structure, so that the calculation amount is reduced, and meanwhile, the accuracy is ensured. A rapid space pyramid pooling structure is introduced between a main network and a neck network, and a Feature Pyramid Network (FPN) and Pixel Aggregation Network (PAN) feature fusion network structure is adopted in the neck network, so that the feature fusion capability of the network can be enhanced, and the detection classification performance of a ground target classification network can be improved. And a decoding layer adopting a multi-scale feature fusion network structure is arranged behind the neck network to carry out multi-scale feature fusion, so that the detection classification performance of the ground target classification network can be further improved.
And S4, training a ground target classification network by using the ground object classification training data set.
After the ground object classification network is constructed, in order to improve the classification precision of the ground object classification network and reduce classification errors, the ground object classification network needs to be trained and updated by utilizing the ground object classification training data set.
In one embodiment of the present invention, training a ground object classification network using a ground object classification training dataset comprises:
and taking a visible light image and an infrared satellite image in training data in the ground object classification training data set as input, taking a ground object label in the training data as output, and training a ground object classification network.
Specifically, in an embodiment of the present invention, a visible light image and an infrared satellite image in training data in a ground object classification training data set are taken as inputs, a ground object tag in the training data is taken as an output, and a ground object classification network is trained, and the method further includes the following steps S41 to S43:
step S41, sequentially inputting the visible light images and the infrared satellite images in the training data into a ground target classification network to obtain a prediction result output by the ground target classification network.
In an embodiment of the present invention, a visible light image and an infrared satellite image in training data are input from an input end of a ground target classification network, sequentially processed by parameters of each layer in the ground target classification network, and output from an output end of the ground target classification network, wherein information output from the output end is a prediction result corresponding to the visible light image and the infrared satellite image, and specifically, the prediction result includes center point coordinates, width, height and category information of a prediction target frame of the ground target. The initial ground target classification network may be an untrained ground target classification network or an untrained ground target classification network, and each layer of the initial ground target classification network is provided with initialized parameters, and the parameters can be updated and adjusted continuously in the training process.
The input of the traditional YOLO series network is an RGB three-channel image, namely a visible light image, and as the number of network layers is increased, the deep network can discard the detail characteristics of the target continuously, so that the classification performance is reduced.
In an embodiment of the invention, the visible light image and the infrared satellite image are used as the input of the constructed ground target classification network, namely, the infrared image input channel is added on the basis that the ground target classification network is provided with the visible light image input channel, and the visible light image and the infrared image are combined into a four-channel image to be input into the ground target classification network, so that the ground target characteristic loss in the detection classification process can be compensated, and the classification performance of the ground target classification network can be improved.
Step S42, comparing the prediction result output by the ground target classification network with the labels in the training data, and calculating the prediction accuracy of the ground target classification network;
in an embodiment of the present invention, the prediction accuracy of the ground target classification network may be calculated by comparing the prediction result output by the ground target classification network with the tag in the training data, specifically comparing the prediction result of the ground target classification network output including the center point coordinate, the width, the height and the category information of the predicted target frame of the ground target with the tag including the center point coordinate, the width, the height and the category information of the real target frame of the ground target.
Step S43, judging whether the prediction accuracy obtained at least twice continuously is larger than a preset accuracy threshold, if yes, taking the current ground target classification network as a ground target classification network after training, if not, calculating a preset classification loss function, updating parameters of the ground target classification network by using the preset classification loss function, and returning to the step S41.
In an embodiment of the present invention, the preset classification loss function includes a positioning loss portion, a classification loss portion, a confidence loss portion, and a segmentation loss portion, which are specifically expressed as:
L=L loc +L cls +L obj +L seg
wherein L is loc Indicating loss of positioning, L cls Representing the classification loss, L obj Indicating confidence loss, L seg Representing segmentation loss。
Further, referring to fig. 5, in fig. 5, b represents a prediction target frame, b gt Representing a real target frame, in one embodiment of the invention, when determining the positioning loss, introducing vector angles between the predicted target frame and the real target frame, and constructing a SIoU loss function to describe the positioning loss.
Specifically, the positioning loss L loc Expressed as:
where IoU denotes the intersection ratio between the predicted target frame and the real target frame, delta denotes the distance loss,
Figure SMS_18
γ=2-Λ,/>
Figure SMS_19
e represents a natural constant, < >>
Figure SMS_20
And->
Figure SMS_21
Respectively representing the abscissa and the ordinate of the center point of the real target frame, b cx And b cy Respectively representing the abscissa and the ordinate of the central point of the predicted target frame, c w And c h Respectively representing the width and the height of the circumscribed rectangle of the predicted target frame and the real target frame, Λ represents the angle loss, alpha represents the included angle between the central connecting line of the predicted target frame and the real target frame and the x axis, Ω represents the shape loss, and +.>
Figure SMS_22
w gt And h gt Representing the width and height of the real target frame, respectively, w and h representing the width and height of the predicted target frame, respectively, θ representing the preset shape cost for controlling the attention to the shape loss.
According to the set positioning loss formula, when alpha is 0, the distance loss tends to be constant, so that the degree of freedom of network regression can be reduced, network convergence is quickened, and regression accuracy is improved.
Further, in an embodiment of the present invention, the classification loss L cls And confidence loss L obj Binary cross entropy loss functions are used, specifically expressed as:
Figure SMS_23
Figure SMS_24
segmentation loss L seg The method comprises the following steps:
Figure SMS_25
where S represents the number of grids of the ground object classification network for the input image, M represents the number of candidate object frames generated per grid,
Figure SMS_26
indicating whether the target is located in the j candidate target frame of the i-th grid, if so,/-, a. Sup.,>
Figure SMS_27
if no, go up>
Figure SMS_28
Figure SMS_29
Representing target confidence predictors in the ith grid, C i Representing the true value of the confidence level of the object in the ith grid, c representing the class of the object, class representing the total class, +.>
Figure SMS_30
Class c classification probability predictor, p, in the ith grid i (c) Representing the true value of the classification probability for class c in the ith grid.
In one embodiment of the present invention, the parameters of the ground target classification network are optimized and updated by adopting a gradient descent method, specifically, the parameters are firstly derived by using a chained derivation rule and a loss function, and then the parameters are updated by using a derivation result and a preset learning rate.
Specifically, the parameters of the ground target classification network may be updated using the following formula:
Figure SMS_31
wherein θ represents a parameter set of the ground target classification network, Δ [ · ] represents an optimizer, η represents a learning rate, the optimizer and the learning rate may be selected according to practical situations, and the optimizer is, for example Adam, SGD, etc., and the learning rate is used to control the speed of parameter update.
And S5, performing ground target classification by using the trained ground target classification network.
Specifically, after the training of the ground target classification network is completed, the visible light image and the infrared image which are to be subjected to ground target detection classification and correspond to each other are input into the ground target classification network, and corresponding information of the ground target in the image can be obtained.
According to the ground object classification method based on the fusion of the visible light and the infrared satellite image, provided by the embodiment of the invention, the ground object classification network is constructed, and the visible light image and the infrared satellite image are used as the input of the ground object classification network, so that the spatial characteristics and the spectral characteristics of the ground object can be extracted at the same time, the joint utilization of spatial information and spectral information is realized, the simultaneous detection classification precision of various ground objects can be obviously improved, and the pixel-level classification of various ground objects is realized.
The following describes the beneficial effects of the ground object classification method based on the fusion of visible light and infrared satellite images according to an embodiment of the present invention with reference to specific examples:
in the example, a BigEarthNet-S2 remote sensing image dataset is used for manufacturing a ground object classification training dataset, the acquired ground object classification training dataset is divided into a training set, a verification set and a test set according to the ratio of 7:2:1 so as to train, verify and test a target classification network subsequently, an SGD optimizer is selected when the target classification network is trained, the initial learning rate is set to be 0.01, the momentum is 0.9, the weight attenuation is 0.00001, and the performance evaluation index of the target classification network is selected from mIoU, the false alarm rate and the leakage alarm rate.
Based on the set ground object classification training data set, parameters and performance evaluation indexes, training, verification and testing of the target classification network are performed, and performance comparison results of the ground object classification network, which is obtained by the method provided by the embodiment of the invention and is shown in the following table 1, and the single-mode target classification network based on visible light images and the single-mode target classification network based on infrared images are obtained.
Table 1 results of performance comparisons for target classification networks
Figure SMS_32
According to the table, the overall performance index of the ground target classification network obtained by the method provided by the embodiment of the invention is better than that of the single-mode target classification network, and the corresponding mIoU can reach 61.24%.
Further, referring to fig. 6, fig. 6 shows an original picture, a classification result of classifying the original picture using the ground object classification network according to an embodiment of the present invention, a classification result of classifying the original picture using the visible light-based single-mode object classification network, and a real object annotation. As can be seen from the result shown in fig. 6, the ground target classification network obtained by the method according to an embodiment of the present invention can more accurately detect various ground targets than the single-mode target classification network.
Further, based on the set ground object classification training data set, parameters and performance evaluation indexes, the ground object classification networks before and after the multi-scale feature fusion network is introduced are trained, verified and tested, and performance comparison results of the ground object classification networks before and after the multi-scale feature fusion network are introduced are obtained as shown in the following table 2.
Table 2 performance comparison results of ground target classification networks before and after introducing the multiscale feature fusion network
Figure SMS_33
According to the table, after the multi-scale feature fusion network is introduced into the ground target classification network, the mIoU is improved from 43.15% to 61.24%, namely, the multi-scale feature fusion network is introduced to retain more context semantic information of the image, so that the network performance can be greatly improved.
Further, training, verifying and testing of different target classification networks are performed based on the set ground object classification training data set, parameters and performance evaluation indexes, so as to obtain performance comparison results of the ground object classification network and the traditional various target classification networks, which are obtained based on the method provided by the embodiment of the invention and shown in the following table 3.
TABLE 3 Performance comparison results for different target classification networks
Figure SMS_34
According to the table above, the performance indexes of the ground target classification network obtained by the method provided by an embodiment of the invention are superior to those of other types of target classification networks.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting thereof; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A ground object classification method based on fusion of visible light and infrared satellite images is characterized by comprising the following steps:
obtaining a plurality of groups of visible light images and infrared satellite images which correspond to each other and contain ground targets;
labeling ground targets in the visible light image and the infrared satellite image, obtaining a label, and generating a ground object classification training data set, wherein each training data set in the ground object classification training data set comprises a visible light image and an infrared satellite image which correspond to each other and a ground target label corresponding to the visible light image and the infrared satellite image;
constructing a ground target classification network;
training a ground target classification network by utilizing a ground object classification training data set;
and classifying the ground targets by using the trained ground target classification network.
2. The method for classifying ground objects based on fusion of visible light and infrared satellite images according to claim 1, wherein obtaining a plurality of sets of visible light images and infrared satellite images which correspond to each other and contain ground objects, comprises:
selecting a plurality of satellite multispectral images containing ground targets;
and fusing RGB three-channel data of the satellite multispectral image to obtain a corresponding visible light image, extracting infrared channel data of the satellite multispectral image, and obtaining a corresponding infrared satellite image.
3. The method for classifying ground objects based on fusion of visible light and infrared satellite images according to claim 1, wherein labeling ground objects in the visible light images and the infrared satellite images, obtaining labels, and generating a ground object classification training dataset comprises:
respectively labeling different ground targets in the visible light image and the infrared satellite image by using a rectangular frame, determining the coordinates of the center point of the rectangular frame, the width and the height of the rectangular frame and the type of the targets in the rectangular frame, and obtaining a label comprising the coordinates of the center point of the rectangular frame, the width, the height and the type information;
and taking the visible light image and the infrared satellite image which correspond to each other and the ground target label corresponding to the images as training data to generate a ground feature classification training data set comprising a plurality of training data.
4. The ground object classification method based on visible light and infrared satellite image fusion according to claim 1, wherein a ground object classification network is constructed based on a YOLOv5 algorithm.
5. The method for classifying ground objects based on fusion of visible light and infrared satellite images according to claim 4, wherein the ground object classification network comprises: the system comprises an input layer, a Focus module, a backbone network, a rapid space pyramid pooling module, a neck network, a decoding layer and an output layer which are sequentially connected, wherein the backbone network adopts a CSP structure, the neck network adopts a characteristic fusion network structure of a characteristic pyramid network and a pixel aggregation network, and the decoding layer adopts a multi-scale characteristic fusion network structure.
6. The method for classifying features based on fusion of visible light and infrared satellite images according to claim 5, wherein said rapid spatial pyramid pooling structure comprises:
the system comprises two CBS modules, wherein the input of the first CBS module is the input of the rapid spatial pyramid pooling structure, the output of the first CBS module is connected with the input of the first maximum pooling module and the input of a Concat layer respectively, the input of the second CSB module is connected with the output of the Concat layer, the output of the second CBS module is the output of the rapid spatial pyramid pooling structure, and the CBS module comprises a convolution layer, a data normalization layer and an activation function layer which are sequentially connected;
the output of the first maximum pooling module is respectively connected with the input of the second maximum pooling module and the input of the Concat layer, the output of the second maximum pooling module is respectively connected with the input of the third maximum pooling module and the input of the Concat layer, and the output of the third maximum pooling module is connected with the input of the Concat layer;
and the Concat layer is used for connecting the input characteristics.
7. The method of classification of ground objects based on fusion of visible and infrared satellite images according to any one of claims 1-6, wherein training a ground object classification network using a ground object classification training dataset comprises:
and taking a visible light image and an infrared satellite image in training data in the ground object classification training data set as input, taking a ground object label in the training data as output, and training a ground object classification network.
8. The method for classifying a ground object based on fusion of visible light and infrared satellite images according to claim 7, wherein the method for classifying a ground object based on fusion of visible light and infrared satellite images in training data in a training dataset for classifying a ground object is characterized by taking as input a visible light image and an infrared satellite image in training dataset, taking as output a ground object tag in training dataset, training a ground object classification network, further comprising:
step S41, sequentially inputting visible light images and infrared satellite images in a plurality of training data into a ground target classification network to obtain a prediction result output by the ground target classification network;
step S42, comparing the prediction result output by the ground target classification network with the labels in the training data, and calculating the prediction accuracy of the ground target classification network;
step S43, judging whether the prediction accuracy obtained at least twice continuously is larger than a preset accuracy threshold, if yes, taking the current ground target classification network as a ground target classification network after training, if not, calculating a preset classification loss function, updating parameters of the ground target classification network by using the preset classification loss function, and returning to the step S41.
9. The method for classifying features based on fusion of visible light and infrared satellite images according to claim 8, wherein the predetermined classification loss function is:
L=L loc +L cls +L obj +L seg
wherein L is loc Indicating loss of positioning, L cls Representing the classification loss, L obj Indicating confidence loss, L seg Representing segmentation loss;
loss of positioning L loc Expressed as:
Figure FDA0004112457750000031
classification loss L cls Expressed as:
Figure FDA0004112457750000032
confidence loss L obj Expressed as:
Figure FDA0004112457750000033
segmentation loss L seg Expressed as:
Figure FDA0004112457750000034
where S represents the number of meshes of the input image divided by the object detection network, M represents the number of candidate object frames generated per mesh,
Figure FDA0004112457750000035
indicating whether the target is located in the j candidate target frame of the i-th grid, if so,/-, a. Sup.,>
Figure FDA0004112457750000036
if not, the method comprises the steps of,
Figure FDA0004112457750000037
Figure FDA0004112457750000038
representing target confidence predictors in the ith grid, C i Representing the true value of the confidence level of the object in the ith grid, c representing the class of the object, class representing the total class, +.>
Figure FDA0004112457750000039
Class c classification probability predictor, p, in the ith grid i (c) Representing the true value of the classification probability of class c in the ith grid, ioU representing the intersection ratio between the predicted and real target frames, delta representing the distance loss, +.>
Figure FDA00041124577500000310
γ=2-Λ,
Figure FDA00041124577500000311
e represents a natural constant, < >>
Figure FDA00041124577500000312
And->
Figure FDA00041124577500000313
Respectively represent in real target framesAbscissa and ordinate of heart point, b cx And b cy Respectively representing the abscissa and the ordinate of the central point of the predicted target frame, c w And c h Respectively representing the width and the height of the circumscribed rectangle of the predicted target frame and the real target frame, Λ represents the angle loss, alpha represents the included angle between the central connecting line of the predicted target frame and the real target frame and the x axis, Ω represents the shape loss, and +.>
Figure FDA00041124577500000314
Figure FDA00041124577500000315
w gt And h gt The width and the height of the real target frame are respectively represented, w and h are respectively represented by the width and the height of the predicted target frame, and θ represents the preset shape cost.
10. The method for classifying ground objects based on fusion of visible light and infrared satellite images according to claim 9, wherein the parameters of the ground object classification network are updated by adopting the following formula:
Figure FDA0004112457750000041
wherein θ represents a parameter set of the ground target classification network, Δ [. Cndot. ] represents an optimizer, and η represents a learning rate.
CN202310210423.5A 2023-03-07 2023-03-07 Ground object classification method based on visible light and infrared satellite image fusion Pending CN116433959A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310210423.5A CN116433959A (en) 2023-03-07 2023-03-07 Ground object classification method based on visible light and infrared satellite image fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310210423.5A CN116433959A (en) 2023-03-07 2023-03-07 Ground object classification method based on visible light and infrared satellite image fusion

Publications (1)

Publication Number Publication Date
CN116433959A true CN116433959A (en) 2023-07-14

Family

ID=87093345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310210423.5A Pending CN116433959A (en) 2023-03-07 2023-03-07 Ground object classification method based on visible light and infrared satellite image fusion

Country Status (1)

Country Link
CN (1) CN116433959A (en)

Similar Documents

Publication Publication Date Title
Huang et al. Urban land-use mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery
Zhou et al. BOMSC-Net: Boundary optimization and multi-scale context awareness based building extraction from high-resolution remote sensing imagery
CN112966684A (en) Cooperative learning character recognition method under attention mechanism
CN112465880B (en) Target detection method based on multi-source heterogeneous data cognitive fusion
CN112287807A (en) Remote sensing image road extraction method based on multi-branch pyramid neural network
CN108427919B (en) Unsupervised oil tank target detection method based on shape-guided saliency model
CN112950780B (en) Intelligent network map generation method and system based on remote sensing image
CN113610070A (en) Landslide disaster identification method based on multi-source data fusion
Guo et al. Using multi-scale and hierarchical deep convolutional features for 3D semantic classification of TLS point clouds
Liu et al. Survey of road extraction methods in remote sensing images based on deep learning
CN116645592B (en) Crack detection method based on image processing and storage medium
CN111985325A (en) Aerial small target rapid identification method in extra-high voltage environment evaluation
Gao et al. Road extraction using a dual attention dilated-linknet based on satellite images and floating vehicle trajectory data
CN116091937A (en) High-resolution remote sensing image ground object recognition model calculation method based on deep learning
Chen et al. Exchange means change: An unsupervised single-temporal change detection framework based on intra-and inter-image patch exchange
CN115019163A (en) City factor identification method based on multi-source big data
Wang et al. Feature extraction and segmentation of pavement distress using an improved hybrid task cascade network
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN113496148A (en) Multi-source data fusion method and system
CN116091946A (en) Yolov 5-based unmanned aerial vehicle aerial image target detection method
CN116433959A (en) Ground object classification method based on visible light and infrared satellite image fusion
CN112069997B (en) Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net
Chen et al. Built-up Area Extraction Combing Densely Connected Dual-Attention Network and Multi-Scale Context
CN113012167A (en) Combined segmentation method for cell nucleus and cytoplasm
CN112465821A (en) Multi-scale pest image detection method based on boundary key point perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination