CN116206210A - NAS-Swin-based remote sensing image agricultural greenhouse extraction method - Google Patents

NAS-Swin-based remote sensing image agricultural greenhouse extraction method Download PDF

Info

Publication number
CN116206210A
CN116206210A CN202211569653.2A CN202211569653A CN116206210A CN 116206210 A CN116206210 A CN 116206210A CN 202211569653 A CN202211569653 A CN 202211569653A CN 116206210 A CN116206210 A CN 116206210A
Authority
CN
China
Prior art keywords
agricultural greenhouse
swin
layer
remote sensing
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211569653.2A
Other languages
Chinese (zh)
Inventor
佟威剑
贾淑涵
赵泉华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Technical University
Original Assignee
Liaoning Technical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Technical University filed Critical Liaoning Technical University
Priority to CN202211569653.2A priority Critical patent/CN116206210A/en
Publication of CN116206210A publication Critical patent/CN116206210A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
    • Y02A40/25Greenhouse technology, e.g. cooling systems therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Remote Sensing (AREA)
  • Astronomy & Astrophysics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention designs a remote sensing image agricultural greenhouse extraction method based on NAS-Swin, belonging to the field of remote sensing images; firstly, marking the position information of the ground object of the agricultural greenhouse in the acquired satellite image data, and dividing the marked position information into a training set and a testing set; inputting the training set into a Swin-transducer neural network module with improved parameters for training to obtain remote sensing characteristics of the agricultural greenhouse; inputting remote sensing features of the agricultural greenhouse into a NAS-FPN network, establishing a targeted feature pyramid aiming at the agricultural greenhouse in the Landsat image, and fusing the remote sensing features; inputting the remote sensing fusion characteristics into an RPN (remote procedure network) based on an Approx strategy improvement, and extracting and marking the position information of the agricultural greenhouse; obtaining a NAS-Swin neural network framework; and finally obtaining the information of the agricultural greenhouse. The invention constructs a better NAS-Swin neural network framework aiming at remote sensing agricultural greenhouse extraction, and overcomes the defect that large-scale image data are difficult to extract by the traditional agricultural greenhouse extraction method.

Description

NAS-Swin-based remote sensing image agricultural greenhouse extraction method
Technical Field
The invention belongs to the field of remote sensing images, and particularly relates to a remote sensing image agricultural greenhouse extraction method based on NAS-Swin.
Background
As an emerging agricultural facility, the agricultural greenhouse has low price and excellent effect of resisting insect damage disasters, can effectively reduce the influence of climate environment on the ground planting, can effectively improve the yield of the ground planting and relieves the phenomenon of shortage of supply. The vegetable supply period in northern areas can be improved, the land utilization rate is improved, the requirements of people on agricultural byproducts all the year round are met, the agricultural greenhouse has the characteristics of low investment, high return, high benefit and the like, is widely applied to the fields of livestock cultivation, forestry seedling cultivation, fruit tree cultivation, vegetable production and the like at present, becomes the post industry of agriculture, has extremely high social benefit and economic benefit, and becomes an important index for measuring agricultural modernization at present. Meanwhile, problems are brought, such as that the construction of the agricultural greenhouse cannot be timely detected and controlled, the distribution of the agricultural greenhouse cannot be effectively planned, the land of the agricultural greenhouse is illegally occupied, and the like, so that timely and accurate acquisition of the information such as the geographical distribution and coverage area of the distribution of the agricultural greenhouse has important significance in the aspects of agricultural yield evaluation, farmer production planning, national agricultural overall planning, and the like. The traditional acquisition mode of the distribution information of the agricultural greenhouse is still remained in manual field investigation or visit investigation, and the problems of large requirements on manpower and material resources, long time period, strong subjectivity and the like are caused by the complexity of the traditional acquisition mode of the distribution information of the agricultural greenhouse, and the phenomena of 'missing calculation, multiple calculation, miscalculation' and the like in statistics are often caused.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a remote sensing image agricultural greenhouse extraction method based on NAS-Swin, which can better and more efficiently extract the agricultural greenhouse information in the remote sensing image.
The remote sensing image agricultural greenhouse extraction method based on NAS-Swin specifically comprises the following steps:
step 1: obtaining Landsat satellite data and Sentinel-2 satellite image data, and then carrying out image preprocessing on the Landsat satellite data and the Sentinel-2 satellite image data; marking the position information of the ground object of the agricultural greenhouse in the image data through labelme, and marking the marked position information with 4:1 is divided into a training set and a testing set;
step 2: inputting the training set obtained in the step 1 into a Swin-transducer neural network module with improved parameters for training to obtain remote sensing characteristics of the agricultural greenhouse;
the Swin-transducer neural network module consists of 4 parts; dividing an input image into a set of non-overlapping image blocks through a patch part layer, wherein the size of each image block is 4x4, the corresponding feature dimension of each image block is 48, the first part firstly reduces the dimension of the divided feature dimension through a linear embedding, the first part is sent into a Swin Transformer Block module for calculation, the second part to the fourth part are completely the same, the previous layer of input image block set is combined through a patch part layer according to the neighborhood range of 2x2, the combined image block size is four times of the size of the previous layer of image block, the feature dimension is four times of the previous layer, and the feature dimension is reduced to be half of the previous one through a linear embedding layer, and then the first part is sent into a Swin Transformer Block module for calculation of the self-attention of the image block;
the Swin Transformer Block module needs to normalize the input characteristic Z1 passing through the patch raising Layer through the Layer-Norm Layer; then performing feature learning through window based self-attention (W-MSA), performing residual operation to obtain Z2, and performing Layer-Norm Layer, MLP Layer and residual operation to obtain Z3; z3 carries out feature learning through a Layer-Norm Layer and Shifted window based self-attribute (SW-MSA), then residual operation is carried out to obtain Z4, and then Z5, namely the output Layer feature, is obtained through a Layer-Norm Layer, an MLP and a residual; wherein the residual connection formula in the module is as follows,
Z2=W-MSA(Layer-Norm(Z1))+Z1
Z3=MLP(Layer-Norm(Z2))+Z2
Z4=SW-MSA(Layer-Norm(Z3))+Z3
Z5=MLP(Layer-Norm(Z4))+Z4
wherein Z1 is an input feature, Z2 and Z3 are output features of the W-MSA and MLP modules respectively, and Z4 and Z5 are output features of the SW-MSA and MLP modules respectively; W-MSA and SW-MSA respectively represent window multi-head self-attention and sliding window multi-head self-attention, and information interaction between different image blocks is carried out in a sliding window mode; to reduce the amount of computation, swin transformer Block performs a moving merge on the window of the SW-MSA;
the calculation for self-attitution is,
Figure BDA0003987477120000021
wherein Q, K and V correspond to query, key and value in the self-attention mechanism, respectively, and Q, K, V ε R m×d ,R m×d The real number matrix of M rows and d columns, d is the dimension of query and key, M is the number of patches, wherein the range of the relative position values between different patches is [ -M+1, M-1]M is the arithmetic square root of the number of patches, thus parameterizing a matrix B, B ε R (2M -1)×(2M-1) The method comprises the steps of carrying out a first treatment on the surface of the The softMax functions are normalized, taking the ith patch as an example, and are defined as follows:
Figure BDA0003987477120000022
wherein m is the number of the latches;
aiming at the problem of extracting the agricultural greenhouse, the moving window size and the downsampling ratio of the Swin-converter neural network module are improved, the moving window size is changed from 12×12 to 7×7, and the downsampling ratio is changed from 8,16,32,64 to 4,8, 16, 32;
step 3: inputting the remote sensing features of the agricultural greenhouse extracted in the step 2 into a NAS-FPN network, establishing a targeted feature pyramid aiming at the agricultural greenhouse in the Landsat image, and fusing the remote sensing features;
the construction of the targeted feature pyramid is to search in a search space overlapped for 7 times according to the NAS-FPN network, and select the feature pyramid with the highest feature fusion precision through comparison;
step 4: inputting the remote sensing fusion characteristics obtained in the step 3 into an RPN (remote procedure network) based on an Approx strategy improvement to obtain preselected anchor frame information;
the Approx strategy is used for improving the problem of positive and negative sample distribution of the RPN network; generating 9 pre-selected frames on each feature point of the remote sensing fusion feature map obtained in the step 3, and adopting an Approx strategy for the 9 pre-selected frames on each feature point: the maximum IOU value in each feature point is reserved through calculation and comparison of the IOU values of each pre-selected frame; the remote sensing feature map is input into convolution of 3 multiplied by 3 and 1 multiplied by 1 and then divided into two branches, one branch is used for judging whether the target is classified into two types in the candidate frame by using a softmax classifier, if the target is present, the pre-selected frame is reserved, if the target is not present, the pre-selected frame is removed, and the other branch is used for boundary frame regression; carrying out displacement operation on the boundary boxes according to the output predicted offset, determining whether the boundary boxes are background or not, then sequencing the bboxes according to probability, reserving the bboxes with highest probability scores, and selecting and rejecting the rest bboxes in a softening non-maximum suppression mode, wherein the area encircled by the rest bboxes is the ROI area finally;
step 5: extracting and marking the position information of the agricultural greenhouse based on the fusion characteristics of the step 3 and the pre-selected anchor frame information of the step 4;
step 6: calculating corresponding loss functions according to the agricultural greenhouse information obtained by prediction in the step 5 based on the sample marking information obtained in the step 1, wherein different loss functions are used according to different network modules; obtaining a NAS-Swin neural network framework;
-selection of the different loss functions; the overall loss function L of the invention Toltal Is composed of two parts: RPN network loss function L RPN And the ROIAlign loss function L ROIAlign The method comprises the following steps:
L Total =L RPN +L ROIAlign
wherein L is RPN Containing object classification losses L in anchor frames cls And bounding box position loss L of anchor box reg
Figure BDA0003987477120000031
Wherein p is i Representing the probability that the ith anchor frame is predicted to be a true label, p i * 1 when positive sample, 0 when negative sample, t i Representing the regression parameters of the frame of the ith anchor frame, t i * Representing the real frame corresponding to the ith anchor frame, N cls N is the total number of positive and negative samples of the anchor frame for training the RPN network reg To represent the number of anchor frame positions;
L cls ((p i ,p i * ) Representing classification loss, namely:
Figure BDA0003987477120000041
Figure BDA0003987477120000042
representing the bezel localization regression loss, namely:
Figure BDA0003987477120000043
the frame positioning regression loss and the classification loss of the ROIAlign are the same as those of the RPN network, and the mask loss adopts a two-class cross entropy loss function, namely:
Figure BDA0003987477120000044
wherein D is i The probability of belonging to the target pixel is predicted for the i-th pixel,
Figure BDA0003987477120000045
the probability that the ith pixel belongs to a true target pixel;
step 7: and (3) inputting the training set obtained in the step (1) into a NAS-Swin neural network for training, and then inputting the testing set into the trained NAS-Swin neural network to finally obtain the information of the agricultural greenhouse.
The invention has the beneficial effects that:
according to the invention, the characteristic extraction is performed by using the Swin-transformer backbone network with the parameters adjusted aiming at the problem of extracting the agricultural greenhouse, and compared with the unmodified Swin-transformer backbone network, the method has a better effect. Through overlapping NAS-FPN search space for 7 times, a feature fusion strategy with better extraction effect aiming at the remote sensing agricultural greenhouse is found, and the problem of feature fusion of the traditional feature pyramid is solved. The invention constructs a better NAS-Swin neural network framework aiming at remote sensing agricultural greenhouse extraction, and overcomes the defect that large-scale image data are difficult to extract by the traditional agricultural greenhouse extraction method.
Drawings
FIG. 1 is a structural architecture diagram of an agricultural greenhouse extraction based on NAS-Swin in an embodiment of the invention;
FIG. 2 is a schematic diagram of a Swin-transducer backbone network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the structure of an embodiment Swin-Transformer block of the present invention;
FIG. 4 is a schematic diagram of an embodiment of an MLP network according to the present invention;
FIG. 5 is a schematic diagram of window multi-headed self-attention and sliding window multi-headed self-attention in accordance with an embodiment of the present invention;
FIG. 6 is a schematic diagram of a sliding window merging scheme according to an embodiment of the present invention;
FIG. 7 is a schematic diagram showing the operation of the building blocks of the NAS-FPN according to the embodiment of the present invention;
fig. 8 is a schematic view of a feature pyramid extracted from an agricultural greenhouse according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and examples;
a remote sensing image agricultural greenhouse extraction method based on NAS-Swin is shown in figure 1. The method specifically comprises the following steps:
step 1: downloading satellite images on the geospatial data cloud, acquiring Landsat satellite data and Sentinel-2 satellite image data, and then carrying out image preprocessing on the Landsat satellite data and the Sentinel-2 satellite image data; marking the position information of the ground object of the agricultural greenhouse in the image data through labelme, and marking the marked position information with 4:1 is divided into a training set and a testing set;
in this embodiment, a land at remote sensing image of a city in year 2000-year 2021 in year 21 is collected as experimental data, and is subjected to image labeling, and the image quantity ratio of the training set to the test set is 4:1.
step 2: inputting the training set obtained in the step 1 into a Swin-transducer neural network module with improved parameters for training to obtain remote sensing characteristics of the agricultural greenhouse; the Swin transformer Block prevents problems such as gradient dissipation or gradient explosion caused by the increase of the depth of the network in a residual connection mode;
the Swin-transducer neural network module is shown in figure 2 and consists of 4 parts; dividing an input image into a set of non-overlapping image blocks through a patch part layer, wherein the size of each image block is 4x4, the corresponding feature dimension of each image block is 48, the first part firstly reduces the dimension of the divided feature dimension through a linear embedding, the first part is sent into a Swin Transformer Block module for calculation, the second part to the fourth part are completely the same, the previous layer of input image block set is combined through a patch part layer according to the neighborhood range of 2x2, the combined image block size is four times of the size of the previous layer of image block, the feature dimension is four times of the previous layer, and the feature dimension is reduced to be half of the previous one through a linear embedding layer, and then the first part is sent into a Swin Transformer Block module for calculation of the self-attention of the image block;
as shown in FIG. 3, the Swin Transformer Block module needs to normalize the input feature Z1 passing through the patch raising Layer through the Layer-Norm Layer; then performing feature learning through window based self-attention (W-MSA), performing residual operation to obtain Z2, and performing Layer-Norm Layer, MLP Layer and residual operation to obtain Z3; the network structure of the MLP is shown in fig. 4, Z3 carries out feature learning again through a Layer-Norm Layer and Shifted window based self-attribute (SW-MSA), then residual operation is carried out to obtain Z4, and then Z5, namely the output Layer feature, is obtained through one Layer-Norm Layer, one MLP and one residual; wherein the residual connection formula in the module is as follows,
Z2=W-MSA(Layer-Norm(Z1))+Z1
Z3=MLP(Layer-Norm(Z2))+Z2
Z4=SW-MSA(Layer-Norm(Z3))+Z3
Z5=MLP(Layer-Norm(Z4))+Z4
wherein Z1 is an input feature, Z2 and Z3 are output features of the W-MSA and MLP modules respectively, and Z4 and Z5 are output features of the SW-MSA and MLP modules respectively; W-MSA and SW-MSA respectively represent window multi-head self-attention and sliding window multi-head self-attention, and information interaction between different image blocks is carried out in a sliding window mode; the specific operation is shown in fig. 5. The black line block in fig. 5 represents one tile, the adjacent 4*4 tile represents a fixed window, and the self-attention calculation is placed in the red box. FIG. 5 (a) is a W-MSA, and the SW-MSA of FIG. 5 (b) is obtained by sliding a window. To reduce the amount of computation, swin transformer Block performs a moving merge on the window of the SW-MSA, the specific operation is shown in fig. 6.
The calculation for self-attitution is,
Figure BDA0003987477120000061
wherein Q, K and V correspond to query, key and value, respectively, in the self-attention mechanism, and Q,K,V∈R m×d ,R m×d The real number matrix of M rows and d columns, d is the dimension of query and key, M is the number of patches, wherein the range of the relative position values between different patches is [ -M+1, M-1]M is the arithmetic square root of the number of patches, thus parameterizing a matrix B, B ε R (2M -1)×(2M-1) The method comprises the steps of carrying out a first treatment on the surface of the The softMax functions are normalized, taking the ith patch as an example, and are defined as follows:
Figure BDA0003987477120000062
wherein m is the number of the latches;
aiming at the problem of extracting the agricultural greenhouse, the moving window size and the downsampling ratio of the Swin-converter neural network module are improved, the moving window size is changed from 12×12 to 7×7, and the downsampling ratio is changed from 8,16,32,64 to 4,8, 16, 32; through the improvement of parameters, the Swin-Transformer backbone network has better information extraction effect on the agricultural greenhouse in the remote sensing image.
Step 3: inputting the remote sensing features of the agricultural greenhouse extracted in the step 2 into a NAS-FPN network, establishing a targeted feature pyramid aiming at the agricultural greenhouse in the Landsat image, and fusing the remote sensing features;
the construction of the targeted feature pyramid is to search in a search space overlapped for 7 times according to the NAS-FPN network, and select the feature pyramid with the highest feature fusion precision through comparison;
the NAS-FPN consists of a plurality of repeated raising cells, and the structure of the raising cells is shown in figure 7; the detection flow can be divided into 4 steps: firstly, selecting feature graphs with different scales, namely { C1, C2, C3, C4 and C5}, wherein the step sizes of the feature graphs are {8,16,32,64 and 128}, wherein C1, C2 and C3 are 3 feature layers extracted by a network, and C4 and C5 are obtained by downsampling the C3 feature layers according to step sizes of 2 and 4 respectively; secondly, constructing candidate layer features by using the 5 feature layers, and selecting 2 different feature layers from the candidate layers to perform feature fusion; then, performing feature fusion in an operation pool, taking the output of the previous enhancing cell as the input of the next enhancing cell, and repeating the operation of the enhancing cell until reaching a threshold value; finally, output feature layers are circularly generated and are marked as { P1, P2, P3, P4 and P5}, and the resolutions of the output feature layers correspond to 5 candidate feature layers which are input respectively. The 2 feature layer fusion operations in the operation pool are sum and global mapping, respectively. The sum is obtained by adding the smaller of the 2 feature maps to the large feature map size through bilinear interpolation and then adding the two feature maps pixel by pixel. global mapping is used for solving a attention feature of a smaller high-dimensional semantic feature map through averaging and sigmoid and multiplying the attention feature with the larger low-dimensional feature map. The smaller high-dimensional feature map is bilinear interpolated to the size of the larger feature map, and then added pixel by pixel, and the obtained fused feature map has the same size as the larger feature map. The feature fusion strategy for the agricultural greenhouse after searching is shown in fig. 8.
Step 4: inputting the remote sensing fusion characteristics obtained in the step 3 into an RPN (remote procedure network) based on an Approx strategy improvement to obtain preselected anchor frame information;
the Approx strategy is used for improving the problem of positive and negative sample distribution of the RPN network; generating 9 pre-selected frames on each feature point of the remote sensing fusion feature map obtained in the step 3, and adopting an Approx strategy for the 9 pre-selected frames on each feature point: the maximum IOU value in each feature point is reserved through calculation and comparison of the IOU values of each pre-selected frame; the remote sensing feature map is input into convolution of 3 multiplied by 3 and 1 multiplied by 1 and then divided into two branches, one branch is used for judging whether the target is classified into two types in the candidate frames by using a softmax classifier, if the target is present, the pre-selected frame is reserved, if the target is not present, the pre-selected frame is removed, and the other branch is used for Bounding-Box (bbox) regression; carrying out displacement operation on the boundary boxes according to the output predicted offset, determining whether the boundary boxes are background or not, then sequencing the bboxes according to probability, reserving the bboxes with highest probability scores, and selecting and rejecting the rest bboxes in a softening non-maximum suppression mode, wherein finally the area encircled by the rest bboxes is ROI (Region of Interest) area;
for the regression problem, the invention improves the RPN positive and negative sample distribution problem through an error strategy. The core idea of the error is as follows: the method comprises the steps of calculating the iou of 9 anchors and gt by using 9 anchor settings of each position of an original retinanet, selecting the highest iou value in the 9 iou of each position by using max operation in the 9 anchors, and calculating the subsequent MaxIoUAssigner by using the iou value, wherein at the moment, which positions on each feature map position are positive samples can be obtained.
Step 5: extracting and marking the position information of the agricultural greenhouse based on the fusion characteristics of the step 3 and the pre-selected anchor frame information of the step 4;
step 6: calculating corresponding loss functions according to the agricultural greenhouse information obtained by prediction in the step 5 based on the sample marking information obtained in the step 1, wherein different loss functions are used according to different network modules; obtaining a NAS-Swin neural network framework;
-selection of the different loss functions; the overall loss function L of the invention Toltal Consists of two parts: RPN network loss function L RPN And the ROIAlign loss function L ROIAlign The method comprises the following steps:
L Total =L RPN +L ROIAlign
wherein L is RPN Containing object classification losses L in anchor frames cls And bounding box position loss L of anchor box reg
Figure BDA0003987477120000081
Wherein p is i Representing the probability that the ith anchor box predicts as a true label,
Figure BDA0003987477120000082
1 when positive sample, 0 when negative sample, t i Representing the ith anchor frame border regression parameters, +.>
Figure BDA0003987477120000083
Representing the real frame corresponding to the ith anchor frame, N cls N is the total number of positive and negative samples of the anchor frame for training the RPN network reg To represent the number of anchor frame positions;
Figure BDA0003987477120000084
representing classification loss, namely:
Figure BDA00039874771200000812
/>
Figure BDA0003987477120000087
representing the frame location regression loss, expressed by Smooth L1 loss, namely:
Figure BDA0003987477120000088
the frame-positioned regression and classification losses of ROIAlign are identical to RPN networks, and the mask loss uses a bi-classification cross entropy loss function (Binary Cross Entropy, BCE), namely:
Figure BDA0003987477120000089
wherein D is i The probability of belonging to the target pixel is predicted for the i-th pixel,
Figure BDA00039874771200000810
the probability that the ith pixel belongs to a true target pixel;
step 7: the training set is input into the NAS-Swin neural network for training, the maximum iteration number is 30, and the learning rate is 0.00001. And (3) inputting the test set obtained in the step (1) into a trained NAS-Swin neural network to obtain the information of the agricultural greenhouse.
The agricultural greenhouse extraction is carried out on the land at remote sensing image, and the land at remote sensing image is compared with a Box Inst and SVM algorithm. The method disclosed by the invention has higher extraction precision of the agricultural greenhouse. As shown in table 1:
table 1 Swin transformer, SVM and BoxInst green house extraction accuracy results;
Figure BDA00039874771200000811
Figure BDA0003987477120000091
/>

Claims (5)

1. the remote sensing image agricultural greenhouse extraction method based on NAS-Swin is characterized by comprising the following steps of:
step 1: obtaining Landsat satellite data and Sentinel-2 satellite image data, and then carrying out image preprocessing on the Landsat satellite data and the Sentinel-2 satellite image data; marking the position information of the ground object of the agricultural greenhouse in the image data through labelme, and marking the marked position information with 4:1 is divided into a training set and a testing set;
step 2: inputting the training set obtained in the step 1 into a Swin-transducer neural network module with improved parameters for training to obtain remote sensing characteristics of the agricultural greenhouse;
step 3: inputting the remote sensing features of the agricultural greenhouse extracted in the step 2 into a NAS-FPN network, establishing a targeted feature pyramid aiming at the agricultural greenhouse in the Landsat image, and fusing the remote sensing features;
the construction of the targeted feature pyramid is to search in a search space overlapped for 7 times according to the NAS-FPN network, and select the feature pyramid with the highest feature fusion precision through comparison;
step 4: inputting the remote sensing fusion characteristics obtained in the step 3 into an RPN (remote procedure network) based on an Approx strategy improvement to obtain preselected anchor frame information;
step 5: extracting and marking the position information of the agricultural greenhouse based on the fusion characteristics of the step 3 and the pre-selected anchor frame information of the step 4;
step 6: calculating corresponding loss functions according to the agricultural greenhouse information obtained by prediction in the step 5 based on the sample marking information obtained in the step 1, wherein different loss functions are used according to different network modules; obtaining a NAS-Swin neural network framework;
step 7: and (3) inputting the training set obtained in the step (1) into a NAS-Swin neural network for training, and then inputting the testing set into the trained NAS-Swin neural network to finally obtain the information of the agricultural greenhouse.
2. The method for extracting the agricultural greenhouse from the remote sensing image based on the NAS-Swin as set forth in claim 1, wherein the Swin-transform neural network module in the step 2 is composed of 4 parts; each part is a similar repeating unit, firstly, an image is divided into a set of non-overlapping image blocks through a patch part layer, wherein the size of each image block is 4x4, then the corresponding feature dimension of each image block is 48, the first part firstly carries out dimension reduction on the divided feature dimension through a linear embedding, the first part carries out calculation in a Swin Transformer Block module, the second part to the fourth part are completely the same, firstly, the set of image blocks input from the last layer is combined through a patch part layer according to a neighborhood range of 2x2, the combined image block size is four times of the size of the image block of the last layer, and therefore the feature dimension is also increased to be four times of the size of the image block of the last layer, and then the feature dimension is reduced to be half of the previous dimension through a linear embedding layer, and then the self-attention of the image block is calculated through a Swin Transformer Block module.
3. The remote sensing image agricultural greenhouse extraction method based on NAS-Swin as set forth in claim 2, wherein the Swin Transformer Block module is required to perform normalization processing on the input feature Z1 passing through the patch raising Layer through the Layer-Norm Layer; then performing feature learning through window based self-attention (W-MSA), performing residual operation to obtain Z2, and performing Layer-Norm Layer, MLP Layer and residual operation to obtain Z3; z3 carries out feature learning through a Layer-Norm Layer and Shifted window based self-attribute (SW-MSA), then residual operation is carried out to obtain Z4, and then Z5, namely the output Layer feature, is obtained through a Layer-Norm Layer, an MLP and a residual; wherein the residual connection formula in the module is as follows,
Z2=W-MSA(Layer-Norm(Z1))+Z1
Z3=MLP(Layer-Norm(Z2))+Z2
Z4=SW-MSA(Layer-Norm(Z3))+Z3
Z5=MLP(Layer-Norm(Z4))+Z4
wherein Z1 is an input feature, Z2 and Z3 are output features of the W-MSA and MLP modules respectively, and Z4 and Z5 are output features of the SW-MSA and MLP modules respectively; W-MSA and SW-MSA respectively represent window multi-head self-attention and sliding window multi-head self-attention, and information interaction between different image blocks is carried out in a sliding window mode; to reduce the amount of computation, swin transformer Block performs a moving merge on the window of the SW-MSA;
the calculation for self-attitution is,
Figure FDA0003987477110000021
wherein Q, K and V correspond to query, key and value in the self-attention mechanism, respectively, and Q, K, V ε R m×d ,R m×d The real number matrix of M rows and d columns, d is the dimension of query and key, M is the number of patches, wherein the range of the relative position values between different patches is [ -M+1, M-1]M is the arithmetic square root of the number of patches, thus parameterizing a matrix B, B ε R (2M -1)×(2M-1) The method comprises the steps of carrying out a first treatment on the surface of the The softMax functions are normalized, taking the ith patch as an example, and are defined as follows:
Figure FDA0003987477110000022
wherein m is the number of the latches;
aiming at the problem of extracting the agricultural greenhouse, the moving window size and the downsampling ratio of the Swin-converter neural network module are improved, the moving window size is changed from 12×12 to 7×7, and the downsampling ratio is changed from 8,16,32,64 to 4,8, 16, 32.
4. The remote sensing image agricultural greenhouse extraction method based on NAS-Swin according to claim 1, wherein the Approx strategy in step 4 is used for improving the problem of positive and negative sample distribution of the RPN network; generating 9 pre-selected frames on each feature point of the remote sensing fusion feature map obtained in the step 3, and adopting an Approx strategy for the 9 pre-selected frames on each feature point: the maximum IOU value in each feature point is reserved through calculation and comparison of the IOU values of each pre-selected frame; the remote sensing feature map is input into convolution of 3 multiplied by 3 and 1 multiplied by 1 and then divided into two branches, one branch is used for judging whether the target is classified into two types in the candidate frame by using a softmax classifier, if the target is present, the pre-selected frame is reserved, if the target is not present, the pre-selected frame is removed, and the other branch is used for boundary frame regression; and carrying out displacement operation on the boundary boxes according to the output predicted offset, determining whether the boundary boxes are background, then sequencing the bboxes according to probability, reserving the bboxes with highest probability scores, and selecting and rejecting the rest bboxes in a softening non-maximum suppression mode, wherein the area encircled by the rest bboxes is the ROI area finally.
5. The method for extracting the agricultural greenhouse from the remote sensing image based on the NAS-Swin as set forth in claim 1, wherein the selection of the different loss functions in the step 6; the overall loss function L of the invention Toltal Consists of two parts: RPN network loss function L RPN And the ROIAlign loss function L ROIAlign The method comprises the following steps:
L Total =L RPN +L ROIAlign
wherein L is RPN Containing object classification losses L in anchor frames cls And bounding box position loss L of anchor box reg
Figure FDA0003987477110000031
Wherein p is i Representing the probability that the ith anchor box predicts as a true label,
Figure FDA0003987477110000032
1 when positive sample, 0 when negative sample, t i Representing the ith anchor frame border regression parameters, +.>
Figure FDA0003987477110000033
Representing the real frame corresponding to the ith anchor frame, N cls N is the total number of positive and negative samples of the anchor frame for training the RPN network reg To represent the number of anchor frame positions; />
Figure FDA0003987477110000034
Representing classification loss, namely:
Figure FDA0003987477110000035
Figure FDA0003987477110000036
representing the bezel localization regression loss, namely:
Figure FDA0003987477110000037
the frame positioning regression loss and the classification loss of the ROIAlign are the same as those of the RPN network, and the mask loss adopts a two-class cross entropy loss function, namely:
Figure FDA0003987477110000038
wherein D is i The probability of belonging to the target pixel is predicted for the i-th pixel,
Figure FDA0003987477110000039
for the ith pixel genusProbability at the true target pixel. />
CN202211569653.2A 2022-12-08 2022-12-08 NAS-Swin-based remote sensing image agricultural greenhouse extraction method Pending CN116206210A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211569653.2A CN116206210A (en) 2022-12-08 2022-12-08 NAS-Swin-based remote sensing image agricultural greenhouse extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211569653.2A CN116206210A (en) 2022-12-08 2022-12-08 NAS-Swin-based remote sensing image agricultural greenhouse extraction method

Publications (1)

Publication Number Publication Date
CN116206210A true CN116206210A (en) 2023-06-02

Family

ID=86518098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211569653.2A Pending CN116206210A (en) 2022-12-08 2022-12-08 NAS-Swin-based remote sensing image agricultural greenhouse extraction method

Country Status (1)

Country Link
CN (1) CN116206210A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274823A (en) * 2023-11-21 2023-12-22 成都理工大学 Visual transducer landslide identification method based on DEM feature enhancement

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274823A (en) * 2023-11-21 2023-12-22 成都理工大学 Visual transducer landslide identification method based on DEM feature enhancement
CN117274823B (en) * 2023-11-21 2024-01-26 成都理工大学 Visual transducer landslide identification method based on DEM feature enhancement

Similar Documents

Publication Publication Date Title
Li et al. Automatic organ-level point cloud segmentation of maize shoots by integrating high-throughput data acquisition and deep learning
Zhang et al. Deep learning-based automatic recognition network of agricultural machinery images
CN112749627A (en) Method and device for dynamically monitoring tobacco based on multi-source remote sensing image
CN111028255A (en) Farmland area pre-screening method and device based on prior information and deep learning
CN115481368B (en) Vegetation coverage estimation method based on full remote sensing machine learning
CN113160150B (en) AI (Artificial intelligence) detection method and device for invasion of foreign matters in wire mesh
CN112766155A (en) Deep learning-based mariculture area extraction method
CN115527123B (en) Land cover remote sensing monitoring method based on multisource feature fusion
Wang et al. Deep segmentation and classification of complex crops using multi-feature satellite imagery
CN110705449A (en) Land utilization change remote sensing monitoring analysis method
CN115452759B (en) River and lake health index evaluation method and system based on satellite remote sensing data
CN115880487A (en) Forest laser point cloud branch and leaf separation method based on deep learning method
CN113435254A (en) Sentinel second image-based farmland deep learning extraction method
CN115457396A (en) Surface target ground object detection method based on remote sensing image
CN117197668A (en) Crop lodging level prediction method and system based on deep learning
CN116206210A (en) NAS-Swin-based remote sensing image agricultural greenhouse extraction method
Cheng et al. Multi-scale Feature Fusion and Transformer Network for urban green space segmentation from high-resolution remote sensing images
Cai et al. Learning spectral-spatial representations from VHR images for fine-scale crop type mapping: A case study of rice-crayfish field extraction in South China
Yan et al. High-resolution mapping of paddy rice fields from unmanned airborne vehicle images using enhanced-TransUnet
CN114387446A (en) Automatic water body extraction method for high-resolution remote sensing image
CN116188993A (en) Remote sensing image cultivated land block segmentation method based on multitask learning
Meedeniya et al. Prediction of paddy cultivation using deep learning on land cover variation for sustainable agriculture
CN115527108A (en) Method for rapidly identifying water and soil loss artificial disturbance plots based on multi-temporal Sentinel-2
CN113205543A (en) Laser radar point cloud trunk extraction method based on machine learning
Wei et al. Multispectral remote sensing and DANet model improve the precision of urban park vegetation detection: an empirical study in Jinhai Park, Shanghai

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination