CN117877034A - Remote sensing image instance segmentation method and model based on dynamic convolution enhancement - Google Patents

Remote sensing image instance segmentation method and model based on dynamic convolution enhancement Download PDF

Info

Publication number
CN117877034A
CN117877034A CN202410024193.8A CN202410024193A CN117877034A CN 117877034 A CN117877034 A CN 117877034A CN 202410024193 A CN202410024193 A CN 202410024193A CN 117877034 A CN117877034 A CN 117877034A
Authority
CN
China
Prior art keywords
dynamic
convolution
feature
module
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410024193.8A
Other languages
Chinese (zh)
Inventor
李冠群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Genyu Muxing Beijing Space Technology Co ltd
Original Assignee
Genyu Muxing Beijing Space Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genyu Muxing Beijing Space Technology Co ltd filed Critical Genyu Muxing Beijing Space Technology Co ltd
Priority to CN202410024193.8A priority Critical patent/CN117877034A/en
Publication of CN117877034A publication Critical patent/CN117877034A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image instance segmentation method and model based on dynamic convolution enhancement, and belongs to the technical field of remote sensing image processing. The method comprises the steps of performing feature extraction on a remote sensing image to be processed by utilizing a backbone feature extraction network to obtain a multi-scale feature map; performing feature fusion on the extracted multi-scale feature images by using a feature fusion network to obtain fused feature images; and constructing a mask prediction module based on dynamic convolution enhancement to perform mask prediction on the fused feature map, wherein the mask prediction module based on dynamic convolution enhancement comprises a point convolution sub-module, a dynamic weight sub-module and a mask prediction sub-module which are sequentially connected. The method and the device can effectively improve the data utilization quality of the feature map in the feature decoupling process, help the model dynamically acquire the target feature information with higher quality, further help the instance segmentation model to more accurately understand the image content, and correctly identify and position potential targets in the remote sensing image.

Description

Remote sensing image instance segmentation method and model based on dynamic convolution enhancement
Technical Field
The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image instance segmentation method and model based on dynamic convolution enhancement.
Background
The remote sensing image has wide application prospect in the fields of national defense safety, environment monitoring, urban planning and the like. The remote sensing image instance segmentation technology has been one of the important research points in the remote sensing field.
The current mainstream remote sensing image example segmentation method is mainly based on convolutional neural network construction, and after feature extraction, feature fusion and feature decoupling, mask information, prediction frame and type information of a potential target area are output. The feature decoupling network plays an important role, and is one of the most core architectures of the whole instance segmentation model. In the example segmentation model of the current stage mainstream, the characteristic decoupling network consists of a basic convolution module and a prediction function. The convolution module is used for receiving the input remote sensing image characteristics and processing the proper shape and channel dimension of the remote sensing image characteristics so as to facilitate subsequent calculation. The prediction function is used for predicting the position, namely the category information, of the potential target area of the feature map and outputting a prediction result. Because the remote sensing image has the characteristics of complex background and severe scale change, the traditional remote sensing image instance segmentation model based on the convolutional neural network is often limited by a fixed convolutional kernel size when the feature map is decoupled, so that the diversity features of different targets are difficult to capture effectively, and further, the bottleneck exists in the precision expression.
Therefore, how to improve the remote sensing image instance segmentation method and design an efficient remote sensing image instance segmentation method and model are the problems to be solved by those skilled in the art.
Disclosure of Invention
Aiming at the problem of insufficient decoupling capacity of model features in remote sensing image instance segmentation tasks, the invention provides a novel solution.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention discloses a remote sensing image instance segmentation method based on dynamic convolution enhancement, which comprises the following steps:
performing feature extraction on the remote sensing image to be processed by using a backbone feature extraction network to obtain a multi-scale feature map;
performing feature fusion on the extracted multi-scale feature images by using a feature fusion network to obtain fused feature images;
and constructing a mask prediction module based on dynamic convolution enhancement to perform mask prediction on the fused feature map, wherein the mask prediction module based on dynamic convolution enhancement comprises a point convolution sub-module, a dynamic weight sub-module and a mask prediction sub-module which are sequentially connected.
Further, the backbone feature extraction network comprises a plurality of convolution modules which are stacked, wherein each convolution module comprises a basic convolution operator Conv, a normalization module and a Relu activation function;
the basic convolution operator Conv is used for calculating the channel information of the remote sensing image to be processed in a pixel-by-pixel level;
the normalization module is used for normalizing the calculated channel information result into a standard interval;
the Relu activation function is used to activate and deactivate feature matrices of potential target areas.
Further, the feature fusion network is utilized to perform feature fusion on the extracted multi-scale feature map, and the feature map after fusion is obtained, which specifically comprises:
selecting feature graphs of different scales extracted by the backbone feature extraction network at each layer;
the convolution module is utilized to carry out size adjustment on the selected feature graphs with different scales;
fusing every two adjacent feature images after size adjustment to obtain fused feature images of different levels;
and splicing the fused feature images of different levels to obtain the final fused feature image.
Further, the dynamic weight submodule comprises a space dynamic weight unit, a channel dynamic weight unit and a core dynamic weight unit.
Further, the calculation process of the dynamic weight sub-module can be expressed by the following formula:
wherein,an output feature diagram representing the dynamic weight sub-module; />An input feature map representing a dynamic weight sub-module; />、/>Respectively representing a space dynamic weight, a channel dynamic weight and a kernel dynamic weight, wherein subscripts 1,2,3,..i., n represents a corresponding weight matrix of an ith convolution kernel calculation stage; />The self weight matrix of the nth convolution kernel corresponding to the convolution kernel is represented, and n is a natural number; />Representing a matrix multiplication.
The invention further discloses a remote sensing image instance segmentation model based on dynamic convolution enhancement, which comprises the following steps:
a backbone feature extraction network, a feature fusion network and a feature decoupling head which are connected in sequence;
the feature decoupling head comprises a mask prediction module based on dynamic convolution enhancement, and the mask prediction module based on dynamic convolution enhancement comprises a point convolution sub-module, a dynamic weight sub-module and a mask prediction sub-module which are connected in sequence.
Preferably, the dynamic weight submodule includes a space dynamic weight unit, a channel dynamic weight unit and a core dynamic weight unit.
Preferably, the calculation process of the dynamic weight sub-module can be expressed by the following formula:
wherein,an output feature diagram representing the dynamic weight sub-module; />An input feature map representing a dynamic weight sub-module; />、/>Respectively representing a space dynamic weight, a channel dynamic weight and a kernel dynamic weight, wherein subscripts 1,2,3,..i., n represents a corresponding weight matrix of an ith convolution kernel calculation stage; />The self weight matrix of the nth convolution kernel corresponding to the convolution kernel is represented, and n is a natural number; />Representing a matrix multiplication.
Preferably, the feature decoupling head further comprises a classification prediction module and a regression prediction module.
Preferably, the loss function of the remote sensing image instance segmentation model based on dynamic convolution enhancement in the training process is expressed as follows:
wherein,representing focus loss, corresponding to the remote sensing image instance segmentation taskA classification prediction module; />The loss function of the rotating frame IoU is represented, and the loss function corresponds to a regression prediction module in a remote sensing image instance segmentation task and is used for calculating the distance between a prediction frame and a target truth value label corresponding to an input remote sensing image; />The mask prediction module based on dynamic convolution enhancement in the remote sensing image instance segmentation task is used for calculating the distance between a prediction mask and a target truth value label corresponding to the input remote sensing image; />Then the total loss adopted by the remote sensing image instance segmentation model training based on dynamic convolution enhancement is represented, y represents the real label,/and->Representing model predictive results.
Compared with the prior art, the invention discloses a remote sensing image example segmentation method and model based on dynamic convolution enhancement, which have the following beneficial effects:
1. by using the remote sensing image instance segmentation method and model based on dynamic convolution enhancement, the data utilization quality of the feature map in the feature decoupling process can be effectively improved, the model can be helped to dynamically acquire target feature information with higher quality, the instance segmentation model can be helped to more accurately understand image content, and potential targets in the remote sensing image can be accurately identified and positioned.
2. According to the remote sensing image instance segmentation method and model based on dynamic convolution enhancement, in the feature decoupling head stage, the weight parameters of pixel-channel information in the feature map and the size and shape of convolution kernels are dynamically adjusted by a learnable mechanism according to the features of the target region. The flexibility enables the model to be better suitable for targets with various sizes and shapes, and captures boundary information and fine features of the targets, so that accuracy of segmentation is improved, limitations of a traditional fixed convolution kernel are effectively overcome, a feature decoupling head can acquire more multi-scale information and multi-dimensional information about potential target areas, and accurate remote sensing image instance segmentation task identification is achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of steps of a remote sensing image example segmentation method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of an overall structure of a remote sensing image example segmentation model based on dynamic convolution enhancement according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a mask prediction module with dynamic convolution enhancement according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention firstly discloses a remote sensing image example segmentation method based on dynamic convolution enhancement, which is shown in fig. 1 and comprises the following steps:
performing feature extraction on the remote sensing image to be processed by using a backbone feature extraction network to obtain a multi-scale feature map;
performing feature fusion on the extracted multi-scale feature images by using a feature fusion network to obtain fused feature images;
and constructing a mask prediction module based on dynamic convolution enhancement to perform mask prediction on the fused feature map, wherein the mask prediction module based on dynamic convolution enhancement comprises a point convolution sub-module, a dynamic weight sub-module and a mask prediction sub-module which are sequentially connected.
In the above steps, the backbone feature extraction network comprises a plurality of convolution modules stacked together, wherein each convolution module comprises a basic convolution operator Conv, a normalization module and a Relu activation function; the basic convolution operator Conv is used for carrying out pixel-by-pixel level calculation normalization on the channel information of the remote sensing image to be processed, and the calculation normalization module is used for normalizing the calculated channel information result into a standard interval; the Relu activation function is used to activate and deactivate feature matrices of potential target areas.
The backbone feature extraction network is formed by stacking n layers of convolution modules, and n feature graphs with different scales can be extracted. The feature fusion network then selects feature graphs of different scales extracted by the backbone feature extraction network for fusion every other layer: firstly, carrying out size adjustment on the selected feature graphs with different scales by utilizing a convolution module; then fusing every two adjacent feature images after size adjustment to obtain fused feature images of different levels; and finally, splicing the fused feature images of different levels to obtain the final fused feature image.
In the invention, the feature fusion network selects feature graphs with different scales of each layer, and the feature graphs can be expressed as [1,3,5 ] from top to bottom according to the level of the backbone feature extraction network, wherein n is an odd number; or [2,4,6 ], n-4, n-2, n ], n being an even number.
And when the feature images after the size adjustment are fused, the final fused feature images are obtained by fusing every two adjacent feature images after the size adjustment and then splicing.
And after the fusion feature map is obtained, carrying out mask prediction on the fused feature map by using a mask prediction module based on dynamic convolution enhancement, wherein the mask prediction module based on dynamic convolution enhancement comprises a point convolution sub-module, a dynamic weight sub-module and a mask prediction sub-module which are connected in sequence. More specifically, the dynamic weighting submodule includes a spatial dynamic weighting unit, a channel dynamic weighting unit, and a core dynamic weighting unit.
Taking the example of a three-layer point convolution sub-module, the mask prediction module may be formally represented as:=;/>the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Pixel mask information obtained for prediction; />Feature map for feature fusion output for the feature fusion network,/for the feature fusion network>The prediction function is partitioned for the pixel.
The embodiment of the invention also discloses a remote sensing image instance segmentation system based on dynamic convolution enhancement, which is shown in fig. 2 and comprises a backbone feature extraction network, a feature fusion network and a feature decoupling head which are sequentially connected, wherein the feature decoupling head comprises a mask prediction module based on dynamic convolution enhancement, and the mask prediction module based on dynamic convolution enhancement comprises a point convolution sub-module, a dynamic weight sub-module and a mask prediction sub-module which are sequentially connected.
The construction process of the remote sensing image instance segmentation system based on dynamic convolution enhancement is described by the following steps:
constructing a traditional remote sensing image basic instance segmentation model based on a backbone feature extraction network-feature fusion network-feature decoupling head structure;
and (3) constructing a mask prediction module based on dynamic convolution enhancement to improve a characteristic decoupling head of the basic instance segmentation model so as to obtain the remote sensing image instance segmentation model based on dynamic convolution enhancement.
In addition, the method comprises the following steps: and training and testing the remote sensing image instance segmentation task on the built remote sensing image instance segmentation model based on dynamic convolution enhancement.
Firstly, building a remote sensing image basic instance segmentation model based on a backbone feature extraction network-feature fusion network-feature decoupling head structure; the traditional example segmentation model based on convolutional neural network mostly uses backbone feature extraction network-feature fusion network-feature decoupling head structure as a model basic architecture. The manner in which the basic example segmentation model of the present invention is constructed will be described in detail.
In the process, firstly, the remote sensing image and the label information thereof are input into the model through a data loader, and the calculation process of the data loader is expressed as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing the size asAn input remote sensing image with the channel number of 3; labels is labeling information of targets in the image and comprises position and category information; />A data loader function for loading data and data enhancement;
the backbone network is then responsible for feature extraction of the input image, and the computation process can be formally represented as the process of feature extraction of imgs by the lightweight backbone feature extraction network:the method comprises the steps of carrying out a first treatment on the surface of the Wherein,representing a feature map obtained by feature extraction, the size is +.>And have->A plurality of channels; />For the backbone feature extraction network, which obtains feature information of an input remote sensing image by stacking convolution modules in a hierarchical manner, a typical backbone feature extraction network includes: resNet, VGG, efficientNet, etc. The method disclosed by the invention takes a basic convolution module as an example, but the method can be used for adapting to a general basic backbone characteristic extraction network based on a remote sensing image example segmentation model enhanced by dynamic convolution through simple model replacement.
Backbone feature extraction networkThe system is formed by stacking convolution modules, each basic convolution module comprises a basic convolution operator (Conv), a normalization module (Batch Normalization) and an activation function (Relu), the basic convolution operator carries out pixel-by-pixel level calculation on channel information of an input remote sensing image so as to achieve the purpose of feature extraction, the normalization module normalizes a calculated channel result into a standard interval so as to facilitate subsequent calculation and avoid gradient optimization anomaly problems, the activation function carries out activation and inhibition on a feature matrix of a potential target area so as to help the model to distinguish foreground and background information, and in a backbone feature extraction network, a calculation result normalized into the standard interval can be converted into the feature matrix of the potential target area. The calculation process of the operation layer of the respective module can be formally expressed as: />The method comprises the steps of carrying out a first treatment on the surface of the Wherein the method comprises the steps ofRepresenting the basic convolution operation, ++>For the normalization module, the channel number of the receiving and outputting characteristics is kept unchanged, imgs is the input original remote sensing image,>and outputting a characteristic diagram obtained by characteristic extraction. />The activation function is responsible for activating the potential target area of the output feature map. Defining CONV as a convolution module comprising a base convolution operator, a normalization module and an activation function, the deep feature extraction process of the deep neural network can be formally expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein->A convolution module representing the nth layer, +.>In order to input an image of the subject,to output a feature map. Based on the method, n feature images of the remote sensing image after deep feature extraction can be obtained.
The feature fusion network is responsible for receiving the feature graphs of the backbone network and fusing the feature graphs to obtain multi-scale characteristic information of the potential targets.
In a specific embodiment of the present invention, a backbone feature extraction network formed by 6 layers of convolution modules is taken as an example for explanation, as shown in fig. 2, the backbone feature extraction network has 6 layers of convolution modules (n is 6) sequentially arranged from top to bottom, so that 6 feature graphs with different scales can be extracted, and the first dimension is selectedLayer extraction network from the backbone features>Extracted feature map->Wherein n=6 is transmitted to a feature fusion network to perform feature fusion; the calculation process of the feature fusion network is expressed as follows: />The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>The feature fusion network is used for feature fusion, and is characterized in that firstly, three layers of feature graphs of a backbone feature extraction network are received, the size of the feature graphs is adjusted by utilizing a convolution module, fusion of adjacent adjusted feature graphs is realized by utilizing a fusion module, and the fusion calculation process of the feature graphs of different levels is expressed as follows:
the method comprises the steps of carrying out a first treatment on the surface of the The Concat represents a feature fusion operator and is used for fusing feature graphs of different levels. Then, the fused output feature map is obtained by utilizing feature stitching: />=The method comprises the steps of carrying out a first treatment on the surface of the Wherein Fusion represents a feature stitching method for obtaining a fused feature map ++>Which is dimensioned as an intermediate layer profile +.>And the number of channels is +.>And (3) adding three layers.
And finally, decoupling the fused feature graphs by using a feature decoupling head, wherein a feature decoupling head network consists of a basic convolution module and a prediction function module. The basic convolution module is used for receiving the input fused remote sensing image characteristics and processing the proper shape and channel dimension of the fused remote sensing image characteristics so as to facilitate subsequent calculation. The prediction function module is used for predicting the position, namely the category information, of the potential target area of the feature map and outputting a prediction result. As shown in fig. 2, the prediction function module includes a mask prediction module based on dynamic convolution enhancement, and includes a classification prediction module and a regression prediction module.
The invention improves the traditional mask prediction model, and adopts a mask prediction module based on dynamic convolution enhancement to obtain a remote sensing image instance segmentation model based on dynamic convolution enhancement;
the network architecture of the mask prediction module based on dynamic convolution enhancement is shown in fig. 3. The method comprises a point convolution sub-module, a dynamic weight sub-module and a mask prediction sub-module which are connected in sequence. The dynamic weight sub-module further comprises a space dynamic weight, a channel dynamic weight and a core dynamic weight. The dynamic weight submodule introduces three dynamic weight modes, so that the feature extraction can dynamically adjust the weight parameters of pixel-channel information in the feature map and the size and shape of the convolution kernel by using a leavable mechanism according to the characteristics of the target region. The flexibility enables the model to be better suitable for targets with various sizes and shapes, and boundary information and fine characteristics of the targets are better captured, so that the accuracy of segmentation is improved, and the limitation of the traditional fixed convolution kernel is effectively overcome.
The calculation process of the dynamic weight sub-module can be formally expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>、/>The spatial dynamic weights, the channel dynamic weights, and the kernel dynamic weights are represented, respectively, with subscripts 1,2, 3. />And +.>Representing input and output feature maps, respectively. />Representing the own weight matrix of the nth convolution kernel of the convolution kernel. />Matrix multiplication is represented to enable dynamic computation under different weights. In summary, after the structure of the mask prediction module based on dynamic convolution enhancement is described, the mask prediction module based on dynamic convolution enhancement can be replaced by a traditional mask prediction module, so that a remote sensing image instance segmentation model based on dynamic convolution enhancement can be realized, and a mask prediction part corresponding to a feature decoupling head network of the instance segmentation model can be formally expressed as:
=/>
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>The obtained pixel mask information is predicted for a mask prediction module based on dynamic convolution enhancement; />Is saidFeature fusion network performs feature fusion output feature map, < >>The prediction function is partitioned for the pixel.
The complete example segmentation model further comprises a classification prediction module and a regression prediction module, wherein the classification prediction module and the regression prediction module can be represented by the following formulas:
;/>the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Predicting the obtained classification information for the classification prediction module; />Predicting the obtained position information for the regression prediction module; />For classification function, ++>Is a rectangular block prediction function.
Besides, the invention also comprises the training and testing of the model; specifically, a known disclosed remote sensing image instance segmentation dataset is adopted as a dataset for training and testing, the division ratio of the training set to the testing set is 7:3, model training and testing are carried out after a remote sensing image instance segmentation model based on dynamic convolution enhancement is built, a Loss function in a focus Loss (Focal Loss), a rotation frame cross-over Loss (Rotated IoU) and a cross entropy Loss (cross Entopropyloss) remote sensing image instance segmentation task training is adopted, and the Loss function in the training process can be formally expressed as:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>And the focus loss is represented and corresponds to a classification prediction module in a remote sensing image instance segmentation task. />The loss function of the rotating frame IoU is represented, corresponds to a regression prediction module in a remote sensing image instance segmentation task and is used for calculating the distance between the prediction frame and a target truth value label corresponding to an input remote sensing image. />And the mask prediction module is used for calculating the distance between the prediction mask and the target truth value label corresponding to the input remote sensing image. />The total loss employed by the network training is represented. y represents a real label->Representing model predictive results. And after the network training is finished until the loss is no longer reduced, the network training is stabilized, the training process is finished, and a remote sensing image instance segmentation model based on dynamic convolution enhancement after the training is finished is obtained.
Further, the trained remote sensing image example segmentation model based on dynamic convolution enhancement is used for testing the remote sensing image to be tested, and the method is formally expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representation of trained remote sensing image instance segmentation model based on dynamic convolution enhancement, < >>And->Respectively representing the remote sensing image to be tested and the example segmentation result corresponding to the remote sensing image to be tested. And performing instance segmentation on the remote sensing image to be processed by using the trained and tested remote sensing image instance segmentation model.
Aiming at the problem of insufficient decoupling capacity of model features in a remote sensing image instance segmentation task, the invention provides a high-efficiency innovative solution, and the weight parameters of pixel-channel information in a feature map and the size and shape of a convolution kernel are dynamically adjusted by utilizing a learning mechanism according to the features of a target area in a feature decoupling head stage by designing a high-efficiency remote sensing image instance segmentation model based on dynamic convolution enhancement. The flexibility enables the model to be better suitable for targets with various sizes and shapes, and boundary information and fine characteristics of the targets are better captured, so that the accuracy of segmentation is improved, and the limitation of the traditional fixed convolution kernel is effectively overcome.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The remote sensing image instance segmentation method based on dynamic convolution enhancement is characterized by comprising the following steps of:
performing feature extraction on the remote sensing image to be processed by using a backbone feature extraction network to obtain a multi-scale feature map;
performing feature fusion on the extracted multi-scale feature images by using a feature fusion network to obtain fused feature images;
and constructing a mask prediction module based on dynamic convolution enhancement to perform mask prediction on the fused feature map, wherein the mask prediction module based on dynamic convolution enhancement comprises a point convolution sub-module, a dynamic weight sub-module and a mask prediction sub-module which are sequentially connected.
2. The remote sensing image instance segmentation method based on dynamic convolution enhancement according to claim 1, wherein the backbone feature extraction network comprises a plurality of convolution modules stacked together, and each convolution module comprises a basic convolution operator Conv, a normalization module and a Relu activation function;
the basic convolution operator Conv is used for calculating the channel information of the remote sensing image to be processed in a pixel-by-pixel level;
the normalization module is used for normalizing the calculated channel information result into a standard interval;
the Relu activation function is used to activate and deactivate feature matrices of potential target areas.
3. The remote sensing image instance segmentation method based on dynamic convolution enhancement according to claim 1, wherein the feature fusion network is used for carrying out feature fusion on the extracted multi-scale feature map to obtain a fused feature map, and the method specifically comprises the following steps:
selecting feature graphs of different scales extracted by the backbone feature extraction network at each layer;
the convolution module is utilized to carry out size adjustment on the selected feature graphs with different scales;
fusing every two adjacent feature images after size adjustment to obtain fused feature images of different levels;
and splicing the fused feature images of different levels to obtain the final fused feature image.
4. The remote sensing image instance segmentation method based on dynamic convolution enhancement according to claim 1, wherein the dynamic weighting sub-module comprises a space dynamic weighting unit, a channel dynamic weighting unit and a kernel dynamic weighting unit.
5. The remote sensing image instance segmentation method based on dynamic convolution enhancement according to claim 4, wherein the calculation process of the dynamic weight sub-module can be represented by the following formula:
wherein,an output feature diagram representing the dynamic weight sub-module; />An input feature map representing a dynamic weight sub-module;、/>respectively representing a space dynamic weight, a channel dynamic weight and a kernel dynamic weight, wherein subscripts 1,2,3,..i., n represents a corresponding weight matrix of an ith convolution kernel calculation stage; />The self weight matrix of the nth convolution kernel corresponding to the convolution kernel is represented, and n is a natural number; />Representing momentAnd (5) matrix multiplication.
6. A remote sensing image instance segmentation model based on dynamic convolution enhancement, comprising:
a backbone feature extraction network, a feature fusion network and a feature decoupling head which are connected in sequence;
the feature decoupling head comprises a mask prediction module based on dynamic convolution enhancement, and the mask prediction module based on dynamic convolution enhancement comprises a point convolution sub-module, a dynamic weight sub-module and a mask prediction sub-module which are connected in sequence.
7. The remote sensing image instance segmentation model based on dynamic convolution enhancement according to claim 6, wherein the dynamic weighting sub-module comprises a spatial dynamic weighting unit, a channel dynamic weighting unit and a kernel dynamic weighting unit.
8. The remote sensing image instance segmentation model based on dynamic convolution enhancement according to claim 7, wherein the calculation process of the dynamic weight sub-module can be represented by the following formula:
wherein,an output feature diagram representing the dynamic weight sub-module; />An input feature map representing a dynamic weight sub-module;、/>respectively represent space dynamic weights,Channel dynamic weights and kernel dynamic weights, wherein subscripts 1,2,3,..i., n represents the corresponding weight matrix of the i-th convolution kernel calculation stage; />The self weight matrix of the nth convolution kernel corresponding to the convolution kernel is represented, and n is a natural number; />Representing a matrix multiplication.
9. The model of claim 6, wherein the feature decoupling head further comprises a classification prediction module and a regression prediction module.
10. The remote sensing image instance segmentation model based on dynamic convolution enhancement according to claim 9, wherein a loss function of the remote sensing image instance segmentation model based on dynamic convolution enhancement in a training process is expressed as:
wherein,the focus loss is represented, and the focus loss corresponds to a classification prediction module in a remote sensing image instance segmentation task;the loss function of the rotating frame IoU is represented, and the loss function corresponds to a regression prediction module in a remote sensing image instance segmentation task and is used for calculating the distance between a prediction frame and a target truth value label corresponding to an input remote sensing image; />Representing cross entropy loss, corresponding to dynamic-based in remote sensing image instance segmentation tasksThe convolution enhanced mask prediction module is used for calculating the distance between the prediction mask and a target truth value label corresponding to the input remote sensing image; />Then the total loss adopted by the remote sensing image instance segmentation model training based on dynamic convolution enhancement is represented, y represents the real label,/and->Representing model predictive results.
CN202410024193.8A 2024-01-08 2024-01-08 Remote sensing image instance segmentation method and model based on dynamic convolution enhancement Pending CN117877034A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410024193.8A CN117877034A (en) 2024-01-08 2024-01-08 Remote sensing image instance segmentation method and model based on dynamic convolution enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410024193.8A CN117877034A (en) 2024-01-08 2024-01-08 Remote sensing image instance segmentation method and model based on dynamic convolution enhancement

Publications (1)

Publication Number Publication Date
CN117877034A true CN117877034A (en) 2024-04-12

Family

ID=90589694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410024193.8A Pending CN117877034A (en) 2024-01-08 2024-01-08 Remote sensing image instance segmentation method and model based on dynamic convolution enhancement

Country Status (1)

Country Link
CN (1) CN117877034A (en)

Similar Documents

Publication Publication Date Title
CN112396002B (en) SE-YOLOv 3-based lightweight remote sensing target detection method
CN110770752A (en) Automatic pest counting method combining multi-scale feature fusion network with positioning model
CN112418392A (en) Neural network construction method and device
CN111340141A (en) Crop seedling and weed detection method and system based on deep learning
CN113221787B (en) Pedestrian multi-target tracking method based on multi-element difference fusion
CN113807399B (en) Neural network training method, neural network detection method and neural network training device
CN108921198A (en) commodity image classification method, server and system based on deep learning
Zeng et al. Identification of maize leaf diseases by using the SKPSNet-50 convolutional neural network model
CN105631474B (en) Based on Jeffries-Matusita distance and class to the more classification methods of the high-spectral data of decision tree
Zhang et al. Global context aware RCNN for object detection
CN113592060A (en) Neural network optimization method and device
CN113298032A (en) Unmanned aerial vehicle visual angle image vehicle target detection method based on deep learning
CN113435254A (en) Sentinel second image-based farmland deep learning extraction method
CN116597224A (en) Potato defect detection method based on improved YOLO V8 network model
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN116452937A (en) Multi-mode characteristic target detection method based on dynamic convolution and attention mechanism
Fan et al. A novel sonar target detection and classification algorithm
Devisurya et al. Early detection of major diseases in turmeric plant using improved deep learning algorithm
CN113902966A (en) Anchor frame-free target detection network for electronic components and detection method applying same
Sun et al. Decoupled feature pyramid learning for multi-scale object detection in low-altitude remote sensing images
CN111860601B (en) Method and device for predicting type of large fungi
CN116630700A (en) Remote sensing image classification method based on introduction channel-space attention mechanism
CN117132910A (en) Vehicle detection method and device for unmanned aerial vehicle and storage medium
CN116258914A (en) Remote sensing image classification method based on machine learning and local and global feature fusion
CN117877034A (en) Remote sensing image instance segmentation method and model based on dynamic convolution enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination