CN116229272B - High-precision remote sensing image detection method and system based on representative point representation - Google Patents

High-precision remote sensing image detection method and system based on representative point representation Download PDF

Info

Publication number
CN116229272B
CN116229272B CN202310241950.2A CN202310241950A CN116229272B CN 116229272 B CN116229272 B CN 116229272B CN 202310241950 A CN202310241950 A CN 202310241950A CN 116229272 B CN116229272 B CN 116229272B
Authority
CN
China
Prior art keywords
features
feature
network
resolution
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310241950.2A
Other languages
Chinese (zh)
Other versions
CN116229272A (en
Inventor
张锦
顾因
陈锋
段晔鑫
姜伟成
蔡军
耿京
刘雁轩
杨子恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Army Military Transportation University of PLA Zhenjiang
Original Assignee
Army Military Transportation University of PLA Zhenjiang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Army Military Transportation University of PLA Zhenjiang filed Critical Army Military Transportation University of PLA Zhenjiang
Priority to CN202310241950.2A priority Critical patent/CN116229272B/en
Publication of CN116229272A publication Critical patent/CN116229272A/en
Application granted granted Critical
Publication of CN116229272B publication Critical patent/CN116229272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The application discloses a high-precision remote sensing image detection method based on representative point representation, which comprises the following steps: acquiring a remote sensing image to be detected; inputting the acquired remote sensing image into a pre-trained single feature aggregation depth network, and outputting feature mapping; the backbone network is used for dividing the remote sensing image into multi-scale characteristics of a plurality of scales; the above-feature fusion network is used for upwardly fusing deep features with low resolution with shallow features with high resolution to obtain primary fusion features; the feature lower fusion network gradually fuses the deep high-resolution primary fusion features with the shallow low-resolution primary fusion features downwards to obtain high-level fusion features; the feature transformation network is used for reducing the dimension of the features and outputting lightweight features; the convolution module is used for carrying out convolution operation on the image; the single feature aggregation depth network is used for carrying out feature encoding and decoding on the original image and outputting feature mapping with the channel dimension of na× (nc+1+2 x np); the penalty functions include a classification penalty term, a location penalty term, a confidence penalty term, a geometric regularization term, and a feature regularization term.

Description

High-precision remote sensing image detection method and system based on representative point representation
Technical Field
The application relates to a high-precision remote sensing image detection method and system based on representative point representation, and belongs to the technical field of remote sensing image detection.
Background
The remote sensing image detection is an important means for implementing on-line information mining and dynamic monitoring aiming at important targets in a wide region, and can be widely applied to scenes such as large-range regional personnel search and rescue, forest fire detection, geological investigation, battlefield information real-time sensing and reconnaissance and the like. In one aspect, current rotation detectors face "angle critical problems" and "circle-like problems" by directly regressing the "rotation box" angle. The "angle critical problem" refers to a contradiction that the angle predicted value and the true value have small differences but large losses in the vicinity of the critical angle; "circle-like problem" means that the circle-like object is essentially direction independent, but the loss at this time is still very sensitive to direction prediction. The method aims to solve the problem of 'angle critical' caused by rapidity of remote sensing image detection and a rotating frame. On the other hand, the remote sensing image detector based on the rotating frame generally has low reasoning speed, is difficult to cope with the rapid requirement of high-resolution image detection, and the target detection algorithm of the horizontal frame which is rapid in direct migration is difficult to adapt to the requirements of the rotating invariance, the scale difference and the like of the remote sensing image.
Disclosure of Invention
The application aims to provide a high-precision remote sensing image detection method and system based on representative point representation, which are used for solving the problem of rapidness of remote sensing image detection and angle critical brought by a rotating frame.
A high-precision remote sensing image detection method based on representative point representation, the method comprising:
acquiring a remote sensing image to be detected;
inputting the acquired remote sensing image into a pre-trained single feature aggregation depth network, and outputting feature mapping;
the single feature aggregation depth network comprises a backbone network based on a single feature aggregation module, an upper feature fusion network, a lower feature fusion network, a feature transformation network, a representative point loss function and a convolution module;
the backbone network is used for dividing the remote sensing image into multi-scale characteristics of a plurality of scales; the above-feature fusion network is used for upwardly fusing deep features with low resolution with shallow features with high resolution to obtain primary fusion features; the feature lower fusion network gradually fuses the deep high-resolution primary fusion features with the shallow low-resolution primary fusion features downwards to obtain high-level fusion features; the feature transformation network is used for reducing the dimension of the features and outputting lightweight features; the convolution module is used for carrying out convolution operation on the image; the single feature aggregation depth network is used for carrying out feature encoding and decoding on the original image and outputting feature mapping with the channel dimension of na× (nc+1+2 x np); the representative point loss function includes a classification loss term, a location loss term, a confidence loss term, a geometric regularization term, and a feature regularization term.
Further, the single feature aggregation depth network is trained through a representative point loss function, and the formula of the representative point loss function is as follows:
wherein ,for classifying loss, < >>For confidence loss, < >>For locating losses, < >>For geometrically regularized items, < >>Feature regularization term, j represents a corresponding scale; alpha represents the weight of the different types of losses, and beta represents the weight of the different scale losses; p represents a predicted value, and T represents a true value; CE represents classical cross entropy loss, convexGIOU represents polygon generalized cross-correlation operator, ++> and />The distribution of a set of representative points representing an object is constrained from a geometric and characteristic perspective, respectively.
Further, the single feature aggregation depth network performs feature extraction and multi-scale feature fusion based on a single feature aggregation module.
Further, the single feature aggregation module includes:
c for input 0 Carrying out convolution processing on the characteristics to obtain 4 groups of output characteristics with the channel number of c;
and splicing the obtained 4 groups of output features with the channel number of c into a group of features with the channel number of 4c along the channel dimension, and then performing aggregation by using convolution operation to obtain a group of features with the channel number of 2 c.
Further, the convolution module comprises a convolution one and a convolution two; the first convolution is a convolution kernel with the kernel size of 3 multiplied by 3, and the step length of 1 is cascaded with an activation function with the parameter of 0.1, and the second convolution is a convolution kernel with the kernel size of 1 multiplied by 1, so that the quick dimension reduction of the high-dimension input characteristic is realized.
Further, the backbone network performs layer-by-layer abstraction and processing on the input remote sensing image based on a single feature aggregation module and a plurality of independent convolution and maximum pooling operations, and outputs feature mapping with different scale resolutions.
Further, the above-feature fusion network firstly uses convolution with a kernel of 1×1 to process the low-resolution input features of one channel number and the high-resolution input features of the other channel to obtain two features with uniform channel dimensions, then upsamples the low-resolution features to further unify the resolution, and fuses the obtained two features by using a single feature aggregation module.
Further, the feature downsampling network realizes downsampling of high-resolution features through a convolution operation with a step size of 2.
Further, the feature transformation network uses convolution with a lightweight kernel of 1×1 to reduce the dimension of the channel number features to obtain dimension-reduced features, then adopts multi-group pooling operation to encode diversified features on the dimension-reduced features, and finally carries out aggregation on multi-branch features to output lightweight features.
A high-precision remote sensing image detection system based on representative point representation, the system comprising:
the acquisition module is used for acquiring a remote sensing image to be detected;
the processing module is used for processing the input remote sensing image to be detected;
the processing module comprises a backbone network based on a single feature aggregation module, an upper feature fusion network, a lower feature fusion network, a feature transformation network, a representative point loss function and a convolution module; the backbone network is used for dividing the remote sensing image into multi-scale characteristics of a plurality of scales; the above-feature fusion network is used for upwardly fusing deep features with low resolution with shallow features with high resolution to obtain primary fusion features; the feature lower fusion network gradually fuses the deep high-resolution primary fusion features with the shallow low-resolution primary fusion features downwards to obtain high-level fusion features; the feature transformation network is used for reducing the dimension of the features and outputting lightweight features; the convolution module is used for carrying out convolution operation on the image; the single feature aggregation depth network is used for carrying out feature encoding and decoding on the original image and outputting feature mapping with the channel dimension of na× (nc+1+2 x np); the representative point loss function includes a classification loss term, a location loss term, a confidence loss term, a geometric regularization term, and a feature regularization term.
Compared with the prior art, the application has the beneficial effects that: the single feature aggregation depth network of the application performs feature extraction and multi-scale feature fusion based on a GPU reasoning efficient single feature aggregation module as a core component; the single feature aggregation module OFA can avoid low-efficiency calculation and large storage characteristics brought by a dense feature multiplexing mode so as to improve the reasoning speed, and can extract more diversified feature representations by a bit adding mode.
The application "discards" the rectangular box representation commonly used for "object" and selects np points to represent "object". In the restoration process, a rectangular frame surrounding all representative points is generated based on a classical Jarvis March algorithm or a minimum area algorithm.
Because the representative points and the rectangular frame generation algorithm based on the representative points do not relate to the representation of the angles of the rotating rectangular frames, the angle critical problem and the circle-like problem can be fundamentally avoided, and the stability of model training and the high precision of detection are ensured.
Drawings
FIG. 1 is a single feature aggregation depth network of the present application;
FIG. 2 is a schematic diagram of a single feature aggregation module of the present application;
FIG. 3 is a schematic diagram of a backbone network of the present application;
FIG. 4 is a schematic diagram of a feature transformation network of the present application;
FIG. 5 is a schematic diagram of a converged network in accordance with features of the present application;
fig. 6 is a schematic diagram of a converged network in accordance with the features of the present application.
Detailed Description
The application is further described in connection with the following detailed description, in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the application easy to understand.
Example 1
A high-precision remote sensing image detection method based on representative point representation, the method comprising:
acquiring a remote sensing image to be detected;
inputting the acquired remote sensing image into a pre-trained single feature aggregation depth network, and outputting feature mapping;
the single feature aggregation depth network comprises a backbone network based on a single feature aggregation module, an upper feature fusion network, a lower feature fusion network, a feature transformation network, a representative point loss function and a convolution module;
the backbone network is used for dividing the remote sensing image into multi-scale characteristics of a plurality of scales; the above-feature fusion network is used for upwardly fusing deep features with low resolution with shallow features with high resolution to obtain primary fusion features; the feature lower fusion network gradually fuses the deep high-resolution primary fusion features with the shallow low-resolution primary fusion features downwards to obtain high-level fusion features; the feature transformation network is used for reducing the dimension of the features and outputting lightweight features; the convolution module is used for carrying out convolution operation on classical images; the single feature aggregation depth network is used for carrying out feature encoding and decoding on the original image and outputting feature mapping with the channel dimension of na× (nc+1+2 x np); the representative point loss function includes a classification loss term, a location loss term, a confidence loss term, a geometric regularization term, and a feature regularization term.
The application "discards" the rectangular box representation commonly used for "object" and selects np points to represent "object". At the time of restoration, a rectangular frame surrounding all representative points is generated based on a classical Jarvis March algorithm or a minimum area algorithm (MinAeraRect).
Because the representative points and the rectangular frame generation algorithm based on the representative points do not relate to the representation of the angles of the rotating rectangular frames, the angle critical problem and the circle-like problem can be fundamentally avoided, and the stability of model training and the high precision of detection are ensured. The single feature aggregation depth network OFAN is based on a GPU reasoning efficient single feature aggregation module (One-pass Feature Aggregation, OFA) as a core component to perform feature extraction and multi-scale feature fusion. The single feature aggregation module OFA can not only avoid low-efficiency calculation and large storage characteristics caused by a dense feature multiplexing mode (such as DenseNet) so as to improve the reasoning speed, but also extract more diversified feature representations by a bit adding mode (such as ResNet).
The specific mode is as follows:
as shown in fig. 1, the single feature aggregation deep network OFAN mainly includes a Backbone network (Backbone), a fusion-on-feature network (FuseDown 2Up, fuseD 2U), a fusion-under-feature network (FuseUp 2Down, fuseU 2D), and a feature transformation network (Transition) and a convolution module (conv), wherein the Backbone network (Backbone) uses a single feature aggregation module OFA as a core component.
The single feature aggregation deep network OFAN workflow is as follows: the remote sensing image is sent into a Backbone network Backbone to obtain multi-scale features of 4 scales, then deep features with low resolution are gradually fused upwards with shallow features with high resolution (upward means that the resolution of the fused features is consistent with that of the shallow features with high resolution), a series of primary fused features (output features of FuseU 2D) are obtained, and then deep high-resolution primary fused features are gradually fused downwards with shallow low-resolution primary fused features (downward means that the resolution of the fused features is consistent with that of the shallow features with low resolution), so that advanced fused features (output features of FuseU 2D) are obtained.
Finally, the feature map with the dimensions of na× (nc+1+2×np) of the final output channel is obtained after the convolution processing of the advanced fusion features of each scale.
A specific explanation of this channel dimension is as follows: each prediction box is covered with np representative points (called a set of points), and naturally, 2×np neurons represent the abscissa offset of this set of points; nc neurons represent class confidence of the predicted object (class space size nc); 1 neuron represents the confidence that the object covered by the group of representative points belongs to the foreground; na denotes the number of prediction frames per position.
As shown in equation 1, represents the point loss function RPBy classification loss->Confidence loss->And loss of localization->Geometric regularization term->And feature regularization term->Composition is prepared. The first three losses correspond to three sets of nc, 1, 2 xnp neurons. Wherein j represents a corresponding scale; alpha and beta respectively represent the weights of different types of losses and the weights of different scale losses; p and T represent predicted and actual values, respectively; CE and ConvexGIOU represent classical cross entropy loss and polygon generic cross-correlation operators. /> and />The distribution of a set of representative points representing an object is constrained from a geometric and characteristic perspective, respectively. />The set of guide points should be as diffuse as possible to cover the whole object as completely as possible.
The application designs the total distance rho of the group of points from the point set center c kc Inverse of (formula 5).The similarity of the features corresponding to the points is minimized to facilitate the selection of points of different semantic parts of the object to characterize the whole object.
The application designs the characteristic similarity (e kc ) And (3) summing.
As shown in fig. 2, the single feature aggregation module OFA (c 0 The method comprises the steps of carrying out a first treatment on the surface of the 2c) The number of the opposite channels is c 0 And processing the input features of the number of channels to obtain an output feature with the channel number of 2 c. The single feature aggregation module OFA adjusts the longest path (branch 1) and the shortest path (branch 4) in the module to increase gradient diversity, so that the network can learn more diversity features and promote precision and accelerate convergence, and the specific workflow is as follows: the OFA builds 4 branches based on convolution operation to extract diversified features, then splices the diversified features along the channel dimension (the number of channels after splicing is 4 c), and then uses convolution operation to aggregate. OFA involves 2 different types of convolution operations-conv3×3,1, leakRelu (0.1) and conv1×1,1, leakRelu (0.1). The former represents a convolution kernel with a kernel size of 3 x3 and a step size of 1, followed by concatenating an activation function, leakrlu, with a parameter of 0.1 (0.1 times the output itself when the input value is exactly the output itself, negative); the latter uses a 1 x 1 convolution kernel to achieve fast dimensionality reduction of the high-dimensional input features.
As shown in fig. 3, the Backbone network backhaul performs layer-by-layer abstraction and processing on the input remote sensing image (resolution is h×w) based on a single feature aggregation module OFA and several independent convolution (conv) and Max Pooling (Max Pooling) operations, and finally outputs feature maps (b 2, b3, b4, b 5) of 4 different scale resolutions. Wherein the downsampling of the features is implemented based on a convolution of step size 2 (conv3×3,2, leakrelu (0.1)) and a Pooling operation of core size 2 (Max Pooling2, 2). Although the deep features after downsampling lose part of spatial information (positioning capability is reduced), the single feature aggregation module OFA obtains larger receptive field, so that advanced semantic features are easier to encode (object class prediction capability is improved). In contrast, the shallow feature space before downsampling is rich in information (strong in positioning capability), while the semantic information is weak (insufficient in object class prediction capability). In order to comprehensively utilize the advantages of the high-resolution shallow features and the low-resolution deep features, feature fusion is carried out in 2 stages of 'up' fusion and 'down' fusion based on FuseD2U and FuseU2D modules.
In order to improve the operation efficiency, the deep feature b5 is firstly subjected to dimension reduction based on Transition before feature fusion, and the number of feature channels is aggregated from 1024 to 256 (fig. 1). The transmission (4 c; c) shown in FIG. 4 processes the input feature with the number of channels of 4c to obtain the output feature with the number of channels of c. And the feature transformation network Transition uses the convolution with the lightweight kernel of 1 multiplied by 1 to reduce the dimension of the feature with the channel number of 4c to obtain the dimension reduction feature with the channel number of c, then adopts 3 groups of parallel pooling operations with receptive fields of 5, 9 and 13 to encode diversified features, and finally carries out aggregation on multi-branch features to output the lightweight features.
As shown in fig. 5, the feature fusion network FuseDown2Up (c 0 C1; c) The module first pairs the number of channels to c using a convolution with a kernel of 1 x 1 0 Is c 1 And (3) processing the high-resolution input features to obtain two features with the channel dimension being unified as c, and then upsampling the low-resolution features to further unify the resolution. And finally, fusing the two obtained features by using a single feature aggregation module OFA module. And a feature fusion network FuseDown2Up (c) 0 C1; c) Similarly, the under-feature fusion network FuseUp2Down (c 0 C1; c) The difference is that fuseep 2Down is the downsampling of the high resolution features achieved by a convolution operation with a step size of 2, see fig. 6.
Experimental conditions: the advancement of the present application was demonstrated by experimental verification on the global maximum remote sensing image detection Dataset (DOTA). The DOTA dataset contains 2806 remote sensing images, nearly 19 tens of thousands of labeled examples, for a total of 15 categories (airplane, ship, tank, tennis court, basketball court, baseball field, track field, harbor, bridge, dolly, large truck, helicopter, roundabout, soccer field, swimming pool). The experimental procedure was divided into 3 stages: 1) Developing network training and parameter adjustment based on the training set and the verification set disclosed by DOTA; 2) Reasoning the trained network on the unmarked test set; 3) And submitting the reasoning result to a DOTA official website to obtain algorithm evaluation.
Algorithm parameter setting: classification lossConfidence loss->And loss of localization->Geometric regularization termAnd feature regularization term->Weight alpha corresponding to five-part loss 1 、α 2 、α 3 、α 4 、α 5 Set to 0.07, 0.0375, 1.92, 0.03, respectively; the weight (beta) of the object lost by the small to large scale 1 、β 2 、β 3 、β 4 ) 4, 1, 0.25, 0.06, respectively. The number np of a set of points is set to 9; the training batch size was set to 48, the number of rounds was set to 200, the initial learning rate was set to 0.01, and after linear decay, the final round of learning rate was set to 0.002. The input image resolution for both training and testing procedures was 1024 x 1024. The IOU threshold for maximum suppression was set to 0.45 and the confidence thresholds for training and reasoning were set to 0.01 and 0.25, respectively.
TABLE 1 comprehensive Performance of high Performance remote sensing image detection Algorithm on DOTA test set 1
Experimental analysis: table 1 shows the overall mAP accuracy and inference speed (FPS) of the present application (OFAN) versus the high performance remote sensing image detection algorithm under DOTA testing. The reasoning speed of the existing high-performance algorithm does not exceed 20 Frames Per Second (FPS), and the detection speed of the method in the GTX3090 video card respectively reaches 62.5FPS (16 ms). Particularly, when the batch size is 32, the reasoning speed of the application reaches 177FPS (5.6 ms), and the rapid requirement of remote sensing image detection can be better met. For a 20-level high-precision remote sensing image (0.27 meters in spatial resolution), the algorithm can monitor an area of about 13.5 square kilometers (177×0.27×1024/1000×0.27×1024/1000) in real-time on the second level. The mAP precision of the application is 73.7 when the acquisition speed is greatly improved, and is close to the detection precision of the current most advanced aviation image detector.
Example 2
A high-precision remote sensing image detection system based on representative point representation, the system comprising:
the acquisition module is used for acquiring a remote sensing image to be detected;
the processing module is used for processing the input remote sensing image to be detected;
the processing module comprises a backbone network based on a single feature aggregation module, an upper feature fusion network, a lower feature fusion network, a feature transformation network, a representative point loss function and a convolution module; the backbone network is used for dividing the remote sensing image into multi-scale characteristics of a plurality of scales; the above-feature fusion network is used for upwardly fusing deep features with low resolution with shallow features with high resolution to obtain primary fusion features; the feature lower fusion network gradually fuses the deep high-resolution primary fusion features with the shallow low-resolution primary fusion features downwards to obtain high-level fusion features; the feature transformation network is used for reducing the dimension of the features and outputting lightweight features; the convolution module is used for carrying out convolution operation on the image; the single feature aggregation depth network is used for carrying out feature encoding and decoding on the original image and outputting feature mapping with the channel dimension of na× (nc+1+2 x np); the representative point loss function includes a classification loss term, a location loss term, a confidence loss term, a geometric regularization term, and a feature regularization term.
The foregoing is merely a preferred embodiment of the present application, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present application, and such modifications and variations should also be regarded as being within the scope of the application.

Claims (8)

1. The high-precision remote sensing image detection method based on representative point representation is characterized by comprising the following steps of:
acquiring a remote sensing image to be detected;
inputting the acquired remote sensing image into a pre-trained single feature aggregation depth network, and outputting feature mapping;
the single feature aggregation depth network comprises a backbone network based on a single feature aggregation module, an upper feature fusion network, a lower feature fusion network, a feature transformation network, a representative point loss function and a convolution module;
the backbone network is used for dividing the remote sensing image into multi-scale characteristics of a plurality of scales; the above-feature fusion network is used for upwardly fusing deep features with low resolution with shallow features with high resolution to obtain primary fusion features; the feature lower fusion network gradually fuses the deep high-resolution primary fusion features with the shallow low-resolution primary fusion features downwards to obtain high-level fusion features; the feature transformation network is used for reducing the dimension of the features and outputting lightweight features; the convolution module is used for carrying out convolution operation on the image; the single feature aggregation depth network is used for carrying out feature encoding and decoding on the original image and outputting feature mapping with the channel dimension of na× (nc+1+2 x np); the representative point loss function comprises a classification loss term, a positioning loss term, a confidence loss term, a geometric regularization term and a characteristic regularization term;
the single feature aggregation depth network is trained through a representative point loss function, and the formula of the representative point loss function is as follows:
wherein ,for classifying loss, < >>For confidence loss, < >>For locating losses, < >>Is a geometric regularization term,Feature regularization term, j represents a corresponding scale; alpha represents the weight of the different types of losses, and beta represents the weight of the different scale losses; p represents a predicted value, and T represents a true value; CE represents warpClassical cross entropy loss, convexGIOU represents the polygon generalized cross-correlation operator, +.> and />The distribution of a set of representative points representing an object is constrained from a geometric and characteristic perspective, respectively;
the feature fusion network firstly uses convolution with a kernel of 1 multiplied by 1 to process the low-resolution input features of one channel number and the high-resolution input features of the other channel to obtain two features with uniform channel dimensions, then upsamples the low-resolution features to further unify the resolution, and fuses the obtained two features by using a single feature aggregation module;
gradually fusing the deep high-resolution primary fusion characteristic with the shallow low-resolution primary fusion characteristic downwards to obtain an advanced fusion characteristic.
2. The high-precision remote sensing image detection method based on representative point representation according to claim 1, wherein the single feature aggregation depth network performs feature extraction and multi-scale feature fusion based on a single feature aggregation module.
3. The method for detecting a high-precision remote sensing image based on representative point representation according to claim 1, wherein the single feature aggregation module comprises:
c for input 0 Carrying out convolution processing on the characteristics to obtain 4 groups of output characteristics with the channel number of c;
and splicing the obtained 4 groups of output features with the channel number of c into a group of features with the channel number of 4c along the channel dimension, and then performing aggregation by using convolution operation to obtain a group of features with the channel number of 2 c.
4. The method for detecting the high-precision remote sensing image based on the representative point representation according to claim 1, wherein the convolution module comprises a first convolution and a second convolution; the first convolution is a convolution kernel with the kernel size of 3 multiplied by 3, and the step length of 1 is cascaded with an activation function with the parameter of 0.1, and the second convolution is a convolution kernel with the kernel size of 1 multiplied by 1, so that the quick dimension reduction of the high-dimension input characteristic is realized.
5. The method for detecting the high-precision remote sensing image based on representative point representation according to claim 1, wherein the backbone network performs layer-by-layer abstraction and processing on the input remote sensing image based on a single feature aggregation module and a plurality of independent convolution and maximum pooling operations, and outputs feature mapping with different scale resolutions.
6. The method for detecting the high-precision remote sensing image based on representative point representation according to claim 1, wherein the feature downsampling network realizes downsampling of high-resolution features through convolution operation with a step length of 2.
7. The method for detecting the high-precision remote sensing image based on representative point representation according to claim 1, wherein the feature transformation network uses a convolution with a lightweight kernel of 1×1 to perform dimension reduction on the features of the channel number to obtain dimension reduction features, then adopts a plurality of groups of pooling operations on the dimension reduction features to encode diversified features, and finally performs aggregation on the multi-branch features to output lightweight features.
8. A high-precision remote sensing image detection system based on representative point representation, the system comprising:
the acquisition module is used for acquiring a remote sensing image to be detected;
the processing module is used for processing the input remote sensing image to be detected;
the processing module comprises a backbone network based on a single feature aggregation module, an upper feature fusion network, a lower feature fusion network, a feature transformation network, a representative point loss function and a convolution module; the backbone network is used for dividing the remote sensing image into multi-scale characteristics of a plurality of scales; the above-feature fusion network is used for upwardly fusing deep features with low resolution with shallow features with high resolution to obtain primary fusion features; the feature lower fusion network gradually fuses the deep high-resolution primary fusion features with the shallow low-resolution primary fusion features downwards to obtain high-level fusion features; the feature transformation network is used for reducing the dimension of the features and outputting lightweight features; the convolution module is used for carrying out convolution operation on the image; the single feature aggregation depth network is used for carrying out feature encoding and decoding on the original image and outputting feature mapping with the channel dimension of na× (nc+1+2 x np); the representative point loss function comprises a classification loss term, a positioning loss term, a confidence loss term, a geometric regularization term and a characteristic regularization term;
the single feature aggregation depth network is trained through a representative point loss function, and the formula of the representative point loss function is as follows:
wherein ,for classifying loss, < >>For confidence loss, < >>For locating losses, < >>Is a geometric regularization term,Feature regularization term, j represents a corresponding scale; alpha represents the weight of the different types of losses, and beta represents the weight of the different scale losses; p represents a predicted value, and T represents a true value; CE represents classical cross entropy loss, convexGIOU represents polygon generalized cross-correlation operator, ++> and />The distribution of a set of representative points representing an object is constrained from a geometric and characteristic perspective, respectively;
the feature fusion network firstly uses convolution with a kernel of 1 multiplied by 1 to process the low-resolution input features of one channel number and the high-resolution input features of the other channel to obtain two features with uniform channel dimensions, then upsamples the low-resolution features to further unify the resolution, and fuses the obtained two features by using a single feature aggregation module; gradually fusing the deep high-resolution primary fusion characteristic with the shallow low-resolution primary fusion characteristic downwards to obtain an advanced fusion characteristic.
CN202310241950.2A 2023-03-14 2023-03-14 High-precision remote sensing image detection method and system based on representative point representation Active CN116229272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310241950.2A CN116229272B (en) 2023-03-14 2023-03-14 High-precision remote sensing image detection method and system based on representative point representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310241950.2A CN116229272B (en) 2023-03-14 2023-03-14 High-precision remote sensing image detection method and system based on representative point representation

Publications (2)

Publication Number Publication Date
CN116229272A CN116229272A (en) 2023-06-06
CN116229272B true CN116229272B (en) 2023-10-31

Family

ID=86569399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310241950.2A Active CN116229272B (en) 2023-03-14 2023-03-14 High-precision remote sensing image detection method and system based on representative point representation

Country Status (1)

Country Link
CN (1) CN116229272B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274763B (en) * 2023-11-21 2024-04-05 珠江水利委员会珠江水利科学研究院 Remote sensing image space-spectrum fusion method, system, equipment and medium based on balance point analysis

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245655A (en) * 2019-05-10 2019-09-17 天津大学 A kind of single phase object detecting method based on lightweight image pyramid network
CN111079683A (en) * 2019-12-24 2020-04-28 天津大学 Remote sensing image cloud and snow detection method based on convolutional neural network
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method
WO2020143323A1 (en) * 2019-01-08 2020-07-16 平安科技(深圳)有限公司 Remote sensing image segmentation method and device, and storage medium and server
CN114022793A (en) * 2021-10-28 2022-02-08 天津大学 Optical remote sensing image change detection method based on twin network
CN114648684A (en) * 2022-03-24 2022-06-21 南京邮电大学 Lightweight double-branch convolutional neural network for image target detection and detection method thereof
CN114821357A (en) * 2022-04-24 2022-07-29 中国人民解放军空军工程大学 Optical remote sensing target detection method based on transformer

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020143323A1 (en) * 2019-01-08 2020-07-16 平安科技(深圳)有限公司 Remote sensing image segmentation method and device, and storage medium and server
CN110245655A (en) * 2019-05-10 2019-09-17 天津大学 A kind of single phase object detecting method based on lightweight image pyramid network
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method
CN111079683A (en) * 2019-12-24 2020-04-28 天津大学 Remote sensing image cloud and snow detection method based on convolutional neural network
CN114022793A (en) * 2021-10-28 2022-02-08 天津大学 Optical remote sensing image change detection method based on twin network
CN114648684A (en) * 2022-03-24 2022-06-21 南京邮电大学 Lightweight double-branch convolutional neural network for image target detection and detection method thereof
CN114821357A (en) * 2022-04-24 2022-07-29 中国人民解放军空军工程大学 Optical remote sensing target detection method based on transformer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Effective Features of Remote Sensing Image Classification Using Interactive Adaptive Thresholding Method;T.Balaji et al;《airXiv》;1-5 *
基于激光点云和视觉融合的智能车前方障碍物检测方法研究;张煜;《中国优秀硕士学位论文全文数据库(电子期刊)》;第2023卷(第02期);全文 *

Also Published As

Publication number Publication date
CN116229272A (en) 2023-06-06

Similar Documents

Publication Publication Date Title
Song et al. Mstdsnet-cd: Multiscale swin transformer and deeply supervised network for change detection of the fast-growing urban regions
CN109446925A (en) A kind of electric device maintenance algorithm based on convolutional neural networks
CN109784283A (en) Based on the Remote Sensing Target extracting method under scene Recognition task
CN111353487A (en) Equipment information extraction method for transformer substation
CN116229272B (en) High-precision remote sensing image detection method and system based on representative point representation
CN116229295A (en) Remote sensing image target detection method based on fusion convolution attention mechanism
CN113569672A (en) Lightweight target detection and fault identification method, device and system
CN111046756A (en) Convolutional neural network detection method for high-resolution remote sensing image target scale features
CN112395953A (en) Road surface foreign matter detection system
CN116740344A (en) Knowledge distillation-based lightweight remote sensing image semantic segmentation method and device
Wang et al. Fault detection for power line based on convolution neural network
CN117132910A (en) Vehicle detection method and device for unmanned aerial vehicle and storage medium
Zhao et al. A target detection algorithm for remote sensing images based on a combination of feature fusion and improved anchor
Zhang et al. RoI Fusion Strategy With Self-Attention Mechanism for Object Detection in Remote Sensing Images
Cao et al. Small Object Detection Algorithm for Railway Scene
Sato et al. Semantic Segmentation of Outcrop Images using Deep Learning Networks Toward Realization of Carbon Capture and Storage
CN114897858A (en) Rapid insulator defect detection method and system based on deep learning
Han et al. Instance Segmentation of Transmission Line Images Based on an Improved D-SOLO Network
Que et al. Low altitude, slow speed and small size object detection improvement in noise conditions based on mixed training
Luo et al. SOLOv2-cable: A Power Cable Segmentation Algorithm in Complex Scenarios
Qu et al. Research on UAV Image Detection Method in Urban Low-altitude Complex Background
Zhou et al. Insulator detection for high-resolution satellite images based on deep learning
CN116524348B (en) Aviation image detection method and system based on angle period representation
CN117557775B (en) Substation power equipment detection method and system based on infrared and visible light fusion
Zhu et al. Rgb-d saliency detection based on cross-modal and multi-scale feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant