CN111860398A - Remote sensing image target detection method and system and terminal equipment - Google Patents

Remote sensing image target detection method and system and terminal equipment Download PDF

Info

Publication number
CN111860398A
CN111860398A CN202010737230.1A CN202010737230A CN111860398A CN 111860398 A CN111860398 A CN 111860398A CN 202010737230 A CN202010737230 A CN 202010737230A CN 111860398 A CN111860398 A CN 111860398A
Authority
CN
China
Prior art keywords
feature map
attention
residual block
scale
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010737230.1A
Other languages
Chinese (zh)
Other versions
CN111860398B (en
Inventor
刘京
田亮
郭蔚
杨烁今
陈栋
周丙寅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Normal University
Original Assignee
Hebei Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Normal University filed Critical Hebei Normal University
Priority to CN202010737230.1A priority Critical patent/CN111860398B/en
Publication of CN111860398A publication Critical patent/CN111860398A/en
Application granted granted Critical
Publication of CN111860398B publication Critical patent/CN111860398B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention is suitable for the technical field of image processing, and discloses a method, a system and a terminal device for detecting a remote sensing image target, wherein the method comprises the following steps: acquiring a remote sensing image to be detected; inputting the remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output characteristic graphs with different scales; and carrying out target detection according to the output characteristic graphs with different scales to obtain a detection result. The invention carries out feature extraction through the parallel perception attention network model, not only can extract multi-scale, context and global features of the target, but also can extract correlation features among non-local targets and can extract direction sensitive target features.

Description

Remote sensing image target detection method and system and terminal equipment
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a method, a system and a terminal device for detecting a remote sensing image target.
Background
The target detection is an important research content in the field of image processing, has high practical application value and is a research subject which is widely concerned by experts and scholars at home and abroad. With the development of deep learning, the application of deep learning to target detection of remote sensing images is becoming more prevalent.
Currently, target detection models based on deep learning are mainly classified into two categories. One type of target detection models is represented by RCNN, Fast-RCNN and is based on region recommendation, boundary frames and types of detected targets are predicted through two steps from rough to fine by the target detection models, and the target detection models have high accuracy but low detection speed; the other type is a regression-based target detection model represented by YOLO (YOLO, solid State disk), the type of the model directly predicts a boundary enclosure and a type of a detected target without a process of 'firstly thickening and then thinning', and the model has high detection speed but general detection accuracy. Therefore, the prior art cannot give consideration to both the target detection speed and the target detection accuracy.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, a system, and a terminal device for detecting a target in a remote sensing image, so as to solve the problem that the prior art cannot consider both the target detection speed and the target detection accuracy.
The first aspect of the embodiment of the invention provides a method for detecting a remote sensing image target, which comprises the following steps:
acquiring a remote sensing image to be detected;
inputting the remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output characteristic graphs with different scales;
and carrying out target detection according to the output characteristic graphs of different scales to obtain a detection result.
A second aspect of an embodiment of the present invention provides a remote sensing image target detection system, including:
the acquisition module is used for acquiring a remote sensing image to be detected;
the feature extraction module is used for inputting the remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output feature maps with different scales;
and the target detection module is used for carrying out target detection according to the output characteristic graphs with different scales to obtain a detection result.
A third aspect of the embodiments of the present invention provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for detecting a target in a remote sensing image according to the first aspect when executing the computer program.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by one or more processors, implements the steps of the method for object detection in remote sensing images according to the first aspect.
Compared with the prior art, the embodiment of the invention has the following beneficial effects: according to the embodiment of the invention, the remote sensing image to be detected is firstly acquired, then the remote sensing image to be detected is input into the trained parallel perception attention network model to obtain a plurality of output feature maps with different scales, and finally, the target detection is carried out according to the plurality of output feature maps with different scales to obtain the detection result.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart illustrating an implementation of a method for detecting a target in a remote sensing image according to an embodiment of the present invention;
FIG. 2 is a diagram of a parallel perceptual attention network model provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a first multi-scale attention submodule provided in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of a first context attention sub-module provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of a first channel attention sub-module provided in accordance with an embodiment of the present invention;
FIG. 6 is a schematic view of a thermal image of a first scale feature map provided by an embodiment of the present invention;
FIG. 7 is a schematic diagram of a thermal image of a first contextual feature map provided by an embodiment of the present invention;
FIG. 8 is a schematic thermal image of a first channel profile provided by an embodiment of the present invention;
FIG. 9 is a schematic flow chart illustrating an implementation of a method for detecting a target in a remote sensing image according to another embodiment of the present invention;
FIG. 10 is a schematic diagram of experimental test results provided by an embodiment of the present invention;
FIG. 11 is a schematic block diagram of a remote sensing image target detection system provided by an embodiment of the invention;
fig. 12 is a schematic block diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Fig. 1 is a schematic flow chart of an implementation of a method for detecting a target in a remote sensing image according to an embodiment of the present invention, and for convenience of description, only a part related to the embodiment of the present invention is shown. The execution main body of the embodiment of the invention can be terminal equipment. As shown in fig. 1, the method may include the steps of:
s101: and acquiring a remote sensing image to be detected.
In the embodiment of the invention, the remote sensing image to be detected can be obtained by the existing method.
S102: and inputting the remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output characteristic graphs with different scales.
In the embodiment of the invention, a parallel perception attention network model is firstly constructed, and then the constructed parallel perception attention network model is trained through a training set to obtain the trained parallel perception attention network model.
In one embodiment of the invention, in the training of the parallel perceptual attention network model, a class loss function and a regression loss function are used, wherein the regression loss function is a distance cross-correlation loss function.
Specifically, the class loss function is:
Figure BDA0002605528320000041
the Distance Intersection over loss function (DIoU) is:
Figure BDA0002605528320000042
in the distance cross-correlation loss function, bgtRespectively representing the center points of the anchor bounding box and the label bounding box, p representing the Euclidean distance for calculating the two center points, and c representing the diagonal distance of the minimum rectangle which can simultaneously cover the anchor bounding box and the label bounding box. The normalized distance between the anchor bounding box and the tag bounding box is thus modeled in DIoU. The loss function is beneficial to improving the detection accuracy of the small target while accelerating the convergence.
In the embodiment of the invention, the distance cross-correlation loss is adopted to replace the traditional regression loss, so that the training speed can be accelerated, and the detection accuracy of the small target can be enhanced.
In one embodiment of the present invention, referring to fig. 2, the parallel perceptual attention network model takes a residual error network as a backbone;
the parallel perceptual attention network model comprises a first residual block B1A second residual block B2A third residual block B3And a fourth residual block B4The system comprises a first parallel perception attention module, a second parallel perception attention module, a third parallel perception attention module and a fourth parallel perception attention module; first residual block B1A second residual block B2A third residual block B3And a fourth residual block B4All the sizes of the components are different;
the first parallel perception attention module uses a first residual block B1And a second residual block B2For input, output the first fused feature map IB1(ii) a A second parallel perceptual attention module with a second residual block B2And a third residual block B3For input, output a second fused profile IB2(ii) a A third parallel perceptual attention module with a third residual block B3And a fourth residual block B4Outputting the third fused feature map IB as input3(ii) a A fourth parallel perceptual attention module with a fourth residual block B4Outputting the fourth fused feature map IB as input4
Fourth fused feature map IB4Obtaining an output characteristic diagram O of a fourth scale through deformable convolution4(ii) a Third fused profile IB3The output characteristic diagram O of the fourth scale after being subjected to deformable convolution and 2 times of upsampling4Adding to obtain an output characteristic diagram O of a third scale3(ii) a Second fused profile IB2The output characteristic diagram O of the third scale after being subjected to deformable convolution and 2 times of upsampling3Adding to obtain an output characteristic diagram O of a second scale2(ii) a First fused profile IB1The output characteristic diagram O of the second scale after being subjected to deformable convolution and 2 times of upsampling2Adding to obtain an output characteristic diagram O of a first scale1
The main part of the parallel perception attention network model adopts a residual error network ResNet-101.
And introducing a parallel perception attention module behind each residual block, obtaining each fusion characteristic graph through fusion operation, then accurately extracting position sensitive characteristics by using deformable convolution and obtaining four output characteristic graphs with different scales by adopting a multi-scale fusion strategy.
Because the target angles in the remote sensing image are changeable, the conventional convolution operation is easy to extract irrelevant information, in order to reduce the influence of irrelevant features on the direction sensitive target, the embodiment of the invention uses the deformable convolution operation after obtaining the feature images fused under various scales, and the operation achieves the purpose of correcting the sampling position by predicting the offset of a pair of x direction and y direction for each sampling position, thereby changing the traditional regular sampling structure, being capable of sampling objects in any shape and enhancing the feature extraction capability of the direction sensitive target.
Specifically, in the deformable convolution, a feature map of H × W × C is input, H is the height of the feature map, W is the width of the feature map, and C is the number of channels of the feature map, the feature map of H × W × 2C is obtained after convolution operation, the number of channels at this time is twice as large as the original number, and represents the offset of each pixel point in the X direction and the Y direction, and finally the final feature map is obtained by adding the index of the pixel in the input image and the offset obtained through convolution, and the offset needs to be set to be within the picture when pixel offset is performed. Because the offset is usually a decimal number in actual operation and cannot be directly used as an offset coordinate, if rounding is forced, a large error is introduced, and in order to avoid the error, a bilinear interpolation method is usually adopted in actual operation to obtain a final characteristic diagram.
In one embodiment of the invention, the first parallel perceptual attention module, the second parallel perceptual attention module and the third parallel perceptual attention module have the same structure;
referring to fig. 2, the first parallel perceptual attention module includes a first multi-scale attention submodule, a first context attention submodule, and a first channel attention submodule;
the first multi-scale attention submodule uses a first residual block B1And a second residual block B2Outputting a first scale feature map E as an input;
the first context attention submodule uses a first residual block B1Outputting a first context feature map F as an input;
the first channel attention sub-module uses a first residual block B1Outputting a first channel characteristic diagram G as an input;
fusing the first scale characteristic diagram E, the first context characteristic diagram F and the first channel characteristic diagram G to obtain a first fused characteristic diagram IB1
In the embodiment of the invention, the first parallel attention sensing module, the second parallel attention sensing module and the third parallel attention sensing module have the same structure, but different input and output. The second parallel perception attention module comprises a second multi-scale attention submodule, a second context attention submodule and a second channel attention submodule; the third parallel perceptual attention module includes a third multi-scale attention sub-module, a third contextual attention sub-module, and a third channel attention sub-module.
In particular, in the second parallel awareness module, the secondResidual block B2Replacing the first residual block B in the first parallel perceptual attention module1Position of (2), third residual block B3Replacing the second residual block B in the first parallel perceptual attention module2The position of (a). Similarly, in the third parallel perceptual attention module, the third residual block B3Replacing the first residual block B in the first parallel perceptual attention module1Position of (1), fourth residual block B4Replacing the second residual block B in the first parallel perceptual attention module2The position of (a).
In one embodiment of the invention, referring to FIG. 3, in a first multi-scale attention submodule, a first residual block B is formed1Performing convolution to obtain a first intermediate scale feature map A, performing convolution on a second residual block B2 to obtain a second intermediate scale feature map B, performing matrix transformation on the second intermediate scale feature map B, multiplying the second intermediate scale feature map B by the first intermediate scale feature map A to obtain a third intermediate scale feature map, normalizing the third intermediate scale feature map to obtain a first multi-scale attention weight map M, performing multiplication on the first multi-scale attention weight map M and the second intermediate scale feature map B to obtain a fourth intermediate scale feature map, performing up-sampling on the fourth intermediate scale feature map, and performing up-sampling on the fourth intermediate scale feature map and the first residual block B1Adding to obtain a first scale characteristic diagram E;
referring to FIG. 4, in the first context attention submodule, a first residual block B is formed1Convolution is carried out to respectively obtain a first intermediate context feature graph K and a second intermediate context feature graph D, the second intermediate context feature graph D is multiplied by the first intermediate context feature graph K after matrix transformation is carried out to obtain a third intermediate context feature graph, normalization is carried out on the third intermediate context feature graph to obtain a first context attention weight graph P, and the first context attention weight graph P and a first residual block B are combined1Multiplying to obtain a fourth intermediate context feature map, matrix-transforming the fourth intermediate context feature map and the first residual block B1Adding to obtain a first context feature map F;
referring to fig. 5, in the first channel attention submodule,the first residual block B1After matrix transformation, the first residual block B1Multiplying to obtain a first intermediate channel characteristic diagram, normalizing the first intermediate channel characteristic diagram to obtain a first channel attention weight diagram Q, and comparing the first channel attention weight diagram Q with a first residual block B1Multiplying to obtain a second intermediate channel characteristic diagram, performing matrix transformation on the second intermediate channel characteristic diagram, and then performing matrix transformation on the second intermediate channel characteristic diagram and the first residual block B1And performing addition operation to obtain a first channel characteristic diagram G.
In the embodiment of the present invention, specific working processes of a first multi-scale attention submodule, a first context attention submodule, and a first channel attention submodule included in a first parallel awareness module are given, and since the first parallel awareness module, a second parallel awareness module, and a third parallel awareness module have the same structure and are different only in input and output, specific processes of the second parallel awareness module and the third parallel awareness module are not described in detail herein.
Specifically, in the deep convolutional neural network, feature maps of different scales contain different degrees of structure and semantic information, the semantic information is rich in a high-level feature map, and the structural information is rich in a low-level feature map. However, the information is very important for detecting the target in the remote sensing image, especially the small target, and in order to fully utilize the information, the embodiment of the invention provides a multi-scale attention module so as to enhance the feature expression of the small target.
The embodiment of the invention provides a specific working process of the first multi-scale attention submodule. Wherein the first intermediate-scale feature map A and the second intermediate-scale feature map B are composed of a first residual block B1And a second residual block B2Attention weight maps obtained by performing 1 × 1 convolution respectively, H and W respectively representing the first residual block B1Height and width of (1), first residual block B1The number of channels of (2) is denoted by C. The matrix transformation may be a matrix transposition. The normalization may be Softmax normalization. Since the second intermediate-scale feature map B is in a deeper network layer, the first intermediate-scale feature map A is richer in contentThe first multi-scale attention weight map M contains a priori of the structure information of the first intermediate-scale feature map a to the structure information of the second intermediate-scale feature map B, so that the first scale feature map E obtained through the first multi-scale attention weight map M contains rich structure information and deeper semantic information, and detection of a small-scale target is facilitated.
In one embodiment of the present invention, the first multi-scale attention weight map M is calculated as:
Figure BDA0002605528320000081
wherein i represents the ith row, j represents the jth column, and N is the first residual block B1A is a first intermediate-scale feature map, B is a second intermediate-scale feature map;
the calculation formula of the first scale feature map E is as follows:
Figure BDA0002605528320000082
wherein, B1α is a first weight coefficient that can be learned, for the first residual block.
Optionally, j may take a value from 1 to the first residual block B1Is a positive integer between the widths of (a).
Optionally, a first residual block B1Is the same as the width.
MjiIs a normalized weight coefficient in the first multi-scale attention weight map M, which measures the influence of the ith position on the jth position in each scale, and alpha is a learnable first weight coefficient used for weighing the corrected feature map and the initial feature map. Referring to fig. 6, fig. 6 shows a thermal image of a portion of the first scale feature E from which it can be seen that more small aircraft regions are activated.
The embodiment of the invention also provides a specific working process of the first context attention submodule. The context information can effectively distinguish foreground information from background information and is beneficial to remote sensing image target detection under a complex background, and the first context attention submodule embeds the context information into an attention mechanism so as to fully extract the associated information of the front background and the rear background and further enhance the feature expression capability of the network. The main structure is shown in fig. 4.
Wherein, in the first context attention submodule, for the first residual block B1Respectively carrying out 7 multiplied by 7 convolution to obtain a first intermediate context feature map K and a second intermediate context feature map D; in the second context attention submodule, for the second residual block B2Respectively carrying out 5 multiplied by 5 convolution to obtain two intermediate context feature maps; in the third context attention submodule, for a third residual block B3Respectively carrying out 3 x3 convolution to obtain two intermediate context feature maps; in the fourth context attention submodule, for a fourth residual block B4The 1 × 1 convolution is performed to obtain two intermediate context feature maps.
The first context attention weight graph comprises the contribution degree of the context information of the target with non-local relevance to the classification and regression of the target under each scale. The first context feature map enhances the expression of the target and associated information around the target.
In one embodiment of the present invention, the first contextual attention weight map P is calculated as:
Figure BDA0002605528320000091
wherein K is a first intermediate context feature map, and D is a second intermediate context feature map;
the calculation formula of the first context feature map F is:
Figure BDA0002605528320000092
where β is a learnable second weight coefficient.
Wherein, PjiWeighting influence coefficients of the ith position to the jth position in a weight map with context information; beta is a second weight coefficient which can be learned and is used for weighing the correctedA feature map and an initial feature map. Referring to fig. 7, fig. 7 shows a thermal image of a portion of the first contextual characteristic map F, from which it can be seen that more local information around the object is activated.
The embodiment of the invention also provides a specific working process of the first channel attention submodule. Each channel of the feature map of the convolutional neural network has global information of different categories and spatial positions, some information is favorable for target detection, some information is unfavorable for target detection, and in order to strengthen positive response and weaken negative response, the embodiment of the invention provides a channel attention submodule for modeling the interrelation among the channels and the non-local association in the feature map. The specific process can be seen in fig. 5.
In one embodiment of the present invention, the first channel attention weight map Q is calculated as:
Figure BDA0002605528320000101
wherein C is the first residual block B1The number of channels of (a);
the calculation formula of the first channel characteristic diagram G is as follows:
Figure BDA0002605528320000102
where γ is a learnable third weight coefficient.
Wherein QjiGamma is a third weight coefficient which can be learned and is used for weighing the corrected characteristic diagram and the initial characteristic diagram. Referring to fig. 8, fig. 8 shows a thermal image of a portion of the first channel signature G from which it can be seen that more global information associated with the target is activated.
In one embodiment of the invention, the fourth parallel perceptual attention module comprises a fourth contextual attention submodule and a fourth channel attention submodule;
fourth context attention submodule with fourth residual block B4Outputting the fourth context feature as an inputA drawing;
the fourth channel attention submodule uses a fourth residual block B4Outputting a fourth channel characteristic diagram for input;
fusing the fourth context feature map and the fourth channel feature map to obtain a fourth fused feature map IB4
Different from the three awareness modules, the fourth parallel awareness module only includes a context awareness sub-module and a channel attention sub-module, and the context awareness sub-module and the channel attention sub-module are similar to the working processes of the first context awareness sub-module and the first channel attention sub-module described above, and are not described herein again.
Optionally, before S102, the method may further include:
preprocessing a remote sensing image to be detected to obtain a preprocessed remote sensing image to be detected;
accordingly, S102 may include:
and inputting the preprocessed remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output characteristic graphs with different scales.
S103: and carrying out target detection according to the output characteristic graphs of different scales to obtain a detection result.
In the embodiment of the invention, any existing method can be utilized to perform target detection according to a plurality of output characteristic graphs with different scales to obtain a detection result.
Alternatively, referring to fig. 9, after feature extraction is performed through the trained parallel perceptual attention network model, target detection may be performed through operations of area recommendation network, alignment, pooling, and the like, and operations of non-maximum suppression output classification, positioning result, and the like, so as to obtain a detection result.
In the embodiment of the present invention, the design detail parameters of the parallel awareness network model are shown in table 1.
TABLE 1 design details parameters of parallel perceptual attention network model
The target detection effect of the embodiment of the invention is verified through experiments.
The hardware and software environments used for the experiments were as follows:
a CPU: intel core i767003.30GHZ; GPU: p 20005G; memory: 16G; operating the system: ubuntu 16.04; and (3) developing environment: tensorflow programming language: python 3.5; IDE: pycharm
Experimental data set:
the data set used in the experiment is two remote sensing image public data sets: RSOD and UCAS-AOD, 80% of the automobile and airplane categories are randomly selected as training sets, and 20% of the automobile and airplane categories are selected as testing sets.
The network model adopts a residual error network with 101 layers as a main network, parameters are initialized by using weights pre-trained on ImageNet, the input sizes of pictures are uniformly adjusted to 800x800 pixels, 30000 rounds of training are carried out by using a random gradient descent method, the initial learning rate is 0.001, and the initial learning rate is reduced to 0.0001 after 15000 rounds. In anchor bounding box selection, four dimensions of 32x32, 64x64, 128x128 and 256x256 are used, the aspect ratio is 1:1, 2:1 and 1:2, the anchor bounding boxes are used, calculation can be reduced, and meanwhile good accuracy is guaranteed, and the threshold value of IoU is set to be 0.7.
Table 2 compares the results of the average accuracy and recall with those of other methods
Figure BDA0002605528320000121
The experimental results are as follows:
the evaluation indexes of the experiment adopt average accuracy and recall rate. Fig. 10 shows a comparison between the detection results of the target detection method according to the embodiment of the present invention and the detection results of the current mainstream deep learning method, where the first three columns show the detection results of a target (aircraft) in a complex background under a small scale and under an occlusion condition, and the last column shows the detection results of an automobile in each scene, where the first row is an original picture, the second row and the third row are the detection results of a regression-based target detection model YOLO and an SSD, respectively, and it can be seen from the frame that the detection accuracy is not high, and there are still many missed detection conditions in a complex scene. The fourth line and the fifth line are target detection models FPN and Faster-RCNN based on region recommendation, and the detection accuracy is higher than that of YOLO and SSD according to results.
Table 2 shows the comparison of the accuracy and the recall ratio of the automobile and airplane detection results by the method provided in the embodiment of the present invention with other methods, and compared with other deep learning methods, the method provided in the embodiment of the present invention improves the average accuracy and the recall ratio of the automobile and airplane detection by 7% on average, which is about 1% higher than the best detection method.
Table 3 shows a comparison between the detection speed of the method provided by the embodiment of the present invention and the detection speed of other methods, and it can be seen from table 3 that the detection speed of about 8.8FPS can be achieved by using the network model in the method provided by the embodiment of the present invention as a backbone network for target detection, which is improved by 3 times compared with the previous model, and the detection speed is also improved compared with the main stream network model based on regional recommendation.
TABLE 3 comparison of the test rates with other methods
Figure BDA0002605528320000131
The experiment also used ablation studies to verify the effect of each sub-module on the test results, and from the ablation study data in table 4, the average accuracy was improved by 0.9% when the model used only the channel attention sub-module and the context attention sub-module, by 2.1% when the multi-scale attention sub-module and the channel attention sub-module were used, and by 2.3% when the context attention sub-module and the multi-scale attention sub-module were used, which indicates that the information features of the multi-scale and context are more helpful for testing the target, and by 3.7% when all sub-modules were used, it can be seen that each sub-module is effective for testing the target.
TABLE 4 Effect of modules on test results
Figure BDA0002605528320000141
The embodiment of the invention provides a parallel perception attention network model (neural network model) based on an attention mechanism to improve the accuracy and detection speed of remote sensing image target detection, wherein the network model comprises a parallel multi-scale attention submodule, a context attention submodule and a channel attention submodule. Firstly, the output of three parallel modules under multiple scales is fused to obtain abundant multi-scale features, context features and non-local associated features; then, the deformable convolution is used for replacing the traditional convolution in the obtained fused feature map, so that the object features sensitive to the direction are better extracted; finally, the distance intersection is used for replacing the traditional bounding box loss, so that the model convergence speed is accelerated, and more accurate target positioning is obtained; the experimental result proves that the network model is used as a backbone network for target detection, so that the detection accuracy and the detection speed can be effectively improved, and meanwhile, the network model also has a good detection effect on targets in a complex scene.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 11 is a schematic block diagram of a remote sensing image target detection system according to an embodiment of the present invention, and for convenience of description, only the parts related to the embodiment of the present invention are shown.
In the embodiment of the present invention, the remote sensing image target detection system 110 may include an acquisition module 1101, a feature extraction module 1102, and a target detection module 1103.
The acquisition module 1101 is used for acquiring a remote sensing image to be detected;
the feature extraction module 1102 is used for inputting the remote sensing image to be detected into the trained parallel perceptual attention network model to obtain a plurality of output feature maps with different scales;
and the target detection module 1103 is configured to perform target detection according to a plurality of output feature maps with different scales to obtain a detection result.
Optionally, in the feature extraction module 1102, the parallel perceptual attention network model takes a residual error network as a backbone;
the parallel perception attention network model comprises a first residual block, a second residual block, a third residual block, a fourth residual block, a first parallel perception attention module, a second parallel perception attention module, a third parallel perception attention module and a fourth parallel perception attention module; the sizes of the first residual block, the second residual block, the third residual block and the fourth residual block are all different;
the first parallel perception attention module takes the first residual block and the second residual block as input and outputs a first fused feature map IB 1; the second parallel perception attention module takes the second residual block and the third residual block as input and outputs a second fusion feature map; the third parallel perception attention module takes the third residual block and the fourth residual block as input and outputs a third fusion feature map; the fourth parallel perception attention module takes the fourth residual block as input and outputs a fourth fusion characteristic diagram;
obtaining an output characteristic diagram of a fourth scale by the fourth fusion characteristic diagram through deformable convolution; after the third fusion characteristic diagram is subjected to deformable convolution, adding the third fusion characteristic diagram and the output characteristic diagram of the fourth scale subjected to 2 times of upsampling to obtain an output characteristic diagram of the third scale; after the second fusion characteristic diagram is subjected to deformable convolution, adding the second fusion characteristic diagram and the output characteristic diagram of the third scale subjected to 2 times of upsampling to obtain an output characteristic diagram of the second scale; the first fused feature map IB1 is subjected to deformable convolution and then added with the output feature map of the second scale after being subjected to 2 times of upsampling to obtain the output feature map of the first scale.
Optionally, the first parallel attention sensing module, the second parallel attention sensing module and the third parallel attention sensing module have the same structure;
the first parallel perceptual attention module includes a first multi-scale attention submodule, a first contextual attention submodule, and a first channel attention submodule;
the first multi-scale attention submodule takes the first residual block and the second residual block as input and outputs a first scale feature map;
the first context attention submodule takes the first residual block as input and outputs a first context feature map;
the first channel attention submodule takes the first residual block as input and outputs a first channel characteristic diagram;
and fusing the first scale feature map, the first context feature map and the first channel feature map to obtain a first fused feature map IB 1.
Optionally, in the first multi-scale attention submodule, convolving the first residual block to obtain a first intermediate-scale feature map, convolving the second residual block to obtain a second intermediate-scale feature map, performing matrix transformation on the second intermediate-scale feature map, multiplying the second intermediate-scale feature map by the first intermediate-scale feature map to obtain a third intermediate-scale feature map, normalizing the third intermediate-scale feature map to obtain a first multi-scale attention weight map, multiplying the first multi-scale attention weight map by the second intermediate-scale feature map to obtain a fourth intermediate-scale feature map, upsampling the fourth intermediate-scale feature map, and adding the upsampled fourth intermediate-scale feature map to the first residual block to obtain the first scale feature map;
in a first context attention submodule, performing convolution on a first residual block to respectively obtain a first intermediate context feature map and a second intermediate context feature map, performing matrix transformation on the second intermediate context feature map, then performing multiplication operation on the second intermediate context feature map and the first intermediate context feature map to obtain a third intermediate context feature map, normalizing the third intermediate context feature map to obtain a first context attention weight map, performing multiplication operation on the first context attention weight map and a first residual block to obtain a fourth intermediate context feature map, performing matrix transformation on the fourth intermediate context feature map, and then performing addition operation on the fourth intermediate context feature map and the first residual block to obtain a first context feature map;
in the first channel attention submodule, a first residual block is subjected to matrix transformation and then multiplied by the first residual block to obtain a first intermediate channel characteristic diagram, the first intermediate channel characteristic diagram is normalized to obtain a first channel attention weight diagram, the first channel attention weight diagram is multiplied by the first residual block to obtain a second intermediate channel characteristic diagram, and the second intermediate channel characteristic diagram is subjected to matrix transformation and then added to the first residual block to obtain a first channel characteristic diagram.
Optionally, the calculation formula of the first multi-scale attention weight map M is:
Figure BDA0002605528320000161
wherein i represents the ith row, j represents the jth column, N is the height of the first residual block, A is a first intermediate-scale feature map, and B is a second intermediate-scale feature map;
the calculation formula of the first scale feature map E is as follows:
Figure BDA0002605528320000171
wherein, B1Is a first residual block, α is a first weight coefficient that can be learned;
the calculation formula of the first contextual attention weight map P is:
Figure BDA0002605528320000172
wherein K is a first intermediate context feature map, and D is a second intermediate context feature map;
the calculation formula of the first context feature map F is:
Figure BDA0002605528320000173
wherein β is a learnable second weight coefficient;
the first channel attention weight map Q is calculated as:
Figure BDA0002605528320000174
wherein, C is the channel number of the first residual block;
the calculation formula of the first channel characteristic diagram G is as follows:
Figure BDA0002605528320000175
where γ is a learnable third weight coefficient.
Optionally, the fourth parallel perceptual attention module comprises a fourth contextual attention submodule and a fourth channel attention submodule;
the fourth context attention submodule takes the fourth residual block as input and outputs a fourth context feature map;
the fourth channel attention submodule takes the fourth residual block as input and outputs a fourth channel characteristic diagram;
and fusing the fourth context feature map and the fourth channel feature map to obtain a fourth fused feature map.
Optionally, in the training of the parallel perceptual attention network model, a class loss function and a regression loss function are used, wherein the regression loss function is a distance cross-correlation loss function.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is merely used as an example, and in practical applications, the foregoing function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the remote sensing image target detection system is divided into different functional units or modules to perform all or part of the above-described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the above-mentioned apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Fig. 12 is a schematic block diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 12, the terminal device 120 of this embodiment includes: one or more processors 1201, a memory 1202, and a computer program 1203 stored in the memory 1202 and executable on the processors 1201. The processor 1201 implements the steps in the embodiments of the remote sensing image target detection method described above, for example, steps S101 to S103 shown in fig. 1, when executing the computer program 1203. Alternatively, the processor 1201 realizes the functions of the modules/units in the embodiment of the remote sensing image target detection system, for example, the functions of the modules 1101 to 1103 shown in fig. 11, when executing the computer program 1203.
Illustratively, the computer program 1203 may be partitioned into one or more modules/units that are stored in the memory 1202 and executed by the processor 1201 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 1203 in the terminal device 120. For example, the computer program 1203 may be divided into an acquisition module, a feature extraction module, and an object detection module, and each module specifically functions as follows:
the acquisition module is used for acquiring a remote sensing image to be detected;
the feature extraction module is used for inputting the remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output feature maps with different scales;
and the target detection module is used for carrying out target detection according to the output characteristic graphs with different scales to obtain a detection result.
Other modules or units can refer to the description of the embodiment shown in fig. 11, and are not described again here.
The terminal device 120 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device 120 includes, but is not limited to, a processor 1201 and a memory 1202. Those skilled in the art will appreciate that fig. 12 is only one example of a terminal device 120, and does not constitute a limitation to terminal device 120, and may include more or less components than those shown, or combine certain components, or different components, for example, terminal device 120 may also include an input device, an output device, a network access device, a bus, etc.
The Processor 1201 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 1202 may be an internal storage unit of the terminal device 120, such as a hard disk or a memory of the terminal device 120. The memory 1202 may also be an external storage device of the terminal device 120, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 120. Further, the memory 1202 may also include both an internal storage unit of the terminal device 120 and an external storage device. The memory 1202 is used for storing the computer program 1203 and other programs and data required by the terminal device 120. The memory 1202 may also be used to temporarily store data that has been output or is to be output.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed remote sensing image target detection system and method may be implemented in other ways. For example, the above-described embodiments of the remote sensing image object detection system are merely illustrative, and for example, the division of the modules or units is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A remote sensing image target detection method is characterized by comprising the following steps:
acquiring a remote sensing image to be detected;
inputting the remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output characteristic graphs with different scales;
and carrying out target detection according to the output characteristic graphs with different scales to obtain a detection result.
2. The remote sensing image target detection method of claim 1, wherein the parallel perceptual attention network model is based on a residual error network;
the parallel perception attention network model comprises a first residual block, a second residual block, a third residual block, a fourth residual block, a first parallel perception attention module, a second parallel perception attention module, a third parallel perception attention module and a fourth parallel perception attention module; the first, second, third and fourth residual blocks all have different sizes;
the first parallel perception attention module takes the first residual block and the second residual block as input and outputs a first fusion feature map; the second parallel perception attention module takes the second residual block and the third residual block as input and outputs a second fusion feature map; the third parallel perceptual attention module takes the third residual block and the fourth residual block as input and outputs a third fused feature map; the fourth parallel perception attention module takes the fourth residual block as input and outputs a fourth fusion feature map;
the fourth fusion feature map is subjected to deformable convolution to obtain an output feature map of a fourth scale; after the third fusion feature map is subjected to deformable convolution, adding the third fusion feature map and the output feature map of the fourth scale subjected to 2 times of upsampling to obtain an output feature map of the third scale; the second fusion feature map is subjected to deformable convolution and then is added with the output feature map of the third scale subjected to 2 times of upsampling to obtain an output feature map of the second scale; and the first fusion feature map is subjected to deformable convolution and then is added with the output feature map of the second scale subjected to 2 times of upsampling to obtain an output feature map of the first scale.
3. The remote sensing image target detection method of claim 2, wherein the first parallel perceptual attention module, the second parallel perceptual attention module and the third parallel perceptual attention module have the same structure;
the first parallel perceptual attention module includes a first multi-scale attention submodule, a first contextual attention submodule, and a first channel attention submodule;
the first multi-scale attention submodule takes the first residual block and the second residual block as input and outputs a first scale feature map;
the first context attention submodule takes the first residual block as input and outputs a first context feature map;
the first channel attention submodule takes the first residual block as input and outputs a first channel characteristic diagram;
and fusing the first scale feature map, the first context feature map and the first channel feature map to obtain the first fused feature map.
4. The remote sensing image target detection method of claim 3, wherein in the first multi-scale attention sub-module, convolving the first residual block to obtain a first intermediate scale feature map, convolving the second residual block to obtain a second intermediate scale feature map, performing matrix transformation on the second intermediate scale feature map, multiplying the second intermediate scale feature map by the first intermediate scale feature map to obtain a third intermediate scale feature map, normalizing the third intermediate-scale feature map to obtain a first multi-scale attention weight map, multiplying the first multi-scale attention weight map and the second intermediate-scale feature map to obtain a fourth intermediate-scale feature map, performing upsampling on the fourth intermediate-scale feature map, and then performing addition operation on the upsampled fourth intermediate-scale feature map and the first residual block to obtain a first-scale feature map;
in the first context attention submodule, performing convolution on the first residual block to respectively obtain a first intermediate context feature map and a second intermediate context feature map, performing matrix transformation on the second intermediate context feature map, then performing multiplication operation on the second intermediate context feature map and the first intermediate context feature map to obtain a third intermediate context feature map, normalizing the third intermediate context feature map to obtain a first context attention weight map, performing multiplication operation on the first context attention weight map and the first residual block to obtain a fourth intermediate context feature map, performing matrix transformation on the fourth intermediate context feature map, and then performing addition operation on the fourth intermediate context feature map and the first residual block to obtain the first context feature map;
in the first channel attention submodule, performing matrix transformation on the first residual block, then performing multiplication operation on the first residual block and the first residual block to obtain a first intermediate channel feature map, normalizing the first intermediate channel feature map to obtain a first channel attention weight map, performing multiplication on the first channel attention weight map and the first residual block to obtain a second intermediate channel feature map, performing matrix transformation on the second intermediate channel feature map, and then performing addition operation on the second intermediate channel feature map and the first residual block to obtain the first channel feature map.
5. The method for detecting the target in the remote sensing image as claimed in claim 4, wherein the calculation formula of the first multi-scale attention weight map M is as follows:
Figure FDA0002605528310000031
wherein i represents the ith row, j represents the jth column, N is the height of the first residual block, a is the first intermediate-scale feature map, and B is the second intermediate-scale feature map;
the calculation formula of the first scale feature map E is as follows:
Figure FDA0002605528310000032
wherein, B1For the first residual block, α is a learnable first weight coefficient;
the calculation formula of the first context attention weight map P is as follows:
Figure FDA0002605528310000033
wherein K is the first intermediate context feature map, and D is the second intermediate context feature map;
the calculation formula of the first context feature map F is as follows:
Figure FDA0002605528310000034
wherein β is a learnable second weight coefficient;
the calculation formula of the first channel attention weight map Q is as follows:
Figure FDA0002605528310000035
wherein C is the channel number of the first residual block;
the calculation formula of the first channel characteristic diagram G is as follows:
Figure FDA0002605528310000041
where γ is a learnable third weight coefficient.
6. The remote sensing image target detection method of claim 2, wherein the fourth parallel perceptual attention module comprises a fourth contextual attention submodule and a fourth channel attention submodule;
the fourth context attention submodule takes the fourth residual block as input and outputs a fourth context feature map;
the fourth channel attention submodule takes a fourth residual block as input and outputs a fourth channel characteristic diagram;
and fusing the fourth context feature map and the fourth channel feature map to obtain the fourth fused feature map.
7. The remote sensing image target detection method of any one of claims 1-6, characterized in that a class loss function and a regression loss function are used in the training process of the parallel perceptual attention network model, wherein the regression loss function is a distance cross-correlation loss function.
8. A remote sensing image target detection system, comprising:
the acquisition module is used for acquiring a remote sensing image to be detected;
the feature extraction module is used for inputting the remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output feature maps with different scales;
and the target detection module is used for carrying out target detection according to the output characteristic graphs with different scales to obtain a detection result.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method for object detection of remote sensing images according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by one or more processors, carries out the steps of the method for object detection according to any one of claims 1 to 7.
CN202010737230.1A 2020-07-28 2020-07-28 Remote sensing image target detection method and system and terminal equipment Active CN111860398B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010737230.1A CN111860398B (en) 2020-07-28 2020-07-28 Remote sensing image target detection method and system and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010737230.1A CN111860398B (en) 2020-07-28 2020-07-28 Remote sensing image target detection method and system and terminal equipment

Publications (2)

Publication Number Publication Date
CN111860398A true CN111860398A (en) 2020-10-30
CN111860398B CN111860398B (en) 2022-05-10

Family

ID=72948358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010737230.1A Active CN111860398B (en) 2020-07-28 2020-07-28 Remote sensing image target detection method and system and terminal equipment

Country Status (1)

Country Link
CN (1) CN111860398B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270278A (en) * 2020-11-02 2021-01-26 重庆邮电大学 Key point-based blue top house detection method
CN112487900A (en) * 2020-11-20 2021-03-12 中国人民解放军战略支援部队航天工程大学 SAR image ship target detection method based on feature fusion
CN112949453A (en) * 2021-02-26 2021-06-11 南京恩博科技有限公司 Training method of smoke and fire detection model, smoke and fire detection method and smoke and fire detection equipment
CN113129345A (en) * 2021-04-19 2021-07-16 重庆邮电大学 Target tracking method based on multi-feature map fusion and multi-scale expansion convolution
CN113159013A (en) * 2021-04-28 2021-07-23 平安科技(深圳)有限公司 Paragraph identification method and device based on machine learning, computer equipment and medium
CN113239825A (en) * 2021-05-19 2021-08-10 四川中烟工业有限责任公司 High-precision tobacco beetle detection method in complex scene
CN113628245A (en) * 2021-07-12 2021-11-09 中国科学院自动化研究所 Multi-target tracking method, device, electronic equipment and storage medium
CN114529808A (en) * 2022-04-21 2022-05-24 南京北控工程检测咨询有限公司 Pipeline detection panoramic shooting processing method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197182A (en) * 2019-06-11 2019-09-03 中国电子科技集团公司第五十四研究所 Remote sensing image semantic segmentation method based on contextual information and attention mechanism
CN110378297A (en) * 2019-07-23 2019-10-25 河北师范大学 A kind of Remote Sensing Target detection method based on deep learning
CN110414377A (en) * 2019-07-09 2019-11-05 武汉科技大学 A kind of remote sensing images scene classification method based on scale attention network
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111079739A (en) * 2019-11-28 2020-04-28 长沙理工大学 Multi-scale attention feature detection method
CN111160249A (en) * 2019-12-30 2020-05-15 西北工业大学深圳研究院 Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method
CN111259758A (en) * 2020-01-13 2020-06-09 中国矿业大学 Two-stage remote sensing image target detection method for dense area
CN111274869A (en) * 2020-01-07 2020-06-12 中国地质大学(武汉) Method for classifying hyperspectral images based on parallel attention mechanism residual error network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197182A (en) * 2019-06-11 2019-09-03 中国电子科技集团公司第五十四研究所 Remote sensing image semantic segmentation method based on contextual information and attention mechanism
CN110414377A (en) * 2019-07-09 2019-11-05 武汉科技大学 A kind of remote sensing images scene classification method based on scale attention network
CN110378297A (en) * 2019-07-23 2019-10-25 河北师范大学 A kind of Remote Sensing Target detection method based on deep learning
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111079739A (en) * 2019-11-28 2020-04-28 长沙理工大学 Multi-scale attention feature detection method
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method
CN111160249A (en) * 2019-12-30 2020-05-15 西北工业大学深圳研究院 Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN111274869A (en) * 2020-01-07 2020-06-12 中国地质大学(武汉) Method for classifying hyperspectral images based on parallel attention mechanism residual error network
CN111259758A (en) * 2020-01-13 2020-06-09 中国矿业大学 Two-stage remote sensing image target detection method for dense area

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270278A (en) * 2020-11-02 2021-01-26 重庆邮电大学 Key point-based blue top house detection method
CN112487900A (en) * 2020-11-20 2021-03-12 中国人民解放军战略支援部队航天工程大学 SAR image ship target detection method based on feature fusion
CN112487900B (en) * 2020-11-20 2022-11-15 中国人民解放军战略支援部队航天工程大学 SAR image ship target detection method based on feature fusion
CN112949453A (en) * 2021-02-26 2021-06-11 南京恩博科技有限公司 Training method of smoke and fire detection model, smoke and fire detection method and smoke and fire detection equipment
CN112949453B (en) * 2021-02-26 2023-12-26 南京恩博科技有限公司 Training method of smoke and fire detection model, smoke and fire detection method and equipment
CN113129345A (en) * 2021-04-19 2021-07-16 重庆邮电大学 Target tracking method based on multi-feature map fusion and multi-scale expansion convolution
CN113159013A (en) * 2021-04-28 2021-07-23 平安科技(深圳)有限公司 Paragraph identification method and device based on machine learning, computer equipment and medium
CN113239825B (en) * 2021-05-19 2022-08-19 四川中烟工业有限责任公司 High-precision tobacco beetle detection method in complex scene
CN113239825A (en) * 2021-05-19 2021-08-10 四川中烟工业有限责任公司 High-precision tobacco beetle detection method in complex scene
CN113628245A (en) * 2021-07-12 2021-11-09 中国科学院自动化研究所 Multi-target tracking method, device, electronic equipment and storage medium
CN113628245B (en) * 2021-07-12 2023-10-31 中国科学院自动化研究所 Multi-target tracking method, device, electronic equipment and storage medium
CN114529808B (en) * 2022-04-21 2022-07-19 南京北控工程检测咨询有限公司 Pipeline detection panoramic shooting processing system and method
CN114529808A (en) * 2022-04-21 2022-05-24 南京北控工程检测咨询有限公司 Pipeline detection panoramic shooting processing method

Also Published As

Publication number Publication date
CN111860398B (en) 2022-05-10

Similar Documents

Publication Publication Date Title
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
US11734851B2 (en) Face key point detection method and apparatus, storage medium, and electronic device
CN109522942B (en) Image classification method and device, terminal equipment and storage medium
CN109086811B (en) Multi-label image classification method and device and electronic equipment
WO2020119527A1 (en) Human action recognition method and apparatus, and terminal device and storage medium
CN111402130B (en) Data processing method and data processing device
CN108399386A (en) Information extracting method in pie chart and device
CN111476719B (en) Image processing method, device, computer equipment and storage medium
CN111340077B (en) Attention mechanism-based disparity map acquisition method and device
CN113469088B (en) SAR image ship target detection method and system under passive interference scene
Wang et al. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection
CN115953665B (en) Target detection method, device, equipment and storage medium
CN113191489B (en) Training method of binary neural network model, image processing method and device
CN109815931A (en) A kind of method, apparatus, equipment and the storage medium of video object identification
CN112634316A (en) Target tracking method, device, equipment and storage medium
CN115272691A (en) Training method, recognition method and equipment for steel bar binding state detection model
CN114049491A (en) Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium
CN116740362B (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
CN110633630B (en) Behavior identification method and device and terminal equipment
CN116309643A (en) Face shielding score determining method, electronic equipment and medium
CN111104965A (en) Vehicle target identification method and device
CN115601820A (en) Face fake image detection method, device, terminal and storage medium
CN115577768A (en) Semi-supervised model training method and device
CN114820755A (en) Depth map estimation method and system
CN114758145A (en) Image desensitization method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant