CN112183414A - Weak supervision remote sensing target detection method based on mixed hole convolution - Google Patents

Weak supervision remote sensing target detection method based on mixed hole convolution Download PDF

Info

Publication number
CN112183414A
CN112183414A CN202011068687.4A CN202011068687A CN112183414A CN 112183414 A CN112183414 A CN 112183414A CN 202011068687 A CN202011068687 A CN 202011068687A CN 112183414 A CN112183414 A CN 112183414A
Authority
CN
China
Prior art keywords
feature
detection
representing
features
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011068687.4A
Other languages
Chinese (zh)
Inventor
陈苏婷
邵东威
张闯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202011068687.4A priority Critical patent/CN112183414A/en
Publication of CN112183414A publication Critical patent/CN112183414A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a weak supervision remote sensing target detection method based on mixed hole convolution. The invention adopts various custom designs such as mixed cavity convolution, attention mechanism, multilayer pooling and the like, enhances multi-scale feature extraction and fusion, and improves the robustness to objects with different sizes. In addition, asynchronous iterative alternative training between the strong supervision detector and the weak supervision detector is utilized, and training and detection can be performed only by image-level real labels, so that the aim of cooperatively improving the detection performance is fulfilled.

Description

Weak supervision remote sensing target detection method based on mixed hole convolution
Technical Field
The invention relates to the field of pattern recognition, in particular to a weak supervision remote sensing target detection method based on mixed hole convolution.
Background
With the development and combination of the aviation technology and the computer vision technology, high-altitude high-resolution optical remote sensing images are easier to acquire and are applied to various fields. As a fundamental feature extraction problem in remote sensing image analysis, there has been considerable history of research in this field by academia. Specifically, the target of the remote sensing image target detection comprises the positioning of a ground object and the classification of object classes. In recent years, research results in the field of remote sensing image target detection are rapidly advanced, and a plurality of algorithms can simultaneously realize high-precision ground object positioning and identification work. Most of the image characteristic and target identification stages are decomposed into two stages, and according to the extracted characteristic types, the target detection method in the remote sensing image can be divided into a method based on traditional manual characteristics and a method based on deep learning.
The traditional target detection method facing remote sensing images can be roughly divided into three processes: firstly, selecting a region to be detected by using a sliding window, then extracting the characteristics of each selected region, and finally judging the object type contained in the region by using a classifier such as a support vector machine. However, the conventional method faces two major problems. On one hand, the sliding window scans the whole image, the pertinence is lacked, the time complexity is high, and a large number of redundant windows with characteristics to be extracted exist. On the other hand, because the information contained in the remote sensing image is very complex, the object types and sizes are diversified, and the edge difference between the object and the background, such as a city or a forest, is not very obvious, the semantic information of the object cannot be extracted by the traditional manual feature extraction algorithm based on image processing and machine learning, and the robustness of the remote sensing image target detection is poor.
The remote sensing image target detection method based on deep learning is an end-to-end model actually, and comprises a complete frame, wherein the frame simultaneously comprises an identification stage of an object in an image and a regression stage of an object detection frame. A plurality of regions containing potential objects of interest may be generated first using a region extractor. Then, a feature extractor extracts features of these regions of interest. Finally, according to the extracted features, the classifier generates the classes of the objects in the region of interest, and the position estimator can predict the positions of the objects more accurately. The method based on deep learning considers the features of the global scale, and utilizes the features of the full-connection layer to refine the position and the size of the candidate frame, so that the method has certain robustness on the size of an object in an image, but lacks natural robustness on the scale change of the object. Therefore, the existing work typically fuses feature maps of multiple scales to solve this problem, such as a Feature Pyramid Network (FPN). The method makes up for the loss of low-level visual features in the process of extracting high-level semantic features, and is beneficial to the feature learning of the network. However, this kind of method generally predicts the multi-scale feature maps separately, and the network becomes very complex and not beneficial for training.
In addition, another important problem in the field of target detection of remote sensing images is the lack of a marked data set, and the progress of remote sensing technology brings a large amount of high-resolution data, the image data simultaneously contains a large amount of target objects to be detected, and a large amount of manpower and material resources are needed to manually mark the detection frames of the objects in the images one by one.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems and the defects in the prior art, the invention provides a weak supervision remote sensing target detection method based on mixed hole convolution. The invention designs a novel backbone network, which can greatly reduce the information loss in the feature extraction process, and introduces a channel attention module and a multilayer pooling module to further strengthen and fuse the extracted features. Meanwhile, a weak supervision learning mode is adopted, the training of a target detection task can be carried out under the condition that supervision information at the detection frame level is not needed, and the detection precision is cooperatively improved.
The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a weak supervision remote sensing target detection method based on mixed hole convolution comprises the following steps:
(1) acquiring a remote sensing image data set to be detected, and dividing the data set into a training set, a verification set and a test set according to a proportion;
(2) the method comprises the steps that a lossless residual error network is constructed by utilizing mixed hole convolution, and multi-scale features, namely low-level visual features and high-level semantic features, are extracted from a target object in a remote sensing image by using the lossless residual error network, so that a receptive field can cover the whole area, loss of edge information is avoided, and the robustness of the whole network to the multi-scale target in the remote sensing image is directly improved;
(3) sending the features extracted in the step (2) into a channel attention module, strengthening key feature information effective to a target detection task, and inhibiting invalid feature information;
(4) sending the features enhanced in the step (3) into a cascade multilayer pooling module for feature fusion to realize further fusion of low-level visual features and high-level semantic features, wherein the fused features are used as final output of a feature extraction network;
(5) sending the final characteristics obtained in the step (4) into a cooperative detection module, wherein the module is provided with two branches of a multi-instance learning branch and a detection frame regression branch, a weak supervision detection network WSDDN is used as the multi-instance learning branch to generate pseudo label information, a strong supervision detection network Fast R-CNN is used as the detection frame regression branch to realize more accurate target positioning, and the detection class probability and the detection frame of the target in the graph are used as the detection result of the module;
(6) calculating consistency errors of two branch training according to the detection results of the step (5), updating weight parameters of the two branch training simultaneously through a gradient descent algorithm, performing collaborative training, testing detection precision through a verification set, and continuously adjusting a network model until the precision meets expectations;
(7) and taking the trained network model as a detector, inputting the characteristics of the test set into the detector for detection, and obtaining a detection result, namely the probability and the detection frame of the target object in the remote sensing image.
Further, in the step (2), a lossless residual error network is constructed by using mixed hole convolution, and lossless multi-scale feature extraction is performed on the target in the remote sensing image, wherein the method comprises the following steps:
(2.1) based on ResNet-101, inserting 23 x3 hole convolutions with expansion rates of 2 and 5 respectively after standard 3x3 convolution in the original residual block to form a continuous hole convolution combination with expansion rates of 1,2 and 5, thereby constructing a new residual block, namely a lossless residual block. In addition, dense connection is added in the lossless residual block, namely the output of each cavity convolution layer is connected with the input characteristic and then input into the next cavity convolution layer, so that the bottom layer characteristic beneficial to target positioning is shared and reused. ResNet-101 here refers to a residual network with a depth of 101 layers.
(2.2) the first three stages of ResNet-101 are retained, and then 23 and 3 lossless residual blocks are stacked in the 4 th and 5 th stages, respectively, instead of the 4 th and 5 th stages in the original network. The stacking structure can improve the information utilization rate under the condition of keeping the size of the receiving field unchanged, effectively enhance the correlation between remote information and relieve the grid effect;
(2.3) stages 4 and 5 keep the same number of input channels as stage 3, i.e. 256 convolution kernels, and remove the downsampling operation so that the resolution of the output feature map remains at 1/8 of the original image.
Further, in step (3), the features extracted in step (2) are used as the input of the channel attention module to enhance the feature expression most relevant to the target location, and the specific process is as follows:
(3.1) features extracted for the 5 th stage in step (2.3)
Figure BDA0002714649430000037
The module performs convolution operation once by using C +1 convolution cores to obtain C +1 characteristic graphs
Figure BDA0002714649430000031
H, W and C herein represent the height, width and number of channels, respectively, of the feature map;
(3.2) decomposing the characteristics obtained in the step (3.1) on channel dimension to respectively obtain C characteristic graphs
Figure BDA0002714649430000032
And 1 single channel feature map
Figure BDA0002714649430000033
And to f2Performing Sigmoid activation operation to obtain 1 channel attention matrix
Figure BDA0002714649430000038
The importance of each characteristic channel, namely the weight value, can be automatically reflected;
(3.3) attention matrix M and feature map f of the channel1Respectively multiplying element by element, specifically, multiplying each pixel point by the corresponding weight in the attention matrix, further strengthening the important characteristics of the target detection task and inhibiting the unimportant characteristics, and finally obtaining the output characteristics
Figure BDA0002714649430000034
The mathematical expression of the whole module is
Figure BDA0002714649430000035
Wherein the content of the first and second substances,
Figure BDA0002714649430000036
representing element-by-element multiplication and sigma (#) represents a Sigmoid activation function.
Further, in the step (4), the output features of the step (3) are sent to a cascade multilayer pooling module, + realizing feature fusion of different layers, and the method is as follows:
(4.1) this module uses pooling layers with 6 different kernel sizes (1x1,2x2,4x4,8x8,10x2,2x20) for feature F obtained in step (3.3)attentionPerforming multilevel pooling operation to obtain feature maps P at 6 different spatial scalesi={P1,P2,P3,P4,P5,P6And the 5 th inner core and the 6 th inner core are respectively an average pooling layer in the vertical direction and the horizontal direction, so that the design can capture long-strip-shaped target features which are difficult to detect in remote sensing images, such as bridges, ships and the like. The expression of the step is as follows:
Figure BDA0002714649430000041
wherein, PiIndicating pooling characteristics, pavg(*)andpmax(. x) represents average pooling and maximum pooling operations, respectively.
(4.2) compressing the channel number of the feature map extracted in the step (4.1) to the input feature F by using 1x1 convolutionattention1/8 for limiting the weight of the global feature in the subsequent feature fusion stage to obtain the intermediateFeature Ci={C1,C2,C3,C4,C5,C6}. The expression of the step is as follows: ci=fconv(Pi) I ∈ {1, 2., 6}, where C isiRepresents an intermediate feature, fconv() denotes the convolution operation and i denotes the number of layers of the module.
(4.3) obtaining the intermediate characteristic C in the step (4.2)1To C6And original input features FattentionPerforming first splicing on the channel dimension to obtain a fused feature FconcatAnd integrating the characteristic information of coarse granularity and fine granularity to make up the loss of spatial information caused by deepening the network. The expression of the step is as follows:
Figure BDA0002714649430000042
Figure BDA0002714649430000043
wherein, FconcatAnd FattentionRespectively representing the fusion feature and the feature after attention-boost,
Figure BDA0002714649430000044
representing the join operation of the feature map in the channel dimension.
(4.4) feature F extracted in the 2 nd stage in the step (2.3)stage2Down-sampling to step (4.3) fused feature FconcatSize of (d), and FconcatPerforming second splicing on the channel dimension, and performing convolution operation on the spliced features for three times to further promote the fusion of the low-layer high-resolution detail features and the high-layer semantic features to obtain the final output features F of the feature extraction moduleout. The feature extraction module is a general term for lossless residual error network, channel attention module and cascade multi-layer pooling module. The expression of the step is as follows:
Figure BDA0002714649430000045
Figure BDA0002714649430000046
wherein, Fstage2And FoutRespectively representing the extracted features of the 2 nd stage in the step (2) and the final output features of the step (4).
Further, constructing a two-stage collaborative detection module with a multi-instance learning branch and a detection frame regression branch to train and detect the remote sensing images in the training set; the specific process is as follows:
(5.1) for each training or test image, using a selective search algorithm (SSW) to generate 2000 candidate frames of the target to be detected, and mapping each candidate frame to the final feature F output in the step (4)OutNormalizing the feature map corresponding to each candidate frame by using a Space Pyramid Pool (SPP) to obtain pooling features with fixed sizes;
(5.2) accessing the pooled features obtained in the step (5.1) into two full-connection layers, converting the pooled features into feature vectors of all candidate frames, and respectively sending the feature vectors into two different branches: one branch outputs the probability that the target object belongs to each category according to the content of the candidate frame; and the other branch outputs the probability of various target objects contained in the candidate frame according to the position of the candidate frame, each branch consists of a full connection layer and a Softmax layer, the output matrixes of the two branches are multiplied element by element to obtain the category label of each candidate frame, and the calculation formula of the category label of each candidate frame is as follows:
Figure BDA0002714649430000051
in the above formula, PjcA category label representing each of the candidate boxes,
Figure BDA0002714649430000052
representing the class probability that each candidate box j belongs to each class c,
Figure BDA0002714649430000053
representing the position probability of each candidate box j belonging to each category c;
(5.3) adding the class labels of all the candidate frames to obtain the prediction probability of each class of target object, wherein the prediction probability is used as an image-level prediction label of the whole remote sensing image, and the calculation formula of the prediction probability of each class of target object is as follows:
Figure BDA0002714649430000054
in the above-mentioned formula,
Figure BDA0002714649430000055
target class prediction result representing the entire picture, JWRepresenting the number of candidate boxes. Then, calculating the cross entropy loss between the predicted label and the real label to iteratively update the training process of the WSDDN, wherein the calculation formula is as follows:
Figure BDA0002714649430000056
in the above formula, ycThe true label representing the target class, y, is due to cross-entropy loss of the two classesc∈{-1,1};JWRepresenting the number of candidate boxes and C representing the number of categories.
(5.4) when the loss in the step (5.3) exceeds a threshold, for example, the threshold is set to 0.5, extracting the weak supervised prediction result (namely, the pseudo label) with high confidence coefficient in the WSDDN, and providing a strong supervised prediction result calculation error obtained by Fast R-CNN as a real label, thereby realizing more accurate detection frame regression. In particular, the method of manufacturing a semiconductor device,
the final characteristic F obtained in the step (4.4)outA spatial pyramid layer and two full-connection layers are accessed in the same way, and then two different branches (namely a classification branch and a regression branch) are sent in to respectively obtain predicted class probability picAnd a coordinate parameter ticAs a result of a strongly supervised prediction.
(5.5) normalizing the cooperative training process of the two strong and weak supervised detection networks by using a joint loss function to obtain a final prediction result, wherein the specific process is as follows:
1) obtaining a prediction label { (p) of WSDDN and Fast R-CNN in the same remote sensing imagejc,tjc) And { (p)ic,tic)};
2) Calculating the class loss L of the WSDDN for each candidate framecls
3) Calculating the class loss L between WSDDN and Fast R-CNN for each candidate regioncls_interAnd frame regression loss LDIoUAnd the frame regression loss adopts distance cross-correlation ratio (DIoU) loss.
4) And carrying out weighted summation on the loss of the three parts to obtain a joint loss function of the cooperative detection network, wherein the specific formula is as follows:
Figure BDA0002714649430000061
above formula, JWAnd JSRespectively representing the number of candidate frames extracted by weak supervision and strong supervision; p is a radical ofjcAnd picRespectively representing the prediction classes of weak supervision and strong supervision, tjcAnd ticRespectively representing the coordinates of the weak supervision predicted position and the strong supervision predicted position; l iscls_interRepresenting the consistency of the class prediction between two detection networks under strong and weak supervision, LclsRepresenting class prediction loss inside strong supervision; i isijIoU overlap of target object detection boxes extracted by two networks, and if I is greater than 0.5ijIs 1, otherwise is 0; beta is a hyper-parameter between 0 and 1, which is used for balancing the consistency of the predictions of the strong and weak supervision networks, and the larger beta indicates that the strong supervision network trusts the positions of the target objects predicted by the weak supervision networks more. The last item in the loss function is used for restricting the consistency of the positions of the detection frames between two networks and preventing the detection frame difference of strong and weak supervision prediction from being too large. In addition, the border regression operation in the collaborative loss function adopts DIoU, and the calculation steps are as follows:
Figure BDA0002714649430000062
whereinIoU is the value of the portion of the two regions that overlap divided by the portion of the set of the two regions,
Figure BDA0002714649430000063
the representation can simultaneously cover the diagonal distance of the minimum closure area of the anchor frame and the real detection frame,
Figure BDA0002714649430000064
and the distance between the anchor frame and the center point of the real detection frame is represented. The calculation formula of the frame regression loss function based on the DIoU is as follows:
Figure BDA0002714649430000065
in summary, the overall loss function of the cooperative detection module is as follows:
Ltotal=LWSDDN+LSSD
has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
aiming at the problem of insufficient marking quantity of remote sensing data, the invention designs an end-to-end remote sensing target detection network combining a weak supervision detector and a strong supervision detector, constructs a combined loss function, and performs collaborative training, parameter sharing and synchronous promotion on the two, thereby remarkably improving the performance of training only by using an image-level label;
aiming at the characteristic of huge target scale difference in the remote sensing image, the invention designs a novel backbone network by utilizing mixed hole convolution, thereby greatly reducing information loss in the characteristic extraction process and realizing the full coverage of receptive field; an attention module and a cascade multi-layer pooling module are connected to the rear end of the system, so that the sensitivity of the network to scale change is effectively inhibited, and the capability of feature learning is further improved.
Aiming at the defect of the detection branch of Fast R-CNN in the frame regression stage, the invention defines a multitask loss function based on DIoU, and can improve the accuracy and convergence speed of frame regression.
Drawings
FIG. 1 is a training flow diagram of the present invention;
FIG. 2 is a block diagram of a network used in the present invention;
FIG. 3 is a schematic diagram of the test results obtained from the training of the present invention;
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The invention relates to a cooperative learning-based weak supervision remote sensing image multi-target detection method, an algorithm framework is shown in figure 1, and the method comprises the following steps:
(1) acquiring a remote sensing image data set to be detected, and dividing the data set into a training set, a verification set and a test set according to a proportion;
the remote sensing image data used in this embodiment are TGRS-HRRSD and DIOR data sets. Wherein, the TGRS-HRRSD comprises 21761 high-altitude images from google earth and hundredth maps in total, comprising 55740 target object instances of 13 classes; the DIOR comprises 20 types of 23463 specially-picked high-altitude remote sensing images, and the data set comprises 192472 target instances.
In this embodiment, a pytorech framework is adopted, and a programming experiment is performed by combining a python language, so that the pytorech can be regarded as a powerful deep neural network with an automatic derivation function. The data set is divided into a training set, a verification set and a test set, which are respectively used for training, verifying and testing the detection model, and the basic information is shown in table 1:
TABLE 1
Data set Training set Verification set Test set
TGRS-HRRSD 5401 5417 10943
DIOR 5862 5863 11738
(2) Constructing a lossless residual error network by using mixed cavity convolution, and performing lossless multi-scale feature extraction on a target in a remote sensing image, wherein the method comprises the following steps:
(2.1) based on ResNet-101, inserting 23 x3 hole convolutions with expansion rates of 2 and 5 respectively after standard 3x3 convolution in the original residual block to form a continuous hole convolution combination with expansion rates of 1,2 and 5, thereby constructing a new residual block, namely a lossless residual block. In addition, dense connection is added in the lossless residual block, namely the output of each cavity convolution layer is connected with the input characteristic and then input into the next cavity convolution layer, so that the bottom layer characteristic which has large influence on target positioning is shared and reused. ResNet-101 here refers to a residual network with a depth of 101 layers.
(2.2) the first three stages of ResNet-101 are retained, and then 23 and 3 lossless residual blocks are stacked in the 4 th and 5 th stages, respectively, instead of the 4 th and 5 th stages in the original network. The stacking structure can improve the information utilization rate under the condition of keeping the size of the receiving field unchanged, effectively enhances the correlation between remote information and relieves the grid effect.
(2.3) stages 4 and 5 keep the same number of input channels as stage 3, i.e. 256 convolution kernels, and remove the downsampling operation so that the resolution of the output feature map remains at 1/8 of the original image.
(3) And (3) taking the features extracted in the step (2) as the input of a channel attention module to strengthen the feature expression most relevant to target positioning, wherein the method comprises the following steps:
(3.1) features extracted for step (2)
Figure BDA0002714649430000088
The module performs convolution operation once by using C +1 convolution cores to obtain C +1 characteristic graphs
Figure BDA0002714649430000081
(3.2) decomposing the characteristics obtained in the step (3.1) on channel dimension to respectively obtain C characteristic graphs
Figure BDA0002714649430000082
And 1 single channel feature map
Figure BDA0002714649430000083
And to f2Performing Sigmoid activation operation to obtain 1 channel attention matrix
Figure BDA0002714649430000084
The importance of each characteristic channel, namely the weight value, can be automatically reflected;
(3.3) drawing M of channel attention and feature map f2Element-by-element multiplication is carried out, the obtained importance degree is utilized to promote the characteristics and inhibit the characteristics which are not important to the target detection task, and finally the output characteristics are obtained
Figure BDA0002714649430000085
Further, the overall expression of step (3) is as follows:
Figure BDA0002714649430000086
wherein the content of the first and second substances,
Figure BDA0002714649430000087
representing element-by-element multiplication and sigma (#) represents a Sigmoid activation function.
(4) Sending the features enhanced in the step (3) into a cascade multi-layer pooling module to realize feature fusion of different layers, wherein the method comprises the following steps:
(4.1) this module uses pooling layers with 6 different kernel sizes (1x1,2x2,4x4,8x8,10x2,2x20) for feature F obtained in step (3.3)attentionPerforming multilevel pooling operation to obtain feature maps P at 6 different spatial scalesi={P1,P2,P3,P4,P5,P6}. The 5 th inner core and the 6 th inner core are respectively an average pooling layer in the vertical direction and the horizontal direction, so that the design can capture strip-shaped target features which are difficult to detect in remote sensing images, such as bridges and ships.
(4.2) compressing the channel number of the feature map extracted in the step (4.1) to 1/8 of the original input channel by utilizing 1x1 convolution, wherein the channel number is used for limiting the weight of the global features in the subsequent feature fusion stage to obtain an intermediate feature Ci={C1,C2,C3,C4,C5,C6}。
(4.3) obtaining the intermediate characteristic C in the step (4.2)1To C6And original input features FattentionPerforming first splicing on the channel dimension to obtain a fused feature FconcatAnd the detail features of the network with low layer and high resolution are fused with the semantic features of the network with high layer to make up the loss of spatial information caused by deepening the network.
(4.4) feature F extracted in the 2 nd stage in the step (2.3)stage2Down-sampling to step (4.3) fused feature FconcarSize of (d), and FconcatPerforming second splicing on the channel dimension, and performing convolution operation on the spliced features for three times to further promote the fusion of the low-layer high-resolution detail features and the high-layer semantic features to obtain the final output features F of the feature extraction moduleout. The feature extraction module is a general term for lossless residual error network, channel attention module and cascade multi-layer pooling module.
Further, the overall expression of step (4) is as follows:
Figure BDA0002714649430000091
Ci=fconv(Pi),i∈{1,2,...,6}
Figure BDA0002714649430000092
Figure BDA0002714649430000093
wherein, Pi、Ci、Fconcat、Fstage2、FattentionAnd FoutRespectively representing the pooling feature, the intermediate feature, the fusion feature, the feature extracted in the 2 nd stage in the step (2), the feature after attention enhancement and the final output feature in the step (4); p is a radical ofavg(*)andpmax(. x) represents average pooling and maximum pooling operations, respectively; f. ofconv() represents a convolution operation; ≧ denotes the connection operation of the feature map in the channel dimension, and i denotes the number of layers of the module.
(5) Constructing a two-stage collaborative detection module with a multi-instance learning branch and a detection frame regression branch to train and detect the remote sensing images in the training set, wherein the method comprises the following steps:
(5.1) for each training or test image, using a selective search algorithm (SSW) to generate 2000 candidate frames of the target to be detected, and mapping each candidate frame to the final feature F output in the step (4)OutAnd then, normalizing the feature map corresponding to each candidate frame by using a Space Pyramid Pool (SPP) to obtain the pooled features with fixed sizes.
(5.2) accessing the pooled features obtained in the step (5.1) into two full-connection layers, converting the pooled features into feature vectors of all candidate frames, and respectively sending the feature vectors into two different branches: one branch outputs the probability that the target object belongs to each category according to the content of the candidate frame; and the other branch circuit outputs the probability of various target objects contained in the candidate frame according to the position of the candidate frame, each branch circuit consists of a full connection layer and a Softmax layer, and the output matrixes of the two branch circuits are multiplied element by element to obtain the class label of each candidate frame.
And (5.3) adding the category labels of all the candidate frames to obtain the prediction probability of each category target object, wherein the prediction probability is used as an image-level prediction label of the whole remote sensing image, and the cross entropy of the whole remote sensing image and a real label is used as a loss function of the WSDDN.
(5.4) when the loss in the step (5.3) exceeds a threshold, for example, the threshold is set to 0.5, extracting the weak supervised prediction result (i.e. pseudo label) with high confidence in the WSDDN, and providing the result as a real label to Fast R-CNN for more accurate detection box regression. Specifically, the final feature F obtained in step (4.4)outA spatial pyramid layer and two full-connection layers are accessed in the same way, and then two different branches (namely a classification branch and a regression branch) are sent in to respectively obtain predicted class probability picAnd a coordinate parameter ticAs a result of a strongly supervised prediction.
And (5.5) normalizing the cooperative training process of the two strong and weak supervised detection networks by using a joint loss function to obtain a final prediction result.
Further, the calculation formula of the category label of each candidate box in step (5.2) is:
Figure BDA0002714649430000101
in the above formula, PjcA category label representing each of the candidate boxes,
Figure BDA0002714649430000102
representing the class probability that each candidate box j belongs to each class c,
Figure BDA0002714649430000103
representing the probability of the position of each candidate box j belonging to each category c.
Further, the calculation formula of the prediction probability of each category of target object in the step (5.3) is as follows:
Figure BDA0002714649430000104
in the above-mentioned formula,
Figure BDA0002714649430000105
target class prediction result representing the entire picture, JWRepresenting the number of candidate boxes.
Further, the loss function of WSDDN is defined as:
Figure BDA0002714649430000106
in the above formula, ycThe true label representing the target class, y, is due to cross-entropy loss of the two classesc∈{-1,1};JWRepresenting the number of candidate boxes and C representing the number of categories.
Further, in step (5.5), a joint loss function is used to normalize the cooperative training process of the two strong and weak supervised detection networks, which is specifically as follows:
1) obtaining a prediction label { (p) of WSDDN and Fast R-CNN in the same remote sensing imagejc,tjc) And { (p)ic,tic)};
2) Calculating the class loss L of the WSDDN for each candidate framecls
3) Calculating the class loss L between WSDDN and Fast R-CNN for each candidate regioncls_interAnd frame regression loss LdIoUAnd the frame regression loss adopts distance cross-correlation ratio (DIoU) loss.
4) And carrying out weighted summation on the loss of the three parts to obtain a joint loss function of the cooperative detection network, wherein the specific formula is as follows:
Figure BDA0002714649430000111
above formula, JWAnd JSRespectively representing the number of candidate frames extracted by weak supervision and strong supervision; p is a radical ofjcAnd picRespectively representing the prediction classes of weak supervision and strong supervision, tjcAnd ticRespectively representing the coordinates of the weak supervision predicted position and the strong supervision predicted position; l iscls_interRepresenting the consistency of the class prediction between two detection networks under strong and weak supervision, LclsRepresenting class prediction loss inside strong supervision; i isijIoU overlap of target object detection boxes extracted by two networks, and if I is greater than 0.5ijIs 1, otherwise is 0; beta is a hyper-parameter between 0 and 1, which is used for balancing the consistency of the predictions of the strong and weak supervision networks, and the larger beta indicates that the strong supervision network trusts the positions of the target objects predicted by the weak supervision networks more. The last item in the loss function is used for restricting the consistency of the positions of the detection frames between two networks and preventing the detection frame difference of strong and weak supervision prediction from being too large.
Further, the border regression operation in the collaborative loss function adopts DIoU, and the calculation steps are as follows:
Figure BDA0002714649430000112
where IoU is the value of the portion where the two regions overlap divided by the collective portion of the two regions,
Figure BDA0002714649430000113
the representation can simultaneously cover the diagonal distance of the minimum closure area of the anchor frame and the real detection frame,
Figure BDA0002714649430000114
and the distance between the anchor frame and the center point of the real detection frame is represented.
Further, the above-mentioned border regression loss function has the following formula:
Figure BDA0002714649430000115
further, the overall loss function of the cooperative detection module is as follows:
Ltotal=LWSDDN+LSSD
in this embodiment, two data sets of TGRS-HRRSD and DIOR are tested, and some test results are shown in fig. 3. From experimental results, the detection precision of the method is obviously superior to that of other current weak supervision detection models, and a more comprehensive and compact bounding box prediction result can be generated. Meanwhile, compared with a partial strong supervision detection model, the method has great competitiveness on detection of certain classes.
The foregoing is a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (5)

1. A weak supervision remote sensing target detection method based on mixed hole convolution is characterized by comprising the following steps:
(1) acquiring a remote sensing image data set to be detected, and dividing the data set into a training set, a verification set and a test set according to a proportion;
(2) constructing a lossless residual error network by using mixed hole convolution, and extracting multi-scale features, namely low-level visual features and high-level semantic features, of a target object in the remote sensing image by using the lossless residual error network;
(3) sending the features extracted in the step (2) into a channel attention module, strengthening key feature information effective to a target detection task, and inhibiting invalid feature information;
(4) sending the features enhanced in the step (3) into a cascade multilayer pooling module for feature fusion to realize further fusion of low-level visual features and high-level semantic features, wherein the fused features are used as final output of a feature extraction network;
(5) sending the final characteristics obtained in the step (4) into a cooperative detection module, wherein the module is provided with two branches of a multi-instance learning branch and a detection frame regression branch, a weak supervision detection network WSDDN is used as the multi-instance learning branch to generate pseudo label information, a strong supervision detection network Fast R-CNN is used as the detection frame regression branch to realize more accurate target positioning, and the detection class probability and the detection frame of the target in the graph are used as the detection result of the module;
(6) calculating consistency errors of two branch training according to the detection results of the step (5), updating weight parameters of the two branch training simultaneously through a gradient descent algorithm, performing collaborative training, testing detection precision through a verification set, and continuously adjusting a network model until the precision meets expectations;
(7) and taking the trained network model as a detector, inputting the characteristics of the test set into the detector for detection, and obtaining a detection result, namely the probability and the detection frame of the target object in the remote sensing image.
2. The method for detecting the weakly supervised remote sensing target based on the mixed hole convolution as recited in claim 1, wherein in step (2), a lossless residual error network is constructed by using the mixed hole convolution, and lossless multi-scale feature extraction is performed on the target in the remote sensing image, and the method comprises the following steps:
(2.1) inserting 23 x3 hole convolutions with the expansion rates of 2 and 5 respectively after standard 3x3 convolution in an original residual block by taking ResNet-101 as a basic model to form a continuous hole convolution combination with the expansion rates of 1,2 and 5, thereby constructing a new residual block, namely a lossless residual block; dense connection is added in the lossless residual block, namely the output and input characteristics of each cavity convolution layer are connected and then input into the next cavity convolution layer, so as to share and reuse the bottom layer characteristics beneficial to target positioning;
(2.2) reserving the first three stages of ResNet-101, and then stacking 23 lossless residual blocks and 3 lossless residual blocks in the 4 th stage and the 5 th stage respectively to replace the 4 th stage and the 5 th stage in the original network;
(2.3) stages 4 and 5 keep the same number of input channels as stage 3, i.e. 256 convolution kernels, and remove the downsampling operation so that the resolution of the output feature map remains at 1/8 of the original image.
3. The method for detecting the weakly supervised remote sensing target based on the mixed hole convolution as recited in claim 2, wherein in the step (3), the features extracted in the step (2) are sent to a channel attention module to strengthen key feature information effective to a target detection task and inhibit invalid feature information, and the specific method is as follows:
(3.1) features extracted for the 5 th stage in step (2.3)
Figure FDA0002714649420000021
The module performs convolution operation once by using C +1 convolution cores to obtain C +1 characteristic graphs
Figure FDA0002714649420000022
H, W and C herein represent the height, width and number of channels, respectively, of the feature map;
(3.2) decomposing the characteristics obtained in the step (3.1) on channel dimension to respectively obtain C characteristic graphs
Figure FDA0002714649420000023
And 1 single channel feature map
Figure FDA0002714649420000024
And to f2Performing Sigmoid activation operation to obtain 1 channel attention matrix
Figure FDA0002714649420000025
The importance of each characteristic channel, namely the weight value, can be automatically reflected;
(3.3) attention matrix M and feature map f of the channel1Respectively multiplying element by element, namely multiplying each pixel point by the corresponding weight in the attention matrix to finally obtain the output characteristics
Figure FDA0002714649420000026
The mathematical expression of the whole module is
Figure FDA0002714649420000027
Wherein the content of the first and second substances,
Figure FDA0002714649420000028
representing element-by-element multiplication and sigma (#) represents a Sigmoid activation function.
4. The method for detecting the weakly supervised remote sensing target based on the mixed hole convolution is characterized in that in the step (4), the output features of the step (3) are sent to a cascade multilayer pooling module to realize feature fusion of different layers, and the method comprises the following steps:
(4.1) the module uses pooling layers with 6 different kernel sizes 1x1,2x2,4x4,8x8,10x2,2x20 for the feature F obtained in step (3.3)attentionPerforming multilevel pooling operation to obtain feature maps F at 6 different spatial scalesi={P1,P2,P3,P4,P5,P6And the 5 th kernel and the 6 th kernel are respectively an average pooling layer in the vertical direction and the horizontal direction, and the expression of the step is as follows:
Figure FDA0002714649420000029
wherein, PiIndicating pooling characteristics, pavg(*)and Pmax(. x) represents average pooling and maximum pooling operations, respectively;
(4.2) compressing the channel number of the feature map extracted in the step (4.1) to the input feature F by using 1x1 convolutionattention1/8, for limiting the weight of the global feature in the subsequent feature fusion stage to obtain the intermediate feature Ci={C1,C2,C3,C4,C5,C6The expression of the step is as follows: ci=fconv(Pi) I ∈ {1, 2., 6}, where C isiRepresents an intermediate feature, fconv(*)Representing a convolution operation, i represents the number of layers of the module;
(4.3) obtaining the intermediate characteristic C in the step (4.2)1To C6And original input features FattentionPerforming first splicing on the channel dimension to obtain a fused feature FconcatThe expression of the step is as follows:
Figure FDA00027146494200000210
Figure FDA00027146494200000211
wherein, FconcatAnd FattentionRespectively representing the fusion feature and the feature after attention-boost,
Figure FDA00027146494200000212
representing the connection operation of the feature map on the channel dimension;
(4.4) feature F extracted in the 2 nd stage in the step (2.3)stage2Down-sampling to step (4.3) fused feature FconcatSize of (d), and FconcatPerforming second splicing on the channel dimension, and performing convolution operation on the spliced features for three times to obtain the final output feature F of the feature extraction moduleOutThe expression of the step is as follows:
Figure FDA0002714649420000031
Figure FDA0002714649420000032
wherein, Fstage2And FoutRespectively representing the extracted features of the 2 nd stage in the step (2) and the final output features of the step (4).
5. The method for detecting the weakly supervised remote sensing target based on the mixed hole convolution is characterized by comprising the following steps of (5) constructing a two-stage collaborative detection module with a multi-instance learning branch and a detection frame regression branch to train and detect the remote sensing image in a training set; the specific process is as follows:
(5.1) for each training or test image, using a selective search algorithm (SSW) to generate 2000 candidate frames of the target to be detected, and mapping each candidate frame to the final feature F output in the step (4)OutThen, normalizing the feature map corresponding to each candidate frame by using a space pyramid pool to obtain pooled features with fixed sizes;
(5.2) accessing the pooled features obtained in the step (5.1) into two full-connection layers, converting the pooled features into feature vectors of all candidate frames, and respectively sending the feature vectors into two different branches: one branch outputs the probability that the target object belongs to each category according to the content of the candidate frame; and the other branch outputs the probability of various target objects contained in the candidate frame according to the position of the candidate frame, each branch consists of a full connection layer and a Softmax layer, the output matrixes of the two branches are multiplied element by element to obtain the category label of each candidate frame, and the calculation formula of the category label of each candidate frame is as follows:
Figure FDA0002714649420000033
in the above formula, PjcA category label representing each of the candidate boxes,
Figure FDA0002714649420000034
representing the class probability that each candidate box j belongs to each class c,
Figure FDA0002714649420000035
representing the position probability of each candidate box j belonging to each category c;
(5.3) adding the class labels of all the candidate frames to obtain the prediction probability of each class of target object, wherein the prediction probability is used as an image-level prediction label of the whole remote sensing image, and the calculation formula of the prediction probability of each class of target object is as follows:
Figure FDA0002714649420000036
in the above-mentioned formula,
Figure FDA0002714649420000038
target class prediction result representing the entire picture, JWRepresenting the number of candidate boxes, and then calculating the cross entropy loss between the predicted label and the real label to iteratively update the training process of the WSDDN, wherein the calculation formula is as follows:
Figure FDA0002714649420000037
in the above formula, ycThe true label representing the target class, y, is due to cross-entropy loss of the two classesc∈{-1,1};JWRepresenting the number of candidate boxes, C representing the number of categories;
(5.4) when the loss in the step (5.3) exceeds a threshold value, extracting a weak supervised prediction result with high confidence coefficient in the WSDDN, namely a pseudo label, as a calculation error of a strong supervised prediction result obtained by a real label and Fast R-CNN,
the final characteristic F obtained in the step (4.4)outA space pyramid layer and two full connection layers are accessed in the same way, and then two different branches, namely a classification branch and a regression branch, are sent in to respectively obtain the predicted class probability picAnd a coordinate parameter ticAs a prediction result of strong supervision;
(5.5) normalizing the cooperative training process of the two strong and weak supervised detection networks by using a joint loss function to obtain a final prediction result, wherein the specific process is as follows:
1) obtaining a prediction label { (p) of WSDDN and Fast R-CNN in the same remote sensing imagejc,tjc) And { (p)ic,tic)};
2) Calculating the class loss L of the WSDDN for each candidate framecls
3) Calculating the class loss L between WSDDN and Fast R-CNN for each candidate regioncls_interAnd frame regression loss LDIoUWherein, the frame regression loss adopts distance cross ratio DIoU loss;
4) and carrying out weighted summation on the loss of the three parts to obtain a joint loss function of the cooperative detection network, wherein the specific formula is as follows:
Figure FDA0002714649420000041
above formula, JWAnd JSRespectively representing the number of candidate frames extracted by weak supervision and strong supervision; p is a radical ofjcAnd picRespectively representing the prediction classes of weak supervision and strong supervision, tjcAnd ticRespectively representing the coordinates of the weak supervision predicted position and the strong supervision predicted position; l iscls_interRepresenting the consistency of the class prediction between two detection networks under strong and weak supervision, LclsRepresenting class prediction loss inside strong supervision; i isijWhen the overlap of IoU representing the target object detection boxes extracted by two networks is greater than 0.5, I isijIs 1, otherwise is 0; beta is a hyper-parameter between 0 and 1 to balance the consistency of strong and weak supervision network prediction, the last item in the loss function is used to restrict the consistency of the positions of the detection boxes between two networks, and in addition, the border regression operation in the collaborative loss function adopts DIoU, and the calculation steps are as follows:
Figure FDA0002714649420000042
where IoU is the value of the portion where the two regions overlap divided by the collective portion of the two regions,
Figure FDA0002714649420000043
the representation can simultaneously cover the diagonal distance of the minimum closure area of the anchor frame and the real detection frame,
Figure FDA0002714649420000044
indicating anchor frame and true checkMeasuring the distance between the center points of the frames, and calculating a frame regression loss function based on the DIoU according to the following formula:
Figure FDA0002714649420000045
in summary, the overall loss function of the cooperative detection module is as follows:
Ltotal=LWSDDN+LSSD
CN202011068687.4A 2020-09-29 2020-09-29 Weak supervision remote sensing target detection method based on mixed hole convolution Pending CN112183414A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011068687.4A CN112183414A (en) 2020-09-29 2020-09-29 Weak supervision remote sensing target detection method based on mixed hole convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011068687.4A CN112183414A (en) 2020-09-29 2020-09-29 Weak supervision remote sensing target detection method based on mixed hole convolution

Publications (1)

Publication Number Publication Date
CN112183414A true CN112183414A (en) 2021-01-05

Family

ID=73948296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011068687.4A Pending CN112183414A (en) 2020-09-29 2020-09-29 Weak supervision remote sensing target detection method based on mixed hole convolution

Country Status (1)

Country Link
CN (1) CN112183414A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733744A (en) * 2021-01-14 2021-04-30 北京航空航天大学 Camouflage object detection model based on edge cooperative supervision and multi-level constraint
CN112926692A (en) * 2021-04-09 2021-06-08 四川翼飞视科技有限公司 Target detection device and method based on non-uniform mixed convolution and storage medium
CN112966684A (en) * 2021-03-15 2021-06-15 北湾科技(武汉)有限公司 Cooperative learning character recognition method under attention mechanism
CN113095235A (en) * 2021-04-15 2021-07-09 国家电网有限公司 Image target detection method, system and device based on weak supervision discrimination mechanism
CN113159057A (en) * 2021-04-01 2021-07-23 湖北工业大学 Image semantic segmentation method and computer equipment
CN113255759A (en) * 2021-05-20 2021-08-13 广州广电运通金融电子股份有限公司 Attention mechanism-based in-target feature detection system, method and storage medium
CN113326845A (en) * 2021-06-30 2021-08-31 上海云从汇临人工智能科技有限公司 Target detection method, system and storage medium based on self-attention mechanism
CN113569750A (en) * 2021-07-29 2021-10-29 上海动亦科技有限公司 Road target detection and identification method based on spatial feature aggregation
CN113723254A (en) * 2021-08-23 2021-11-30 三明学院 Method, device, equipment and storage medium for identifying moso bamboo forest distribution
CN113920313A (en) * 2021-09-29 2022-01-11 北京百度网讯科技有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN113963322A (en) * 2021-10-29 2022-01-21 北京百度网讯科技有限公司 Detection model training method and device and electronic equipment
CN114359739A (en) * 2022-03-18 2022-04-15 深圳市海清视讯科技有限公司 Target identification method and device
CN114511452A (en) * 2021-12-06 2022-05-17 中南大学 Remote sensing image retrieval method integrating multi-scale cavity convolution and triple attention
CN115035409A (en) * 2022-06-20 2022-09-09 北京航空航天大学 Weak supervision remote sensing image target detection algorithm based on similarity comparison learning
CN116012719A (en) * 2023-03-27 2023-04-25 中国电子科技集团公司第五十四研究所 Weak supervision rotating target detection method based on multi-instance learning
CN116206201A (en) * 2023-02-21 2023-06-02 北京理工大学 Monitoring target detection and identification method, device, equipment and storage medium
CN116895030A (en) * 2023-09-11 2023-10-17 西华大学 Insulator detection method based on target detection algorithm and attention mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084210A (en) * 2019-04-30 2019-08-02 电子科技大学 The multiple dimensioned Ship Detection of SAR image based on attention pyramid network
CN111104898A (en) * 2019-12-18 2020-05-05 武汉大学 Image scene classification method and device based on target semantics and attention mechanism
CN111191566A (en) * 2019-12-26 2020-05-22 西北工业大学 Optical remote sensing image multi-target detection method based on pixel classification
CN111353531A (en) * 2020-02-25 2020-06-30 西安电子科技大学 Hyperspectral image classification method based on singular value decomposition and spatial spectral domain attention mechanism
CN111444939A (en) * 2020-02-19 2020-07-24 山东大学 Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084210A (en) * 2019-04-30 2019-08-02 电子科技大学 The multiple dimensioned Ship Detection of SAR image based on attention pyramid network
CN111104898A (en) * 2019-12-18 2020-05-05 武汉大学 Image scene classification method and device based on target semantics and attention mechanism
CN111191566A (en) * 2019-12-26 2020-05-22 西北工业大学 Optical remote sensing image multi-target detection method based on pixel classification
CN111444939A (en) * 2020-02-19 2020-07-24 山东大学 Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field
CN111353531A (en) * 2020-02-25 2020-06-30 西安电子科技大学 Hyperspectral image classification method based on singular value decomposition and spatial spectral domain attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SUTING CHEN 等: "FCC-Net: A Full-Coverage Collaborative Network for Weakly Supervised Remote Sensing Object Detection" *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733744B (en) * 2021-01-14 2022-05-24 北京航空航天大学 Camouflage object detection model based on edge cooperative supervision and multi-level constraint
CN112733744A (en) * 2021-01-14 2021-04-30 北京航空航天大学 Camouflage object detection model based on edge cooperative supervision and multi-level constraint
CN112966684A (en) * 2021-03-15 2021-06-15 北湾科技(武汉)有限公司 Cooperative learning character recognition method under attention mechanism
CN113159057A (en) * 2021-04-01 2021-07-23 湖北工业大学 Image semantic segmentation method and computer equipment
CN112926692A (en) * 2021-04-09 2021-06-08 四川翼飞视科技有限公司 Target detection device and method based on non-uniform mixed convolution and storage medium
CN112926692B (en) * 2021-04-09 2023-05-09 四川翼飞视科技有限公司 Target detection device, method and storage medium based on non-uniform mixed convolution
CN113095235A (en) * 2021-04-15 2021-07-09 国家电网有限公司 Image target detection method, system and device based on weak supervision discrimination mechanism
CN113095235B (en) * 2021-04-15 2023-10-27 国家电网有限公司 Image target detection method, system and device based on weak supervision and discrimination mechanism
CN113255759A (en) * 2021-05-20 2021-08-13 广州广电运通金融电子股份有限公司 Attention mechanism-based in-target feature detection system, method and storage medium
CN113255759B (en) * 2021-05-20 2023-08-22 广州广电运通金融电子股份有限公司 In-target feature detection system, method and storage medium based on attention mechanism
WO2022241803A1 (en) * 2021-05-20 2022-11-24 广州广电运通金融电子股份有限公司 Attention mechanism-based system and method for detecting feature in target, and storage medium
CN113326845A (en) * 2021-06-30 2021-08-31 上海云从汇临人工智能科技有限公司 Target detection method, system and storage medium based on self-attention mechanism
CN113569750B (en) * 2021-07-29 2023-07-07 上海动亦科技有限公司 Road target detection and identification method based on spatial feature aggregation
CN113569750A (en) * 2021-07-29 2021-10-29 上海动亦科技有限公司 Road target detection and identification method based on spatial feature aggregation
CN113723254A (en) * 2021-08-23 2021-11-30 三明学院 Method, device, equipment and storage medium for identifying moso bamboo forest distribution
CN113920313A (en) * 2021-09-29 2022-01-11 北京百度网讯科技有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN113963322A (en) * 2021-10-29 2022-01-21 北京百度网讯科技有限公司 Detection model training method and device and electronic equipment
CN113963322B (en) * 2021-10-29 2023-08-25 北京百度网讯科技有限公司 Detection model training method and device and electronic equipment
CN114511452B (en) * 2021-12-06 2024-03-19 中南大学 Remote sensing image retrieval method integrating multi-scale cavity convolution and triplet attention
CN114511452A (en) * 2021-12-06 2022-05-17 中南大学 Remote sensing image retrieval method integrating multi-scale cavity convolution and triple attention
CN114359739A (en) * 2022-03-18 2022-04-15 深圳市海清视讯科技有限公司 Target identification method and device
CN114359739B (en) * 2022-03-18 2022-06-28 深圳市海清视讯科技有限公司 Target identification method and device
CN115035409A (en) * 2022-06-20 2022-09-09 北京航空航天大学 Weak supervision remote sensing image target detection algorithm based on similarity comparison learning
CN115035409B (en) * 2022-06-20 2024-05-28 北京航空航天大学 Weak supervision remote sensing image target detection algorithm based on similarity comparison learning
CN116206201A (en) * 2023-02-21 2023-06-02 北京理工大学 Monitoring target detection and identification method, device, equipment and storage medium
CN116012719B (en) * 2023-03-27 2023-06-09 中国电子科技集团公司第五十四研究所 Weak supervision rotating target detection method based on multi-instance learning
CN116012719A (en) * 2023-03-27 2023-04-25 中国电子科技集团公司第五十四研究所 Weak supervision rotating target detection method based on multi-instance learning
CN116895030A (en) * 2023-09-11 2023-10-17 西华大学 Insulator detection method based on target detection algorithm and attention mechanism
CN116895030B (en) * 2023-09-11 2023-11-17 西华大学 Insulator detection method based on target detection algorithm and attention mechanism

Similar Documents

Publication Publication Date Title
CN112183414A (en) Weak supervision remote sensing target detection method based on mixed hole convolution
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
CN110929607B (en) Remote sensing identification method and system for urban building construction progress
Deng et al. Vision based pixel-level bridge structural damage detection using a link ASPP network
CN112541904B (en) Unsupervised remote sensing image change detection method, storage medium and computing device
CN113033520B (en) Tree nematode disease wood identification method and system based on deep learning
CN111563473A (en) Remote sensing ship identification method based on dense feature fusion and pixel level attention
CN111079739B (en) Multi-scale attention feature detection method
CN112560675B (en) Bird visual target detection method combining YOLO and rotation-fusion strategy
Tian et al. Multiscale building extraction with refined attention pyramid networks
Huang et al. A lightweight network for building extraction from remote sensing images
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
Narazaki et al. Automated vision-based bridge component extraction using multiscale convolutional neural networks
CN113011398A (en) Target change detection method and device for multi-temporal remote sensing image
CN111815576B (en) Method, device, equipment and storage medium for detecting corrosion condition of metal part
CN111368634B (en) Human head detection method, system and storage medium based on neural network
Wu et al. TAL: Topography-aware multi-resolution fusion learning for enhanced building footprint extraction
Zuo et al. A remote sensing image semantic segmentation method by combining deformable convolution with conditional random fields
Jiang et al. Arbitrary-shaped building boundary-aware detection with pixel aggregation network
CN115719475A (en) Three-stage trackside equipment fault automatic detection method based on deep learning
He et al. Building extraction based on U-net and conditional random fields
CN113887455B (en) Face mask detection system and method based on improved FCOS
CN113657196B (en) SAR image target detection method, SAR image target detection device, electronic equipment and storage medium
CN114882490A (en) Unlimited scene license plate detection and classification method based on point-guided positioning
Yuan et al. Graph neural network based multi-feature fusion for building change detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210105

RJ01 Rejection of invention patent application after publication