CN116630801A

CN116630801A - Remote sensing image weak supervision target detection method based on pseudo-instance soft label

Info

Publication number: CN116630801A
Application number: CN202310568911.3A
Authority: CN
Inventors: 钱晓亮; 林晨阳; 霍豫; 王慰; 曾黎; 程塨; 姚西文; 王芳; 刘玉翠; 岳伟超; 任航丽; 刘向龙; 吴青娥; 张秋闻
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou University of Light Industry
Priority date: 2023-03-13
Filing date: 2023-05-19
Publication date: 2023-08-22

Abstract

The invention provides a remote sensing image weak supervision target detection method based on a pseudo-instance soft label, which comprises the following steps: constructing a weak supervision depth detection network, and adding instance classification optimization; generating a series of target candidate frames for the training image by using a selective search algorithm, sequentially calculating a target candidate frame category score and an image category prediction score, and training a weak supervision depth detection network; calculating a dual-context projection score for each target candidate box; obtaining quality scores of target candidate frames, classifying and optimizing branches for each instance, and excavating pseudo-true value instances; soft labels are distributed to all pseudo-true value examples, and branch training is optimized for example classification; and constructing a target detection model by using the trained weak supervision depth detection network and the instance classification optimization branch, and obtaining the category and the position of the target of interest by using the target detection model. The method can effectively excavate the high-quality target candidate frame and effectively improve the detection precision of the high-resolution remote sensing image weak supervision target detection.

Description

Remote sensing image weak supervision target detection method based on pseudo-instance soft label

Technical Field

The invention relates to the technical field of deep learning and target detection, in particular to a remote sensing image weak supervision target detection method based on a pseudo-instance soft label.

Background

The high-resolution remote sensing image target detection is one of the most important tasks in the field of remote sensing image processing, is a key technology for remote sensing data information analysis and mining, and aims to identify and position high-value ground object targets in high-resolution remote sensing images. In military applications, high-resolution remote sensing image target detection can monitor specific areas of enemies, evaluate the combat capability of enemies on the sea, and monitor deployment conditions of important ports of the enemies and ships in sea areas. In civil application, the target detection of the high-resolution remote sensing image can be used for regularly detecting land planning and buildings, houses, traffic and the like in disaster areas and analyzing disaster conditions and making disaster relief schemes. Therefore, the remote sensing image target detection plays an extremely important application value in the fields of military, civil affairs and the like.

The high-resolution remote sensing image full-supervised target detection method requires the class and position labels of each target instance in the image to supervise the training of the model, however, the high-resolution remote sensing image usually contains dense target instances, and labeling the class and position labels for each target instance is time-consuming and labor-consuming. The high-resolution remote sensing image weak supervision target detection method only needs image-level class labels to supervise model training, and does not need manual labeling of position labels of examples, so that the manual labeling cost is greatly reduced, and the high-resolution remote sensing image weak supervision target detection method is widely focused gradually.

With the development of deep learning, the high-resolution remote sensing image weak supervision target detection method has greatly advanced, however, more weak supervision target detection methods still have two problems. The first problem is that many weakly supervised target detection methods only use target candidate box class scores to mine false true value instances, however the reliability of the target candidate box class scores is inadequate. In fact, the categories in the remote sensing image are more confusing than the categories in the natural image because the remote sensing image features vertical imaging and a large scale chaotic background. Furthermore, instances of high target candidate box class scores are more prone to overlay target saliency areas than to overlay targets as a whole. The second problem is that in more weakly supervised target detection methods, neighborhood instances of the pseudo-genuine instances are typically given the same supervision labels as the pseudo-genuine instances, which to some extent is prone to misclassification problems.

Disclosure of Invention

Aiming at the technical problems of poor reliability and insufficient precision of the existing weak supervision target detection method, the invention provides a remote sensing image weak supervision target detection method of a pseudo-instance soft label, and the designed double-context projection score can effectively evaluate the quality of a target candidate frame; the proposed quality score of the target candidate frame (the traditional class score of the target candidate frame is fused with the double-context projection score) can effectively mine the high-quality target candidate frame; by designing the pseudo-example soft tag, the detection precision of the high-resolution remote sensing image weak supervision target detection can be effectively improved.

In order to achieve the above purpose, the technical scheme of the invention is realized as follows: a remote sensing image weak supervision target detection method based on a pseudo-instance soft label comprises the following steps:

step one: constructing a weak supervision deep detection network by using a deep learning method, and adding at least two instance classification optimizations on the weak supervision deep detection network; generating a series of target candidate frames for each training image by using a selective search algorithm by taking the marked high-resolution remote sensing image as a training image, calculating the class scores of the target candidate frames, calculating the class prediction score of each training image according to the class scores of all the target candidate frames of each training image, and training a weak supervision depth detection network by using the marked image class of each training image;

step two: calculating a dual-context projection score for each target candidate frame according to the internal context projection score and the external context projection score of the target candidate frame;

step three: combining the target candidate frame category score and the double-context projection score to obtain a target candidate frame quality score, and utilizing the target candidate frame quality score to classify and optimize branches for each instance to mine high-quality false-true value instances;

step four: distributing soft labels for all pseudo-true value examples of each example classification optimizing branch, and training the example classification optimizing branches by using the soft labels;

step five: and constructing a target detection model by using the trained weak supervision depth detection network and the instance classification optimization branch, and carrying out target detection on the high-resolution remote sensing image to be detected by using the target detection model to obtain the category and the position of the target of interest.

Preferably, the method for calculating the target candidate frame category score is as follows: sending the training image and the corresponding target candidate frame into a main network of a weak supervision depth detection network to obtain a feature map of the image, carrying out pooling operation on the region of interest to obtain a feature map of the target candidate frame, and obtaining feature vectors of the target candidate frame through two full-connection layers; the feature vectors of the target candidate frames are respectively sent into category branches and detection branches to obtain category scores and detection scores, and the category scores and the detection scores are multiplied element by element to obtain a category score matrix of the target candidate framesWherein C represents the number of target categories;

class prediction score φ for each training image on class c _c The method comprises the following steps:

wherein ,x_cn E x represents a target candidate box r _n A target candidate box class score on class c; n represents the number of target candidate frames, r _n N=1, 2, … …, N for the nth target candidate box.

Preferably, the training of the weakly supervised depth detection network is supervised using image class labels of the training images, the loss function L being used _w The method comprises the following steps:

wherein ,y_c =1 or 0 is an image class label of the training image, indicating whether the training image contains a target belonging to class c

The class branch comprises a full connection layer and a class direction Softmax operation, and the detection branch comprises a full connection layer and an example direction Softmax operation.

Preferably, K instance classification optimization branches are added behind two fully-connected layers of the weakly-supervised deep detection network, wherein each instance classification optimization branch comprises a fully-connected layer and a Softmax operation in a class direction; the kth example classification optimization branch outputs a target candidate frame class scoring matrixWherein k∈ {1,2,., K }, ∈k }>Representing target candidate box r in kth example class optimization branch _n A target candidate box class score on class C, class (c+1) representing the background class; using a target candidate frame class scoring matrix x ^k Pseudo-instance tag matrix +.1 for mining (k+1) th instance class optimization branch>Wherein, element->Representing a target candidate frame r _n Whether it belongs to category c, namely: />Representing belongings of->The representation does not belong; mining pseudo-instance tag matrix y of first instance class optimization branches using target candidate frame class score matrix x ¹ Use of the loss function->Training the kth example class optimization branch.

Preferably, the calculating method of the dual-context projection score in the second step is as follows:

target candidate frame r _n The initial dual context projection score on category c is: IDCPS (IDCPS) _cn ＝ICPS _cn -ECPS _cn ；

The initial dual context projection score IDCPS _cn Normalization to obtain dual-context projection score DCPS _cn ∈[0,1]The method comprises the following steps:

wherein, max { · } and Min { · } represent operations of taking the maximum value and the minimum value, respectively; ICPS (ICPS control System) _cn Representing a target candidate frame r _n Internal context projection score on category c, ECPS _cn Representing a target candidate frame r _n The external context on category c projects the score.

Preferably, the internal context projection score ICPS _cn And external context projection score ECPS _cn The calculation method of (1) is as follows:

generating a semantic segmentation map of the input training image by using a weak supervision semantic segmentation algorithm; candidate frame r of target _n The sizes of (1- α) ×100% and (1+α) ×100% of the original size are adjusted, respectively, to obtain target candidate frames r _n Is provided with an inner frame ir _n And an outer frame _n The method comprises the steps of carrying out a first treatment on the surface of the Candidate frame r of target _n Internal frame ir _n And an outer frame _n Respectively projecting the images to semantic segmentation graphs of the training images to obtain instance-level segmentation graphsInstance level partition map->And instance level partition map->Wherein W and H respectively represent target candidate frames r _n Width and height of> and />Respectively represent the internal frames ir _n Width and height of> and />Respectively represent the external frames er _n Is the width and height of (1), wherein ∈>Representing an upward rounding operation;

using instance-level partition map S _n And instance level partition map IS _n Calculating target candidate frame r _n Is projected using an instance-level segmentation map S _n And instance level partition map ES _n Calculating target candidate frame r _n Is a projection score for the external context of (a).

Preferably, the internal context projection score ICPS _cn The calculation method of (1) is as follows:

target candidate frame r _n Partition map of internal context area of (c)The method comprises the following steps: ICS (ICS) _n ＝S _n -PIS _n ；

wherein ,expressed in IS _n Rectangle graph filled with 0 around PIS _n And S is equal to _n Is the same in size;

segmentation map ICS _n C-th channel of (2)The projections in the horizontal and vertical direction are +.> and />And:

wherein ,Max_h (. Cndot.) and Max _v (. Cndot.) represents operations of maximum value in the horizontal direction and the vertical direction, respectively;

target candidate frame r _n Internal context projection score on category c:

ICPS _cn ＝Avg(HICS _cn )+Avg(VICS _cn )；

wherein Avg (·) represents the averaging operation;

the external context projection score ECPS _cn The calculation method of (1) is as follows:

target candidate frame r _n Is a segmentation map of the external context area of (a)Obtained by the following formula:

ECS _n ＝ES _n -PS _n ；

wherein ,indicated at S _n Rectangle with 0 filled in the periphery, PS _n And ES (ES) _n Is the same in size;

segmentation map ECS _n C-th channel of (2)Projections along the horizontal and vertical directions are denoted as respectively and />And:

target candidate frame r _n The external context projection score on category c is:

ECPS _cn ＝Avg(HECS _cn )+Avg(VECS _cn )。

preferably, the calculating method of the quality score of the target candidate frame is as follows: combining the target candidate frame category score and the dual-context projection score to obtain a target candidate frame quality score which is:

wherein, when k is more than 1,representing the target candidate box r in the (k-1) th example class optimization branch _n Target candidate box class score on class c, when k=1,/i>DCPS _cn Representing a target candidate frame r _n Double-context projection score on category c,/->Representing target candidate box r in kth example class optimization branch _n A target candidate box quality score on category c; lambda is a modulation factor used to define the target candidate box class score +.>And dual context projection score DCPS _cn Is a weight corresponding to the weight of the model;

the mining method of the pseudo-true value example comprises the following steps: MIST [ Z.ren et al., "Instance-aware, context-secure, and memory-efficient weakly supervised object detection," in Proc.IEEE/CVF Conf. Comput. Vis. Pattern recording., "2020, pp.10598-10607 was used.]The proposed mining algorithm utilizes target candidate frame quality scoresMining a set of false true value instances in a kth instance classification optimization branch wherein ,/>Representing a pseudo-truth instance set->M represents the number of pseudo-true value instances belonging to class c in the kth instance class optimization branch.

Preferably, the soft label allocation policy for each pseudo-true value instance is: target candidate box r in kth example class optimization branch _n The soft labels on category c are

Wherein σ (·, ·) represents the intersection ratio of two rectangular boxes;pseudo-true value instance set representing all classesIntermediate distance target candidate frame r _n Recent pseudo-true value instances; />Target candidate frame r representing the following two conditions are satisfied _n Is a set of (3): 1) Target candidate frame r _n Lie in the pseudo-true value instance->Within the neighborhood of (i.e.)>2) Pseudo-truth example->Is a pseudo-true value instance set +.>Intermediate distance target candidate frame r _n Recent instances of false true values.

Preferably, the method for training the example classification optimization branch by using the soft label comprises the following steps:

soft labels distributed for each pseudo-true value instance according to a soft label distribution strategy; the training loss function of the kth example class optimization branch is:

wherein ,representing a target candidate frame r _n Loss weight of->Representing the highest target candidate box class score on class c in the (k-1) th example class optimization branch; />Representing a target candidate frame r _n Target candidate box category score on category c.

Compared with the prior art, the invention has the beneficial effects that: the quality of the target candidate frame can be effectively evaluated through the double-context projection score designed by utilizing the result of weak supervision semantic segmentation; the high-quality target candidate frames can be effectively mined by combining the traditional target candidate frame category score and the target candidate frame quality score proposed by the dual-context projection score (the traditional target candidate frame category score and the dual-context projection score are fused); the classification accuracy of the high-resolution remote sensing image weak supervision target detection can be effectively improved by designing the pseudo-instance soft tag by utilizing the space distance between the pseudo-true value instance and the nearest neighbor target candidate frame.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of the present invention.

Fig. 2 is a subjective comparison chart of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, a remote sensing image weak supervision target detection method based on a pseudo-instance soft label includes the steps:

the training sample and the test sample used in the implementation of the invention are high-resolution remote sensing images only with category labels. The high resolution Remote sensing image used was from the data set NWPU VHR-10.v2[G.Cheng,P.Zhou,and J.Han, "Learning rotation-invariant convolutional neural networks for object detection in VHR optical Remote sensing images," IEEE Trans. Geosci. Remote Sens., vol.54, no.12, pp.7405-7415, sept.2016 ] and DIOR [ K.Li, G.Wan, G.Cheng, L.Meng, and J.Han, "Object detection in optical Remote sensing images: A survey and a new benchmark," ISPRS J.Photogram Remote Sens., vol.159, pp.296-307, jan.2020 ]. In the NWPU VHR-10.v2 dataset, a total of 10 classes of objects are contained, each image size is 400 x 400 pixels, and the training set, validation set and test set contain 679, 200 and 293 images, respectively. In the DIOR dataset, which contains 20 types of targets in total, each image size is 800×800 pixels, the training set, the validation set and the test set contain 5862, 5863 and 11738 images, respectively, in the practice of the present invention, the training set and the validation set are used for training of the weakly supervised target detection model, and the test set is used for testing of the weakly supervised target detection model.

Weak supervision depth detection network [ H.Bilen and A.Vedaldi, "Weakly supervised deep detection networks," in Proc.IEEE Conf.Comput.Vis.Pattern Recognit.,2016, pp.2846-2854]]As shown in FIG. 1, first, a selective search algorithm [ J.R.Uijlings, K.E.Van De san De, T.Gevers, and A.W. Smeulders, "Selective search for object recognition," int.J. Comput. Vis., vol.104, no.2, pp.154 ]171,Apr.2013.]Generating a series of target candidate boxes r= { R for each high-resolution remote sensing image ₁ ,...,r _n ,...,r _N Where N represents the number of target candidate boxes, r _n N=1, 2, … …, N for the nth target candidate box. The selective search algorithm does not need training, and the initial target candidate frame of each image can be obtained by simply executing the algorithm.

Sending the high-resolution remote sensing image and the corresponding target candidate frame into a backbone network of a weak supervision depth detection network to obtain a feature image of the image, carrying out pooling operation on the region of interest to obtain the feature image of the target candidate frame, and finally obtaining feature vectors of the target candidate frame through two full-connection layers; the feature vector of the target candidate frame obtained in this way is more robust and can represent the features of the target candidate frame. The feature vectors of the target candidate frames are respectively sent into a category branch (a fully-connected layer and a Softmax operation in a category direction) and a detection branch (a fully-connected layer and a Softmax operation in an instance direction) to obtain category scores and detection scores, and the category scores and the detection scores are multiplied element by element to obtain target candidate frame category scores (Proposal Class Score, PCS) expressed asWhere c=20 represents the number of target categories. Class prediction score φ for each training image on class c _c The calculation mode of (2) is as follows:

wherein ,x_cn E x represents a target candidate box r _n The target candidate box class score PCS on class c.

Finally, the training of the weakly supervised depth detection network is supervised using image class labels of the training images, using a loss function L _w The definition is as follows:

wherein ,y_c =1 or 0 is an image class label of the input training image, indicating whether the input training image contains a target belonging to class c. The loss function L used _w The similarity between the predicted data distribution and the real data distribution output by the model can be well measured, so that training is more efficient and effective, and more accurate image-level classification is realized.

As shown in fig. 1, the present invention adds k=3 instance classification optimization (Instance Classifier Refinement, ICR) branches [ P.Tang, X.Wang, X.Bai, and w.liu, "Multiple instance detection network with online instance classifier refinement," in proc.ieee conf.comp.vis.pattern recognit, jul.2017, pp.2843-2851, behind two fully connected layers of a weak supervisory depth detection network on the basis of the weak supervisory depth detection network.]Each ICR branch includes a full connection layer and a class-oriented Softmax operation. The kth ICR outputs a PCS matrix, labeledWherein k is {1,2,3}, -A }>Representing target candidate box r in kth example class optimization branch _n The target candidate box class on class c scores PCS, class 21 representing the background class. Using PCS matrix x ^k To mine the pseudo-instance tag matrix of the (k+1) th ICR branch +.>Wherein, element->Representing a target candidate frame r _n Whether it belongs to category c, namely: />(belonged to) or->(not belonging to). Mining pseudo-instance tag matrix y of first ICR branch using target candidate box class score x ¹ Finally, use the loss function +.>Training the kth ICR branch:

wherein ,representing a target candidate frame r _n Loss weight of->Represents the PCS highest on category c in the (k-1) th ICR leg.

Adding instance classification optimization branches can improve the performance of the model. The experimental result shows that the model precision reaches the optimum when 3 branches are added. Selected loss functionThe similarity between the predicted data distribution and the real data distribution output by the model can be well measured, so that training is more efficient and effective, and more accurate instance-level classification is realized.

Step two: a Dual context projection score (Dual-context Projection Score, DCPS) is calculated for each target candidate box based on the internal context projection score and the external context projection score for the target candidate box.

The semantic segmentation map of the input training image was generated using a weakly supervised semantic segmentation algorithm [ J.Ahn, S.Cho, and S.Kwak, "Weakly supervised learning of instance segmentation with inter-pixel references," in Proc.IEEE/CVF Conf Comput. Vis. Pattern Recognit.,2019, pp.2209-2218 ]. The weak supervision semantic segmentation algorithm has high detection performance, so that the generated semantic segmentation graph is more accurate, and the accuracy of the subsequent calculation of the double-projection context score can be improved.

Candidate frame r of target _n The sizes of (1-0.2) x 100% and (1+0.2) x 100% of the original sizes are adjusted, respectively, to obtain target candidate frames r _n Is provided with an inner frame ir _n And an outer frame _n . Then the target candidate frame r _n Internal frame ir _n And an outer frame _n Projecting onto semantic segmentation maps of input training images to obtain their instance-level segmentation maps, respectively labeled as and />Wherein W and H respectively represent target candidate frames r _n Width and height of> and />Respectively represent the internal frames ir _n Is defined by a width and a height of (a), and />Respectively represent the external frames er _n Is the width and height of (1), wherein ∈>Representing a rounding up operation.

Using instance-level partition map S _n And instance level partition map IS _n To calculate a target candidate frame r _n Internal context projection score (Internal Context Projection Score, ICPS) using instance-level segmentation map S _n And instance level partition map ES _n To calculate a target candidate frame r _n External context projection score (Externa)l Context Projection Score,ECPS)。

The ICPS calculation method comprises the following steps: target candidate frame r _n Partition map of internal context area of (c)Obtained by the following formula:

ICS _n ＝S _n -PIS _n (4)

wherein ,expressed in IS _n Rectangle graph filled with 0 around PIS _n And S is equal to _n Is the same size. Segmentation map ICS _n C-th channel->The projections in the horizontal and vertical direction are marked +.> and />The calculation formula is as follows:

wherein ,Max_h (. Cndot.) and Max _v (. Cndot.) indicates operations of taking maximum values in the horizontal direction and the vertical direction, respectively.

Finally, target candidate frame r _n ICPS on Category c is labeled ICPS _cn The calculation formula is as follows:

ICPS _cn ＝Avg(HICS _cn )+Avg(VICS _cn ) (6)

wherein Avg (·) represents the averaging operation.

The ECPS calculation method comprises the following steps: target candidate frame r _n Is a segmentation map of the external context area of (a)Obtained by the following formula:

ECS _n ＝ES _n -PS _n (7)

wherein ,indicated at S _n Rectangle with 0 filled in the periphery, PS _n And ES (ES) _n Is the same size. Segmentation map ECS _n C-th channel->The projections in the horizontal and vertical directions are denoted +.>Andthe calculation formula is as follows:

finally, target candidate frame r _n ECPS on category c is labeled ECPS _cn The calculation formula is as follows:

ECPS _cn ＝Avg(HECS _cn )+Avg(VECS _cn ) (9)。

the calculating method of DCPS comprises the following steps: target candidate frame r _n The initial DCPS on category c is labeled IDCPS _cn The calculation formula is as follows:

IDCPS _cn ＝ICPS _cn -ECPS _cn (10)

finally, the initial dual context projection score IDCPS _cn Normalization to obtain DCPS _cn ∈[0,1]：

Wherein, max {. Cndot. } and Min {. Cndot. } represent operations taking the maximum value and the minimum value, respectively.

Step three: and combining the target candidate frame category score and the dual-context projection score to obtain a target candidate frame quality score, and optimizing ICR branches for each instance classification by utilizing the target candidate frame quality score to mine high-quality false-true value instances.

The method for calculating the quality score (Proposal Quality Score, PQS) of the target candidate frame is as follows:

the PQS is obtained by combining the target candidate box class score PCS and the dual-context projection score DCPS, and the calculation formula is as follows:

wherein, when k is more than 1,representing the target candidate box r in the (k-1) th ICR branch _n PCS on category c, when k=1, +.>DCPS _cn Representing a target candidate frame r _n DCPS on category c, +.>Representing target candidate box r in kth ICR branch _n PQS on category c. Lambda is a modulation factor used to define the target candidate box class score +.>And dual context projection score DCPS _cn The formula is as follows:

where t and 200000 denote the number of iterations of the current training and the total number of iterations of the training, respectively, 100 being used to control the rate of increase of the modulation factor lambda.

Mining pseudo-true value instances in each ICR branch using target candidate box quality scores PQS: MIST [ Z.ren et al., "Instance-aware, context-secure, and memory-efficient weakly supervised object detection," in Proc.IEEE/CVF Conf. Comput. Vis. Pattern recording., "2020, pp.10598-10607 was used.]The proposed mining algorithm utilizes target candidate frame quality scoresTo mine the set of pseudo-true value instances in the kth ICR branch +.> wherein ,/>Representing a pseudo-truth instance set->M represents the number of pseudo-true value instances belonging to category c in the kth ICR branch. The quality score PQS of the target candidate frame is utilized, and the mining algorithm proposed by the MIST is utilized to mine the pseudo-true value instance with higher quality, so that the precision of the weakly supervised target detection model is improved.

Step four: soft labels are assigned to all pseudo-true value instances for each ICR branch and training of 3 ICR branches is completed.

The soft label assignment for the pseudo-true instance is:

the soft label allocation policy for each pseudo-true instance is as follows:

wherein σ (·, ·) represents the intersection ratio of two rectangular boxes, namely: the ratio of the intersection area of two rectangular frames to the union area of two rectangular frames;target candidate frame r representing the following two conditions are satisfied _n Is a set of (3): 1) Target candidate frame r _n Lie in the pseudo-true value instance->Within the neighborhood of (i.e.)>2) Pseudo-truth example->Is a pseudo-true value instance set +.>Intermediate distance target candidate frame r _n Recent pseudo-true value instances; />Pseudo-true value instance set representing all classesIntermediate distance target candidate frame r _n Recent pseudo-true value instances; />Representing target candidate box r in kth ICR branch _n Soft labels on category c.

The ICR branch training method based on soft label supervision comprises the following steps:

the training loss function of the kth ICR branch defined in equation (3) is modified according to the soft label assigned to each pseudo-true value instance of equation (14) as follows:

pg in FIG. 1 represents an example of a pseudo-true value, nb represents the neighborhood of pgExamples bg represents background examples. The neighborhood instance nb is a target candidate box r satisfying the following two conditions _n : 1) Some target candidate frame r _n And pseudo-true value instanceSatisfy->Wherein σ (·, ·) represents the intersection ratio of two rectangular boxes, namely: the ratio of the intersection area of two rectangular frames to the union area of two rectangular frames; 2) Pseudo-truth example->Is a pseudo-true value instance set +.>Intermediate distance target candidate frame r _n Recent instances of false true values. The background instance bg is a target candidate frame r satisfying the following two conditions _n : 1) Some target candidate frame r _n And pseudo-value instance->Satisfy->2) Pseudo-truth example->Is a pseudo-true value instance set +.>Intermediate distance target candidate frame r _n Recent instances of false true values. The One-hot label of the pseudo-truth instance, the soft label of the neighborhood instance and the soft label of the background instance are as follows:

and inputting the target candidate frame feature vector into a full connection layer, and then obtaining a PCS matrix of the k example branch optimization branch through Softmax operation in a category direction.

Averaging 3 PCS of 3 ICR branches to obtain averageMatrix, in->If a certain target candidate box takes the maximum value in the category c (non-background category), and the score in the category c is greater than 0.5 (the target candidate box which does not meet the two conditions is discarded), the category of the target candidate box is the category c, and the position of the target candidate box is the position obtained by the selective search algorithm. Similarly, all target candidate frames are traversed to obtain an initial prediction result, and the initial prediction result has a plurality of redundant target candidate frames, so that non-maximum suppression operations [ J.Hosang, R.Benenson, and B.Schiele, "Learning non-maximum suppression," in Proc.IEEE Conf.Comput.Vis.Pattern Recogit, jul.2017, pp.4507-4515 are performed.]And eliminating redundant target candidate frames to obtain the final category and position of the target of interest.

The overall training loss function L of the model consists of two parts: training loss function L of weak supervision depth detection network _w And training loss functions for 3 ICR branches, as follows:

training of the weak supervision depth detection network and training of the 3 ICR branches are performed simultaneously, so that training of the whole weak supervision target detection model is completed. And sending the high-resolution remote sensing image to be detected into a trained weak supervision target detection model to obtain the category and the position of the target of interest in the high-resolution remote sensing image.

The hardware configuration of the implementation of the invention: E5-2650V4 CPU (2.2 GHz 12x2 core), 512GB memory, 8 NVIDIA RTX Titan display card, the platform configuration of its software: ubuntu16.04, python3.7, pytorch1.7.

To better demonstrate the performance of the present invention, as shown in tables 1 and 2, the present invention compares to 5 popular target detection methods on NWPU VHR-10.v2 and DIOR datasets, the 5 methods being: WSDDN [ H.Bilen, A.Vedaldi, weakly supervised deep detection networks, in: proc.IEEE Conf.Comput.Vis.Pattern Recgnit., "2016, pp.2846-2854], OICR [ P.Tang, X.Wang, X.Bai, W.Liu, multiple Instance detection network with online Instance classifier refinement, in: proc.IEEE Conf.Comput.Vis.Pattern Recgnit.," 2017, pp.3059-295 3067], PCL [ P.Tang, X.Wang, S.Bai, W.Shen, X.Bai, W.Liu, A.L.Yuille, PCL: proposal cluster learning for weakly supervised object detection, IEEE Trans.Pattern Anal.Mach.Interl.42 (1) (2020) 176-191], MELM [ F.Wan, P.Wei, J.Jiao, Z.Han, and Q.Ye, "Min-entropy latent model for weakly supervised object detection," in Proc.IEEE Conf.Comput.Vis.Pattern Recgnit., "Jun.2018, pp.7-1306 ], MIST [ Z.Ren, Z.Yu, X.Yang, M. Y.Liu, Y.J.Lee, A.G.Schwing, J.Kautz," Instance-aware, text-touch, and memory-efficient weakly supervised object detection, and memory-35, and "CVF.Compf.Compf.Compf.Vittn.Recgnit.," JUN.1298.V.Compf.10. mAP and Corloc represent average accuracy and positioning accuracy, respectively.

Table 1 comparison of the present invention with 5 popular algorithms on NWPU VHR-10.v2 dataset with respect to average accuracy and positioning accuracy

Method	mAP	CorLoc
			WSDDN	35.1	35.2
OICR	34.5	40.0
			PCL	39.4	45.1
MELM	42.3	49.9
			MIST	51.5	70.3
The invention is that	63.8	74.6

Table 2 comparison of the present invention with 5 popular algorithms on DIOR dataset with respect to average accuracy and positioning accuracy

Method	mAP	CorLoc
			WSDDN	13.3	32.4
OICR	16.5	34.8
			PCL	18.2	41.5
MELM	18.7	43.3
			MIST	22.2	43.6
The invention is that	28.6	53.2

The larger mAP value in tables 1 and 2 indicates the higher model detection accuracy, and the larger CorLoc value indicates the higher model positioning accuracy. FIG. 2 is a subjective comparison of the method of the present invention with other weakly supervised target detection methods on NWPU VHR-10.v2 and DIOR datasets, with the first three columns selected from the NWPU VHR-10.v2 dataset and the last three columns selected from the DIOR dataset. As can be seen from FIG. 2, the method of the present invention can more accurately identify and locate the ground object targets of the remote sensing image compared with other 5 weakly supervised target detection methods.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. A remote sensing image weak supervision target detection method based on a pseudo-instance soft label is characterized by comprising the following steps:

step one: constructing a weak supervision deep detection network by using a deep learning method, and adding at least two instance classification optimizations on the weak supervision deep detection network; using the high-resolution remote sensing image only marked with the image category as a training image, generating a series of target candidate frames for each training image by using a selective search algorithm, calculating the category scores of the target candidate frames, calculating the category prediction score of each training image according to the category scores of all the target candidate frames of each training image, and training a weak supervision depth detection network by using the marked image category of each training image;

2. The method for detecting the weak supervision target of the remote sensing image based on the pseudo-instance soft label according to claim 1,the method for calculating the target candidate frame category score is characterized by comprising the following steps: sending the training image and the corresponding target candidate frame into a main network of a weak supervision depth detection network to obtain a feature map of the image, carrying out pooling operation on the region of interest to obtain a feature map of the target candidate frame, and obtaining feature vectors of the target candidate frame through two full-connection layers; the feature vectors of the target candidate frames are respectively sent into category branches and detection branches to obtain category scores and detection scores, and the category scores and the detection scores are multiplied element by element to obtain a category score matrix of the target candidate framesWherein C represents the number of target categories;

3. The method for detecting a weak supervision target of a remote sensing image based on a soft label of a pseudo-instance according to claim 2, wherein the training of the weak supervision depth detection network is supervised by using an image class label of a training image, and a loss function L is used _w The method comprises the following steps:

wherein ,y_c =1 or 0 is an image class label of the training image, indicating whether the training image contains a target belonging to class c;

4. A method for detecting a weak supervision target of a remote sensing image based on a pseudo-instance soft label according to claim 2 or 3, wherein K instance classification optimization branches are added behind two fully connected layers of the weak supervision depth detection network, each instance classification optimization branch comprises a fully connected layer and a Softmax operation in a category direction; the kth example classification optimization branch outputs a target candidate frame class scoring matrixWherein k∈ {1,2,., K }, ∈k }>Representing target candidate box r in kth example class optimization branch _n A target candidate box class score on class C, class (c+1) representing the background class; using a target candidate frame class scoring matrix x ^k Pseudo-instance tag matrix +.1 for mining (k+1) th instance class optimization branch>Wherein, element->Representing a target candidate frame r _n Whether it belongs to category c, namely: />Representing belongings of->The representation does not belong; mining pseudo-instance tag matrix y of first instance class optimization branches using target candidate frame class score matrix x ¹ Use of the loss function->Training the kth example class optimization branch.

5. The method for detecting the weak supervision target of the remote sensing image based on the pseudo-instance soft label according to claim 4, wherein the calculating method of the double-context projection score in the second step is as follows:

6. The method for detecting a weak supervision target of a remote sensing image based on a pseudo-instance soft label according to claim 5, wherein the internal context projection score ICPS _cn And external context projection score ECPS _cn The calculation method of (1) is as follows:

generating a semantic segmentation map of the input training image by using a weak supervision semantic segmentation algorithm; candidate frame r of target _n The sizes of (1- α) ×100% and (1+α) ×100% of the original size are adjusted, respectively, to obtain target candidate frames r _n Is provided with an inner frame ir _n And an outer frame _n Wherein alpha is [0, 1]]For adjusting the sizes of the inner and outer frames; candidate frame r of target _n Internal frame ir _n And an outer frame _n Respectively projecting the images to semantic segmentation graphs of the training images to obtain instance-level segmentation graphsInstance level partition map->And instance level partition map->Wherein W and H respectively represent target candidate frames r _n Is defined by a width and a height of (a), and />Respectively represent the internal frames ir _n Width and height of> and />Respectively represent the external frames er _n Is the width and height of (1), wherein ∈>Representing an upward rounding operation;

7. The method for detecting a weak supervision target of a remote sensing image based on a pseudo-instance soft label according to claim 6, wherein the internal context projection score ICPS _cn The calculation method of (1) is as follows:

wherein ,representing the segmentation map IS at the instance level _n Rectangle chart PIS with 0 filled in periphery _n And instance level partition map S _n Is the same in size;

segmentation map ICS _n C-th channel of (2)The projections in the horizontal and vertical direction are +.>Andand:

target candidate frame r _n Internal context projection score on category c:

ICPS _cn ＝Avg(HICS _cn )+Avg(VICS _cn )；

wherein Avg (·) represents the averaging operation;

ECS _n ＝ES _n -PS _n ；

wherein ,representation of segmentation map S at the instance level _n Rectangle diagram PS with 0 filled in periphery _n And instance level partition map ES _n Is the same in size;

segmentation map ECS _n C-th channel of (2)The projections in the horizontal and vertical directions are denoted +.> and />And:

ECPS _cn ＝Avg(HECS _cn )+Avg(VECS _cn )。

8. the method for detecting a target in weak supervision of a remote sensing image based on a soft label of a pseudo-instance according to any one of claims 5-7, wherein the method for calculating the quality score of the target candidate frame is as follows: combining the target candidate frame category score and the dual-context projection score to obtain a target candidate frame quality score which is:

9. The method for weakly-supervised target detection of remote sensing images based on pseudo-instance soft labels of claim 8, wherein the soft label allocation strategy for each pseudo-true value instance is: target candidate box r in kth example class optimization branch _n The soft labels on category c are

Wherein σ (·, ·) represents the intersection ratio of two rectangular boxes;a set of pseudo-true value instances representing all classes +.>Intermediate distance target candidate frame r _n Recent pseudo-true value instances; />Target candidate frame r representing the following two conditions are satisfied _n Is a set of (3): 1) Target candidate frame r _n Lie in the pseudo-true value instance->Within the neighborhood of (i.e.)>2) Pseudo-truth exampleIs pseudo-Truth instance set->Intermediate distance target candidate frame r _n Recent instances of false true values.

10. The method for detecting the weak supervision target of the remote sensing image based on the soft label of the pseudo instance according to claim 9, wherein the method for training the optimization branch of the instance classification by using the soft label is as follows: