CN116630801A - Remote sensing image weak supervision target detection method based on pseudo-instance soft label - Google Patents

Remote sensing image weak supervision target detection method based on pseudo-instance soft label Download PDF

Info

Publication number
CN116630801A
CN116630801A CN202310568911.3A CN202310568911A CN116630801A CN 116630801 A CN116630801 A CN 116630801A CN 202310568911 A CN202310568911 A CN 202310568911A CN 116630801 A CN116630801 A CN 116630801A
Authority
CN
China
Prior art keywords
target candidate
instance
target
score
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310568911.3A
Other languages
Chinese (zh)
Inventor
钱晓亮
林晨阳
霍豫
王慰
曾黎
程塨
姚西文
王芳
刘玉翠
岳伟超
任航丽
刘向龙
吴青娥
张秋闻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Publication of CN116630801A publication Critical patent/CN116630801A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a remote sensing image weak supervision target detection method based on a pseudo-instance soft label, which comprises the following steps: constructing a weak supervision depth detection network, and adding instance classification optimization; generating a series of target candidate frames for the training image by using a selective search algorithm, sequentially calculating a target candidate frame category score and an image category prediction score, and training a weak supervision depth detection network; calculating a dual-context projection score for each target candidate box; obtaining quality scores of target candidate frames, classifying and optimizing branches for each instance, and excavating pseudo-true value instances; soft labels are distributed to all pseudo-true value examples, and branch training is optimized for example classification; and constructing a target detection model by using the trained weak supervision depth detection network and the instance classification optimization branch, and obtaining the category and the position of the target of interest by using the target detection model. The method can effectively excavate the high-quality target candidate frame and effectively improve the detection precision of the high-resolution remote sensing image weak supervision target detection.

Description

Remote sensing image weak supervision target detection method based on pseudo-instance soft label
Technical Field
The invention relates to the technical field of deep learning and target detection, in particular to a remote sensing image weak supervision target detection method based on a pseudo-instance soft label.
Background
The high-resolution remote sensing image target detection is one of the most important tasks in the field of remote sensing image processing, is a key technology for remote sensing data information analysis and mining, and aims to identify and position high-value ground object targets in high-resolution remote sensing images. In military applications, high-resolution remote sensing image target detection can monitor specific areas of enemies, evaluate the combat capability of enemies on the sea, and monitor deployment conditions of important ports of the enemies and ships in sea areas. In civil application, the target detection of the high-resolution remote sensing image can be used for regularly detecting land planning and buildings, houses, traffic and the like in disaster areas and analyzing disaster conditions and making disaster relief schemes. Therefore, the remote sensing image target detection plays an extremely important application value in the fields of military, civil affairs and the like.
The high-resolution remote sensing image full-supervised target detection method requires the class and position labels of each target instance in the image to supervise the training of the model, however, the high-resolution remote sensing image usually contains dense target instances, and labeling the class and position labels for each target instance is time-consuming and labor-consuming. The high-resolution remote sensing image weak supervision target detection method only needs image-level class labels to supervise model training, and does not need manual labeling of position labels of examples, so that the manual labeling cost is greatly reduced, and the high-resolution remote sensing image weak supervision target detection method is widely focused gradually.
With the development of deep learning, the high-resolution remote sensing image weak supervision target detection method has greatly advanced, however, more weak supervision target detection methods still have two problems. The first problem is that many weakly supervised target detection methods only use target candidate box class scores to mine false true value instances, however the reliability of the target candidate box class scores is inadequate. In fact, the categories in the remote sensing image are more confusing than the categories in the natural image because the remote sensing image features vertical imaging and a large scale chaotic background. Furthermore, instances of high target candidate box class scores are more prone to overlay target saliency areas than to overlay targets as a whole. The second problem is that in more weakly supervised target detection methods, neighborhood instances of the pseudo-genuine instances are typically given the same supervision labels as the pseudo-genuine instances, which to some extent is prone to misclassification problems.
Disclosure of Invention
Aiming at the technical problems of poor reliability and insufficient precision of the existing weak supervision target detection method, the invention provides a remote sensing image weak supervision target detection method of a pseudo-instance soft label, and the designed double-context projection score can effectively evaluate the quality of a target candidate frame; the proposed quality score of the target candidate frame (the traditional class score of the target candidate frame is fused with the double-context projection score) can effectively mine the high-quality target candidate frame; by designing the pseudo-example soft tag, the detection precision of the high-resolution remote sensing image weak supervision target detection can be effectively improved.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows: a remote sensing image weak supervision target detection method based on a pseudo-instance soft label comprises the following steps:
step one: constructing a weak supervision deep detection network by using a deep learning method, and adding at least two instance classification optimizations on the weak supervision deep detection network; generating a series of target candidate frames for each training image by using a selective search algorithm by taking the marked high-resolution remote sensing image as a training image, calculating the class scores of the target candidate frames, calculating the class prediction score of each training image according to the class scores of all the target candidate frames of each training image, and training a weak supervision depth detection network by using the marked image class of each training image;
step two: calculating a dual-context projection score for each target candidate frame according to the internal context projection score and the external context projection score of the target candidate frame;
step three: combining the target candidate frame category score and the double-context projection score to obtain a target candidate frame quality score, and utilizing the target candidate frame quality score to classify and optimize branches for each instance to mine high-quality false-true value instances;
step four: distributing soft labels for all pseudo-true value examples of each example classification optimizing branch, and training the example classification optimizing branches by using the soft labels;
step five: and constructing a target detection model by using the trained weak supervision depth detection network and the instance classification optimization branch, and carrying out target detection on the high-resolution remote sensing image to be detected by using the target detection model to obtain the category and the position of the target of interest.
Preferably, the method for calculating the target candidate frame category score is as follows: sending the training image and the corresponding target candidate frame into a main network of a weak supervision depth detection network to obtain a feature map of the image, carrying out pooling operation on the region of interest to obtain a feature map of the target candidate frame, and obtaining feature vectors of the target candidate frame through two full-connection layers; the feature vectors of the target candidate frames are respectively sent into category branches and detection branches to obtain category scores and detection scores, and the category scores and the detection scores are multiplied element by element to obtain a category score matrix of the target candidate framesWherein C represents the number of target categories;
class prediction score φ for each training image on class c c The method comprises the following steps:
wherein ,xcn E x represents a target candidate box r n A target candidate box class score on class c; n represents the number of target candidate frames, r n N=1, 2, … …, N for the nth target candidate box.
Preferably, the training of the weakly supervised depth detection network is supervised using image class labels of the training images, the loss function L being used w The method comprises the following steps:
wherein ,yc =1 or 0 is an image class label of the training image, indicating whether the training image contains a target belonging to class c
The class branch comprises a full connection layer and a class direction Softmax operation, and the detection branch comprises a full connection layer and an example direction Softmax operation.
Preferably, K instance classification optimization branches are added behind two fully-connected layers of the weakly-supervised deep detection network, wherein each instance classification optimization branch comprises a fully-connected layer and a Softmax operation in a class direction; the kth example classification optimization branch outputs a target candidate frame class scoring matrixWherein k∈ {1,2,., K }, ∈k }>Representing target candidate box r in kth example class optimization branch n A target candidate box class score on class C, class (c+1) representing the background class; using a target candidate frame class scoring matrix x k Pseudo-instance tag matrix +.1 for mining (k+1) th instance class optimization branch>Wherein, element->Representing a target candidate frame r n Whether it belongs to category c, namely: />Representing belongings of->The representation does not belong; mining pseudo-instance tag matrix y of first instance class optimization branches using target candidate frame class score matrix x 1 Use of the loss function->Training the kth example class optimization branch.
Preferably, the calculating method of the dual-context projection score in the second step is as follows:
target candidate frame r n The initial dual context projection score on category c is: IDCPS (IDCPS) cn =ICPS cn -ECPS cn
The initial dual context projection score IDCPS cn Normalization to obtain dual-context projection score DCPS cn ∈[0,1]The method comprises the following steps:
wherein, max { · } and Min { · } represent operations of taking the maximum value and the minimum value, respectively; ICPS (ICPS control System) cn Representing a target candidate frame r n Internal context projection score on category c, ECPS cn Representing a target candidate frame r n The external context on category c projects the score.
Preferably, the internal context projection score ICPS cn And external context projection score ECPS cn The calculation method of (1) is as follows:
generating a semantic segmentation map of the input training image by using a weak supervision semantic segmentation algorithm; candidate frame r of target n The sizes of (1- α) ×100% and (1+α) ×100% of the original size are adjusted, respectively, to obtain target candidate frames r n Is provided with an inner frame ir n And an outer frame n The method comprises the steps of carrying out a first treatment on the surface of the Candidate frame r of target n Internal frame ir n And an outer frame n Respectively projecting the images to semantic segmentation graphs of the training images to obtain instance-level segmentation graphsInstance level partition map->And instance level partition map->Wherein W and H respectively represent target candidate frames r n Width and height of> and />Respectively represent the internal frames ir n Width and height of> and />Respectively represent the external frames er n Is the width and height of (1), wherein ∈>Representing an upward rounding operation;
using instance-level partition map S n And instance level partition map IS n Calculating target candidate frame r n Is projected using an instance-level segmentation map S n And instance level partition map ES n Calculating target candidate frame r n Is a projection score for the external context of (a).
Preferably, the internal context projection score ICPS cn The calculation method of (1) is as follows:
target candidate frame r n Partition map of internal context area of (c)The method comprises the following steps: ICS (ICS) n =S n -PIS n
wherein ,expressed in IS n Rectangle graph filled with 0 around PIS n And S is equal to n Is the same in size;
segmentation map ICS n C-th channel of (2)The projections in the horizontal and vertical direction are +.> and />And:
wherein ,Maxh (. Cndot.) and Max v (. Cndot.) represents operations of maximum value in the horizontal direction and the vertical direction, respectively;
target candidate frame r n Internal context projection score on category c:
ICPS cn =Avg(HICS cn )+Avg(VICS cn );
wherein Avg (·) represents the averaging operation;
the external context projection score ECPS cn The calculation method of (1) is as follows:
target candidate frame r n Is a segmentation map of the external context area of (a)Obtained by the following formula:
ECS n =ES n -PS n
wherein ,indicated at S n Rectangle with 0 filled in the periphery, PS n And ES (ES) n Is the same in size;
segmentation map ECS n C-th channel of (2)Projections along the horizontal and vertical directions are denoted as respectively and />And:
target candidate frame r n The external context projection score on category c is:
ECPS cn =Avg(HECS cn )+Avg(VECS cn )。
preferably, the calculating method of the quality score of the target candidate frame is as follows: combining the target candidate frame category score and the dual-context projection score to obtain a target candidate frame quality score which is:
wherein, when k is more than 1,representing the target candidate box r in the (k-1) th example class optimization branch n Target candidate box class score on class c, when k=1,/i>DCPS cn Representing a target candidate frame r n Double-context projection score on category c,/->Representing target candidate box r in kth example class optimization branch n A target candidate box quality score on category c; lambda is a modulation factor used to define the target candidate box class score +.>And dual context projection score DCPS cn Is a weight corresponding to the weight of the model;
the mining method of the pseudo-true value example comprises the following steps: MIST [ Z.ren et al., "Instance-aware, context-secure, and memory-efficient weakly supervised object detection," in Proc.IEEE/CVF Conf. Comput. Vis. Pattern recording., "2020, pp.10598-10607 was used.]The proposed mining algorithm utilizes target candidate frame quality scoresMining a set of false true value instances in a kth instance classification optimization branch wherein ,/>Representing a pseudo-truth instance set->M represents the number of pseudo-true value instances belonging to class c in the kth instance class optimization branch.
Preferably, the soft label allocation policy for each pseudo-true value instance is: target candidate box r in kth example class optimization branch n The soft labels on category c are
Wherein σ (·, ·) represents the intersection ratio of two rectangular boxes;pseudo-true value instance set representing all classesIntermediate distance target candidate frame r n Recent pseudo-true value instances; />Target candidate frame r representing the following two conditions are satisfied n Is a set of (3): 1) Target candidate frame r n Lie in the pseudo-true value instance->Within the neighborhood of (i.e.)>2) Pseudo-truth example->Is a pseudo-true value instance set +.>Intermediate distance target candidate frame r n Recent instances of false true values.
Preferably, the method for training the example classification optimization branch by using the soft label comprises the following steps:
soft labels distributed for each pseudo-true value instance according to a soft label distribution strategy; the training loss function of the kth example class optimization branch is:
wherein ,representing a target candidate frame r n Loss weight of->Representing the highest target candidate box class score on class c in the (k-1) th example class optimization branch; />Representing a target candidate frame r n Target candidate box category score on category c.
Compared with the prior art, the invention has the beneficial effects that: the quality of the target candidate frame can be effectively evaluated through the double-context projection score designed by utilizing the result of weak supervision semantic segmentation; the high-quality target candidate frames can be effectively mined by combining the traditional target candidate frame category score and the target candidate frame quality score proposed by the dual-context projection score (the traditional target candidate frame category score and the dual-context projection score are fused); the classification accuracy of the high-resolution remote sensing image weak supervision target detection can be effectively improved by designing the pseudo-instance soft tag by utilizing the space distance between the pseudo-true value instance and the nearest neighbor target candidate frame.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of the present invention.
Fig. 2 is a subjective comparison chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a remote sensing image weak supervision target detection method based on a pseudo-instance soft label includes the steps:
step one: constructing a weak supervision deep detection network by using a deep learning method, and adding at least two instance classification optimizations on the weak supervision deep detection network; generating a series of target candidate frames for each training image by using a selective search algorithm by taking the marked high-resolution remote sensing image as a training image, calculating the class scores of the target candidate frames, calculating the class prediction score of each training image according to the class scores of all the target candidate frames of each training image, and training a weak supervision depth detection network by using the marked image class of each training image;
the training sample and the test sample used in the implementation of the invention are high-resolution remote sensing images only with category labels. The high resolution Remote sensing image used was from the data set NWPU VHR-10.v2[G.Cheng,P.Zhou,and J.Han, "Learning rotation-invariant convolutional neural networks for object detection in VHR optical Remote sensing images," IEEE Trans. Geosci. Remote Sens., vol.54, no.12, pp.7405-7415, sept.2016 ] and DIOR [ K.Li, G.Wan, G.Cheng, L.Meng, and J.Han, "Object detection in optical Remote sensing images: A survey and a new benchmark," ISPRS J.Photogram Remote Sens., vol.159, pp.296-307, jan.2020 ]. In the NWPU VHR-10.v2 dataset, a total of 10 classes of objects are contained, each image size is 400 x 400 pixels, and the training set, validation set and test set contain 679, 200 and 293 images, respectively. In the DIOR dataset, which contains 20 types of targets in total, each image size is 800×800 pixels, the training set, the validation set and the test set contain 5862, 5863 and 11738 images, respectively, in the practice of the present invention, the training set and the validation set are used for training of the weakly supervised target detection model, and the test set is used for testing of the weakly supervised target detection model.
Weak supervision depth detection network [ H.Bilen and A.Vedaldi, "Weakly supervised deep detection networks," in Proc.IEEE Conf.Comput.Vis.Pattern Recognit.,2016, pp.2846-2854]]As shown in FIG. 1, first, a selective search algorithm [ J.R.Uijlings, K.E.Van De san De, T.Gevers, and A.W. Smeulders, "Selective search for object recognition," int.J. Comput. Vis., vol.104, no.2, pp.154 ]171,Apr.2013.]Generating a series of target candidate boxes r= { R for each high-resolution remote sensing image 1 ,...,r n ,...,r N Where N represents the number of target candidate boxes, r n N=1, 2, … …, N for the nth target candidate box. The selective search algorithm does not need training, and the initial target candidate frame of each image can be obtained by simply executing the algorithm.
Sending the high-resolution remote sensing image and the corresponding target candidate frame into a backbone network of a weak supervision depth detection network to obtain a feature image of the image, carrying out pooling operation on the region of interest to obtain the feature image of the target candidate frame, and finally obtaining feature vectors of the target candidate frame through two full-connection layers; the feature vector of the target candidate frame obtained in this way is more robust and can represent the features of the target candidate frame. The feature vectors of the target candidate frames are respectively sent into a category branch (a fully-connected layer and a Softmax operation in a category direction) and a detection branch (a fully-connected layer and a Softmax operation in an instance direction) to obtain category scores and detection scores, and the category scores and the detection scores are multiplied element by element to obtain target candidate frame category scores (Proposal Class Score, PCS) expressed asWhere c=20 represents the number of target categories. Class prediction score φ for each training image on class c c The calculation mode of (2) is as follows:
wherein ,xcn E x represents a target candidate box r n The target candidate box class score PCS on class c.
Finally, the training of the weakly supervised depth detection network is supervised using image class labels of the training images, using a loss function L w The definition is as follows:
wherein ,yc =1 or 0 is an image class label of the input training image, indicating whether the input training image contains a target belonging to class c. The loss function L used w The similarity between the predicted data distribution and the real data distribution output by the model can be well measured, so that training is more efficient and effective, and more accurate image-level classification is realized.
As shown in fig. 1, the present invention adds k=3 instance classification optimization (Instance Classifier Refinement, ICR) branches [ P.Tang, X.Wang, X.Bai, and w.liu, "Multiple instance detection network with online instance classifier refinement," in proc.ieee conf.comp.vis.pattern recognit, jul.2017, pp.2843-2851, behind two fully connected layers of a weak supervisory depth detection network on the basis of the weak supervisory depth detection network.]Each ICR branch includes a full connection layer and a class-oriented Softmax operation. The kth ICR outputs a PCS matrix, labeledWherein k is {1,2,3}, -A }>Representing target candidate box r in kth example class optimization branch n The target candidate box class on class c scores PCS, class 21 representing the background class. Using PCS matrix x k To mine the pseudo-instance tag matrix of the (k+1) th ICR branch +.>Wherein, element->Representing a target candidate frame r n Whether it belongs to category c, namely: />(belonged to) or->(not belonging to). Mining pseudo-instance tag matrix y of first ICR branch using target candidate box class score x 1 Finally, use the loss function +.>Training the kth ICR branch:
wherein ,representing a target candidate frame r n Loss weight of->Represents the PCS highest on category c in the (k-1) th ICR leg.
Adding instance classification optimization branches can improve the performance of the model. The experimental result shows that the model precision reaches the optimum when 3 branches are added. Selected loss functionThe similarity between the predicted data distribution and the real data distribution output by the model can be well measured, so that training is more efficient and effective, and more accurate instance-level classification is realized.
Step two: a Dual context projection score (Dual-context Projection Score, DCPS) is calculated for each target candidate box based on the internal context projection score and the external context projection score for the target candidate box.
The semantic segmentation map of the input training image was generated using a weakly supervised semantic segmentation algorithm [ J.Ahn, S.Cho, and S.Kwak, "Weakly supervised learning of instance segmentation with inter-pixel references," in Proc.IEEE/CVF Conf Comput. Vis. Pattern Recognit.,2019, pp.2209-2218 ]. The weak supervision semantic segmentation algorithm has high detection performance, so that the generated semantic segmentation graph is more accurate, and the accuracy of the subsequent calculation of the double-projection context score can be improved.
Candidate frame r of target n The sizes of (1-0.2) x 100% and (1+0.2) x 100% of the original sizes are adjusted, respectively, to obtain target candidate frames r n Is provided with an inner frame ir n And an outer frame n . Then the target candidate frame r n Internal frame ir n And an outer frame n Projecting onto semantic segmentation maps of input training images to obtain their instance-level segmentation maps, respectively labeled as and />Wherein W and H respectively represent target candidate frames r n Width and height of> and />Respectively represent the internal frames ir n Is defined by a width and a height of (a), and />Respectively represent the external frames er n Is the width and height of (1), wherein ∈>Representing a rounding up operation.
Using instance-level partition map S n And instance level partition map IS n To calculate a target candidate frame r n Internal context projection score (Internal Context Projection Score, ICPS) using instance-level segmentation map S n And instance level partition map ES n To calculate a target candidate frame r n External context projection score (Externa)l Context Projection Score,ECPS)。
The ICPS calculation method comprises the following steps: target candidate frame r n Partition map of internal context area of (c)Obtained by the following formula:
ICS n =S n -PIS n (4)
wherein ,expressed in IS n Rectangle graph filled with 0 around PIS n And S is equal to n Is the same size. Segmentation map ICS n C-th channel->The projections in the horizontal and vertical direction are marked +.> and />The calculation formula is as follows:
wherein ,Maxh (. Cndot.) and Max v (. Cndot.) indicates operations of taking maximum values in the horizontal direction and the vertical direction, respectively.
Finally, target candidate frame r n ICPS on Category c is labeled ICPS cn The calculation formula is as follows:
ICPS cn =Avg(HICS cn )+Avg(VICS cn ) (6)
wherein Avg (·) represents the averaging operation.
The ECPS calculation method comprises the following steps: target candidate frame r n Is a segmentation map of the external context area of (a)Obtained by the following formula:
ECS n =ES n -PS n (7)
wherein ,indicated at S n Rectangle with 0 filled in the periphery, PS n And ES (ES) n Is the same size. Segmentation map ECS n C-th channel->The projections in the horizontal and vertical directions are denoted +.>Andthe calculation formula is as follows:
finally, target candidate frame r n ECPS on category c is labeled ECPS cn The calculation formula is as follows:
ECPS cn =Avg(HECS cn )+Avg(VECS cn ) (9)。
the calculating method of DCPS comprises the following steps: target candidate frame r n The initial DCPS on category c is labeled IDCPS cn The calculation formula is as follows:
IDCPS cn =ICPS cn -ECPS cn (10)
finally, the initial dual context projection score IDCPS cn Normalization to obtain DCPS cn ∈[0,1]:
Wherein, max {. Cndot. } and Min {. Cndot. } represent operations taking the maximum value and the minimum value, respectively.
Step three: and combining the target candidate frame category score and the dual-context projection score to obtain a target candidate frame quality score, and optimizing ICR branches for each instance classification by utilizing the target candidate frame quality score to mine high-quality false-true value instances.
The method for calculating the quality score (Proposal Quality Score, PQS) of the target candidate frame is as follows:
the PQS is obtained by combining the target candidate box class score PCS and the dual-context projection score DCPS, and the calculation formula is as follows:
wherein, when k is more than 1,representing the target candidate box r in the (k-1) th ICR branch n PCS on category c, when k=1, +.>DCPS cn Representing a target candidate frame r n DCPS on category c, +.>Representing target candidate box r in kth ICR branch n PQS on category c. Lambda is a modulation factor used to define the target candidate box class score +.>And dual context projection score DCPS cn The formula is as follows:
where t and 200000 denote the number of iterations of the current training and the total number of iterations of the training, respectively, 100 being used to control the rate of increase of the modulation factor lambda.
Mining pseudo-true value instances in each ICR branch using target candidate box quality scores PQS: MIST [ Z.ren et al., "Instance-aware, context-secure, and memory-efficient weakly supervised object detection," in Proc.IEEE/CVF Conf. Comput. Vis. Pattern recording., "2020, pp.10598-10607 was used.]The proposed mining algorithm utilizes target candidate frame quality scoresTo mine the set of pseudo-true value instances in the kth ICR branch +.> wherein ,/>Representing a pseudo-truth instance set->M represents the number of pseudo-true value instances belonging to category c in the kth ICR branch. The quality score PQS of the target candidate frame is utilized, and the mining algorithm proposed by the MIST is utilized to mine the pseudo-true value instance with higher quality, so that the precision of the weakly supervised target detection model is improved.
Step four: soft labels are assigned to all pseudo-true value instances for each ICR branch and training of 3 ICR branches is completed.
The soft label assignment for the pseudo-true instance is:
the soft label allocation policy for each pseudo-true instance is as follows:
wherein σ (·, ·) represents the intersection ratio of two rectangular boxes, namely: the ratio of the intersection area of two rectangular frames to the union area of two rectangular frames;target candidate frame r representing the following two conditions are satisfied n Is a set of (3): 1) Target candidate frame r n Lie in the pseudo-true value instance->Within the neighborhood of (i.e.)>2) Pseudo-truth example->Is a pseudo-true value instance set +.>Intermediate distance target candidate frame r n Recent pseudo-true value instances; />Pseudo-true value instance set representing all classesIntermediate distance target candidate frame r n Recent pseudo-true value instances; />Representing target candidate box r in kth ICR branch n Soft labels on category c.
The ICR branch training method based on soft label supervision comprises the following steps:
the training loss function of the kth ICR branch defined in equation (3) is modified according to the soft label assigned to each pseudo-true value instance of equation (14) as follows:
pg in FIG. 1 represents an example of a pseudo-true value, nb represents the neighborhood of pgExamples bg represents background examples. The neighborhood instance nb is a target candidate box r satisfying the following two conditions n : 1) Some target candidate frame r n And pseudo-true value instanceSatisfy->Wherein σ (·, ·) represents the intersection ratio of two rectangular boxes, namely: the ratio of the intersection area of two rectangular frames to the union area of two rectangular frames; 2) Pseudo-truth example->Is a pseudo-true value instance set +.>Intermediate distance target candidate frame r n Recent instances of false true values. The background instance bg is a target candidate frame r satisfying the following two conditions n : 1) Some target candidate frame r n And pseudo-value instance->Satisfy->2) Pseudo-truth example->Is a pseudo-true value instance set +.>Intermediate distance target candidate frame r n Recent instances of false true values. The One-hot label of the pseudo-truth instance, the soft label of the neighborhood instance and the soft label of the background instance are as follows:
and inputting the target candidate frame feature vector into a full connection layer, and then obtaining a PCS matrix of the k example branch optimization branch through Softmax operation in a category direction.
Averaging 3 PCS of 3 ICR branches to obtain averageMatrix, in->If a certain target candidate box takes the maximum value in the category c (non-background category), and the score in the category c is greater than 0.5 (the target candidate box which does not meet the two conditions is discarded), the category of the target candidate box is the category c, and the position of the target candidate box is the position obtained by the selective search algorithm. Similarly, all target candidate frames are traversed to obtain an initial prediction result, and the initial prediction result has a plurality of redundant target candidate frames, so that non-maximum suppression operations [ J.Hosang, R.Benenson, and B.Schiele, "Learning non-maximum suppression," in Proc.IEEE Conf.Comput.Vis.Pattern Recogit, jul.2017, pp.4507-4515 are performed.]And eliminating redundant target candidate frames to obtain the final category and position of the target of interest.
Step five: and constructing a target detection model by using the trained weak supervision depth detection network and the instance classification optimization branch, and carrying out target detection on the high-resolution remote sensing image to be detected by using the target detection model to obtain the category and the position of the target of interest.
The overall training loss function L of the model consists of two parts: training loss function L of weak supervision depth detection network w And training loss functions for 3 ICR branches, as follows:
training of the weak supervision depth detection network and training of the 3 ICR branches are performed simultaneously, so that training of the whole weak supervision target detection model is completed. And sending the high-resolution remote sensing image to be detected into a trained weak supervision target detection model to obtain the category and the position of the target of interest in the high-resolution remote sensing image.
The hardware configuration of the implementation of the invention: E5-2650V4 CPU (2.2 GHz 12x2 core), 512GB memory, 8 NVIDIA RTX Titan display card, the platform configuration of its software: ubuntu16.04, python3.7, pytorch1.7.
To better demonstrate the performance of the present invention, as shown in tables 1 and 2, the present invention compares to 5 popular target detection methods on NWPU VHR-10.v2 and DIOR datasets, the 5 methods being: WSDDN [ H.Bilen, A.Vedaldi, weakly supervised deep detection networks, in: proc.IEEE Conf.Comput.Vis.Pattern Recgnit., "2016, pp.2846-2854], OICR [ P.Tang, X.Wang, X.Bai, W.Liu, multiple Instance detection network with online Instance classifier refinement, in: proc.IEEE Conf.Comput.Vis.Pattern Recgnit.," 2017, pp.3059-295 3067], PCL [ P.Tang, X.Wang, S.Bai, W.Shen, X.Bai, W.Liu, A.L.Yuille, PCL: proposal cluster learning for weakly supervised object detection, IEEE Trans.Pattern Anal.Mach.Interl.42 (1) (2020) 176-191], MELM [ F.Wan, P.Wei, J.Jiao, Z.Han, and Q.Ye, "Min-entropy latent model for weakly supervised object detection," in Proc.IEEE Conf.Comput.Vis.Pattern Recgnit., "Jun.2018, pp.7-1306 ], MIST [ Z.Ren, Z.Yu, X.Yang, M. Y.Liu, Y.J.Lee, A.G.Schwing, J.Kautz," Instance-aware, text-touch, and memory-efficient weakly supervised object detection, and memory-35, and "CVF.Compf.Compf.Compf.Vittn.Recgnit.," JUN.1298.V.Compf.10. mAP and Corloc represent average accuracy and positioning accuracy, respectively.
Table 1 comparison of the present invention with 5 popular algorithms on NWPU VHR-10.v2 dataset with respect to average accuracy and positioning accuracy
Method mAP CorLoc
WSDDN 35.1 35.2
OICR 34.5 40.0
PCL 39.4 45.1
MELM 42.3 49.9
MIST 51.5 70.3
The invention is that 63.8 74.6
Table 2 comparison of the present invention with 5 popular algorithms on DIOR dataset with respect to average accuracy and positioning accuracy
Method mAP CorLoc
WSDDN 13.3 32.4
OICR 16.5 34.8
PCL 18.2 41.5
MELM 18.7 43.3
MIST 22.2 43.6
The invention is that 28.6 53.2
The larger mAP value in tables 1 and 2 indicates the higher model detection accuracy, and the larger CorLoc value indicates the higher model positioning accuracy. FIG. 2 is a subjective comparison of the method of the present invention with other weakly supervised target detection methods on NWPU VHR-10.v2 and DIOR datasets, with the first three columns selected from the NWPU VHR-10.v2 dataset and the last three columns selected from the DIOR dataset. As can be seen from FIG. 2, the method of the present invention can more accurately identify and locate the ground object targets of the remote sensing image compared with other 5 weakly supervised target detection methods.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. A remote sensing image weak supervision target detection method based on a pseudo-instance soft label is characterized by comprising the following steps:
step one: constructing a weak supervision deep detection network by using a deep learning method, and adding at least two instance classification optimizations on the weak supervision deep detection network; using the high-resolution remote sensing image only marked with the image category as a training image, generating a series of target candidate frames for each training image by using a selective search algorithm, calculating the category scores of the target candidate frames, calculating the category prediction score of each training image according to the category scores of all the target candidate frames of each training image, and training a weak supervision depth detection network by using the marked image category of each training image;
step two: calculating a dual-context projection score for each target candidate frame according to the internal context projection score and the external context projection score of the target candidate frame;
step three: combining the target candidate frame category score and the double-context projection score to obtain a target candidate frame quality score, and utilizing the target candidate frame quality score to classify and optimize branches for each instance to mine high-quality false-true value instances;
step four: distributing soft labels for all pseudo-true value examples of each example classification optimizing branch, and training the example classification optimizing branches by using the soft labels;
step five: and constructing a target detection model by using the trained weak supervision depth detection network and the instance classification optimization branch, and carrying out target detection on the high-resolution remote sensing image to be detected by using the target detection model to obtain the category and the position of the target of interest.
2. The method for detecting the weak supervision target of the remote sensing image based on the pseudo-instance soft label according to claim 1,the method for calculating the target candidate frame category score is characterized by comprising the following steps: sending the training image and the corresponding target candidate frame into a main network of a weak supervision depth detection network to obtain a feature map of the image, carrying out pooling operation on the region of interest to obtain a feature map of the target candidate frame, and obtaining feature vectors of the target candidate frame through two full-connection layers; the feature vectors of the target candidate frames are respectively sent into category branches and detection branches to obtain category scores and detection scores, and the category scores and the detection scores are multiplied element by element to obtain a category score matrix of the target candidate framesWherein C represents the number of target categories;
class prediction score φ for each training image on class c c The method comprises the following steps:
wherein ,xcn E x represents a target candidate box r n A target candidate box class score on class c; n represents the number of target candidate frames, r n N=1, 2, … …, N for the nth target candidate box.
3. The method for detecting a weak supervision target of a remote sensing image based on a soft label of a pseudo-instance according to claim 2, wherein the training of the weak supervision depth detection network is supervised by using an image class label of a training image, and a loss function L is used w The method comprises the following steps:
wherein ,yc =1 or 0 is an image class label of the training image, indicating whether the training image contains a target belonging to class c;
the class branch comprises a full connection layer and a class direction Softmax operation, and the detection branch comprises a full connection layer and an example direction Softmax operation.
4. A method for detecting a weak supervision target of a remote sensing image based on a pseudo-instance soft label according to claim 2 or 3, wherein K instance classification optimization branches are added behind two fully connected layers of the weak supervision depth detection network, each instance classification optimization branch comprises a fully connected layer and a Softmax operation in a category direction; the kth example classification optimization branch outputs a target candidate frame class scoring matrixWherein k∈ {1,2,., K }, ∈k }>Representing target candidate box r in kth example class optimization branch n A target candidate box class score on class C, class (c+1) representing the background class; using a target candidate frame class scoring matrix x k Pseudo-instance tag matrix +.1 for mining (k+1) th instance class optimization branch>Wherein, element->Representing a target candidate frame r n Whether it belongs to category c, namely: />Representing belongings of->The representation does not belong; mining pseudo-instance tag matrix y of first instance class optimization branches using target candidate frame class score matrix x 1 Use of the loss function->Training the kth example class optimization branch.
5. The method for detecting the weak supervision target of the remote sensing image based on the pseudo-instance soft label according to claim 4, wherein the calculating method of the double-context projection score in the second step is as follows:
target candidate frame r n The initial dual context projection score on category c is: IDCPS (IDCPS) cn =ICPS cn -ECPS cn
The initial dual context projection score IDCPS cn Normalization to obtain dual-context projection score DCPS cn ∈[0,1]The method comprises the following steps:
wherein, max { · } and Min { · } represent operations of taking the maximum value and the minimum value, respectively; ICPS (ICPS control System) cn Representing a target candidate frame r n Internal context projection score on category c, ECPS cn Representing a target candidate frame r n The external context on category c projects the score.
6. The method for detecting a weak supervision target of a remote sensing image based on a pseudo-instance soft label according to claim 5, wherein the internal context projection score ICPS cn And external context projection score ECPS cn The calculation method of (1) is as follows:
generating a semantic segmentation map of the input training image by using a weak supervision semantic segmentation algorithm; candidate frame r of target n The sizes of (1- α) ×100% and (1+α) ×100% of the original size are adjusted, respectively, to obtain target candidate frames r n Is provided with an inner frame ir n And an outer frame n Wherein alpha is [0, 1]]For adjusting the sizes of the inner and outer frames; candidate frame r of target n Internal frame ir n And an outer frame n Respectively projecting the images to semantic segmentation graphs of the training images to obtain instance-level segmentation graphsInstance level partition map->And instance level partition map->Wherein W and H respectively represent target candidate frames r n Is defined by a width and a height of (a), and />Respectively represent the internal frames ir n Width and height of> and />Respectively represent the external frames er n Is the width and height of (1), wherein ∈>Representing an upward rounding operation;
using instance-level partition map S n And instance level partition map IS n Calculating target candidate frame r n Is projected using an instance-level segmentation map S n And instance level partition map ES n Calculating target candidate frame r n Is a projection score for the external context of (a).
7. The method for detecting a weak supervision target of a remote sensing image based on a pseudo-instance soft label according to claim 6, wherein the internal context projection score ICPS cn The calculation method of (1) is as follows:
target candidate frame r n Partition map of internal context area of (c)The method comprises the following steps: ICS (ICS) n =S n -PIS n
wherein ,representing the segmentation map IS at the instance level n Rectangle chart PIS with 0 filled in periphery n And instance level partition map S n Is the same in size;
segmentation map ICS n C-th channel of (2)The projections in the horizontal and vertical direction are +.>Andand:
wherein ,Maxh (. Cndot.) and Max v (. Cndot.) represents operations of maximum value in the horizontal direction and the vertical direction, respectively;
target candidate frame r n Internal context projection score on category c:
ICPS cn =Avg(HICS cn )+Avg(VICS cn );
wherein Avg (·) represents the averaging operation;
the external context projection score ECPS cn The calculation method of (1) is as follows:
target candidate frame r n Is a segmentation map of the external context area of (a)Obtained by the following formula:
ECS n =ES n -PS n
wherein ,representation of segmentation map S at the instance level n Rectangle diagram PS with 0 filled in periphery n And instance level partition map ES n Is the same in size;
segmentation map ECS n C-th channel of (2)The projections in the horizontal and vertical directions are denoted +.> and />And:
target candidate frame r n The external context projection score on category c is:
ECPS cn =Avg(HECS cn )+Avg(VECS cn )。
8. the method for detecting a target in weak supervision of a remote sensing image based on a soft label of a pseudo-instance according to any one of claims 5-7, wherein the method for calculating the quality score of the target candidate frame is as follows: combining the target candidate frame category score and the dual-context projection score to obtain a target candidate frame quality score which is:
wherein, when k is more than 1,representing the target candidate box r in the (k-1) th example class optimization branch n Target candidate box class score on class c, when k=1,/i>DCPS cn Representing a target candidate frame r n Double-context projection score on category c,/->Representing target candidate box r in kth example class optimization branch n A target candidate box quality score on category c; lambda is a modulation factor used to define the target candidate box class score +.>And dual context projection score DCPS cn Is a weight corresponding to the weight of the model;
the mining method of the pseudo-true value example comprises the following steps: MIST [ Z.ren et al., "Instance-aware, context-secure, and memory-efficient weakly supervised object detection," in Proc.IEEE/CVF Conf. Comput. Vis. Pattern recording., "2020, pp.10598-10607 was used.]The proposed mining algorithm utilizes target candidate frame quality scoresMining a set of false true value instances in a kth instance classification optimization branch wherein ,/>Representing a pseudo-truth instance set->M represents the number of pseudo-true value instances belonging to class c in the kth instance class optimization branch.
9. The method for weakly-supervised target detection of remote sensing images based on pseudo-instance soft labels of claim 8, wherein the soft label allocation strategy for each pseudo-true value instance is: target candidate box r in kth example class optimization branch n The soft labels on category c are
Wherein σ (·, ·) represents the intersection ratio of two rectangular boxes;a set of pseudo-true value instances representing all classes +.>Intermediate distance target candidate frame r n Recent pseudo-true value instances; />Target candidate frame r representing the following two conditions are satisfied n Is a set of (3): 1) Target candidate frame r n Lie in the pseudo-true value instance->Within the neighborhood of (i.e.)>2) Pseudo-truth exampleIs pseudo-Truth instance set->Intermediate distance target candidate frame r n Recent instances of false true values.
10. The method for detecting the weak supervision target of the remote sensing image based on the soft label of the pseudo instance according to claim 9, wherein the method for training the optimization branch of the instance classification by using the soft label is as follows:
soft labels distributed for each pseudo-true value instance according to a soft label distribution strategy; the training loss function of the kth example class optimization branch is:
wherein ,representing a target candidate frame r n Loss weight of->Representing the highest target candidate box class score on class c in the (k-1) th example class optimization branch; />Representing a target candidate frame r n Target candidate box category score on category c.
CN202310568911.3A 2023-03-13 2023-05-19 Remote sensing image weak supervision target detection method based on pseudo-instance soft label Pending CN116630801A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310236687 2023-03-13
CN2023102366878 2023-03-13

Publications (1)

Publication Number Publication Date
CN116630801A true CN116630801A (en) 2023-08-22

Family

ID=87596677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310568911.3A Pending CN116630801A (en) 2023-03-13 2023-05-19 Remote sensing image weak supervision target detection method based on pseudo-instance soft label

Country Status (1)

Country Link
CN (1) CN116630801A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496130A (en) * 2023-11-22 2024-02-02 中国科学院空天信息创新研究院 Basic model weak supervision target detection method based on context awareness self-training
CN117496130B (en) * 2023-11-22 2024-07-02 中国科学院空天信息创新研究院 Basic model weak supervision target detection method based on context awareness self-training

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496130A (en) * 2023-11-22 2024-02-02 中国科学院空天信息创新研究院 Basic model weak supervision target detection method based on context awareness self-training
CN117496130B (en) * 2023-11-22 2024-07-02 中国科学院空天信息创新研究院 Basic model weak supervision target detection method based on context awareness self-training

Similar Documents

Publication Publication Date Title
Adarsh et al. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model
CN106886795B (en) Object identification method based on salient object in image
Dornaika et al. Building detection from orthophotos using a machine learning approach: An empirical study on image segmentation and descriptors
CN110287826B (en) Video target detection method based on attention mechanism
Izadi et al. Three-dimensional polygonal building model estimation from single satellite images
CN109740588B (en) X-ray picture contraband positioning method based on weak supervision and deep response redistribution
CN107665498B (en) Full convolution network aircraft detection method based on typical example mining
CN107633226B (en) Human body motion tracking feature processing method
Yin et al. Region search based on hybrid convolutional neural network in optical remote sensing images
KR102373753B1 (en) Method, and System for Vehicle Recognition Tracking Based on Deep Learning
CN111882586B (en) Multi-actor target tracking method oriented to theater environment
Yan et al. Hpnet: Deep primitive segmentation using hybrid representations
CN104715251B (en) A kind of well-marked target detection method based on histogram linear fit
Guo et al. Evaluation-oriented façade defects detection using rule-based deep learning method
CN112884742A (en) Multi-algorithm fusion-based multi-target real-time detection, identification and tracking method
CN115641327B (en) Building engineering quality supervision and early warning system based on big data
CN107862702A (en) A kind of conspicuousness detection method of combination boundary connected and local contrast
CN113688797A (en) Abnormal behavior identification method and system based on skeleton extraction
US20170053172A1 (en) Image processing apparatus, and image processing method
Alsanad et al. Real-time fuel truck detection algorithm based on deep convolutional neural network
CN111274964A (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN110598711A (en) Target segmentation method combined with classification task
Zhu et al. Multi-scale region-based saliency detection using W 2 distance on N-dimensional normal distributions
CN115187884A (en) High-altitude parabolic identification method and device, electronic equipment and storage medium
Peng et al. Hers superpixels: Deep affinity learning for hierarchical entropy rate segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination