CN116778335A

CN116778335A - Method and system for detecting collapsed building based on cross-domain teacher-student training

Info

Publication number: CN116778335A
Application number: CN202310812000.0A
Authority: CN
Inventors: 尹鹏宇; 潘洁; 谭骏翔; 王旻罡; 杨宏
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2023-07-04
Filing date: 2023-07-04
Publication date: 2023-09-19
Anticipated expiration: 2043-07-04
Also published as: CN116778335B

Abstract

The application provides a collapse building detection method and system based on cross-domain teacher-student training. The method comprises the following steps: applying the manual labeling of the aviation optical remote sensing data and the aviation optical remote sensing data as input, training a teacher network, and obtaining a pseudo tag; the method comprises the steps of training an aviation satellite style migration network by using aviation optical remote sensing data and satellite optical remote sensing data as input to generate pseudolite optical remote sensing data; taking the optical remote sensing data of the pseudolites, the manual labels and the pseudo labels as inputs to train the student network; updating parameters in the student network after the training of the round into a teacher network by applying an EMA algorithm, updating the parameters of the teacher network, and carrying out training iteration, wherein the trained teacher network is the finally obtained target detection model; inputting the aviation optical remote sensing data into a target detection model, and detecting the collapsed building. The scheme provided by the application can effectively detect the damaged building through aerial remote sensing and satellite remote sensing image data.

Description

Method and system for detecting collapsed building based on cross-domain teacher-student training

Technical Field

The application belongs to the field of disaster response and collapse building detection, and particularly relates to a collapse building detection method and system based on cross-domain teacher-student training.

Background

Detection of damaged or collapsed buildings is critical to the earthquake disaster emergency response. In the related algorithms of collapse building detection, two general categories can be distinguished: one is based on the detection of changes between pre-and post-disaster images; the other is based on post-disaster image detection only. Because the change detection needs a large amount of pretreatment, and the earthquake disaster has a large accident, the immediate pre-disaster aviation data is difficult to acquire. Thus, most algorithms and techniques are based on constructing a detection model on post-disaster images.

In recent years, deep learning has been widely used in the field of object detection, and many successful object detection models, such as fast-Rcnn and YOLO, which are mature object detection models based on deep learning, have been proposed and applied to the detection of collapsed buildings. However, training of deep learning models typically requires a large amount of marker data. Because the threshold of aerial remote sensing data collection is higher, and meanwhile, building data with damaged earthquake collapse is rare, the related data sets for training of the deep learning model are rare. Furthermore, the damaged area in a single scene is limited, and the feature diversity is insufficient. The above limitations result in using first labeling data and then performing supervised training on the model, the collapsed building detection model obtained by training the traditional target detection model training paradigm has the following two problems:

1. the model recognition accuracy is not high due to insufficient training samples;

2. because the aviation data and the satellite data are data in different image domains, the model trained based on the aviation data is obviously degraded on the satellite data.

These problems limit the development of deep learning in seismic disaster response applications.

In the prior art in the field of deep learning, semi-supervised learning is often used to cope with the problem of few labeling samples. Semi-supervised object detection techniques train the object detector with labeled, weakly labeled, or unlabeled data. At the same time, the aviation data and the satellite data can be regarded as data of different image domains, and the domain adaptation algorithm can help the model to improve average performance on the different image domains. But currently there is no technology that combines semi-supervised learning and domain adaptation algorithms into seismic collapse building detection.

Technical proposal of the prior art

The technical principle proposed in the document Earthquake-Induced Building Damage Mapping Based on Multi-Task Deep Learning Framework is that the post-Earthquake image data with labels is input into a deep learning semantic segmentation model for training, so that the model is required to detect normal buildings besides collapsed buildings, and the characteristic learning of a network is enhanced.

Shortcomings of the prior art

1. This technique is still a training paradigm for traditional supervised learning, without a mechanism to use unlabeled data.

2. The over-fitting problem that occurs when the number of labeled samples is small, and the domain migration problem of the model between the aerial data and the satellite data cannot be solved.

Technical proposal of the second prior art

A study named "Cross-Domain Adaptive Teacher for Object Detection" proposes a similar teacher-student framework for solving the domain adaptation problem in target detection. In this study, the source domain is tagged and the target domain is not tagged, so there is a difference between the domains. The study proposes a self-training framework named "adaptive teacher" that attempts to solve the model domain migration problem and improves the quality of pseudo tags in the target domain by opposing learning and mutual learning. The model includes two independent modules: target specific teacher models and cross-domain student models. This study also applied weak strength enhancement techniques and used Faster R-CNN as the backbone network for the detector.

Disadvantages of the second prior art

1. In solving the domain migration problem between source domain data and target domain data, a Feature extraction module (Feature Encoder) is constrained only at the Feature map level by a loss function and a disparity, without alignment or transition at the original input image level for differences between image domains.

2. There is no verification or application on aerial remote sensing and satellite remote sensing images, in particular on collapse building detection.

3. The problem of cross-image-domain target detection of aerial remote sensing images and satellite remote sensing images is not solved pertinently.

4. The data enhancement mode is not designed in a targeted manner according to the image characteristics of two domains, and the training is unstable and the deviation is increased easily by using the traditional strong random data enhancement mode.

Disclosure of Invention

In order to solve the technical problems, the application provides a technical scheme of a collapse building detection method based on cross-domain teacher-student training, so as to solve the technical problems.

The application discloses a collapse building detection method based on cross-domain teacher-student training, which comprises the following steps:

s1, constructing a data set comprising aviation optical remote sensing data and satellite optical remote sensing data of a collapsed building;

s2, training a teacher network by using manual labeling of aviation optical remote sensing data and the aviation optical remote sensing data as inputs to obtain a pseudo tag;

s3, training an aviation satellite style migration network by using aviation optical remote sensing data and satellite optical remote sensing data as inputs to generate pseudolite optical remote sensing data; the structure of the aviation satellite style migration network is a generation countermeasure network for conversion from unpaired images to images;

s4, training a student network by taking the pseudolite optical remote sensing data, the manual label and the pseudo tag as inputs;

s5, updating parameters in the student network after the training round to the teacher network by applying an EMA algorithm, and updating the parameters of the teacher network;

s6, repeating the steps S2-S5 to train and iterate the teacher network, wherein the trained teacher network is the finally obtained target detection model;

and S7, inputting aviation optical remote sensing data into the target detection model, and detecting the collapsed building.

According to the method of the first aspect of the application, in said step S2, a two-phase model, fast-Rcnn, is selected as the teacher network.

According to the method of the first aspect of the application, in said step S2, a loss function is applied Training the teacher network;

wherein ,indicating RPN class loss,/->Indicating the regression loss of RPN, < >>Representing the ROI classification loss,indicating ROI regression loss.

According to the method of the first aspect of the present application, in said step S3, the network structure of said aeronautical satellite style migration network learns a mapping G: x is X ₁ →X ₂ and F：X₂ →X ₁ ；

X ₁ Representing aviation optical remote sensing data, X ₂ Representing satellite optical remote sensing data, wherein G and F represent mapping functions;

associating a cyclic consistency loss function with X ₁ and X₂ And (3) combining the antagonism loss functions to obtain a complete objective function of unpaired image-to-image conversion of the aviation satellite style migration network training.

According to the method of the first aspect of the present application, in said step S3, a Cycle-GAN network is selected as an aviation satellite style migration network.

According to the method of the first aspect of the present application, in the step S4, the method for training a student network takes the pseudolite optical remote sensing data, the artificial annotation and the pseudotag as inputs, and includes:

and respectively combining the pseudolite optical remote sensing data with the artificial labels and the pseudotags to form two sets of training data pairs, and training a student network by taking the two sets of training data pairs as input.

According to the method of the first aspect of the present application, in the step S4, the student network model is composed of an R-CNN network and a Cycle-GAN network; in the Cycle-GAN network, a res net is used as a basic network of generators and discriminators.

The application discloses a collapse building detection system based on cross-domain teacher-student training, which comprises:

a first processing module configured to construct a dataset comprising aerial optical remote sensing data and satellite optical remote sensing data of the collapsed building;

the second processing module is configured to train the teacher network to obtain a pseudo tag by applying the manual annotation of the aviation optical remote sensing data and the aviation optical remote sensing data as inputs;

the third processing module is configured to train the aviation satellite style migration network by using the aviation optical remote sensing data and the satellite optical remote sensing data as inputs to generate pseudolite optical remote sensing data; the structure of the aviation satellite style migration network is a generation countermeasure network for conversion from unpaired images to images;

a fourth processing module configured to train a student network with the pseudolite optical remote sensing data, the manual annotation, and the pseudotag as inputs;

a fifth processing module configured to apply an EMA algorithm to update parameters in the student network after the training round to the teacher network, and update parameters of the teacher network;

the sixth processing module is configured to repeat training iteration of the second processing module to the fifth processing module on the teacher network, and the trained teacher network is the finally obtained target detection model;

and a seventh processing module configured to input aviation optical remote sensing data into the target detection model to detect a collapsed building.

A third aspect of the application discloses an electronic device. The electronic device comprises a memory and a processor, the memory stores a computer program, and the processor implements the steps in a method for detecting a collapsed building based on cross-domain teacher-student training in any one of the first aspect of the disclosure when executing the computer program.

A fourth aspect of the application discloses a computer-readable storage medium. A computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps in a method for detecting a collapsed building based on cross-domain teacher-student training of any one of the first aspects of the present disclosure.

In summary, the scheme provided by the application can effectively utilize a large amount of unlabeled remote sensing image data, can reduce the dependence of a model on manual labeling, can obtain good accuracy when detecting a collapsed building on aviation data and satellite data, and can improve generalization and domain migration capability of the model; the damaged building is effectively detected through the aerial remote sensing and satellite remote sensing image data, the emergency response capability of the earthquake disaster can be improved, and rescue workers can be helped to quickly locate the damaged building.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for detecting a collapsed building based on cross-domain teacher-student training according to an embodiment of the application;

FIG. 2 is a diagram of teacher network training according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an aviation satellite style migration network training in accordance with an embodiment of the present application;

FIG. 4 is a diagram of a student network training according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a teacher network parameter update according to an embodiment of the present application;

FIG. 6 is a graph of output results during training of an aviation satellite style migration network according to an embodiment of the present application;

FIG. 7 is a block diagram of a collapsed building detection system based on cross-domain teacher-student training in accordance with an embodiment of the present application;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The application discloses a collapse building detection method based on cross-domain teacher-student training. Fig. 1 is a flowchart of a method for detecting a collapsed building based on cross-domain teacher-student training according to an embodiment of the present application, as shown in fig. 1, the method includes:

In step S1, a dataset comprising aerial optical remote sensing data and satellite optical remote sensing data of the collapsed building is constructed.

Specifically, a dataset DB-ARSD was constructed, comprising 3000 satellite images and 1000 aerial images. These images are collected after natural disasters such as earthquakes, hurricanes, etc. In these images the bounding box of the damaged building is marked.

In step S2, as shown in FIG. 2, the teacher network f (θ) is trained to obtain pseudo tags using manual labeling of the aviation optical remote sensing data and the aviation optical remote sensing data as inputs

In some embodiments, in said step S2, a two-phase model, fast-Rcnn, is selected as the teacher network.

Applying a loss functionTraining the teacher network;

In step S3, as shown in FIG. 3, the aviation satellite style migration network g (θ) is trained using the aviation optical remote sensing data and the satellite optical remote sensing data as inputs to generate pseudolite optical remote sensing dataThe architecture of the aviation satellite style migration network is a generation countermeasure network for unpaired image-to-image conversion.

In some casesIn an embodiment, in the step S3, the network structure of the aviation satellite style migration network learns a mapping G: x is X ₁ →X ₂ and F：X₂ →X ₁ ；

And selecting the Cycle-GAN network as an aviation satellite style migration network.

Specifically, the cyclic consistency loss function encourages F (G (x) ₁ ))≈x ₁ and G(F(x ₂ ))≈x ₂ . The loss function is then combined with X ₁ and X₂ And (3) combining the antagonism loss functions to obtain a complete objective function of unpaired image-to-image conversion of the aviation satellite style migration network training. For map G: x is X ₁ →X ₂ And discriminator D thereof _Y The objective function is:

wherein G attempts to generate a domain that looks similar to that from domain X ₂ Image G (x) ₁ ) And D is _Y Attempting to distinguish between transformed samples G (x ₁ ) And true sample x ₂ 。

For mapping: f: x is X ₂ →X ₁ And its discriminator D _X A similar loss function is used. The loop consistency penalty reduces the space of possible mapping functions by forcing forward and backward consistency:

the complete objective function is:

L _GAN (G，F，D _X ，D _Y )＝L _GAN (G，D _Y ，X，Y)+L _GAN (F，D _X ，X，Y)+λL _cyc (G，F)。

in the aviation satellite style migration network training process, the generated style migration image is gradually converted from aviation optical remote sensing data to satellite optical remote sensing data along with the increase of training round number, as shown in fig. 5.

In step S4, as shown in fig. 4, the student network f (epsilon) is trained using the pseudolite optical remote sensing data, the manual labels and the pseudotags as inputs.

In some embodiments, in the step S4, the method for training a student network using the pseudolite optical remote sensing data, the artificial annotation and the pseudotag as inputs includes:

The student network model consists of an R-CNN network and a Cycle-GAN network; in the Cycle-GAN network, a res net is used as a basic network of generators and discriminators.

Specifically, in the student network training, the initial structure and parameters of the student network are inherited by the teacher network trained in step S2. Fixing f (theta) and g (theta) network parameters, and inputting x ₁ To f (θ) toInput x ₁ To g (θ) get ∈>Then taking out the artificial mark y from the original training data pair ₁ Composition-> and />Two sets of training data pairs, and finally, the combined new training batch data is input into f (epsilon) for training.

The aviation satellite style migration network and the student network training are synchronously carried out, so that data generated by the aviation satellite style migration network in each round of training can be used as training data of the student network, and the student network can learn more characteristics and information.

In step S5, as shown in fig. 5, an EMA algorithm is applied to update parameters in the student network after the training round to the teacher network, and update parameters of the teacher network.

Specifically, mθ+ (1-m) ε → θ, mε [0,1], where m is typically 0.999, and the speed of the student network transmitting parameters to the teacher network through the EMA is controlled according to the actual training situation.

The EMA algorithm is introduced to enable the network to keep useful information in the training data when the parameters are updated, so that the training effect of the model is finally improved.

In the training process of the model, an Adam optimizer is used for parameter optimization, the initial value of the learning rate is 0.001, and the weight attenuation coefficient is 0.0005. In training, the batch size was set to 4 and the epoch number was set to 50. To avoid overfitting, dropout techniques and data enhancement methods are used, such as rotation, flipping, scaling, and the like.

The application discloses a collapse building detection system based on cross-domain teacher-student training. FIG. 7 is a block diagram of a collapsed building detection system based on cross-domain teacher-student training in accordance with an embodiment of the present application; as shown in fig. 7, the system 100 includes:

a first processing module 101 configured to construct a dataset comprising aerial optical remote sensing data and satellite optical remote sensing data of a collapsed building;

the second processing module 102 is configured to train the teacher network to obtain the pseudo tag by applying the artificial annotation of the aviation optical remote sensing data and the aviation optical remote sensing data as inputs;

a third processing module 103 configured to train the aviation satellite style migration network to generate pseudolite optical remote sensing data, applying the aviation optical remote sensing data and the satellite optical remote sensing data as inputs; the structure of the aviation satellite style migration network is a generation countermeasure network for conversion from unpaired images to images;

a fourth processing module 104 configured to train the student network with the pseudolite optical remote sensing data, the manual labels, and the pseudotags as inputs;

a fifth processing module 105, configured to apply an EMA algorithm to update parameters in the student network after the training round to the teacher network, and update parameters of the teacher network;

the sixth processing module 106 is configured to repeat the training iteration of the second processing module to the fifth processing module on the teacher network, and the trained teacher network is the finally obtained target detection model;

a seventh processing module 107 is configured to input aviation optical remote sensing data into the target detection model to detect a collapsed building.

According to the system of the second aspect of the present application, the first processing module 101 is specifically configured to construct a dataset DB-ARSD, comprising 3000 satellite images and 1000 aerial images. These images are collected after natural disasters such as earthquakes, hurricanes, etc. In these images the bounding box of the damaged building is marked.

The system according to the second aspect of the present application, the second processing module 102 is specifically configured to select the two-phase model fast-Rcnn as the teacher network.

Applying a loss functionTraining the teacher network;

According to the system of the second aspect of the present application, the third processing module 103 is specifically configured to learn a mapping G by the network structure of the aviation satellite style migration network: x is X ₁ →X ₂ and F：X₂ →X ₁ ；

Specifically, the cyclic consistency loss function encourages F (G (x) ₁ ))≈x ₁ and G(F(x ₂ ))≈x ₂ . The loss function is then combined with X ₁ and X₂ Combining the counterdamage functions of (a) to obtain the complete conversion of unpaired images to images for aviation satellite style migration network trainingAn objective function. For map G: x is X ₁ →X ₂ And discriminator D thereof _Y The objective function is:

the complete objective function is:

in the aviation satellite style migration network training process, generated style migration images are gradually converted from aviation optical remote sensing data to satellite optical remote sensing data along with the increase of training round numbers.

The system according to the second aspect of the present application, the fourth processing module 104 is specifically configured to take as input the pseudolite optical remote sensing data, the manual labeling and the pseudo tag, and the method for training the student network includes:

Specifically, in the student network training, the initial structure and parameters of the student network are inherited by the teacher network trained in the second processing module 102. Fixing f (theta) and g (theta) network parameters, and inputting x ₁ To f (θ) toInput x ₁ To g (θ) get ∈>Then taking out the artificial mark y from the original training data pair ₁ Composition-> and />Two sets of training data pairs, and finally, the combined new training batch data is input into f (epsilon) for training.

According to the system of the second aspect of the present application, the fifth processing module 105 is specifically configured to mθ+ (1-m) ε→θ, mε [0,1], where m is typically 0.999, and controls the speed of the student network transmitting parameters to the teacher network through the EMA according to the actual training situation.

A third aspect of the application discloses an electronic device. The electronic equipment comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps in the collapse building detection method based on cross-domain teacher-student training in any one of the first aspect of the application when executing the computer program.

Fig. 8 is a block diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 8, the electronic device includes a processor, a memory, a communication interface, a display screen, and an input device connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the electronic device is used for conducting wired or wireless communication with an external terminal, and the wireless communication can be achieved through WIFI, an operator network, near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the electronic equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 8 is merely a block diagram of a portion related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the technical solution of the present disclosure is applied, and a specific electronic device may include more or less components than those shown in the drawings, or may combine some components, or have different component arrangements.

A fourth aspect of the application discloses a computer-readable storage medium. The computer readable storage medium stores a computer program which, when executed by a processor, implements the steps in a method for detecting a collapsed building based on cross-domain teacher-student training according to any one of the first aspect of the present disclosure.

Note that the technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be regarded as the scope of the description. The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. The method for detecting the collapse building based on the cross-domain teacher-student training is characterized by comprising the following steps of:

2. The method for detecting a collapse building based on cross-domain teacher-student training according to claim 1, wherein in the step S2, a two-stage model, fast-Rcnn, is selected as a teacher network.

3. The method for detecting a collapsed building based on cross-domain teacher-student training according to claim 2, wherein in step S2, a loss function is applied Training the teacher network;

wherein ,indicating RPN class loss,/->Indicating the regression loss of RPN, < >>Representing ROI classification loss,>indicating ROI regression loss.

4. The method for detecting a collapse building based on cross-domain teacher-student training according to claim 1, wherein in the step S3, the network structure of the aviation satellite style migration network learns a mapping G: x is X ₁ →X ₂ and F：X₂ →X ₁ ；

associating a cyclic consistency loss function with X ₁ and X₂ Is combined with the contrast loss function of (2)And (3) combining to obtain a complete objective function of conversion from unpaired images to images for aviation satellite style migration network training.

5. The method for detecting a collapse building based on cross-domain teacher-student training according to claim 4, wherein in the step S3, a Cycle-GAN network is selected as an aviation satellite style migration network.

6. The method for detecting a collapse building based on cross-domain teacher-student training according to claim 1, wherein in the step S4, the method for training a student network using the pseudolite optical remote sensing data, the manual labeling and the pseudotag as inputs comprises:

7. The method for detecting a collapse building based on cross-domain teacher-student training according to claim 6, wherein in the step S4, the student network model is composed of an R-CNN network and a Cycle-GAN network; in the Cycle-GAN network, a res net is used as a basic network of generators and discriminators.

8. A collapsed building detection system for cross-domain teacher-to-student training, the system comprising:

9. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps in a method for detecting a collapsed building based on cross-domain teachers and students training according to any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps in a method for detecting a collapsed building based on cross-domain teachers and students training according to any one of claims 1 to 7 are implemented.