CN117611896B

CN117611896B - Multi-mode remote sensing data classification basic model self-adaption method

Info

Publication number: CN117611896B
Application number: CN202311580794.9A
Authority: CN
Inventors: 何欣; 赵雅琴; 陈雨时; 吴龙文
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-05-07
Anticipated expiration: 2043-11-24
Also published as: CN117611896A

Abstract

A multi-mode remote sensing data classification basic model self-adaption method belongs to the technical field of image processing. The method solves the problem that the classification of the multi-mode remote sensing data is difficult to realize by directly using the pre-trained basic model in the prior art. The key points are as follows: acquiring multi-mode remote sensing data; preprocessing multi-mode remote sensing data; establishing a mapping layer and selecting a basic model; constructing a cross-space interaction module to realize interaction of the general features and the space features in the multi-mode data, generating a learnable space coding vector set along the space dimension of the multi-mode remote sensing data, adding the learnable space coding vector set into the feature coding vector set, and outputting the space features; constructing a cross-channel interaction module, adding the characteristic interaction between the general characteristic and the channel dimension in the multi-mode data in a multi-head attention mechanism MSA of an encoder module of a basic model, and outputting a spectrum characteristic; and inputting the spatial features and the spectral features into the full-connection layer to obtain a classification result. The method and the device improve the classification accuracy of the multi-mode remote sensing data.

Description

Multi-mode remote sensing data classification basic model self-adaption method

Technical Field

The invention relates to a multi-mode remote sensing data classification method, in particular to a basic model self-adaption method for multi-mode remote sensing data classification, and belongs to the technical field of image processing.

Background

Multimodal in the field of remote sensing generally refers to the imaging results of a scene and a target under different sensors (multispectral, hyperspectral, synthetic aperture radar, lidar, etc.). The multi-mode remote sensing data can be reasonably utilized to provide more comprehensive description information for ground objects from spectrum, time and space, so that the interpretation capability of remote sensing is improved, and the requirements of practical applications such as military reconnaissance and intelligent agriculture are met.

In order to promote the deep and wide application of the multi-mode remote sensing data in the application, an effective information processing means is needed. Classification is one of important links of remote sensing data processing technology, and is a hot spot problem of current research. In recent years, deep learning has become a mainstream of multi-modal remote sensing data classification due to its strong feature extraction capability. Among the many deep learning methods, the transform-based multi-modal remote sensing data method has attracted a great deal of attention. The self-attention mechanism in the transducer enables it to focus on different modalities simultaneously and capture long-range dependencies, but the classification accuracy of these methods still needs to be improved.

With the development of the deep learning method, a neural network, namely a basic model, which is trained by unsupervised learning on the basis of a large amount of original data emerges. On the downstream task, better classification performance can be obtained by only a small amount of marking data for the basic model. However, the data set used for training the basic model and the multi-modal data have large differences, so that it is difficult to directly use the pre-trained basic model to realize the classification of the multi-modal remote sensing data.

Disclosure of Invention

In order to solve the problem that the pre-trained basic model is difficult to directly use to realize the classification of the multi-mode remote sensing data in the prior art, the invention explores the potential and the effectiveness of the basic model in the multi-mode remote sensing data classification task and further provides a multi-mode remote sensing data classification basic model self-adaption method, and the classification effect can be improved without fine adjustment of basic model parameters. The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.

The technical scheme of the invention is as follows:

a multi-mode remote sensing data classification basic model self-adaption method comprises the following steps:

s1, acquiring multi-mode remote sensing data;

s2, preprocessing multi-mode remote sensing data, taking a pixel with a label in each mode as a center, selecting a rectangular window with the size of 27 multiplied by 27 as a sample of each single-mode data, amplifying all samples to 224 multiplied by 224 by a bilinear interpolation method for the sample of each mode data, and matching the spatial input dimension of a basic model;

S3, establishing a mapping layer, respectively sending samples of each mode data into the mapping layer to perform dimension reduction matching on channel dimensions, and fusing different single-mode data samples subjected to dimension reduction to obtain fused multi-mode data input samples;

S4, selecting a basic model based on a transducer, inputting the multi-mode remote sensing data into the basic model, and extracting general characteristics of the multi-mode data;

S5, constructing a cross-space interaction module to realize interaction of the general features and the spatial features in the multi-mode data, generating a learnable spatial coding vector set S ⁰ along the spatial dimension of the multi-mode remote sensing data, adding the learnable spatial coding vector set S ⁰ into a feature coding vector set Z ⁰, and outputting the spatial features;

S6, constructing a cross-channel interaction module, and adding the characteristic interaction between the general characteristic and the channel dimension in the multi-mode data in a multi-head attention mechanism MSA of an encoder module of a basic model to output a spectrum characteristic;

s7, inputting the output spatial features and the spectral features into the full-connection layer to obtain a classification result.

Further: the specific implementation process of the step S5 is as follows:

S51, selecting hyperspectral data in multimode remote sensing data, selecting a hyperspectral data sample x _HSI, and extracting spatial characteristics of the hyperspectral data sample x _HSI by a convolutional neural network; CNN consists of a1×1 convolutional layer, a batch normalization layer, a ReLU activation function, and a3×3 convolutional layer, and its output spatial feature S is formulated as:

S＝CNN(x_HSI) (9)

S52, dividing the spatial feature S into N non-overlapping feature blocks to form a spatial feature coding set S ⁰, and splicing the S ⁰ and a feature coding vector set Z ⁰, wherein the method is expressed as follows:

further: the specific implementation process of the step S6 is as follows:

S61, taking the average spectrum characteristic of each hyperspectral data sample x _HSI as an input;

s62, inputting the spectral features into CNNs of a 1X 1 convolution layer, a batch normalization layer, a ReLU activation function and an MLP layer to output a spectral feature coding set D';

S63.D' is used to generate D _Q、D_K and D _V that match the three matrices of query matrix Q, key value matrix K, value matrix V, and add them to Q _i,K_i and V _i in the multi-headed self-attention mechanism MSA in the base model;

Wherein, D _Q and D _K are represented by the intermediate vector of D 'for extracting the spectral feature of the feature represented by the center pixel, and D _V is the same as D' and represents the global spectral feature;

s64, updating the calculation mode in the MSA:

Where D _Q represents the spectral vector added to Q _i, D _K represents the spectral vector added to K _i, and D _V represents the spectral vector added to V _i.

The beneficial effects of the invention are as follows:

Compared with the prior art, the basic model self-adaptive framework for multi-mode remote sensing data classification provided by the invention fuses the general features, the spatial features and the relations among the spectral features extracted by all multi-mode basic models only by adjusting the cross-space interaction module and the cross-channel interaction module, simultaneously freezes the parameters of the basic models, retains strong general knowledge extracted by the basic models, finally reasonably utilizes the pre-trained parameters of the basic models, and fuses the unique features of the multi-mode remote sensing data such as space, spectrum and the like to improve the classification precision of the multi-mode remote sensing data.

Drawings

FIG. 1 is a schematic diagram of a multi-modal remote sensing data classification method based on basic model fine tuning as described in example 1;

FIG. 2 is a schematic diagram of a cross-space interaction module in a basic model adaptive method for multi-modal remote sensing data classification according to embodiment 1;

fig. 3 is a schematic diagram of a cross-channel interaction module in a basic model adaptive method for multi-modal remote sensing data classification according to embodiment 1.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention is described below by means of specific embodiments shown in the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention. It should be further understood that the terms "first," "second," "third," and the like in this specification are used merely for distinguishing between various components, elements, steps, etc. in the specification and not for indicating a logical or sequential relationship between the various components, elements, steps, etc., unless otherwise indicated.

Embodiment 1, which is described with reference to fig. 1-3, provides a method for adapting a base model of multi-modal remote sensing data classification, comprising the following steps:

s1, acquiring multi-mode remote sensing data;

S4, selecting a basic model based on a transducer, for example DINOv, inputting the multi-mode remote sensing data into the basic model to extract the general characteristics of the multi-mode data, such as the characteristics of edges, angles and the like; the specific implementation process of the steps is as follows:

S41, dividing the fused multi-mode input sample into N sample blocks in a non-overlapping mode to form an input sequence of X= [ X ₁,x₂,…,x_N ];

S42, each sample block is sent into a linear projection layer to perform linear transformation, N feature coding vectors Z _n are output, and in order to obtain global information of an input sequence, a learnable feature coding vector Z _cls is added to Z _n to form a feature coding vector set Z ⁰;

S43, adding the feature coding vector set and the position coding vectors, and inputting the feature coding vector set and the position coding vectors into L encoders; in the encoder, the core module is a multi-head (MS) attention mechanism, defining a query weight matrix W ^Q, a key weight matrix W ^K, and a value weight matrix W ^V; x obtains three matrixes of a query matrix Q, a key matrix K and a value matrix V according to three different weight matrixes, and a calculation formula is shown in a formula (1):

Q＝XW^Q,K＝XW^K,V＝XW^V (1)

S44, inquiring the matrix Q and the value matrix V, and calculating the similarity between each element in the X through matrix multiplication QK ^T to obtain a similarity matrix;

The similarity matrix is processed through a Softmax activation function and is calculated as follows:

Wherein z _i represents input elements, C represents the number of the input elements, and the similarity matrix is processed through a Softmax function to obtain a normalized result, wherein a larger value represents a higher similarity between the elements;

s45, performing matrix multiplication on the similarity matrix and V to obtain an output result of the self-attention mechanism, wherein the calculation formula is as follows:

Where d _k represents the dimension of input X.

In order to enrich the extracted features of the self-attention mechanism and avoid sinking into certain local features, a transducer introduces a multi-head attention mechanism on the basis of a single-head self-attention mechanism. The multi-headed self-attention mechanism is an attention mechanism that parallelizes the scaled dot product attention mechanism, the number of parallelism being called the "head count", so that the model focuses on multiple subspace information simultaneously. The process of the multi-head self-attention mechanism MSA is expressed as:

Z_i＝Attention(Q_i,K_i,V_i),i＝1...h (5)

MS(Q,K,V)＝Concat(Z₁,Z₂,...,Z_h)W^o (6)

wherein i represents the number of the header, Representing the output projection matrix, Z _i representing the output matrix for each head,/>Key matrix representing the ith head,/>Representing the value matrix of the i-th header.

In each encoder, the post-MSA connection layer normalizes (Layer Normalization, LN) the layers and multi-layer perceptrons (Multilayer perceptron, MLP), and a residual connection is added after each module to prevent gradient extinction, the process being expressed as follows:

H^l-1＝MSA(LN(Z^l-1))+Z^l-1,l＝0,...,L-1 (7)

Z^l＝MLP(LN(H^l-1))+H^l-1,l＝0,...,L-1 (8)。

S5, constructing a cross-space interaction module to realize interaction of the general features and the spatial features in the multi-mode data, generating a learnable spatial coding vector set S ⁰ along the spatial dimension of the multi-mode remote sensing data, adding the learnable spatial coding vector set S ⁰ into the feature coding vector set Z ⁰, and outputting the spatial features. The specific implementation process of the steps is as follows:

S51, selecting hyperspectral data in the multimode remote sensing data, selecting a hyperspectral data sample x _HSI, and extracting spatial characteristics of the hyperspectral data sample x _HSI by a convolutional neural network (Convolutional Neural Network, CNN). CNN consists of a1×1 convolutional layer, a batch normalization layer, a ReLU activation function, and a 3×3 convolutional layer, and its output spatial feature S is formulated as:

S＝CNN(x_HSI) (9)

S52, dividing the spatial feature S into N non-overlapping feature blocks to form a spatial feature coding set S ⁰, and splicing the S ⁰ with the Z ⁰ on the basis of a formula (7) and a formula (8), wherein the method is expressed as follows:

s6, constructing a cross-channel interaction module, and adding the characteristic interaction between the general characteristic and the channel dimension in the multi-mode data in a multi-head attention mechanism MSA of an encoder module of the basic model to output the spectral characteristic. The specific implementation process of the steps is as follows:

s64, updating the calculation mode in the MSA:

Wherein, D _Q represents the spectral vector added to Q _i, D _K represents the spectral vector added to K _i, and D _V represents the spectral vector added to V _i.

Example 2:

an electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the base model adaptation method of multimodal remote sensing data classification of embodiment 1 when executing the computer program.

The computer device of the present invention may be a device including a processor and a memory, such as a single chip microcomputer including a central processing unit. And the processor is used for realizing the steps of the basic model self-adaption method for classifying the multi-mode remote sensing data when executing the computer program stored in the memory.

The Processor may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Example 3:

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements a base model adaptation method of multimodal remote sensing data classification as described in embodiment 1.

The computer readable storage medium of the present invention may be any form of storage medium readable by a processor of a computer device, including but not limited to non-volatile memory, ferroelectric memory, etc., having a computer program stored thereon, which when read and executed by the processor of the computer device, implements the steps of a multi-modal remote sensing data classification based model adaptation method described above.

The computer program comprises computer program code which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The foregoing embodiments have further described the objects, technical solutions and advantageous effects of the present application in detail, and it should be understood that the foregoing embodiments are merely examples of the present application and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the present application should be included in the scope of the present application.

Claims

1. The basic model self-adaption method for multi-mode remote sensing data classification is characterized by comprising the following steps of:

s1, acquiring multi-mode remote sensing data;

s7, inputting the output spatial features and the spectral features into the full-connection layer to obtain a classification result;

the specific implementation process of the step S5 is as follows:

S＝CNN(x_HSI) (9)

the specific implementation process of the step S6 is as follows:

s64, updating the calculation mode in the MSA:

2. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of a base model adaptation method of multi-modal remote sensing data classification as claimed in claim 1 when executing the computer program.

3. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements a method of base model adaptation for multi-modal remote sensing data classification as claimed in claim 1.