CN116012679B - Self-supervision remote sensing representation learning method based on multi-level cross-modal interaction - Google Patents
Self-supervision remote sensing representation learning method based on multi-level cross-modal interaction Download PDFInfo
- Publication number
- CN116012679B CN116012679B CN202211635290.8A CN202211635290A CN116012679B CN 116012679 B CN116012679 B CN 116012679B CN 202211635290 A CN202211635290 A CN 202211635290A CN 116012679 B CN116012679 B CN 116012679B
- Authority
- CN
- China
- Prior art keywords
- remote sensing
- mode
- vector
- sensing image
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 230000003993 interaction Effects 0.000 title claims abstract description 17
- 239000013598 vector Substances 0.000 claims abstract description 73
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000003062 neural network model Methods 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 9
- 230000000903 blocking effect Effects 0.000 claims abstract description 7
- 239000013604 expression vector Substances 0.000 claims description 15
- 230000000873 masking effect Effects 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 8
- 238000012544 monitoring process Methods 0.000 claims 4
- 238000013136 deep learning model Methods 0.000 abstract description 5
- 230000003287 optical effect Effects 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The application relates to the technical field of remote sensing image processing, in particular to a self-supervision remote sensing representation learning method based on multi-level cross-modal interaction. The method comprises the following steps: s100, acquiring a multi-mode remote sensing image sample pair set A; s200, traversing A, for a n,m Performing blocking and random mask processing to obtain a n,m Corresponding H mask blocks and D non-mask blocks; s300, performing joint training on M neural network models respectively corresponding to the M modes, wherein the neural network models comprise a data coding model and a decoder, and the joint training process comprises the following steps: will a n,m The corresponding target embedded vector sequence B is input to a data coding model corresponding to the mth mode; loss l=l employed by the joint training 1 +L 2 ,L 1 For the first level loss, L 2 Is a second level loss. The invention improves the characteristic representation capability of the deep learning model on remote sensing images of different modes.
Description
Technical Field
The invention relates to the technical field of remote sensing image processing, in particular to a self-supervision remote sensing representation learning method based on multi-level cross-modal interaction.
Background
The information acquisition approaches in the big data age are various, and multi-modal data has become a main form of data resources in recent years. The data of different modes has different characteristics, for example, the scattering points of the SAR remote sensing image simultaneously comprise amplitude information and frequency information, and the resolution ratio of the optical remote sensing image is higher than that of the SAR remote sensing image, and the optical remote sensing image contains more detail information. The existing remote sensing representation learning method is only suitable for carrying out feature representation on remote sensing images of one mode, and when a deep learning model inputs remote sensing images of other modes, the deep learning model has poor feature representation capability on the remote sensing images of the other modes; how to improve the feature representation capability of the deep learning model on remote sensing images of different modes is a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a self-supervision remote sensing representation learning method based on multi-level cross-modal interaction, which improves the characteristic representation capability of a deep learning model on remote sensing images of different modalities.
According to the invention, a self-supervision remote sensing representation learning method based on multi-level cross-modal interaction is provided, which comprises the following steps:
s100, acquiring a multi-mode remote sensing image sample pair set A= { a 1 ,a 2 ,…,a N },a n For the nth multi-mode remote sensing image sample pair, the value range of N is 1 to N, and N is the number of the multi-mode remote sensing image sample pairs included by A; a, a n =(a n,1 ,a n,2 ,…,a n,M ),a n,m Is a as n The method comprises the steps that in the M-th mode remote sensing image, the value range of M is 1 to M, M is the number of modes included in each multi-mode remote sensing image sample pair in A, and the multi-modes comprise at least two of optics, SAR, hyperspectrum and near infrared; the remote sensing images of M modes included in each multi-mode remote sensing image sample pair are the remote sensing images of the same scene, and the sequence of the remote sensing images corresponding to different modes in each multi-mode remote sensing image sample pair is the same.
S200, traversing A, for a n,m Performing blocking and random mask processing to obtain a n,m Corresponding H mask blocks and D non-mask blocks.
S300, carrying out joint training on M neural network models respectively corresponding to the M modes, wherein the neural network model comprisesThe method comprises the steps of a data coding model and a decoder, wherein A is a multi-mode remote sensing image sample pair set required by one joint training, and the joint training process comprises the following steps: will a n,m The corresponding target embedded vector sequence B is input to a data coding model corresponding to the mth mode; b= (f) n,0 m ,f n,1 m ,f n,2 m ,…,f n,H+D m ),f n,0 m Is a as n,m Corresponding global embedded vector to be learned, f n,i m Is a as n,m The corresponding local embedded vector of the ith block, wherein the value range of i is 1 to H+D, and f is when the ith block is a mask block n,i m To include the ith block at a n,m A local embedded vector to be learned of the medium position information; when the ith block is a non-masking block, f n,i m For including pixel value information of the ith block and the ith block in a n,m A local embedded vector of the position information.
Loss l=l employed by the joint training 1 +L 2 ,L 1 For the first level loss, L 2 For the second level loss, L 1 And (3) withNegative correlation and L 1 And->Positive correlation, ->A global feature expression vector obtained for inputting B into a data coding model corresponding to the mth modality,/V>The method comprises the steps of inputting an embedded vector which is not subjected to random masking processing and corresponds to a reconstruction image of an mth mode into a global feature expression vector obtained by a data coding model corresponding to the mth mode, wherein the reconstruction image of the mth mode is obtained by adopting the method that ∈>An image input to a decoder corresponding to the mth mode,/th mode>Is a as n An average global feature representation vector corresponding to a mode other than the mth mode,/-> To a is of n The target embedded vector sequence corresponding to the jth mode is input into the global feature expression vector obtained by the data coding model corresponding to the jth mode; />For the division a in A n The q-th multimode remote sensing image sample corresponds to the global feature expression vector of the m-th mode, the value range of q is 1 to N-1, sim () is the similarity, and tau is the preset temperature; l (L) 2 And (3) withPositive correlation, ->Is->Is characterized by a probability distribution of->Is->Is characterized by a probability distribution of->To a is of n The target embedded vector sequence corresponding to the g-th mode is input into the global feature expression vector obtained by the data coding model corresponding to the g-th mode, g=1, 2, …, M, g is not equal to M,>is->And->KL divergence of (c).
Compared with the prior art, the method provided by the invention has obvious beneficial effects, can achieve quite technical progress and practicality by virtue of the technical scheme, has wide industrial utilization value, and has at least the following beneficial effects:
the invention discloses a self-supervision remote sensing representation learning method based on multi-level cross-modal interaction, which utilizes a multi-modal remote sensing image sample pair set to perform joint training on M neural network models, wherein each multi-modal remote sensing image sample pair comprises M remote sensing images of different modes in the same scene, a data coding model of each neural network model corresponds to the input of the remote sensing image of one mode. Based on the specific multi-mode remote sensing image sample pair set and the specific loss adopted in the joint training process, the data coding model in the neural network model can learn the information of the remote sensing images from different modes, the learning capacity of the data coding model in the neural network model on the information among the modes is improved, and the characteristic representation capacity of the remote sensing images of different modes is further improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a self-supervision remote sensing representation learning method based on multi-level cross-modal interaction provided by an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
According to the invention, a self-supervision remote sensing representation learning method based on multi-level cross-modal interaction is provided, as shown in fig. 1, comprising the following steps:
s100, acquiring a multi-mode remote sensing image sample pair set A= { a 1 ,a 2 ,…,a N },a n For the nth multi-mode remote sensing image sample pair, the value range of N is 1 to N, and N is the number of the multi-mode remote sensing image sample pairs included by A; a, a n =(a n,1 ,a n,2 ,…,a n,M ),a n,m Is a as n The method comprises the steps that in the M-th mode remote sensing image, the value range of M is 1 to M, M is the number of modes included in each multi-mode remote sensing image sample pair in A, and the multi-modes comprise at least two of optics, SAR, hyperspectrum and near infrared; each multi-mode remote sensing image sample pair comprises M mode remote sensing images which are identicalThe remote sensing images of a scene have the same sequence of the remote sensing images corresponding to different modes in each multi-mode remote sensing image sample pair.
According to the invention, the number of the remote sensing images included in each multi-mode remote sensing image sample pair is the same, the modes corresponding to different remote sensing images included in each multi-mode remote sensing image sample pair are different, and the sequence of the remote sensing images corresponding to different modes in the corresponding multi-mode remote sensing image sample pair is the same. As an embodiment, each multi-modal remote sensing image sample pair comprises an optical remote sensing image and a SAR remote sensing image, m=2, a n =(a n,1 ,a n,2 ),a n,1 For optical remote sensing image, a n,2 Is an SAR remote sensing image. As another embodiment, each multi-modal remote sensing image sample pair includes an optical remote sensing image, a SAR remote sensing image, and a hyperspectral remote sensing image, m=3, a n =(a n,1 ,a n,2 ,a n,3 ),a n,1 For optical remote sensing image, a n,2 A is SAR remote sensing image n,3 Is a hyperspectral remote sensing image.
S200, traversing A, for a n,m Performing blocking and random mask processing to obtain a n,m Corresponding H mask blocks and D non-mask blocks.
According to the invention, a n,m The corresponding set of mask blocks is YP, a n,m The corresponding set of non-masked chunks is NYP; yp= (YP) 1 ,yp 2 ,…,yp H ),yp h Is a as n,m The corresponding H mask block has the value range of H from 1 to H, and H is a n,m A corresponding number of mask blocks; nyp= (NYP) 1 ,nyp 2 ,…,nyp D ),nyp d Is a as n,m The corresponding D-th non-mask block has D ranging from 1 to D, D being a n,m A corresponding number of non-masked chunks.
Alternatively, to a n,m Performing blocking and random masking processes, including:
s210, will a n,m Equally divided into Z x Z blocks, each block having a size (M 0 /Z)×(N 0 /Z)×C m Z is a preset positive integer, M 0 Is a as n,m Length of N 0 Is a as n,m Is of width C m Is a as n,m A corresponding number of image channels.
S220, performing random masking on the z×z blocks according to a preset ratio k to obtain h=round (k×z×z) mask blocks and d=z×z-H non-mask blocks, where round () is a rounding.
Those skilled in the art will appreciate that any method of blocking and random masking in the prior art falls within the scope of the present invention.
S300, performing joint training on M neural network models respectively corresponding to the M modalities, wherein the neural network models comprise a data coding model and a decoder, A is a multi-modality remote sensing image sample pair set required by one joint training, and the joint training process comprises the following steps: will a n,m The corresponding target embedded vector sequence B is input to a data coding model corresponding to the mth mode; b= (f) n,0 m ,f n,1 m ,f n,2 m ,…,f n,H+D m ),f n,0 m Is a as n,m Corresponding global embedded vector to be learned, f n,i m Is a as n,m The corresponding local embedded vector of the ith block, wherein the value range of i is 1 to H+D, and f is when the ith block is a mask block n,i m To include the ith block at a n,m A local embedded vector to be learned of the medium position information; when the ith block is a non-masking block, f n,i m For including pixel value information of the ith block and the ith block in a n,m A local embedded vector of the position information.
Optionally, the data coding model is a transducer data coding model, and the decoder is a linear layer; as an embodiment, the decoder is configured to predict the original pixel values corresponding to the mask blocks according to the output of the data encoding model. Those skilled in the art will appreciate that any configuration of data encoding models and decoders in the prior art falls within the scope of the present invention.
According to the invention, when said i-th block is a non-masked block, f n,i m Acquisition method of (a)Comprising the following steps:
and S310, stretching the ith block into a corresponding one-dimensional pixel vector according to the pixel value information corresponding to the ith block.
According to the invention, the size of the ith block is: (M) 0 /Z)×(N 0 /Z)×C m The dimension of the one-dimensional pixel vector after stretching the ith block into the corresponding one-dimensional pixel vector is (M 0 /Z)×(N 0 /Z)×C m . For example, the size of the i-th block is: 5×5×1, stretching the i-th block into a corresponding one-dimensional pixel vector, wherein the dimension of the one-dimensional pixel vector is 25, the 1 st element in the one-dimensional pixel vector is the pixel value of the pixel point with the coordinates of (1, 1) of the i-th block, the 2 nd element in the one-dimensional pixel vector is the pixel value of the pixel point with the coordinates of (1, 2) of the i-th block, and so on, the 24 th element in the one-dimensional pixel vector is the pixel value of the pixel point with the coordinates of (5, 4) of the i-th block, and the 25 th element in the one-dimensional pixel vector is the pixel value of the pixel point with the coordinates of (5, 5) of the i-th block.
S320, inputting the one-dimensional pixel vector into a linear mapping layer to obtain a pixel embedded vector corresponding to the ith block.
The linear mapping layer functions to transform the one-dimensional pixel vector into a pixel embedding vector of a preset dimension. Those skilled in the art will appreciate that any linear mapping layer in the prior art falls within the scope of the present invention.
S330, the position information corresponding to the ith block is encoded into a corresponding one-dimensional position vector, and the dimension of the one-dimensional position vector is the same as that of the one-dimensional pixel vector; and inputting the one-dimensional position vector into a linear mapping layer to obtain a position embedded vector corresponding to the ith block.
According to the invention, the dimension of the one-dimensional position vector is (M 0 /Z)×(N 0 /Z)×C m The dimension of the position embedded vector obtained by the one-dimensional position vector through the linear mapping layer is the same as the dimension of the pixel embedded vector obtained by the one-dimensional pixel vector through the linear mapping layer, and the dimension is the preset dimension.
S340, the position corresponding to the ith block is determinedAdding the embedded vector and the pixel embedded vector to obtain f n,i m 。
According to the invention, the position embedded vector corresponding to the ith block has the same dimension as the pixel embedded vector, and can be subjected to addition operation, and f is obtained after the addition operation n,i m Including both the pixel value information of the ith block and the pixel value information of the ith block in a n,m Is included in the location information.
According to the present invention, when the ith block is a mask block, the ith block is replaced with a mask vector to be learned, the dimension of the mask vector is set to be the same as the dimension of the pixel embedding vector, and then the position embedding vector corresponding to the ith block is added with the mask vector to obtain f n,i m The method comprises the steps of carrying out a first treatment on the surface of the F is of n,i m To include the ith block at a n,m A local embedded vector to be learned of the position information.
According to the invention, the loss L=L used for the joint training 1 +L 2 ,L 1 For the first level loss, L 2 For the second level loss, L 1 And (3) withNegative correlation and L 1 And->Positive correlation, ->A global feature expression vector obtained for inputting B into a data coding model corresponding to the mth modality,/V>The method comprises the steps of inputting an embedded vector which is not subjected to random masking processing and corresponds to a reconstruction image of an mth mode into a global feature expression vector obtained by a data coding model corresponding to the mth mode, wherein the reconstruction image of the mth mode is obtained by adopting the method that ∈>An image input to a decoder corresponding to the mth mode,/th mode>Is a as n The average global feature representation vector corresponding to a modality other than the mth modality,to a is of n The target embedded vector sequence corresponding to the jth mode is input into the global feature expression vector obtained by the data coding model corresponding to the jth mode; />For the division a in A n The q-th multimode remote sensing image sample corresponds to the global feature expression vector of the m-th mode, the value range of q is 1 to N-1, sim () is the similarity, and tau is the preset temperature; l (L) 2 And->Positive correlation, ->Is->Is characterized by a probability distribution of->Is->Is characterized by a probability distribution of->To a is of n The target embedded vector sequence corresponding to the g-th mode is input into the global feature expression vector obtained by the data coding model corresponding to the g-th mode, g=1, 2, …, M, g not equal to M,is->And->KL divergence of (c).
Alternatively to this, the method may comprise,those skilled in the art will appreciate that any similarity calculation method in the prior art falls within the scope of the present invention.
Acquisition ofAnd->Thereafter, the process of how to obtain the feature probability distribution and obtain the KL divergence is the prior art, and will not be described herein. Those skilled in the art will appreciate that any method of obtaining a characteristic probability distribution and a method of obtaining a KL divergence in the prior art fall within the scope of the present invention.
Optionally, the first level loss L 1 The following relationship is satisfied:
for the division a in A n The other multimode remote sensing image samples except the multimode remote sensing image samples correspond to the set of global feature representation vectors of the mth mode.
Optionally, the second level loss L 2 The following relationship is satisfied:
according to the invention, the characteristic representation of the remote sensing images of different modes can be performed by using the data coding model of each neural network model after the training is finished.
The method does not need to label the remote sensing images in the multi-mode remote sensing image sample pair, belongs to a self-supervision learning method, and solves the problems of time and labor waste and high labeling cost caused by manually labeling the remote sensing images in the prior art.
The invention discloses a self-supervision remote sensing representation learning method based on multi-level cross-modal interaction, which utilizes a multi-modal remote sensing image sample pair set to perform joint training on M neural network models, wherein each multi-modal remote sensing image sample pair comprises M remote sensing images of different modes in the same scene, a data coding model of each neural network model corresponds to the input of the remote sensing image of one mode. Based on the specific multi-mode remote sensing image sample pair set and the specific loss adopted in the joint training process, the data coding model in the neural network model can learn the information of the remote sensing images from different modes, the learning capacity of the data coding model in the neural network model on the information among the modes is improved, and the characteristic representation capacity of the remote sensing images of different modes is further improved.
While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.
Claims (8)
1. A self-supervision remote sensing representation learning method based on multi-level cross-modal interaction is characterized by comprising the following steps:
s100, acquiring a multi-mode remote sensing image sample pair set A= { a 1 ,a 2 ,…,a N },a n For the nth multi-mode remote sensing image sample pair, the value range of N is 1 to N, and N is the number of the multi-mode remote sensing image sample pairs included by A; a, a n =(a n,1 ,a n,2 ,…,a n,M ),a n,m Is a as n The method comprises the steps that in the M-th mode remote sensing image, the value range of M is 1 to M, M is the number of modes included in each multi-mode remote sensing image sample pair in A, and the multi-modes comprise at least two of optics, SAR, hyperspectrum and near infrared; the remote sensing images of M modes included in each multi-mode remote sensing image sample pair are the remote sensing images of the same scene, and the sequence of the remote sensing images corresponding to different modes in each multi-mode remote sensing image sample pair is the same;
s200, traversing A, for a n,m Performing blocking and random mask processing to obtain a n,m Corresponding H mask blocks and D non-mask blocks;
s300, performing joint training on M neural network models respectively corresponding to the M modalities, wherein the neural network models comprise a data coding model and a decoder, A is a multi-modality remote sensing image sample pair set required by one joint training, and the joint training process comprises the following steps: will a n,m The corresponding target embedded vector sequence B is input to a data coding model corresponding to the mth mode; b= (f) n,0 m ,f n,1 m ,f n,2 m ,…,f n,H+D m ),f n,0 m Is a as n,m Corresponding global embedded vector to be learned, f n,i m Is a as n,m The corresponding local embedded vector of the ith block, wherein the value range of i is 1 to H+D, and f is when the ith block is a mask block n,i m To include the ith blocka n,m A local embedded vector to be learned of the medium position information; when the ith block is a non-masking block, f n,i m For including pixel value information of the ith block and the ith block in a n,m A local embedded vector of the medium position information;
loss l=l employed by the joint training 1 +L 2 ,L 1 For the first level loss, L 2 For the second level loss, L 1 And (3) withNegative correlation and L 1 And->Positive correlation, ->A global feature expression vector obtained for inputting B into a data coding model corresponding to the mth modality,/V>The method comprises the steps of inputting an embedded vector which is not subjected to random masking processing and corresponds to a reconstruction image of an mth mode into a global feature expression vector obtained by a data coding model corresponding to the mth mode, wherein the reconstruction image of the mth mode is obtained by adopting the method that ∈>An image input to a decoder corresponding to the mth mode,/th mode>Is a as n An average global feature representation vector corresponding to a mode other than the mth mode,/-> To a is of n The target embedded vector sequence corresponding to the jth mode is input into the global feature expression vector obtained by the data coding model corresponding to the jth mode; />For the division a in A n The q-th multimode remote sensing image sample corresponds to the global feature expression vector of the m-th mode, the value range of q is 1 to N-1, sim () is the similarity, and tau is the preset temperature; l (L) 2 And (3) withPositive correlation, ->Is->Is characterized by a probability distribution of->Is->Is characterized by a probability distribution of->To a is of n The target embedded vector sequence corresponding to the g-th mode is input into the global feature expression vector obtained by the data coding model corresponding to the g-th mode, g=1, 2, …, M, g is not equal to M,>is->And->KL divergence of (c).
2. The self-monitoring remote sensing representation learning method based on multi-level cross-modal interaction of claim 1, wherein in S300, for the division a in A n The other multimode remote sensing image samples except the multimode remote sensing image samples correspond to the set of global feature representation vectors of the mth mode.
4. the self-monitoring remote sensing representation learning method based on multi-level cross-modal interaction according to claim 1, wherein in S200, for a n,m Performing blocking and random masking processes, including:
s210, will a n,m Equally divided into Z x Z blocks, each block having a size (M 0 /Z)×(N 0 /Z)×C m Z is a preset positive integer, M 0 Is a as n,m Length of N 0 Is a as n,m Is of width C m Is a as n,m A corresponding number of image channels;
s220, performing random masking on the z×z blocks according to a preset ratio k to obtain h=round (k×z×z) mask blocks and d=z×z-H non-mask blocks, where round () is a rounding.
5. The multiple-based system of claim 1A self-supervision remote sensing representation learning method of hierarchical cross-modal interaction is characterized in that in S300, when the ith block is a non-mask block, f n,i m The acquisition method of (1) comprises the following steps:
s310, stretching the ith block into a corresponding one-dimensional pixel vector according to the pixel value information corresponding to the ith block;
s320, inputting the one-dimensional pixel vector into a linear mapping layer to obtain a pixel embedded vector corresponding to the ith block;
s330, the position information corresponding to the ith block is encoded into a corresponding one-dimensional position vector, and the dimension of the one-dimensional position vector is the same as that of the one-dimensional pixel vector; inputting the one-dimensional position vector to a linear mapping layer to obtain a position embedded vector corresponding to the ith block;
s340, adding the position embedded vector corresponding to the ith block and the pixel embedded vector to obtain f n,i m 。
6. The self-supervised remote sensing representation learning method based on multi-level cross-modal interaction of claim 1, wherein the data encoding model is a transducer data encoding model.
7. The method of claim 1, wherein the decoder is a linear layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211635290.8A CN116012679B (en) | 2022-12-19 | 2022-12-19 | Self-supervision remote sensing representation learning method based on multi-level cross-modal interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211635290.8A CN116012679B (en) | 2022-12-19 | 2022-12-19 | Self-supervision remote sensing representation learning method based on multi-level cross-modal interaction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116012679A CN116012679A (en) | 2023-04-25 |
CN116012679B true CN116012679B (en) | 2023-06-16 |
Family
ID=86036542
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211635290.8A Active CN116012679B (en) | 2022-12-19 | 2022-12-19 | Self-supervision remote sensing representation learning method based on multi-level cross-modal interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116012679B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107346328A (en) * | 2017-05-25 | 2017-11-14 | 北京大学 | A kind of cross-module state association learning method based on more granularity hierarchical networks |
CN115223057A (en) * | 2022-08-02 | 2022-10-21 | 大连理工大学 | Target detection unified model for multimodal remote sensing image joint learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113240056B (en) * | 2021-07-12 | 2022-05-17 | 北京百度网讯科技有限公司 | Multi-mode data joint learning model training method and device |
-
2022
- 2022-12-19 CN CN202211635290.8A patent/CN116012679B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107346328A (en) * | 2017-05-25 | 2017-11-14 | 北京大学 | A kind of cross-module state association learning method based on more granularity hierarchical networks |
CN115223057A (en) * | 2022-08-02 | 2022-10-21 | 大连理工大学 | Target detection unified model for multimodal remote sensing image joint learning |
Non-Patent Citations (1)
Title |
---|
基于表示学习的跨模态检索模型与特征抽取研究综述;李志义;黄子风;许晓绵;;情报学报(04);第 86-99页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116012679A (en) | 2023-04-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108399406B (en) | Method and system for detecting weakly supervised salient object based on deep learning | |
CN112634296B (en) | RGB-D image semantic segmentation method and terminal for gate mechanism guided edge information distillation | |
CN113313164B (en) | Digital pathological image classification method and system based on super-pixel segmentation and graph convolution | |
CN112801047B (en) | Defect detection method and device, electronic equipment and readable storage medium | |
CN113269224A (en) | Scene image classification method, system and storage medium | |
CN111325766A (en) | Three-dimensional edge detection method and device, storage medium and computer equipment | |
CN113988147A (en) | Multi-label classification method and device for remote sensing image scene based on graph network, and multi-label retrieval method and device | |
CN117974693B (en) | Image segmentation method, device, computer equipment and storage medium | |
CN115578589A (en) | Unsupervised echocardiography section identification method | |
CN115131558A (en) | Semantic segmentation method under less-sample environment | |
CN117693754A (en) | Training masked automatic encoders for image restoration | |
Zhang et al. | A joint convolution auto-encoder network for infrared and visible image fusion | |
Aliouat et al. | EVBS-CAT: enhanced video background subtraction with a controlled adaptive threshold for constrained wireless video surveillance | |
CN116776014B (en) | Multi-source track data representation method and device | |
CN116012679B (en) | Self-supervision remote sensing representation learning method based on multi-level cross-modal interaction | |
CN117011650A (en) | Method and related device for determining image encoder | |
Li et al. | Automated Tire visual inspection based on low rank matrix recovery | |
CN107273793A (en) | A kind of feature extracting method for recognition of face | |
CN115148303B (en) | Microorganism-drug association prediction method based on normalized graph neural network | |
CN115861713A (en) | Carotid plaque ultrasonic image processing method based on multitask learning | |
Duan et al. | A study on the generalized normalization transformation activation function in deep learning based image compression | |
Kebir et al. | End-to-end deep auto-encoder for segmenting a moving object with limited training data | |
Das et al. | Image splicing detection using feature based machine learning methods and deep learning mechanisms | |
Ye et al. | GFSCompNet: remote sensing image compression network based on global feature-assisted segmentation | |
CN117853739B (en) | Remote sensing image feature extraction model pre-training method and device based on feature transformation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |