CN117315433B

CN117315433B - Remote sensing multi-mode multi-space functional mapping method based on distribution consistency constraint

Info

Publication number: CN117315433B
Application number: CN202311620152.7A
Authority: CN
Inventors: 付琨; 孙显; 刁文辉; 肖思宁; 申志平; 王佩瑾; 王萌雨
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-02-13
Anticipated expiration: 2043-11-30
Also published as: CN117315433A

Abstract

The invention provides a remote sensing multi-mode multi-space functional mapping method based on distribution consistency constraint, which can solve the problem that the prior method can not fully mine the essential characteristics of each mode and the influence of semantic gaps existing among different modes on characteristic fusion. Firstly, extracting modal essential characteristics of an optical image and an SAR image in European space and complex space respectively, and then mapping the modal essential characteristics in the two spaces to the same low-dimensional characteristic space through functional constraint to ensure the similarity of cross-modal corresponding characteristics, thereby reducing the influence of semantic gaps; and meanwhile, the reconstruction loss constraint is used for guaranteeing the uniqueness of different modes. Secondly, through carrying out consistency constraint on the feature similarity of the two modes after mapping, the useful information of the two modes is fused more fully. By the method, the intrinsic characteristics in the modes can be mined from the imaging mechanisms of different sensors, and the dimension reduction and cross-domain fusion of the characteristics of the different modes can be realized through functional mapping.

Description

Remote sensing multi-mode multi-space functional mapping method based on distribution consistency constraint

Technical Field

The invention relates to the technical field of remote sensing, in particular to a remote sensing multi-mode multi-space functional mapping method and device based on distribution consistency constraint.

Background

In recent years, with the rapid development of remote sensing technology, the remote sensing data volume grows exponentially, and diversified data is provided for earth observation. Different sensors can capture different ground feature characteristics, for example, an optical image can describe detailed information such as texture, color and the like of a ground feature target more truly and intuitively, and an SAR image reflects different scattering characteristics of different ground features. However, due to imaging differences of different sensors, images of different modes have different inherent defects, such as that an optical image is easily influenced by factors such as cloud and fog shielding, weather and the like, and problems such as perspective shrinkage, speckle noise and the like exist in an SAR image. Therefore, how to reduce the semantic gap and make more complete use of information provided by multi-modal data for fine interpretation has great research significance.

The simplest fusion method comprises the steps of directly summing or splicing the features along a channel, wherein the PSCNN and MRSDC methods fuse the information of two modes in a fusion mode of splicing along the channel, and the FuseNet and v_FuseNet fuse the features of the two modes in a direct addition mode, and the fusion mode ignores semantic gaps among the modes so as to lead to poor segmentation effect. To further enhance feature fusion, some scholars have proposed attention-based networks. MBFNet proposes a global average pooling and global maximum pooling bilinear attention module to perform feature fusion; both the AFNet and CroFuseNet approaches, in combination with the channel attention mechanism, also improve the effect of optical and SAR feature fusion. In addition, CMGFNet and CEGFNet methods introduce gating mechanisms to suppress redundant information present in different modalities to facilitate feature fusion. In recent years, because of the superior capability of the transducer to capture long-range dependencies, transducer structures have also been introduced into the field of multimodal fusion. The CMFNet, CEN and other methods are combined with the transducer to extract better modal characteristics for characteristic fusion. Although these methods improve the segmentation accuracy to some extent, the following problems remain: 1) The universal backup is adopted to fully mine unique characteristics with discriminatory properties of different modes; 2) The problem of modal feature mismatch caused by semantic gaps cannot be relieved, and feature fusion is insufficient.

Disclosure of Invention

The invention provides a remote sensing multi-mode multi-space functional mapping method and device based on distribution consistency constraint, which are used for solving the problem that the existing method can not fully mine the essential characteristics of each mode and the influence of semantic gaps existing among different modes on characteristic fusion.

According to a first aspect of the present invention, there is provided a remote sensing multi-mode multi-space functional mapping method based on distribution consistency constraint, comprising: step S1, acquiring an optical image and an SAR image aiming at the same remote sensing area; s2, extracting a first mode intrinsic characteristic of an optical image in a European space, and extracting a second mode intrinsic characteristic of an SAR image in a complex space, wherein the second mode intrinsic characteristic is formed after optimization by taking the first mode intrinsic characteristic as a guide; s3, reducing the dimensions of the first mode intrinsic characteristics and the second mode intrinsic characteristics to the same low-dimensional characteristic space by using functional mapping, and carrying out consistency constraint on the similarity of the first mode intrinsic characteristics and the second mode intrinsic characteristics in the low-dimensional characteristic space; and S4, carrying out feature fusion on the first mode essential features and the second mode essential features subjected to consistency constraint to obtain fused features.

According to an embodiment of the invention, in step S1, the optical image and the SAR image are imaged by different sensors.

According to an embodiment of the present invention, in step S2, extracting a first modality intrinsic feature of an optical image in an european space includes: converting the features under the space coordinates into data of a graph structure, and excavating unique geometric, texture or color features in the optical image by utilizing a constant topological structure among graph network modeling targets to serve as the essential features of the first modality.

According to an embodiment of the present invention, in step S2, extracting a second modality intrinsic feature of the SAR image in a complex space includes: and combining a cross attention mechanism, introducing the first mode essential characteristic as a guide, improving the characteristic quality by using complementary information, and inhibiting noise caused by inherent defects to form a second mode essential characteristic.

According to an embodiment of the present invention, step S3 specifically includes: step S31, performing dimension reduction coding on the first mode essential feature and the second mode essential feature by using a variation coder to obtain two mode low-dimension features; step S32, in the process of dimension reduction coding, more high-dimension characteristic information is reserved by utilizing reconstruction loss, and the quality of two-mode low-dimension characteristics is ensured; and S33, restraining a similarity graph of the low-dimensional features of the two modes, and reducing the influence of semantic gaps among the modes.

The second aspect of the present invention provides a remote sensing multi-mode multi-space functional mapping device based on distribution consistency constraint, comprising: the multi-mode image acquisition module is used for acquiring an optical image and an SAR image aiming at the same remote sensing area; the feature extraction module is used for extracting first mode essential features of the optical image in the European space and extracting second mode essential features of the SAR image in the complex space, wherein the second mode essential features are formed after the optimization by taking the first mode essential features as guidance; the functional mapping module is used for reducing the dimensions of the first mode intrinsic characteristics and the second mode intrinsic characteristics to the same low-dimensional characteristic space by utilizing functional mapping, and carrying out consistency constraint on the similarity of the first mode intrinsic characteristics and the second mode intrinsic characteristics in the low-dimensional characteristic space; and the feature fusion module is used for carrying out feature fusion on the first mode essential features and the second mode essential features subjected to the consistency constraint to obtain fused features.

According to an embodiment of the present invention, the functional mapping module includes: the dimension reduction coding unit is used for respectively carrying out dimension reduction coding on the first mode essential characteristic and the second mode essential characteristic by utilizing the variation coder to obtain two mode low-dimension characteristics; the reconstruction loss constraint unit is used for reserving more high-dimensional characteristic information by utilizing reconstruction loss in the process of dimension reduction coding and guaranteeing the quality of two modal low-dimensional characteristics; and the consistency constraint unit is used for constraining the similarity graphs of the low-dimensional features of the two modes and reducing the influence of semantic gaps among the modes.

Compared with the prior art, the remote sensing multi-mode multi-space functional mapping method and device based on the distribution consistency constraint have the following beneficial effects:

according to the invention, the functional mapping is utilized to reduce the dimension of the heterogeneous features extracted from different spaces in a domain, and the similarity-based consistency constraint is introduced to reduce the semantic gap between modes, so that the consistency expression of the features of different modes is promoted, and the fusion of information between different modes is better promoted.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following description of embodiments of the present invention with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates a schematic diagram of a remote sensing multi-modal multi-spatial functional mapping method based on distribution consistency constraints according to an embodiment of the present invention;

FIG. 2 schematically illustrates a flow chart of a remote sensing multi-modal multi-spatial functional mapping method based on a distribution consistency constraint according to an embodiment of the present invention;

FIG. 3 schematically illustrates a schematic diagram of a functional mapping process according to an embodiment of the present invention;

FIG. 4 schematically shows a flow chart of a functional mapping process according to an embodiment of the present invention;

FIG. 5 schematically illustrates a block diagram of a remote sensing multi-modal multi-space functional mapping apparatus based on distributed consistency constraints in accordance with an embodiment of the present invention;

fig. 6 schematically shows a block diagram of the functional mapping module according to fig. 5.

Detailed Description

The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Fig. 1 schematically illustrates a schematic diagram of a remote sensing multi-modal multi-spatial functional mapping method based on a distribution consistency constraint according to an embodiment of the present invention. Fig. 2 schematically illustrates a flow chart of a remote sensing multi-modal multi-space functional mapping method based on a distribution consistency constraint according to an embodiment of the invention.

As shown in fig. 1 and fig. 2, the remote sensing multi-mode multi-space functional mapping method based on the distribution consistency constraint according to this embodiment may include steps S1 to S4.

Step S1, acquiring an optical image and an SAR image aiming at the same remote sensing area.

In this embodiment, the optical image and the SAR (Synthetic Aperture Radar ) image are imaged by different sensors. Based on the above description, different sensors capture different characteristics of the ground object, for example, the optical image can describe detailed information such as texture, color and the like of the ground object target more truly and intuitively, and the SAR image reflects different scattering characteristics of different ground objects. However, due to imaging differences of different sensors, images of different modes have inherent defects of different modes, such as optical images are easily influenced by factors such as cloud and fog shielding, weather and the like, and problems such as perspective shrinkage, speckle noise and the like exist in SAR images.

And S2, extracting a first mode intrinsic characteristic of the optical image in the European space, and extracting a second mode intrinsic characteristic of the SAR image in the complex space, wherein the second mode intrinsic characteristic is formed after the optimization by taking the first mode intrinsic characteristic as a guide.

After obtaining the optical image and the SAR image, first, the optical image is subjected to feature extraction in the european space. In this embodiment, extracting a first modal intrinsic feature of an optical image in an european space includes: converting the features under the space coordinates into data of a graph structure, and excavating unique geometric, texture or color features in the optical image by utilizing a constant topological structure among graph network modeling targets to serve as the essential features of the first modality.

And secondly, extracting features of the SAR image by using first-stage SAR data in a complex space, wherein the second-mode essential features of the SAR image are obtained after the extraction of the complex space and the guiding correction of the features of the optical image. In this embodiment, extracting the second modality intrinsic feature of the SAR image in the complex space includes: and combining a cross attention mechanism, introducing the first mode essential characteristic as a guide, improving the characteristic quality by using complementary information, and inhibiting noise caused by inherent defects to form a second mode essential characteristic.

That is, in the feature extraction stage of the SAR image, the features of the optical image are introduced as guidance in combination with a cross-attention mechanism, the features of the SAR image are optimized by utilizing the optical features, and the influence caused by inherent defects such as perspective shrinkage and the like is reduced.

And S3, reducing the dimensions of the first mode intrinsic characteristics and the second mode intrinsic characteristics to the same low-dimensional characteristic space by using functional mapping, and carrying out consistency constraint on the similarity of the first mode intrinsic characteristics and the second mode intrinsic characteristics in the low-dimensional characteristic space.

The functional mapping is utilized to reduce the dimensions of the features of different modes (namely the first mode essential features and the second mode essential features) into the same low-dimensional feature space, so that the difference between modes is reduced under the condition that the original mode features are not lost as much as possible. It should be noted that, the dimension reduction process has a constraint of reconstruction loss, so as to ensure that unique information in the original high-dimension feature is encoded into the low-dimension feature, and ensure the authentication of the low-dimension feature.

Meanwhile, consistency constraint is utilized to ensure consistency expression of the mode characteristics by utilizing a similarity graph of the two mode characteristics. The reconstruction loss constraint ensures that the low-dimensional features retain as much useful information in the original high-dimensional features as possible, and the similarity consistency constraint is used for reducing the influence of semantic gaps.

And S4, carrying out feature fusion on the first mode essential features and the second mode essential features subjected to consistency constraint to obtain fused features.

For example, the feature self-adaption fusion is performed by using the fusion module, so that the fused features are obtained and used for other subsequent tasks. The fused features fuse complementary information of the two modes, make up the interference of inherent noise of a single mode, break through the bottleneck of the single mode, and further improve the precision of semantic segmentation.

According to the embodiment of the invention, the modal essential characteristics of the optical image and the SAR image are firstly extracted in European space and complex space respectively, then the modal essential characteristics in the two spaces are mapped to the same low-dimensional characteristic space through functional constraint, and the original information of the high-dimensional characteristics of the low-dimensional characteristics is ensured to be kept as much as possible by utilizing reconstruction loss constraint in the dimension reduction process, namely, the characteristic dimension is reduced while the useful information is ensured to be kept as much as possible. Secondly, by carrying out consistency constraint on the feature similarity of the two mapped modes, the influence of semantic gap is reduced, and the useful information of the two modes is fused more fully. By the method, the intrinsic characteristics in the modes can be mined from the imaging mechanisms of different sensors, and the dimension reduction and cross-domain fusion of the characteristics of different modes can be realized through functional mapping.

Therefore, the invention adopts different backbones to extract the unique characteristics of different modes in different characteristic spaces respectively, then utilizes functional mapping to realize domain dimension reduction, and combines similarity consistency constraint to carry out high-quality cross-domain fusion.

In particular, for functional mapping, FIG. 3 schematically shows a schematic diagram of a functional mapping process according to an embodiment of the present invention. Fig. 4 schematically shows a flow chart of a functional mapping process according to an embodiment of the invention.

As shown in fig. 3 and 4, in this embodiment, the step S3 may specifically include steps S31 to S32.

And S31, respectively carrying out dimension reduction coding on the first mode essential characteristic and the second mode essential characteristic by utilizing a variation coder to obtain two mode low-dimension characteristics.

And step S32, in the process of dimension reduction coding, more high-dimension characteristic information is reserved by utilizing reconstruction loss, and the quality of the two-mode low-dimension characteristics is ensured.

In particular, reconstruction loss can be utilized to retain information of more high-dimensional features, retain detailed information in more original features, ensure that features with discernment in the original features are encoded, and retain unique detailed information in the original features.

And S33, restraining a similarity graph of the low-dimensional features of the two modes, and reducing the influence of semantic gaps among the modes.

According to the embodiment of the invention, the feature learned by different modes is subjected to dimension reduction coding by utilizing the variational encoder, the similarity graphs of the low-dimension features of the two modes are restrained, the high-dimension original features are restored by utilizing the low-dimension features of the two modes, and the quality of the low-dimension features is ensured by utilizing the reconstruction loss.

From the above description, it can be seen that the above embodiments of the present invention achieve at least the following technical effects: compared with the existing multi-mode method, the method utilizes functional mapping to reduce the dimension of the heterogeneous features extracted from different spaces in a domain, introduces consistency constraint based on similarity to reduce the semantic gap between modes, promotes the consistency expression of the features of different modes, and simultaneously better promotes the fusion of information between different modes.

The invention also provides a remote sensing multi-mode multi-space functional mapping device based on distribution consistency constraint, and the device is described in detail below with reference to fig. 5.

Fig. 5 schematically illustrates a block diagram of a remote sensing multi-modal multi-space functional mapping apparatus based on a distributed consistency constraint according to an embodiment of the present invention.

As shown in fig. 5, the remote sensing multi-mode multi-space functional mapping apparatus 500 based on the distribution consistency constraint of this embodiment includes a multi-mode image acquisition module 510, a feature extraction module 520, a functional mapping module 530, and a feature fusion module 540.

The multi-mode image acquisition module 510 is configured to acquire an optical image and a SAR image for the same remote sensing area. In an embodiment, the multi-mode image acquisition module 510 may be used to perform the step S1 described above, which is not described herein.

The feature extraction module 520 is configured to extract a first modal intrinsic feature of the optical image in the european space and extract a second modal intrinsic feature of the SAR image in the complex space, where the second modal intrinsic feature is formed after optimization guided by the first modal intrinsic feature. In an embodiment, the feature extraction module 520 may be used to perform the step S2 described above, which is not described herein.

The functional mapping module 530 is configured to dimension down the first modal intrinsic feature and the second modal intrinsic feature to the same low-dimensional feature space by using functional mapping, and perform consistency constraint on the similarity between the first modal intrinsic feature and the second modal intrinsic feature in the low-dimensional feature space. In an embodiment, the functional mapping module 530 may be used to perform the step S3 described above, which is not described herein.

The feature fusion module 540 is configured to perform feature fusion on the first mode intrinsic feature and the second mode intrinsic feature subjected to the consistency constraint, so as to obtain a fused feature. In an embodiment, the feature fusion module 540 may be used to perform the step S4 described above, which is not described herein.

As shown in fig. 6, in this embodiment, the functional mapping module 530 of the above embodiment may further include a dimension reduction encoding unit 530A, a reconstruction loss constraint unit 530B, and a consistency constraint unit 530C.

The dimension-reduction encoding unit 530A is configured to perform dimension-reduction encoding on the first mode intrinsic feature and the second mode intrinsic feature by using a variation encoder, so as to obtain two mode low-dimension features. In an embodiment, the dimension-reduction encoding unit 530A may be used to perform the step S31 described above, which is not described herein.

And the reconstruction loss constraint unit 530B is configured to use the reconstruction loss to retain information of more high-dimensional features in the process of dimension reduction encoding, so as to ensure quality of the two-mode low-dimensional features. In an embodiment, the consistency constraint unit 530B may be used to perform the step S32 described above, which is not described herein.

And the consistency constraint unit 530C is configured to constrain the similarity graphs of the low-dimensional features of the two modalities, so as to reduce the influence of the semantic gap between the modalities. In an embodiment, the consistency constraint unit 530C may be used to perform the step S33 described above, which is not described herein.

According to an embodiment of the present invention, any of the multi-modal image acquisition module 510, the feature extraction module 520, the functional mapping module 530, the feature fusion module 540, the dimension reduction encoding unit 530A, the reconstruction loss constraint unit 530B, and the consistency constraint unit 530C may be combined in one module to be implemented, or any one of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the invention, at least one of the multimodal image acquisition module 510, the feature extraction module 520, the functional mapping module 530, the feature fusion module 540, the dimension reduction encoding unit 530A, the reconstruction loss constraint unit 530B, and the consistency constraint unit 530C may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware implementations. Alternatively, at least one of the multimodal image acquisition module 510, the feature extraction module 520, the functional mapping module 530, the feature fusion module 540, the dimension reduction encoding unit 530A, the reconstruction loss constraint unit 530B, and the consistency constraint unit 530C may be at least partially implemented as a computer program module that, when executed, performs the corresponding functions.

Some of the block diagrams and/or flowchart illustrations are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, when executed by the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart. The techniques of the present invention may be implemented in hardware and/or software (including firmware, microcode, etc.). Furthermore, the techniques of the present invention may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise. Furthermore, the word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

While the foregoing is directed to embodiments of the present invention, other and further details of the invention may be had by the present invention, it should be understood that the foregoing description is merely illustrative of the present invention and that no limitations are intended to the scope of the invention, except insofar as modifications, equivalents, improvements or modifications are within the spirit and principles of the invention.

Claims

1. The remote sensing multi-mode multi-space functional mapping method based on the distribution consistency constraint is characterized by comprising the following steps of:

step S1, acquiring an optical image and an SAR image aiming at the same remote sensing area;

step S2, extracting a first mode essential feature of the optical image in a European space, and extracting a second mode essential feature of the SAR image in a complex space, wherein the second mode essential feature is formed after the first mode essential feature is used as a guide for optimization;

step S3, reducing the dimensions of the first mode intrinsic characteristics and the second mode intrinsic characteristics to the same low-dimensional characteristic space by using functional mapping, and carrying out consistency constraint on the similarity of the first mode intrinsic characteristics and the second mode intrinsic characteristics in the low-dimensional characteristic space;

s4, carrying out feature fusion on the first mode essential features and the second mode essential features subjected to the consistency constraint to obtain fused features;

in step S1, the optical image and the SAR image are imaged by different sensors;

the step S3 specifically includes:

step S31, performing dimension reduction coding on the first mode essential feature and the second mode essential feature by using a variation coder to obtain two mode low-dimension features;

step S32, in the process of the dimension reduction coding, more high-dimension characteristic information is reserved by utilizing reconstruction loss, and the quality of the two-mode low-dimension characteristics is ensured;

and S33, restraining the similarity graph of the two modal low-dimensional features, and reducing the influence of semantic gaps among the modalities.

2. The method according to claim 1, wherein in step S2, the extracting the first modal intrinsic feature of the optical image in the european space comprises:

converting the features under the space coordinates into data of a graph structure, and excavating out unique geometric, texture or color features in the optical image by utilizing a constant topological structure among graph network modeling targets as the essential features of the first modality.

3. The method according to claim 1, wherein in step S2, the extracting the second modality intrinsic feature of the SAR image in complex space comprises:

and introducing the first mode essential characteristics as guidance by combining a cross attention mechanism, improving the characteristic quality by utilizing complementary information, and inhibiting noise caused by inherent defects to form the second mode essential characteristics.

4. Remote sensing multi-mode multi-space functional mapping device based on distribution consistency constraint is characterized by comprising:

the multi-mode image acquisition module is used for acquiring an optical image and an SAR image aiming at the same remote sensing area, wherein the optical image and the SAR image are imaged by different sensors;

the feature extraction module is used for extracting a first modal intrinsic feature of the optical image in a European space and extracting a second modal intrinsic feature of the SAR image in a complex space, wherein the second modal intrinsic feature is formed by taking the first modal intrinsic feature as a guide and optimizing;

the functional mapping module is used for reducing the dimensions of the first mode intrinsic characteristics and the second mode intrinsic characteristics to the same low-dimensional characteristic space by utilizing functional mapping, and carrying out consistency constraint on the similarity of the first mode intrinsic characteristics and the second mode intrinsic characteristics in the low-dimensional characteristic space;

the feature fusion module is used for carrying out feature fusion on the first mode essential features and the second mode essential features subjected to the consistency constraint to obtain fused features;

the functional mapping module includes:

the dimension reduction coding unit is used for respectively carrying out dimension reduction coding on the first mode essential characteristic and the second mode essential characteristic by utilizing a variation coder to obtain two mode low-dimension characteristics;

the reconstruction loss constraint unit is used for reserving more high-dimensional characteristic information by utilizing reconstruction loss in the process of the dimension reduction coding and guaranteeing the quality of the two-mode low-dimensional characteristics;

and the consistency constraint unit is used for constraining the similarity graph of the two modal low-dimensional characteristics and reducing the influence of semantic gaps among the modalities.