CN115311463A - Category-guided multi-scale decoupling marine remote sensing image text retrieval method and system - Google Patents
Category-guided multi-scale decoupling marine remote sensing image text retrieval method and system Download PDFInfo
- Publication number
- CN115311463A CN115311463A CN202211223823.1A CN202211223823A CN115311463A CN 115311463 A CN115311463 A CN 115311463A CN 202211223823 A CN202211223823 A CN 202211223823A CN 115311463 A CN115311463 A CN 115311463A
- Authority
- CN
- China
- Prior art keywords
- image
- text
- features
- remote sensing
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of remote sensing image processing, and discloses a method and a system for searching a marine remote sensing image text with category-guided multi-scale decoupling, wherein image features of different scales of a marine remote sensing image and text features of a remote sensing related text are extracted; then, decoupling the obtained image features of different scales by using a bidirectional multi-scale decoupling module, extracting corresponding potential features on each scale, inhibiting complex features on other scales and obtaining decoupling features; guiding the decoupled image characteristics and text characteristics by using the category label guiding module, and calculating final category-related image and text characteristics by using multiplication; and finally, calculating the similarity and the semantic guide triple loss. The invention realizes multi-scale decoupling, introduces effective information for decoupling, establishes a scale and semantic double-decoupling marine multi-modal information fusion method, and solves the problems of multi-scale dimension noise redundancy and difficult multi-dimension decoupling representation information fusion.
Description
Technical Field
The invention belongs to the technical field of remote sensing image processing, and particularly relates to a method and a system for category-guided multi-scale decoupling text retrieval of ocean remote sensing images.
Background
The ocean remote sensing image text retrieval is an important method for solving the problems of text data deletion and inaccurate text data description in remote sensing data. The ocean remote sensing image text retrieval utilizes a cross-modal retrieval algorithm to analyze a large number of satellite remote sensing images and automatically retrieve a large number of text data accurately describing the images so as to achieve the purposes of solving text data loss and inaccurate text data description. The traditional method mainly faces the problem that the effective image features are difficult to extract, because the space distribution of targets in the ocean remote sensing image is dispersed, and the effective targets in the image are few, the information of the effective targets can be diluted in the fusion process of the global information, and the subsequent data mining is influenced. Therefore, the text retrieval method of the ocean remote sensing image at the leading edge introduces a multi-scale feature extraction and attention mechanism, yuan et al propose a novel fine-grained multi-modal feature matching network, and the method has the advantages that image features under different scales are obtained and key features are extracted, so that more accurate text information is retrieved.
However, the prior method has the following problems: first, a large amount of redundant noise is generated during multi-scale feature interaction. The multi-scale features often comprise repeated regions, when the multi-scale features are fused through addition or cascade, the repeated regions are accumulated continuously, the utilization rate of multi-scale contents is low, a redundant feature filtering algorithm used by the existing method is simple, a large amount of noise cannot be filtered, and the redundant noise can influence subsequent data fusion and mining. For example, the existing method uses the gating idea to filter the redundant features, which not only can not effectively filter a large amount of noise, but also has the possibility of filtering effective information. Secondly, the existing method usually performs knowledge decoupling based on multi-scale features of the image, and ignores the disambiguation effect of image semantic information and text semantic information in image-text retrieval. On the aspect of the text retrieval of the ocean remote sensing image, only the characteristic decoupling on the dimension is considered, but the waste of the value of rich semantic information is avoided, and the time and difficulty for extracting effective key characteristics from the model are increased due to the lack of value information. The low-order semantic information of the image is the expression of shallow features (such as features of color, geometry, texture and the like), the semantic information of the text can be understood as information related to category division, and the introduction of the image-text semantic information can express the information of texture, geometry, color and the like in the image content, and can also express text description and text type information. This teletext semantic information will allow the network backend to make the correct predictions of category attribution.
Therefore, aiming at the problems, the invention provides a class-guided bidirectional multi-scale decoupling network, which realizes multi-scale decoupling and introduces effective class information (image-text semantic information) for decoupling. A scale and semantic double-decoupling marine multi-modal information fusion framework is established, and the problems of noise redundancy of multi-scale dimensions and difficulty in information fusion of multi-dimensional decoupling representation are solved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method and a system for category-guided multi-scale decoupling marine remote sensing image text retrieval, decoupling characteristics on different scales are obtained through bidirectional multi-scale decoupling, and the category characteristics of images and texts are guided and decoupled by category labels, so that the problems of noise redundancy of multi-scale dimensions and difficulty in fusion of multi-dimensional decoupling characteristic information are solved.
In order to solve the technical problems, the invention adopts the following technical scheme:
firstly, the invention provides a category-guided multi-scale decoupled marine remote sensing image text retrieval method, which comprises the following steps:
s0, obtaining an ocean remote sensing image and a remote sensing related text;
s1, extracting image characteristics of the ocean remote sensing image: firstly, a convolution neural network is used for embedding the characteristics of an image, the obtained basic characteristics of the image are sampled by cavity convolution with different sampling rates, and the image characteristics with different scales are obtained;
S2, extracting text features of the remote sensing related textT;
S3, bidirectional multi-scale decoupling: decoupling the image features of different scales obtained in the step S1, extracting corresponding potential features on each scale, inhibiting fussy features on other scales, and obtaining the decoupling features of the imageF;
Step S4, guiding by category labels: firstly, generating class characteristics of the image and the text, and then guiding the decoupling characteristics of the image by using the generated class characteristicsFAnd text featuresTComputing final class-dependent image features by multiplicationAnd text features;
S5, calculating similarity and semantic guide triple loss:
firstly, the image characteristics related to the category output in step S4And text featuresPerforming category matching, judging whether the image and the text belong to the same category, inputting category attributes serving as external knowledge into a downstream task, and performing dynamic weight selection on heterogeneous information matched with heterogeneous graphics and texts; then calculating the loss of the semantic guide triple, iterating the steps S1-S5, and carrying out back propagation training;
s6, inputting a marine remote sensing image to be retrieved, and outputting remote sensing related text data; or inputting remote sensing related text data to be retrieved and outputting the ocean remote sensing image.
Further, step S3 is divided into two steps:
s31, extracting image features of each scale from the image feature extraction moduleConstructing an attention map based on attention mechanism at the current scaleExtracting potential features; and generating a suppression mask;
S32, aiming at attention diagrams extracted under different feature scalesAnd suppression maskBy passingTo facilitate significant information on the corresponding scale,the method is used for suppressing salient features of other scales, obtaining image features after redundant information is filtered to achieve scale decoupling, and performing attention drawing through a gradual suppression modeApplication to decoupling featuresAndin the production process of, whereinIs a decoupling feature in the small to large dimension direction,is a decoupling feature in the large scale to small scale direction; finally, decoupling characteristics of various characteristic scales are carried out through concat operationAndof the composite final imageF。
Further, the calculation formula of the decoupling characteristic is as follows:
wherein m is a number of different scales, namely three scales of large, medium and small, and an attention mapAnd suppression maskDeriving decoupling characteristics by arithmetic concatenationAnd with。
Further, step S4 is specifically as follows:
s41, obtaining category semantic labels from the ocean remote sensing images obtained in the step S0, and obtaining the category characteristics of the remote sensing images through training of a remote sensing image classifierU;
S42, obtaining category semantic labels from the remote sensing related texts obtained in the step S0, and obtaining the category characteristics of the remote sensing related texts through training of a remote sensing related text classifierV,
S43, decoupling characteristics of the image obtained in the step S3FAnd remote sensing image category characteristicsUMultiplying the text characteristics obtained in the step S2TText category features related to remote sensingVMultiplication, the purpose of which is to decouple the features of the imageFText features with related textTClass characteristics of respective corresponding modalitiesU&VAttention enhancement is performed to obtain final class-related image featuresText features related to categories。
Further, the step S31 specifically includes: firstly aggregating channel information of a feature through average pooling and maximum pooling operations to generate two feature descriptors, and then generating an attention map through the feature descriptors by a standard convolution layer and sigmoid function;
Further, in step S5, first, the category features are converted into semantic categories of images and texts by softmaxAnd(ii) a Then, a parameter is definedTo adjust the loss and parametersExpressed as:
at a constant value, at a constant valueBased on the above, the category-based triple loss is designed as follows:
whereinThe distance between the finger edges is equal to the distance between the finger edges,representing the similarity of the sample image and the positive sample text;representing the similarity of the sample image and the negative sample text;representing the similarity of the sample text and the positive sample image;representing the similarity of the sample text and the negative sample image; the first summation being for image featuresMatching with all text features, including the text features of the positive sampleAnd text features of negative examplesSecond summation being over text featuresMatching with all image features, including image features of positive samplesAnd image characteristics of negative examples(ii) a The purpose of the triplet loss function constructed by the two summations is to maximize the similarity with the positive samples and minimize the similarity with the negative samples.
The invention also provides a category-guided multi-scale decoupling marine remote sensing image text retrieval system, which is used for realizing the category-guided multi-scale decoupling marine remote sensing image text retrieval method, and comprises an input module, an image feature extraction module, a text feature extraction module, a bidirectional multi-scale decoupling module, a category label guide module, a semantic guide triple loss module and an output module;
the image feature extraction module comprises a depth residual error network and a cavity space convolution pooling pyramid and is used for extracting multi-scale image features,
The text feature extraction module extracts text features to obtain the text features of the remote sensing related textT;
The bidirectional multi-scale decoupling module is used for extracting the multi-scale image features output by the image feature extraction moduleDecoupling is carried out to obtain decoupling characteristicsF;
The category label guiding module comprises a remote sensing image classifier and a remote sensing related text classifier which are respectively used for obtaining the category characteristics of the remote sensing imageUAnd remote sensing related text category featuresV(ii) a Utilizing category semantic tagsU&VGuiding the image and the text as priori knowledge to construct class features and realize feature decoupling on semantic dimensions; wherein U is&V, class characteristics marked by a pre-training model; decoupling features of imagesFText features with related textTClass characteristics of respective corresponding modalitiesU&VPerforming attention enhancement to obtain image and text characteristics related to categories;
the semantic guide triple loss module is used for calculating the semantic guide triple loss; performing category matching on the category characteristics, judging whether the image and the text belong to the same category, inputting the category attribute as external knowledge into a downstream task, and performing dynamic weight selection on heterogeneous information matched with heterogeneous graphics and texts;
the input module is used for inputting a marine remote sensing image or remote sensing related text data to be retrieved, and the output module is used for outputting the remote sensing related text data or the marine remote sensing image.
Compared with the prior art, the invention has the advantages that:
(1) The problem of noise redundancy is solved. The invention effectively filters a large amount of redundant noise generated in the multi-scale feature interaction process. A bidirectional multi-scale decoupling module is constructed, potential features of each scale are extracted in a bidirectional mode in a self-adaptive mode, and tedious features of other scales are suppressed, so that effective features of each scale are extracted, redundant features of each scale are suppressed, a large amount of redundant noise is filtered, and effective features are extracted.
(2) The introduction of category information (semantic information) improves the robustness of the features. The invention unifies the semantic decoupling of the two dimensions. And a category label guide module is constructed, and category semantic labels are used as priori knowledge to monitor images and texts so as to construct more excellent category characteristics and realize characteristic decoupling on semantic dimensions. The category semantic features can emphasize effective features, and the knowledge of semantic decoupling is mapped into a visual multi-scale sample space through cascade connection. The category attribute is used as a bridge of two modal information, and external knowledge is provided for the model while multi-modal knowledge is aligned, so that the model is helped to quickly extract effective features, and effective objects in the remote sensing image are excavated. Meanwhile, the expressions of category information, pixel attribution and scale characteristics can also be generated by the alignment and fusion of the image multi-scale characteristics, effective information (text semantic characteristics) and image semantic characteristics, and the semantic information expressed by the pictures and texts can make the network rear end make correct prediction on the category attribution.
(3) The problems of difficult extraction of effective characteristics and low retrieval accuracy are solved by using the prior knowledge. The invention constructs a semantic guide triple loss module to perform category matching on category characteristics, judges whether an image and a text belong to the same category, inputs category attributes as external knowledge into a downstream task, and performs dynamic weight selection on heterogeneous information matched with heterogeneous images and texts. For example, the remote sensing image classification model and the remote sensing text classification model with high accuracy are trained as prior knowledge and added into the loss function, if the categories of the image and the text are the same, the similarity is increased, so that the model convergence time is greatly shortened, and the matching probability of the image and the text with the same category is actually higher than the unmatched probability. So that the retrieval accuracy of the model is greatly increased.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a system architecture diagram of the present invention;
FIG. 2 is a flow chart of the method of the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
Example 1
With reference to fig. 1 and 2, a category-guided bidirectional multi-scale decoupled marine remote sensing image text retrieval method firstly preprocesses data, including processing a marine remote sensing image, and then extracts text features T from the preprocessed data through a text feature extraction module on the one hand, and extracts decoupled image features F through bidirectional multi-scale decoupling on the other hand; then inputting the decoupled image characteristic F and text characteristic T into a category label guide module, and utilizing a category semantic label (F)U&V) As a priori knowledgeMonitoring images and texts to construct class features and realize feature decoupling on semantic dimensions; and finally, calculating the semantic guide triple loss through the similarity of the image and the text, judging whether the image and the text are the same, and performing back propagation.
The method specifically comprises the following steps:
and S0, acquiring a marine remote sensing image and a remote sensing related text.
S1, extracting image characteristics of the ocean remote sensing image: firstly, a convolution neural network is used for embedding the characteristics of an image, the obtained basic characteristics of the image are sampled by cavity convolution with different sampling rates, and the image characteristics with different scales are obtained. A characterization of the image is obtained by this step.
S2, extracting text features of the remote sensing related textT. In a specific application, text feature extraction can be selected by using a word vector embedding model (sentence embedding) and a Skip-through text processing model. The representation of the text is obtained by this step.
S3, bidirectional multi-scale decoupling: decoupling the image features of different scales obtained in the step S1, extracting corresponding potential features on each scale, inhibiting fussy features on other scales, and obtaining the decoupling features of the imageF. The method comprises the following two steps:
s31, extracting image features of each scale from the image feature extraction moduleConstructing an attention map based on an attention mechanism on the current scaleExtracting potential features; and generating a suppression mask。
The method comprises the following steps: headAggregating channel information of a feature by average pooling and maximum pooling operations to generate two feature descriptors, and generating an attention map by the feature descriptors through a standard convolutional layer and sigmoid function;
WhereinIs a binary mask that will be most significantThe value of (b) is taken as 0, and the others are taken as 1; inhibition mask alleviatesCoverage effects at other scales make the common reference information at different scales stand out.
S32, aiming at attention diagrams extracted under different feature scalesAnd suppression maskBy passingTo facilitate significant information on the corresponding scale,for inhibitingMaking other scales of salient features, obtaining image features after filtering redundant information to realize scale decoupling, and drawing attention in a step-by-step inhibition modeApplication to decoupling featuresAnd withIn the generation process of (1); finally, decoupling characteristics of various characteristic scales through concat operationAnd withOf the composite final imageFThe formula is as follows:
wherein m is a number of different scales, namely three scales of large, medium and small, and an attention mapAnd suppression maskDeriving decoupling characteristics by operational cascadingAnd withIn whichIs in the direction from small scale to large scaleThe decoupling characteristic of (a) is that,is a decoupling feature in the large scale to small scale direction.
In particular, since the attention map represents significant regions of a feature, the suppression mask leverages the attention map representation to suppress significance information on the corresponding scale. The suppression mask mitigation attention seeks to show the effect of the coverage on other scales, highlighting different information.
Step S4, guiding by category labels: firstly, generating class characteristics of the image and the text, and then guiding decoupling characteristics of the image by using the generated class characteristicsFAnd text featuresTMultiplying the resulting class-dependent image and text featuresAndthe method comprises the following steps:
s41, obtaining category semantic labels from the ocean remote sensing images obtained in the step S0, and obtaining the category characteristics of the remote sensing images through training of a remote sensing image classifierU;
S42, obtaining category semantic labels from the remote sensing related texts obtained in the step S0, and obtaining the category characteristics of the remote sensing related texts through training of a remote sensing related text classifierV;
The two classifiers are pre-training models, the prediction accuracy rate of the two classifiers reaches over 80 percent, rich semantic knowledge in the pre-training models can be transferred to a subsequent training process, and the pre-training models can be regarded as prior knowledge supervision of the models.
S43, decoupling characteristics of the image obtained in the step S3FAnd remote sensing image category characteristicsUMultiplying to guide the retrieval network to detect important and reliable category-related information; the text characteristics obtained in the step S2 are usedTAnd remote sensing related text category featuresVMultiplication, the purpose of which is to decouple the features of the imageFText features with related textTRespectively associated with the categories of the corresponding modalitiesFeature(s)U&VAttention enhancement is carried out to obtain final image characteristics related to categoriesAnd text featuresBy making full use of multiplication, significant enhancement of the relevant features can be achieved in the feature combination process.
Andthe method not only captures identifiable multi-scale semantic information, but also highlights reliable knowledge related to categories, thereby improving the accuracy of the retrieval network. To guide the retrieval network to probe important and reliable category-related information. Wherein the decoupling characteristic of the imageFAnd remote sensing image category characteristicsUThe image feature and the text feature are guided by using the classification prior knowledge of the image and the text, the knowledge of the pre-training semantic feature is subjected to semantic decoupling, and the decoupled semantic information is combined with an original retrieval network to explore meaningful and reliable category related data, so that while category supervision is realized for the semantic information, the semantic information and the scale information are fused and aligned on different modal information through a prior knowledge guide module; the formula is as follows:
s5, calculating similarity and semantic guide triple loss:
firstly, the image and text characteristics related to the category output in the step S4Andperforming category matching, and judging whether the image and the text belong to the same category so as to improve the retrieval probability of the cross-modal data of the same category; inputting the category attribute as external knowledge into a downstream task, and performing dynamic weight selection on heterogeneous information matched with heterogeneous graphics and texts so as to improve the retrieval probability of the same-category cross-modal data; and then calculating the loss of the semantic guide triple, iterating the steps S1-S5, and carrying out back propagation training.
First, class features are converted to semantic classes of images and text by softmaxAnd(ii) a Then, a parameter is definedTo adjust the loss and parametersExpressed as:
at a constant value, at a constant valueBased on the above, the category-based triple loss is designed as follows:
the purpose of the triple loss function is to minimize the semantic spatial distance between the sample and the positive sampleThe distance between the sample and the corresponding negative sample is increased. WhereinThe distance between the finger edges is equal to the distance between the finger edges,representing the similarity of the sample image and the positive sample text;representing the similarity of the sample image and the negative sample text;representing the similarity of the sample text and the positive sample image;representing the similarity of the sample text and the negative sample image; the first summation being for image featuresMatching with all text features (including text features of positive samples)And text features of negative examples) The second summation being for text featuresMatching with all image features (including image features of positive samples)And image characteristics of negative examples). Triple loss function constructed by two summationsThe number is to maximize the similarity with the positive samples and minimize the similarity with the negative samples.
And S6, inputting the ocean remote sensing image to be retrieved and outputting remote sensing related text data. (or inputting remote sensing related text data to be retrieved and outputting ocean remote sensing images).
Example 2
The category-guided bidirectional multi-scale decoupling marine remote sensing image text retrieval system comprises an input module, an image feature extraction module, a text feature extraction module, a bidirectional multi-scale decoupling module, a category label guide module, a semantic guide triple loss module and an output module.
The image feature extraction module comprises a convolution neural network and a void space convolution pooling module and is used for extracting multi-scale image features,
The text feature extraction module is used for extracting text features by utilizing a word vector embedding (sentence embedding) model and a Skip-through text processing model to obtain the text features of the remote sensing related textsT;
The bidirectional multi-scale decoupling module is used for extracting the multi-scale image features output by the image feature extraction moduleDecoupling is carried out to obtain decoupling characteristicsF;
The class label guiding module comprises a remote sensing image classifier and a remote sensing related text classifier which are respectively used for obtaining class characteristics of the remote sensing imageUAnd remote sensing related text category featuresV(ii) a Utilizing category semantic tagsU&VGuiding images and texts as priori knowledge to construct class features and realize feature decoupling on semantic dimensions; wherein U is&V, class characteristics marked by a pre-training model; decoupling features of imagesFText features with related textTClass characteristics U of respective corresponding modalities&V for attention enhancement, can also beThe enhancement information is combined with the original retrieval network, so that the fusion of semantics and scale characteristics is realized, meaningful and reliable category-related data are explored, and category-related image and text characteristics are obtained;
the semantic guide triple loss module is used for calculating the semantic guide triple loss; performing category matching on the category characteristics, judging whether the image and the text belong to the same category, inputting the category attribute as external knowledge into a downstream task, and performing dynamic weight selection on heterogeneous information matched with heterogeneous graphics and texts;
the input module is used for inputting a marine remote sensing image or remote sensing related text data to be retrieved, and the output module is used for outputting the remote sensing related text data or the marine remote sensing image.
The function implementation and data processing of each module are partially the same as those in embodiment 1, and are not described herein again.
It should be noted that the method of the present invention can implement two-mode cross-mode retrieval of images and texts, one type of data is used as a query to retrieve the other type of data, when the data is input as an ocean remote sensing image, the output retrieval result is corresponding text data, and when the data is input as ocean remote sensing related text data, the output retrieval result is corresponding ocean remote sensing image.
In summary, the present invention can use the category information as the prior knowledge to guide the more accurate representation of the cross-modal information. Specifically, compared with the existing method, the bidirectional multi-scale decoupling method constructs a bidirectional multi-scale decoupling module to adaptively extract potential features and inhibit fussy features on other scales, so that a discriminative clue is generated and the problem of noise redundancy of cascaded scale decoupling is solved. In addition, a category label guide module and a semantic guide triple loss module are constructed, wherein the category label guide module monitors images and texts by using category semantic labels as priori knowledge to construct more excellent category characteristics and realize characteristic decoupling on semantic dimensions. Then, the decoupled semantic information is combined with an original retrieval network, so that the fusion of semantic and scale characteristics is realized, and meaningful and reliable category related data are explored; and the semantic guide triple loss module performs class matching on the class characteristics, judges whether the image and the text belong to the same class, inputs the class attributes as external knowledge into a downstream task, and performs dynamic weight selection on the heterogeneous information matched with the heterogeneous images and texts so as to improve the retrieval probability of the same-class cross-mode data and improve the retrieval probability and the model convergence speed of the same-class cross-mode data. And finally, by carrying out category matching on the generated category characteristics, a category-based triple loss is designed so as to improve the retrieval probability of the similar cross-modal data.
It is understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.
Claims (7)
1. The method for searching the marine remote sensing image text with multi-scale decoupling guided by categories is characterized by comprising the following steps:
s0, obtaining a marine remote sensing image and a remote sensing related text;
s1, extracting image characteristics of the ocean remote sensing image: firstly, a convolution neural network is used for embedding the characteristics of an image, the obtained basic characteristics of the image are sampled by cavity convolution with different sampling rates, and the image characteristics with different scales are obtained;
S2, extracting text features of the remote sensing related textT;
S3, bidirectional multi-scale decoupling: decoupling the image features of different scales obtained in the step S1, extracting corresponding potential features on each scale, inhibiting fussy features on other scales, and obtaining the decoupling features of the imageF;
Step S4, guiding by category labels: firstly, generating class characteristics of the image and the text, and then guiding the decoupling characteristics of the image by using the generated class characteristicsFAnd text featuresTUsing multiplication to calculate the final class-dependent image featuresAnd text features;
S5, calculating similarity and semantic guide triple loss:
firstly, the image characteristics related to the category output in step S4And text featuresPerforming category matching, judging whether the image and the text belong to the same category, inputting category attributes as external knowledge into a downstream task, and performing dynamic weight selection on heterogeneous information matched with heterogeneous graphics and texts; then calculating the loss of the semantic guide triple, iterating the steps S1-S5, and carrying out reverse propagation training;
s6, inputting a marine remote sensing image to be retrieved and outputting remote sensing related text data; or inputting remote sensing related text data to be retrieved and outputting the ocean remote sensing image.
2. The category-guided multi-scale decoupled marine remote sensing image text retrieval method according to claim 1, wherein step S3 is divided into two steps:
s31, extracting the image features of each scale from the image feature extraction moduleConstructing an attention map based on an attention mechanism on the current scaleExtracting potential features(ii) a And generating a suppression mask;
S32, aiming at attention diagrams extracted under different feature scalesAnd suppression maskBy passingTo facilitate significant information on the corresponding scale,the method is used for suppressing salient features of other scales, obtaining image features after redundant information is filtered to achieve scale decoupling, and performing attention drawing through a gradual suppression modeApplication to decoupling featuresAndin the process of generation, whereinIs a decoupling feature in the small to large dimension direction,is a decoupling feature in the large scale to small scale direction; finally, decoupling characteristics of various characteristic scales are carried out through concat operationAnddecoupling features of synthesized final imagesF。
3. The method for retrieving the marine remote sensing image text with the category-guided multi-scale decoupling according to claim 2, wherein in step S32, a calculation formula of the decoupling characteristic is as follows:
4. the category-guided multi-scale decoupled marine remote sensing image text retrieval method according to claim 1, characterized in that step S4 is specifically as follows:
s41, obtaining category semantic labels from the ocean remote sensing images obtained in the step S0, and obtaining the category characteristics of the remote sensing images through training of a remote sensing image classifierU;
S42, obtaining category semantic labels from the remote sensing related texts obtained in the step S0, and obtaining the category semantic labels through the stepsTraining a remote sensing related text classifier to obtain remote sensing related text class characteristicsV,
S43, decoupling characteristics of the image obtained in the step S3FAnd remote sensing image category characteristicsUMultiplying the text characteristics obtained in the step S2TAnd remote sensing related text category featuresVMultiplication, the purpose of which is to decouple the features of the imageFText features with related textTClass characteristics of respective corresponding modalitiesU&VAttention enhancement is performed to obtain final class-related image featuresAnd text features。
5. The category-guided multi-scale decoupled marine remote sensing image text retrieval method according to claim 2, characterized in that the specific steps of step S31 are: firstly aggregating channel information of a feature through average pooling and maximum pooling operations to generate two feature descriptors, and then generating an attention map through the feature descriptors by a standard convolution layer and sigmoid function;
6. The method for retrieving the marine remote sensing image text with the category-guided multi-scale decoupling function according to claim 1, wherein in step S5, firstly, category features are converted into semantic categories of the image and the text through softmaxAnd(ii) a Then, a parameter is definedTo adjust the loss and parametersExpressed as:
at a constant value, at a constant valueBased on the above, the category-based triple loss is designed as follows:
whereinThe distance between the finger edges is equal to the distance between the finger edges,representing the similarity of the sample image and the positive sample text;representing the similarity of the sample image and the negative sample text;representing the similarity of the sample text and the positive sample image;representing the similarity of the sample text and the negative sample image; the first summation being for image featuresMatching with all text features, including the text features of the positive sampleAnd text features of negative examplesSecond summation being over text featuresMatching with all image features, including image features of positive samplesAnd image characteristics of negative examples(ii) a The objective of the triple loss function constructed by two summations is to maximize the sumThe similarity between the positive samples is minimized with the similarity between the negative samples.
7. The marine remote sensing image text retrieval system with category-guided multi-scale decoupling is characterized in that the marine remote sensing image text retrieval method for achieving category-guided multi-scale decoupling of any one of claims 1-6 comprises an input module, an image feature extraction module, a text feature extraction module, a bidirectional multi-scale decoupling module, a category label guide module, a semantic guide triple loss module and an output module;
the image feature extraction module comprises a convolution neural network and a void space convolution pooling module and is used for extracting multi-scale image features,
The text feature extraction module extracts text features to obtain text features of the remote sensing related textT;
The bidirectional multi-scale decoupling module is used for extracting the multi-scale image features output by the image feature extraction moduleDecoupling is carried out to obtain decoupling characteristicsF;
The class label guiding module comprises a remote sensing image classifier and a remote sensing related text classifier which are respectively used for obtaining class characteristics of the remote sensing imageUText category features related to remote sensingV(ii) a Utilizing category semantic tagsU&VGuiding the image and the text as priori knowledge to construct class features and realize feature decoupling on semantic dimensions; whereinU&VClass features labeled through a pre-training model; decoupling features of imagesFText features with related textTClass characteristics of respective corresponding modalitiesU& VPerforming attention enhancement to obtain image and text characteristics related to categories;
the semantic guide triple loss module is used for calculating the semantic guide triple loss; performing category matching on the category characteristics, judging whether the image and the text belong to the same category, inputting the category attribute as external knowledge into a downstream task, and performing dynamic weight selection on heterogeneous information matched with heterogeneous graphics and texts;
the input module is used for inputting a marine remote sensing image or remote sensing related text data to be retrieved, and the output module is used for outputting the remote sensing related text data or the marine remote sensing image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211223823.1A CN115311463B (en) | 2022-10-09 | 2022-10-09 | Category-guided multi-scale decoupling marine remote sensing image text retrieval method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211223823.1A CN115311463B (en) | 2022-10-09 | 2022-10-09 | Category-guided multi-scale decoupling marine remote sensing image text retrieval method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115311463A true CN115311463A (en) | 2022-11-08 |
CN115311463B CN115311463B (en) | 2023-02-03 |
Family
ID=83866005
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211223823.1A Active CN115311463B (en) | 2022-10-09 | 2022-10-09 | Category-guided multi-scale decoupling marine remote sensing image text retrieval method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115311463B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116127123A (en) * | 2023-04-17 | 2023-05-16 | 中国海洋大学 | Semantic instance relation-based progressive ocean remote sensing image-text retrieval method |
CN116186317A (en) * | 2023-04-23 | 2023-05-30 | 中国海洋大学 | Cross-modal cross-guidance-based image-text retrieval method and system |
CN117556062A (en) * | 2024-01-05 | 2024-02-13 | 武汉理工大学三亚科教创新园 | Ocean remote sensing image audio retrieval network training method and application method |
CN117573916A (en) * | 2024-01-17 | 2024-02-20 | 武汉理工大学三亚科教创新园 | Retrieval method, device and storage medium for image text of marine unmanned aerial vehicle |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017103035A1 (en) * | 2015-12-18 | 2017-06-22 | Ventana Medical Systems, Inc. | Systems and methods of unmixing images with varying acquisition properties |
US10713794B1 (en) * | 2017-03-16 | 2020-07-14 | Facebook, Inc. | Method and system for using machine-learning for object instance segmentation |
CN111798460A (en) * | 2020-06-17 | 2020-10-20 | 南京信息工程大学 | Satellite image segmentation method |
CN113487629A (en) * | 2021-07-07 | 2021-10-08 | 电子科技大学 | Image attribute editing method based on structured scene and text description |
WO2022160771A1 (en) * | 2021-01-26 | 2022-08-04 | 武汉大学 | Method for classifying hyperspectral images on basis of adaptive multi-scale feature extraction model |
-
2022
- 2022-10-09 CN CN202211223823.1A patent/CN115311463B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017103035A1 (en) * | 2015-12-18 | 2017-06-22 | Ventana Medical Systems, Inc. | Systems and methods of unmixing images with varying acquisition properties |
US10713794B1 (en) * | 2017-03-16 | 2020-07-14 | Facebook, Inc. | Method and system for using machine-learning for object instance segmentation |
CN111798460A (en) * | 2020-06-17 | 2020-10-20 | 南京信息工程大学 | Satellite image segmentation method |
WO2022160771A1 (en) * | 2021-01-26 | 2022-08-04 | 武汉大学 | Method for classifying hyperspectral images on basis of adaptive multi-scale feature extraction model |
CN113487629A (en) * | 2021-07-07 | 2021-10-08 | 电子科技大学 | Image attribute editing method based on structured scene and text description |
Non-Patent Citations (3)
Title |
---|
QINMIN CHENG等: "A Semantic-Preserving Deep Hashing Model for multi-label remote sensing image retrieval", 《REMOTE SENSING》 * |
YUANSHENG HUA等: "Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification", 《ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING》 * |
单守平: "基于深度学习和标签语义关联的遥感影像多标签分类", 《中国优秀硕士学位论文全文数据库(电子期刊)》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116127123A (en) * | 2023-04-17 | 2023-05-16 | 中国海洋大学 | Semantic instance relation-based progressive ocean remote sensing image-text retrieval method |
CN116127123B (en) * | 2023-04-17 | 2023-07-07 | 中国海洋大学 | Semantic instance relation-based progressive ocean remote sensing image-text retrieval method |
CN116186317A (en) * | 2023-04-23 | 2023-05-30 | 中国海洋大学 | Cross-modal cross-guidance-based image-text retrieval method and system |
CN116186317B (en) * | 2023-04-23 | 2023-06-30 | 中国海洋大学 | Cross-modal cross-guidance-based image-text retrieval method and system |
CN117556062A (en) * | 2024-01-05 | 2024-02-13 | 武汉理工大学三亚科教创新园 | Ocean remote sensing image audio retrieval network training method and application method |
CN117556062B (en) * | 2024-01-05 | 2024-04-16 | 武汉理工大学三亚科教创新园 | Ocean remote sensing image audio retrieval network training method and application method |
CN117573916A (en) * | 2024-01-17 | 2024-02-20 | 武汉理工大学三亚科教创新园 | Retrieval method, device and storage medium for image text of marine unmanned aerial vehicle |
CN117573916B (en) * | 2024-01-17 | 2024-04-26 | 武汉理工大学三亚科教创新园 | Retrieval method, device and storage medium for image text of marine unmanned aerial vehicle |
Also Published As
Publication number | Publication date |
---|---|
CN115311463B (en) | 2023-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115311463B (en) | Category-guided multi-scale decoupling marine remote sensing image text retrieval method and system | |
JP7335907B2 (en) | Character structuring extraction method and device, electronic device, storage medium, and computer program | |
CN112966684A (en) | Cooperative learning character recognition method under attention mechanism | |
CN111914107B (en) | Instance retrieval method based on multi-channel attention area expansion | |
CN111815602A (en) | Building PDF drawing wall recognition device and method based on deep learning and morphology | |
TW202207077A (en) | Text area positioning method and device | |
CN111932577B (en) | Text detection method, electronic device and computer readable medium | |
CN114419642A (en) | Method, device and system for extracting key value pair information in document image | |
CN115658934A (en) | Image-text cross-modal retrieval method based on multi-class attention mechanism | |
CN114372475A (en) | Network public opinion emotion analysis method and system based on RoBERTA model | |
CN110245292B (en) | Natural language relation extraction method based on neural network noise filtering characteristics | |
CN114782722A (en) | Image-text similarity determining method and device and electronic equipment | |
CN112348001B (en) | Training method, recognition method, device, equipment and medium for expression recognition model | |
Vu et al. | Revising FUNSD dataset for key-value detection in document images | |
CN111159411B (en) | Knowledge graph fused text position analysis method, system and storage medium | |
CN116579348A (en) | False news detection method and system based on uncertain semantic fusion | |
US20230154077A1 (en) | Training method for character generation model, character generation method, apparatus and storage medium | |
CN112800259B (en) | Image generation method and system based on edge closure and commonality detection | |
CN111652164B (en) | Isolated word sign language recognition method and system based on global-local feature enhancement | |
CN114637846A (en) | Video data processing method, video data processing device, computer equipment and storage medium | |
CN114820885A (en) | Image editing method and model training method, device, equipment and medium thereof | |
Priya et al. | Developing an offline and real-time Indian sign language recognition system with machine learning and deep learning | |
CN113313108A (en) | Saliency target detection method based on super-large receptive field characteristic optimization | |
Li et al. | ViT2CMH: Vision Transformer Cross-Modal Hashing for Fine-Grained Vision-Text Retrieval. | |
CN112347196B (en) | Entity relation extraction method and device based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |