CN111639240B - Cross-modal Hash retrieval method and system based on attention awareness mechanism - Google Patents

Cross-modal Hash retrieval method and system based on attention awareness mechanism Download PDF

Info

Publication number
CN111639240B
CN111639240B CN202010408302.8A CN202010408302A CN111639240B CN 111639240 B CN111639240 B CN 111639240B CN 202010408302 A CN202010408302 A CN 202010408302A CN 111639240 B CN111639240 B CN 111639240B
Authority
CN
China
Prior art keywords
modal
cross
hash
data
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010408302.8A
Other languages
Chinese (zh)
Other versions
CN111639240A (en
Inventor
罗昕
姚洪磊
许信顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202010408302.8A priority Critical patent/CN111639240B/en
Publication of CN111639240A publication Critical patent/CN111639240A/en
Application granted granted Critical
Publication of CN111639240B publication Critical patent/CN111639240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a cross-modal Hash retrieval method and a system based on an attention awareness mechanism, which comprises the following steps: performing feature extraction and attention feature extraction on a training set in the cross-modal data set to obtain cross-modal features weighted by the attention features; inputting cross-modal characteristics of cross-modal data pairs into a Hash learning model, and optimizing the Hash learning model by taking a minimum loss function as a target according to an output cross-modal Hash code; and screening modal data meeting the retrieval requirement from the hash codes of modal data with different modals from the to-be-detected data according to the hash codes of the to-be-detected data obtained by the optimized hash learning model. The attention mechanism is applied to a cross-modal Hash retrieval task, a novel attention method of the attention sensing mechanism is provided, noise and redundancy in original data are suppressed, meanwhile, a key attention area is enhanced, and the generation quality of the Hash code is improved.

Description

Cross-modal Hash retrieval method and system based on attention awareness mechanism
Technical Field
The invention relates to the technical field of cross-modal hash retrieval, in particular to a cross-modal hash retrieval method and a cross-modal hash retrieval system based on an attention awareness mechanism.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the explosive growth of network multimedia data, it is necessary to search for texts or videos related to the existing image retrieval or search for images or videos based on texts, that is, even if data of one modality is used to search for similar samples of another modality, meanwhile, efficient storage and fast query of data also become a difficult problem.
The cross-modal retrieval technology aims to retrieve data of different modalities matched with the existing data according to the existing data, such as searching a picture set which accords with the text description in a database through text information. The prior art can be divided into a depth model and a non-depth model according to whether a depth learning technology is combined or not, a traditional depth cross-modal Hash retrieval model is generally divided into three steps, firstly, features of different modes are extracted by using a depth network, then, a Hash function is learned by using a full-connection network under the supervision of cross entropy loss and a sample similarity matrix according to the extracted features, and finally, a sample is converted into a Hash code through the Hash function and stored in a database.
At present, many cross-modal hash retrieval methods have been proposed, but the inventor finds that the prior art has at least the following problems: for a retrieval task, real data often has some noises and redundancies, and during feature extraction, the most useful visual information needs to be extracted, while background information is ignored, because the background information can cause interference to retrieval; however, in actual data, the valuable category information only covers a small part, most areas are backgrounds, and most of the current cross-modal retrieval methods can neglect the problem and learn features directly from original data, so that invalid or redundant information can mislead the features, and a low-quality hash code is generated; in addition, in order to improve the retrieval effect, many depth cross-modal hash retrieval models with better effects often introduce network models with more parameters and better effects, such as GAN (generation countermeasure network) and the like, but the training and retrieval time can be greatly increased.
Disclosure of Invention
In order to solve the problems, the invention provides a cross-modal hash retrieval method and a cross-modal hash retrieval system based on an attention perception mechanism, wherein the attention mechanism is applied to a cross-modal hash retrieval task, and provides a novel attention method of the attention perception mechanism.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a cross-modal hash retrieval method based on an attention-aware mechanism, including:
performing feature extraction and attention feature extraction on a training set in the cross-modal data set to obtain cross-modal features weighted by the attention features;
inputting cross-modal characteristics of cross-modal data pairs in the training set into a Hash learning model, and optimizing the Hash learning model by taking a minimum loss function as a target according to an output cross-modal Hash code;
and screening modal data meeting the retrieval requirement from the hash codes of the modal data in the cross-modal data set, which are different from the modal of the data to be detected, according to the hash codes of the data to be detected, which are obtained by the optimized hash learning model.
In a second aspect, the present invention provides a cross-modal hash retrieval system based on an attention-aware mechanism, including:
the feature extraction module is used for performing feature extraction and attention feature extraction on a training set in the cross-modal data set to obtain cross-modal features weighted by the attention features;
the Hash learning module is used for inputting cross-modal characteristics of cross-modal data pairs in the training set into the Hash learning model and optimizing the Hash learning model by taking a minimum loss function as a target according to the output cross-modal Hash code;
and the retrieval module is used for screening modal data meeting the retrieval requirement in the cross-modal data set and the hash codes of modal data with different modals from the data to be detected according to the hash codes of the data to be detected obtained by the optimized hash learning model.
In a third aspect, the present invention provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein when the computer instructions are executed by the processor, the method of the first aspect is performed.
In a fourth aspect, the present invention provides a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
the cross-modal data set comprises multiple modal data, and the multiple modal data can simultaneously perform feature learning and hash code learning, so that the hash code generation efficiency is improved.
The invention provides a novel attention method of an attention perception mechanism, which applies the attention mechanism to a cross-modal Hash retrieval task, weights two different modes, can highlight a key part of cross-modal data, such as a certain word in a region where an object exists in a picture or text input, can inhibit the influence of a redundant or invalid part on a retrieval effect, such as a picture background or a text interference word and the like, effectively improves the quality of Hash code generation, and can be suitable for the cross-modal retrieval task under various multi-modal data scenes
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIGS. 1(a) - (b) are pictorial modal data;
FIG. 1(c) is a diagram of a text in a public data set MIRFlicker-25K labeled words with a frequency ranking of 10 top;
FIG. 1(d) is the text annotation data of FIG. 1 (a);
fig. 2 is a flowchart of a cross-modal hash retrieval method based on an attention-aware mechanism according to embodiment 1 of the present invention;
fig. 3 is a flowchart of image attention feature extraction provided in embodiment 1 of the present invention;
fig. 4 is a flowchart of text attention feature extraction provided in embodiment 1 of the present invention;
fig. 5 is a structural diagram of a cross-modal hash retrieval system based on an attention-aware mechanism according to embodiment 1 of the present invention.
The specific implementation mode is as follows:
the invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example 1
At present, various cross-mode hash retrieval methods are proposed, but because real data has noise and redundancy, the current retrieval method directly learns characteristics from original data, and the characteristics are misled by invalid or redundant information, so that a low-quality hash code is generated. Taking two modalities of pictures and texts as an example, as shown in fig. 1(a) -1(b), for the picture of fig. 1(a), it is necessary to highlight the area where bees and flowers are located and ignore the background portion behind, because it will interfere with the retrieval; likewise, for the picture of fig. 1(b), the labels, i.e. the supervisory information, are "animal", "flower" and "plant life", the most useful visual information might be a butterfly hovering over the flower. However, these valuable categories of information cover only a small portion of the entire image, while most of the area in the image is background;
as shown in fig. 1(c) which contains the public data set mirlicker-25K, the text labels words with the word frequency ranking 10 top, half of the words can be seen: "explore", "canon", "bw", "nikon", and "2007" are all invalid words that have no direct relationship to the image content; FIG. 1(d) is a text label of FIG. 1(a), only the word "bes" is relevant to the search task.
Therefore, if the noise and redundancy in the original data are not suppressed, the low-quality hash code is easily generated, and the retrieval result is influenced.
The Attention mechanism has been widely applied in the field of computer vision in recent years, for example, in the fields of natural language processing, object detection, image recognition and voice recognition, but is rarely used in the cross-modal search direction. The traditional Attention mechanism is used for image recognition, and can automatically find a part needing important Attention in a picture, namely a Mask with the same size as a picture representation (the picture representation can be an original picture, a feature map and the like) is generated through learning; for the attention area, the Mask corresponding position has a higher activation value. The Attention model can be generally divided into a spatial Attention model and a channel Attention model according to the region of action; the spatial attention model generates corresponding attention values aiming at different positions in the feature map, and the restoration to the original picture means that different positions in the picture have different degrees of influence on the task; the channel attention mechanism generates corresponding attention values aiming at different channels in feature map, and is more abstract.
The embodiment integrates a spatial attention mechanism, applies the attention mechanism to a cross-modal hash retrieval task, and provides a new attention method on the basis of the traditional attention mechanism, namely an attention perception mechanism, which is used for weighting two different modes;
that is, in the cross-modal hash retrieval method based on the attention-aware mechanism in this embodiment, noise and redundancy in raw data are suppressed, and a key attention area is enhanced, so as to extract an attention matrix, which has a better effect of improving the quality of a generated hash code, and can be used for cross-modal information retrieval in various multi-modal data scenarios, as shown in fig. 2, specifically including the following steps:
s1: performing feature extraction and attention feature extraction on a training set in the cross-modal data set to obtain cross-modal features weighted by the attention features;
s2: inputting cross-modal characteristics of cross-modal data pairs in the training set into a Hash learning model, and optimizing the Hash learning model by taking a minimum loss function as a target according to an output cross-modal Hash code;
s3: and screening modal data meeting the retrieval requirement from the hash codes of the modal data in the cross-modal data set, which are different from the modal of the data to be detected, according to the hash codes of the data to be detected, which are obtained by the optimized hash learning model.
In step S1, the cross-modality data set includes multiple modality data, and in this embodiment, the image modality data and the text modality data are taken as an example, and it is understood that the modality type may be extended to other modalities, such as video, voice, and the like.
Dividing a cross-modal data set into a training set and a test set, and simultaneously performing feature extraction and attention feature extraction on the cross-modal data of images and texts in the training set by adopting two parallel convolutional neural networks; the method specifically comprises the following steps: acquiring an initial attention matrix, training the convolutional neural network by using a minimum loss function, and outputting an improved attention matrix; and performing dot product operation on the attention matrix and the feature matrix output by the convolutional neural network to obtain the cross-modal feature weighted by the attention feature.
The image feature extraction and the image attention feature extraction are carried out on the images in the training set, and the method specifically comprises the following steps:
s1-1: in the image feature extraction process, a convolutional neural network CNN _ F is used as a basic network structure, and an image feature matrix is output at a fifth convolutional layer Conv 5;
s1-2: the image attention feature extraction process comprises the following steps: (1) an attention layer is introduced between the fifth convolutional layer and the fully-connected layer, so that a residual error network Resnet-50 is improved, as shown in FIG. 3, the fully-connected layer is replaced by a new convolutional layer Conv6 and a maximum pooling layer Max pooling, and the Conv6 layer is introduced to ensure that the size of a final attention map is consistent with the size of an image feature matrix output by the Conv5 layer in the image feature extraction process; the initial attention matrix O is extracted using a modified Resnet-50 network and the network is pre-trained using a cross-entropy function as a loss function.
(2) The initial attention matrix is further improved:
O′ir=sigmoid(maxk(Oijk)),
wherein, O'irIs a picture IiThe attention weight, O, corresponding to the r-th region of (1)ijkIs the value of the kth class (Nc total classes) at the same position in the pre-training network output O.
Figure BDA0002492232520000081
Wherein the content of the first and second substances,
Figure BDA0002492232520000082
is the finally obtained attention matrix, μiThe threshold value can be calculated in the following specific manner:
sorting the attention values of different areas of the picture in ascending order, and assuming that about p% (0) exists in one picture<p<100) The remaining part (about 1-p%) is the key area; then muiValue of (2) is O'iAfter sequencing
Figure BDA0002492232520000085
An activation value, where Nr ═ n × n denotes the number of regions.
(3) Will be provided with
Figure BDA0002492232520000083
Extending on the channel layer to obtain a new weight matrix
Figure BDA0002492232520000084
Then, the point multiplication operation is carried out on the image feature matrix output by the Conv5 layer,image features weighted by image attention features are obtained.
Performing text feature extraction and text attention feature extraction on the images in the training set, specifically comprising:
s1-3: in the text feature extraction process, two full-connection layers are adopted to obtain text features;
s1-4: the text attention feature extraction process comprises the following steps: (1) an attention layer is introduced before the first fully-connected layer Fc1, a neural network without a hidden layer, namely a two-layer nonlinear classification network, is adopted to obtain a mapping relation W between each label represented by the input text and the corresponding classification thereof, as shown in FIG. 4, and W is used as an initial attention matrix, and the training of the classification network is guided by using the least squares error loss.
(2) The initial attention matrix is further improved:
normalizing W using SoftMax functionijAnd assume text yiContribution to different classes obeys distribution Fi(·),
Figure BDA0002492232520000091
Fi(lj)=W′ij
Wherein ljIs the label information corresponding to the jth sample,
solving the information entropy corresponding to each label:
Figure BDA0002492232520000092
W″i=-Ei
solving final attention moment array
Figure BDA0002492232520000093
Figure BDA0002492232520000094
Figure BDA0002492232520000095
Wherein v is a calculable threshold, and the specific calculation mode is as follows:
attention matrix W ″)iIn ascending order, set v to the second
Figure BDA0002492232520000097
And the corresponding values of the positions, wherein Nt represents the number of different labels in the text label set.
(3) Attention graph of original text features and text
Figure BDA0002492232520000096
Multiplying to obtain text features weighted by the text attention features; wherein, the original text feature is represented by BoW, and can be in other forms such as Word2 Vec.
In step S2, inputting the image features and the text features into a hash learning network model, obtaining a binarized hash code by using a sign function, and constructing a global objective function with a minimum loss function as a target:
Figure BDA0002492232520000101
where n is the number of samples in the sample set, BxIs a binary hash code corresponding to the picture modality, ByIs a binary hash code corresponding to the text mode, and sets B as Bx=By=sign(γ(F+G)),Wx、WyIs an initial attention matrix corresponding to the picture modal data and the text modal data, F*=fx(xix),θxIs the image network parameter, F is the output of the image network; g*=fy(yiy),θyIs a text network parameter, G is the output of the text network(ii) a Order to
Figure BDA0002492232520000102
Both gamma and eta are hyperparameters; the similarity matrix S is: for two different samples i, j, if at least one class exists for both sample labels, then S is usedijSet to 1, otherwise set to 0.
In this embodiment, the first term of the global objective function is a negative log-likelihood loss function, the second term is a quantization loss function, and since the similarity relationship between samples is obtained through the label information L, in order to more fully utilize the sample supervision information, the third term loss, that is, the semantic preserving loss function, is proposed in this embodiment.
In step S2, the hash learning model is optimized with the minimization of loss function as the target, and the variables to be optimized are respectively B, F, G, and Wx,WyIn the present embodiment, an iterative optimization manner is adopted to minimize the loss function, that is, only one variable is optimized at a time, and other variables are kept unchanged. The specific optimization strategy is as follows:
s2-1: fixing variables B, G, Wx,WyUpdating a variable F:
for sample point xiOptimization of F using stochastic gradient descent method*Namely:
Figure BDA0002492232520000103
calculating by using chain rule
Figure BDA0002492232520000104
Namely, it is
Figure BDA0002492232520000105
Updating a parameter θ of an image network via back propagationx
S2-2: fixing variables B, F, G, WyUpdate the variable Wx
The variable is updated using a random gradient descent method,
Figure BDA0002492232520000111
s2-3: fixing variables B, F, Wx,WyUpdating a variable G:
similar to the process of updating the variable F, for the sample point yjFirst, the gradient of the variable G is calculated, i.e.:
Figure BDA0002492232520000112
calculation using chain rule
Figure BDA0002492232520000113
And update the parameter thetay
S2-4: fixing variables B, F, G, WxUpdate the variable WyNamely:
Figure BDA0002492232520000114
s2-5: fixing variables F, G, Wx,WyUpdate variable B, i.e.:
Figure BDA0002492232520000115
wherein V ═ γ (F + G).
In step S3, after the hash learning model is optimized, calculating all samples in the cross-modal dataset according to the optimized hash learning model to obtain corresponding hash codes;
when a retrieval task is carried out, the obtained data is input into the model to obtain corresponding hash codes, N hash codes with the closest Hamming distance are retrieved from the hash codes of the modal data in the cross-modal data set and different from the data to be detected in modal, and the cross-modal data meeting the retrieval requirement is screened out.
Example 2
As shown in fig. 5, the present embodiment provides a cross-modal hash retrieval system based on an attention-aware mechanism, including:
the feature extraction module is used for performing feature extraction and attention feature extraction on a training set in the cross-modal data set to obtain cross-modal features weighted by the attention features;
the Hash learning module is used for inputting cross-modal characteristics of cross-modal data pairs in the training set into the Hash learning model and optimizing the Hash learning model by taking a minimum loss function as a target according to the output cross-modal Hash code;
and the retrieval module is used for screening modal data meeting the retrieval requirement in the cross-modal data set and the hash codes of modal data with different modals from the data to be detected according to the hash codes of the data to be detected obtained by the optimized hash learning model.
It should be noted that the above modules correspond to steps S1 to S3 in embodiment 1, and the above modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.
In this embodiment, a feature extraction module receives pictures and texts, and performs feature learning and hash coding learning on the image data and the text data at the same time, wherein an image attention feature extraction module is included in an image feature extraction network, and a text attention feature extraction module is included in a text feature extraction network, and finally, features weighted by attention are input into a hash learning module to guide generation of hash codes, so that the quality of hash code generation is improved, and the method is suitable for cross-modal retrieval tasks in various multi-modal data scenes.
In further embodiments, there is also provided:
an electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment 1. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.
The method in embodiment 1 may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (9)

1. A cross-modal hash retrieval method based on an attention-aware mechanism is characterized by comprising the following steps:
performing feature extraction and attention feature extraction on a training set in the cross-modal data set to obtain cross-modal features weighted by the attention features;
inputting cross-modal characteristics of cross-modal data pairs in the training set into a Hash learning model, and optimizing the Hash learning model by taking a minimum loss function as a target according to an output cross-modal Hash code; constructing a global objective function by taking a minimum loss function as a target according to the output cross-modal hash code;
the global objective function is:
Figure FDA0002954110500000011
where n is the number of samples in the sample set, Bx、ByIs a hash code, theta, corresponding to x-mode data and y-mode data in a cross-mode data pairx、θyIs a network parameter, W, of the network corresponding to the x-modal data and the y-modal datax、WyIs an initial attention matrix, S, corresponding to the x-mode data and the y-mode dataijIs a similarity matrix, gamma and eta are both hyper-parameters; F. g is the output of the corresponding network of the x modal data and the y modal data, and L is label information;
and screening modal data meeting the retrieval requirement from the hash codes of the modal data in the cross-modal data set, which are different from the modal of the data to be detected, according to the hash codes of the data to be detected, which are obtained by the optimized hash learning model.
2. The attention-aware mechanism-based cross-modal hash retrieval method of claim 1, wherein the cross-modal data set comprises a plurality of modal data, the training set comprises a plurality of cross-modal data pairs, and the cross-modal data pairs employ two parallel convolutional neural networks for feature extraction and attention feature extraction at the same time.
3. The cross-modal hash retrieval method based on attention awareness mechanism as claimed in claim 1, wherein said attention feature extraction comprises:
acquiring an initial attention feature matrix, training the convolutional neural network by using a minimum loss function, and outputting an improved attention feature matrix;
and performing dot product operation on the attention feature matrix and the feature matrix output by the convolutional neural network to obtain the cross-modal feature weighted by the attention feature.
4. The attention-aware mechanism-based cross-modal hash retrieval method of claim 1, wherein the global objective function comprises a negative log-likelihood loss function, a quantization loss function, and a semantic preserving loss function.
5. The cross-modal hash retrieval method based on the attention-aware mechanism as claimed in claim 1, wherein the hash learning model is optimized by using an iterative optimization method, and the optimized variables include hash codes of cross-modal data pairs, outputs of corresponding networks of cross-modal data pairs, and an initial attention matrix.
6. The cross-modal hash retrieval method based on the attention-aware mechanism as claimed in claim 1, wherein in the hash codes of modal data in the cross-modal data set different from the data to be detected in modal, the hamming distances between the hash codes and the hash codes of the data to be detected are compared, the N hash codes with the nearest hamming distances are retrieved, and the cross-modal data satisfying the retrieval requirement are screened out.
7. A cross-modal hash retrieval system based on an attention-aware mechanism, comprising:
the feature extraction module is used for performing feature extraction and attention feature extraction on a training set in the cross-modal data set to obtain cross-modal features weighted by the attention features;
the Hash learning module is used for inputting cross-modal characteristics of cross-modal data pairs in the training set into the Hash learning model and optimizing the Hash learning model by taking a minimum loss function as a target according to the output cross-modal Hash code; constructing a global objective function by taking a minimum loss function as a target according to the output cross-modal hash code;
the global objective function is:
Figure FDA0002954110500000031
where n is the number of samples in the sample set, Bx、ByIs a hash code, theta, corresponding to x-mode data and y-mode data in a cross-mode data pairx、θyIs a network parameter, W, of the network corresponding to the x-modal data and the y-modal datax、WyIs an initial attention matrix, S, corresponding to the x-mode data and the y-mode dataijIs a similarity matrix, gamma and eta are both hyper-parameters; F. g is the output of the corresponding network of the x modal data and the y modal data, and L is label information;
and the retrieval module is used for screening modal data meeting the retrieval requirement in the cross-modal data set and the hash codes of modal data with different modals from the data to be detected according to the hash codes of the data to be detected obtained by the optimized hash learning model.
8. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of any of claims 1-6.
9. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 6.
CN202010408302.8A 2020-05-14 2020-05-14 Cross-modal Hash retrieval method and system based on attention awareness mechanism Active CN111639240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010408302.8A CN111639240B (en) 2020-05-14 2020-05-14 Cross-modal Hash retrieval method and system based on attention awareness mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010408302.8A CN111639240B (en) 2020-05-14 2020-05-14 Cross-modal Hash retrieval method and system based on attention awareness mechanism

Publications (2)

Publication Number Publication Date
CN111639240A CN111639240A (en) 2020-09-08
CN111639240B true CN111639240B (en) 2021-04-09

Family

ID=72331952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010408302.8A Active CN111639240B (en) 2020-05-14 2020-05-14 Cross-modal Hash retrieval method and system based on attention awareness mechanism

Country Status (1)

Country Link
CN (1) CN111639240B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199375B (en) * 2020-09-30 2024-03-01 三维通信股份有限公司 Cross-modal data processing method and device, storage medium and electronic device
CN112364198B (en) * 2020-11-17 2023-06-30 深圳大学 Cross-modal hash retrieval method, terminal equipment and storage medium
CN112329439B (en) * 2020-11-18 2021-11-19 北京工商大学 Food safety event detection method and system based on graph convolution neural network model
CN112287159B (en) * 2020-12-18 2021-04-09 北京世纪好未来教育科技有限公司 Retrieval method, electronic device and computer readable medium
CN112598067A (en) * 2020-12-25 2021-04-02 中国联合网络通信集团有限公司 Emotion classification method and device for event, electronic equipment and storage medium
CN112817914A (en) * 2021-01-21 2021-05-18 深圳大学 Attention-based deep cross-modal Hash retrieval method and device and related equipment
CN112734625B (en) * 2021-01-29 2022-06-07 成都视海芯图微电子有限公司 Hardware acceleration system and method based on 3D scene design
CN112862727B (en) * 2021-03-16 2023-06-23 上海壁仞智能科技有限公司 Cross-modal image conversion method and device
CN113095415B (en) * 2021-04-15 2022-06-14 齐鲁工业大学 Cross-modal hashing method and system based on multi-modal attention mechanism
CN113032614A (en) * 2021-04-28 2021-06-25 泰康保险集团股份有限公司 Cross-modal information retrieval method and device
CN113220919B (en) * 2021-05-17 2022-04-22 河海大学 Dam defect image text cross-modal retrieval method and model
CN113343014A (en) * 2021-05-25 2021-09-03 武汉理工大学 Cross-modal image audio retrieval method based on deep heterogeneous correlation learning
CN113239237B (en) * 2021-07-13 2021-11-30 北京邮电大学 Cross-media big data searching method and device
CN116776157B (en) * 2023-08-17 2023-12-12 鹏城实验室 Model learning method supporting modal increase and device thereof
CN117194740B (en) * 2023-11-08 2024-01-30 武汉大学 Geographic information retrieval intention updating method and system based on guided iterative feedback

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885764A (en) * 2017-09-21 2018-04-06 银江股份有限公司 Based on the quick Hash vehicle retrieval method of multitask deep learning
CN108170755A (en) * 2017-12-22 2018-06-15 西安电子科技大学 Cross-module state Hash search method based on triple depth network

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346440B (en) * 2014-10-10 2017-06-23 浙江大学 A kind of across media hash indexing methods based on neutral net
CN107562812B (en) * 2017-08-11 2021-01-15 北京大学 Cross-modal similarity learning method based on specific modal semantic space modeling
CA3022998A1 (en) * 2017-11-02 2019-05-02 Royal Bank Of Canada Method and device for generative adversarial network training
US10248664B1 (en) * 2018-07-02 2019-04-02 Inception Institute Of Artificial Intelligence Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval
US11556581B2 (en) * 2018-09-04 2023-01-17 Inception Institute of Artificial Intelligence, Ltd. Sketch-based image retrieval techniques using generative domain migration hashing
CN109992686A (en) * 2019-02-24 2019-07-09 复旦大学 Based on multi-angle from the image-text retrieval system and method for attention mechanism
CN109960732B (en) * 2019-03-29 2023-04-18 广东石油化工学院 Deep discrete hash cross-modal retrieval method and system based on robust supervision
CN110222140B (en) * 2019-04-22 2021-07-13 中国科学院信息工程研究所 Cross-modal retrieval method based on counterstudy and asymmetric hash
CN110472642B (en) * 2019-08-19 2022-02-01 齐鲁工业大学 Fine-grained image description method and system based on multi-level attention
CN111125457A (en) * 2019-12-13 2020-05-08 山东浪潮人工智能研究院有限公司 Deep cross-modal Hash retrieval method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885764A (en) * 2017-09-21 2018-04-06 银江股份有限公司 Based on the quick Hash vehicle retrieval method of multitask deep learning
CN108170755A (en) * 2017-12-22 2018-06-15 西安电子科技大学 Cross-module state Hash search method based on triple depth network

Also Published As

Publication number Publication date
CN111639240A (en) 2020-09-08

Similar Documents

Publication Publication Date Title
CN111639240B (en) Cross-modal Hash retrieval method and system based on attention awareness mechanism
CN108984724B (en) Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation
CN108960073B (en) Cross-modal image mode identification method for biomedical literature
US11928602B2 (en) Systems and methods to enable continual, memory-bounded learning in artificial intelligence and deep learning continuously operating applications across networked compute edges
Sharma et al. Era of deep neural networks: A review
CN110188346B (en) Intelligent research and judgment method for network security law case based on information extraction
CN109783666B (en) Image scene graph generation method based on iterative refinement
WO2017052791A1 (en) Semantic multisensory embeddings for video search by text
US11288324B2 (en) Chart question answering
CN108170848B (en) Chinese mobile intelligent customer service-oriented conversation scene classification method
CN114565104A (en) Language model pre-training method, result recommendation method and related device
CN111027576B (en) Cooperative significance detection method based on cooperative significance generation type countermeasure network
CN114743020A (en) Food identification method combining tag semantic embedding and attention fusion
CN110097096B (en) Text classification method based on TF-IDF matrix and capsule network
CN111461175B (en) Label recommendation model construction method and device of self-attention and cooperative attention mechanism
CN111858984A (en) Image matching method based on attention mechanism Hash retrieval
CN114359631A (en) Target classification and positioning method based on coding-decoding weak supervision network model
KR20200071865A (en) Image object detection system and method based on reduced dimensional
Qian Exploration of machine algorithms based on deep learning model and feature extraction
CN111930972B (en) Cross-modal retrieval method and system for multimedia data by using label level information
CN106503066B (en) Processing search result method and apparatus based on artificial intelligence
US20240037335A1 (en) Methods, systems, and media for bi-modal generation of natural languages and neural architectures
CN116958677A (en) Internet short video classification method based on multi-mode big data
Bibi et al. Deep features optimization based on a transfer learning, genetic algorithm, and extreme learning machine for robust content-based image retrieval
CN116663523A (en) Semantic text similarity calculation method for multi-angle enhanced network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant