CN112199531A - Cross-modal retrieval method and device based on Hash algorithm and neighborhood map - Google Patents
Cross-modal retrieval method and device based on Hash algorithm and neighborhood map Download PDFInfo
- Publication number
- CN112199531A CN112199531A CN202011224930.7A CN202011224930A CN112199531A CN 112199531 A CN112199531 A CN 112199531A CN 202011224930 A CN202011224930 A CN 202011224930A CN 112199531 A CN112199531 A CN 112199531A
- Authority
- CN
- China
- Prior art keywords
- modal
- matrix
- cross
- semantic consistency
- neighborhood
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000011159 matrix material Substances 0.000 claims abstract description 94
- 238000004364 calculation method Methods 0.000 claims abstract description 20
- 230000006870 function Effects 0.000 claims abstract description 20
- 230000009466 transformation Effects 0.000 claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 15
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 12
- 238000000605 extraction Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a cross-modal retrieval method and a cross-modal retrieval device based on a Hash algorithm and a neighborhood map, wherein the retrieval method comprises the following steps: obtaining a multi-modal original sample, and performing minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain minimized residual values; learning potential correlation among the multi-modal original samples according to a collaborative matrix decomposition method, and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential correlation; calculating to obtain semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of the neighborhood map; and combining the minimized residual value, the semantic consistency among the modes and the semantic consistency in the modes with regularization calculation for avoiding overfitting to obtain the target function. According to the embodiment of the invention, the target function for cross-modal retrieval is obtained by calculating by comprehensively considering the global characteristics of multiple modes and the local characteristics among the modes, so that the comprehensiveness and the accuracy of the cross-modal retrieval are improved.
Description
Technical Field
The invention relates to the technical field of retrieval, in particular to a cross-modal retrieval method and a cross-modal retrieval device based on a hash algorithm and a neighborhood map.
Background
The rapid development of information technology brings about the explosive growth of multi-modal data, including multi-source heterogeneous data such as images, audio, text, video and the like. Since there are heterogeneous differences in semantic representations between modalities, efficient multi-modal retrieval is one of the key issues in current multi-modal fusion. In the prior art, the multi-modal retrieval is mostly realized by using a hash algorithm, the hash algorithm maps multi-modal data to a uniform potential space, and the alignment of the multi-modal space is realized by using a hash code obtained by quantizing a feature vector through a hash function. However, the applicant finds that, in research, the existing cross-modal retrieval method does not consider the similarity between samples in the same modality and the similarity between modalities, so that the cross-modal retrieval effect is poor.
Disclosure of Invention
The invention provides a cross-modal retrieval method and device based on a Hash algorithm and a neighborhood graph, and aims to solve the technical problem that the cross-modal retrieval effect is poor due to the fact that the similarity between samples in the same modality and the similarity between modalities are not considered in the conventional cross-modal retrieval method.
The first embodiment of the present invention provides a cross-modal retrieval method based on a hash algorithm and a neighborhood graph, including:
obtaining a multi-modal original sample, and performing minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimized residual value;
learning potential associations among the multi-modal original samples according to a collaborative matrix decomposition method, and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential associations;
calculating to obtain semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of a neighborhood map;
and combining the minimized residual value, the semantic consistency among the modes and the semantic consistency in the modes with regularization calculation for avoiding overfitting to obtain a target function.
Further, the obtaining of the multi-modal original sample performs minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimized residual value, and specifically includes:
and obtaining a hash code set of a training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining a minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to a principle of error minimization.
Further, the learning of the potential correlation between the multi-modal original samples according to the collaborative matrix decomposition method and the calculation of the semantic consistency between the modalities of the multi-modal original samples according to the potential correlation specifically include:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain semantic consistency among the modes.
Further, the calculating to obtain the semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of the neighborhood map specifically comprises:
constructing neighborhood graphs among the data in the same mode to represent the local relation of samples, and calculating according to the neighborhood graphs, the image basic feature matrix and the text basic feature image to obtain the semantic consistency in the mode of the multi-mode original samples.
Further, the regularization term includes a regression coefficient matrix, a sample noise matrix, the image basis feature matrix, and the text basis feature matrix.
A second embodiment of the present invention provides a cross-modal search apparatus based on a hash algorithm and a neighborhood map, including:
the minimum processing module is used for acquiring a multi-modal original sample, and performing minimum processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimum residual value;
the first calculation module is used for learning potential associations among the multi-modal original samples according to a collaborative matrix decomposition method and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential associations;
the second calculation module is used for calculating and obtaining semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of the neighborhood map;
and the third calculation module is used for combining the minimized residual value, the semantic consistency among the modalities and the semantic consistency in the modalities with regularization calculation for avoiding overfitting to obtain a target function.
Further, the minimization processing module is specifically configured to:
and obtaining a hash code set of a training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining a minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to a principle of error minimization.
Further, the first calculating module is specifically configured to:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain semantic consistency among the modes.
Further, the second calculation module is specifically configured to:
constructing neighborhood graphs among the data in the same mode to represent the local relation of samples, calculating according to the neighborhood graphs, the image basic feature matrix and the text basic feature image to obtain the semantic consistency in the mode of the multi-mode original samples,
further, the regularization term includes a regression coefficient matrix, a sample noise matrix, the image basis feature matrix, and the text basis feature matrix.
The embodiment of the invention combines the residual value before and after the minimum original sample transformation, the semantic consistency between the modes and the semantic consistency in the modes, and considers the global characteristics of the multiple modes and the local characteristics between the modes, calculates and obtains the target function for cross-mode retrieval, and realizes the improvement of the comprehensiveness and the accuracy of the cross-mode retrieval.
Drawings
Fig. 1 is a schematic flowchart of a cross-modal retrieval method based on a hash algorithm and a neighborhood map according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a cross-modal retrieval apparatus based on a hash algorithm and a neighborhood map according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description of the present application, it is to be understood that the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless otherwise specified.
In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
Referring to fig. 1, a first embodiment of the present invention provides a cross-modal retrieval method based on a hash algorithm and a neighborhood map, including:
s1, obtaining a multi-modal original sample, and performing minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimized residual value;
s2, learning potential correlation among the multi-modal original samples according to a collaborative matrix decomposition method, and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential correlation;
s3, calculating semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of the neighborhood map;
and S4, combining the minimized residual value, the semantic consistency among the modes and the semantic consistency in the modes with regularization calculation for avoiding overfitting to obtain a target function.
When the cross-modal retrieval is carried out, residual values before and after the original sample is converted are minimized, the semantic consistency among the modalities and the semantic consistency in the modalities are comprehensively considered, and the influence factors in the original data conversion process are comprehensively considered, so that the target function is calculated, and the comprehensiveness and the accuracy of the cross-modal retrieval are improved.
Specifically, the embodiment of the invention embodies the overall characteristics of multiple modes by minimizing the residual values before and after the original sample transformation, embodies the local characteristics between the modes by the consistency between the modes, embodies the local characteristics in the modes by the consistency in the modes, realizes the high-efficiency extraction of the overall characteristics of the multiple modes, and thus can effectively improve the accuracy and reliability of the cross-mode retrieval.
As a specific implementation manner of the embodiment of the present invention, a multi-modal original sample is obtained, and a residual value obtained before and after the multi-modal original sample is subjected to feature transformation is subjected to minimization processing to obtain a minimized residual value, which specifically is:
and obtaining a hash code set of the training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining the minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to the principle of error minimization.
Specifically, under the condition of giving a semantic label matrix L, setting the hash code corresponding to each sample as B, setting the hash code set of the training set as B, and according to the principle of error minimization, minimizing the residual value before and after the transformation of the original sample as:
wherein, W is a regression coefficient matrix obtained by hash learning, and L can be understood as a uniform potential semantic space between modalities. Assuming that a linear relation exists between original samples, learning and obtaining a hash code b for retrieval by using a linear regression mode, and further decomposing different modal data to obtain a uniform potential semantic space.
As a specific implementation manner of the embodiment of the present invention, the potential correlation between the multi-modal original samples is learned according to a collaborative matrix decomposition method, and the semantic consistency between the modalities of the multi-modal original samples is obtained according to the potential correlation calculation, which specifically includes:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain the semantic consistency among the modes.
Specifically, the semantic consistency between modalities is expressed according to the relevance between modalities, wherein the relevance obtained by learning the potential relevance between multimodal samples in a collaborative matrix decomposition mode is obtained by the modality facultative relevance, namely the process of hash function learning and feature extraction, and the semantic consistency between modalities is specifically expressed as follows:
wherein X is a multimodal original sample, UXFor a basic feature matrix of the image, UYAnd B is a text basic feature matrix and a Hash code set.
As a specific implementation manner of the embodiment of the present invention, the semantic consistency in the modality of the multi-modality original sample is calculated by adopting the popular learning of the neighborhood map, which specifically includes:
and constructing neighborhood graphs among the data in the same mode to represent the local relation of the samples, and calculating according to the neighborhood graphs, the basic feature matrix of the images and the basic feature images of the texts to obtain the semantic consistency in the mode of the multi-mode original samples.
It should be noted that the semantics within the modality are based on the assumption that the data is approximately extracted from the same underlying space. The method constructs a neighborhood graph S among the same modal data to represent the local relation of original samples, and obtains the semantic consistency in the modal of the multi-modal original samples through calculation and optimization according to the neighborhood graph, an image basic feature matrix and a text basic feature image, wherein the semantic consistency in the modal is specifically represented as follows:
wherein, thereinAndrespectively representing a similarity matrix in the X-modality and a similarity matrix in the Y-modality.
As a specific implementation manner of the embodiment of the present invention, the regularization term includes a regression coefficient matrix, a sample noise matrix, an image basic feature matrix, and a text basic feature matrix.
In the embodiment of the invention, an objective function is established by introducing residual values before and after the transformation of a small original sample, semantic consistency between modes and semantic consistency in the modes and combining a regularization term:
the first term of the objective function is a minimum residual value obtained by semantic label learning, which is beneficial to obtaining a model with high discrimination, the second term and the third term are consistency among modalities, the fourth term and the fifth term are consistency in the modalities, the sixth term is a constraint term, which is a regularization term avoiding overfitting, wherein the regularization term comprises a regression coefficient matrix W and an image basic feature matrix UXAnd text basic feature matrix UYAnd a sample noise matrix E.
The embodiment of the invention has the following beneficial effects:
when the cross-modal retrieval is carried out, residual values before and after the original sample is converted are minimized, the semantic consistency among the modalities and the semantic consistency in the modalities are comprehensively considered, and the influence factors in the original data conversion process are comprehensively considered, so that the target function is calculated, and the comprehensiveness and the accuracy of the cross-modal retrieval are improved.
Specifically, the embodiment of the invention embodies the overall characteristics of multiple modes by minimizing the residual values before and after the original sample transformation, embodies the local characteristics between the modes by the consistency between the modes, embodies the local characteristics in the modes by the consistency in the modes, realizes the high-efficiency extraction of the overall characteristics of the multiple modes, and thus can effectively improve the accuracy and reliability of the cross-mode retrieval.
Referring to fig. 2, a second embodiment of the present invention provides a cross-modal search apparatus based on a hash algorithm and a neighborhood map, including:
the minimization processing module 10 is configured to obtain a multi-modal original sample, and perform minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimized residual value;
the first calculation module 20 is configured to learn potential associations among the multi-modal original samples according to a collaborative matrix decomposition method, and calculate semantic consistency among the modalities of the multi-modal original samples according to the potential associations;
the second calculation module 30 is configured to calculate semantic consistency in the modality of the multi-modality original sample by using the popular learning of the neighborhood map;
and the third calculation module 40 is configured to calculate a target function by combining the minimized residual value, the semantic consistency between the modalities, and the semantic consistency within the modalities with regularization that avoids overfitting.
When the cross-modal retrieval is carried out, residual values before and after the original sample is converted are minimized, the semantic consistency among the modalities and the semantic consistency in the modalities are comprehensively considered, and the influence factors in the original data conversion process are comprehensively considered, so that the target function is calculated, and the comprehensiveness and the accuracy of the cross-modal retrieval are improved.
Specifically, the embodiment of the invention embodies the overall characteristics of multiple modes by minimizing the residual values before and after the original sample transformation, embodies the local characteristics between the modes by the consistency between the modes, embodies the local characteristics in the modes by the consistency in the modes, realizes the high-efficiency extraction of the overall characteristics of the multiple modes, and thus can effectively improve the accuracy and reliability of the cross-mode retrieval.
As a specific implementation manner of the embodiment of the present invention, the minimization processing module 10 is specifically configured to:
and obtaining a hash code set of the training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining the minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to the principle of error minimization.
Specifically, under the condition of giving a semantic label matrix L, setting the hash code corresponding to each sample as B, setting the hash code set of the training set as B, and according to the principle of error minimization, minimizing the residual value before and after the transformation of the original sample as:
wherein, W is a regression coefficient matrix obtained by hash learning, and L can be understood as a uniform potential semantic space between modalities. Assuming that a linear relation exists between original samples, learning and obtaining a hash code b for retrieval by using a linear regression mode, and further decomposing different modal data to obtain a uniform potential semantic space.
As a specific implementation manner of the embodiment of the present invention, the first calculating module 20 is specifically configured to:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain the semantic consistency among the modes.
Specifically, the semantic consistency between modalities is expressed according to the relevance between modalities, wherein the relevance obtained by learning the potential relevance between multimodal samples in a collaborative matrix decomposition mode is obtained by the modality facultative relevance, namely the process of hash function learning and feature extraction, and the semantic consistency between modalities is specifically expressed as follows:
wherein X is a multimodal original sample, UXFor a basic feature matrix of the image, UYAnd B is a text basic feature matrix and a Hash code set.
As a specific implementation manner of the embodiment of the present invention, the second calculating module 30 is specifically configured to:
and constructing neighborhood graphs among the data in the same mode to represent the local relation of the samples, and calculating according to the neighborhood graphs, the basic feature matrix of the images and the basic feature images of the texts to obtain the semantic consistency in the mode of the multi-mode original samples.
It should be noted that the semantics within the modality are based on the assumption that the data is approximately extracted from the same underlying space. The method constructs a neighborhood graph S among the same modal data to represent the local relation of original samples, and obtains the semantic consistency in the modal of the multi-modal original samples through calculation and optimization according to the neighborhood graph, an image basic feature matrix and a text basic feature image, wherein the semantic consistency in the modal is specifically represented as follows:
wherein, thereinAndrespectively representing a similarity matrix in the X-modality and a similarity matrix in the Y-modality.
As a specific implementation manner of the embodiment of the present invention, the regularization term includes a regression coefficient matrix, a sample noise matrix, an image basic feature matrix, and a text basic feature matrix.
In the embodiment of the invention, an objective function is established by introducing residual values before and after the transformation of a small original sample, semantic consistency between modes and semantic consistency in the modes and combining a regularization term:
the first term of the objective function is a minimum residual value obtained by semantic label learning, which is beneficial to obtaining a model with high discrimination, the second term and the third term are consistency among modalities, the fourth term and the fifth term are consistency in the modalities, the sixth term is a constraint term, which is a regularization term avoiding overfitting, wherein the regularization term comprises a regression coefficient matrix W and an image basic feature matrix UXAnd text basic feature matrix UYAnd a sample noise matrix E.
The embodiment of the invention has the following beneficial effects:
when the cross-modal retrieval is carried out, residual values before and after the original sample is converted are minimized, the semantic consistency among the modalities and the semantic consistency in the modalities are comprehensively considered, and the influence factors in the original data conversion process are comprehensively considered, so that the target function is calculated, and the comprehensiveness and the accuracy of the cross-modal retrieval are improved.
Specifically, the embodiment of the invention embodies the overall characteristics of multiple modes by minimizing the residual values before and after the original sample transformation, embodies the local characteristics between the modes by the consistency between the modes, embodies the local characteristics in the modes by the consistency in the modes, realizes the high-efficiency extraction of the overall characteristics of the multiple modes, and thus can effectively improve the accuracy and reliability of the cross-mode retrieval.
The foregoing is a preferred embodiment of the present invention, and it should be noted that it would be apparent to those skilled in the art that various modifications and enhancements can be made without departing from the principles of the invention, and such modifications and enhancements are also considered to be within the scope of the invention.
Claims (10)
1. A cross-modal retrieval method based on a hash algorithm and a neighborhood graph is characterized by comprising the following steps:
obtaining a multi-modal original sample, and performing minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimized residual value;
learning potential associations among the multi-modal original samples according to a collaborative matrix decomposition method, and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential associations;
calculating to obtain semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of a neighborhood map;
and calculating the minimized residual value, the semantic consistency among the modalities and the semantic consistency in the modalities by combining a regularization item for avoiding overfitting to obtain a target function.
2. The cross-modal retrieval method based on the hash algorithm and the neighborhood map as claimed in claim 1, wherein the obtaining of the multi-modal original sample is performed by performing minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain minimized residual values, and specifically comprises:
and obtaining a hash code set of a training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining a minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to a principle of error minimization.
3. The cross-modal retrieval method based on the hash algorithm and the neighborhood graph according to claim 1, wherein the potential associations between the multi-modal original samples are learned according to a collaborative matrix decomposition method, and semantic consistency between modalities of the multi-modal original samples is obtained by calculation according to the potential associations, specifically:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain semantic consistency among the modes.
4. The cross-modal retrieval method based on a hash algorithm and a neighborhood map as claimed in claim 3, wherein the semantic consistency in the modality of the multi-modal original sample is obtained by calculation using the popular learning of the neighborhood map, specifically:
constructing neighborhood graphs among the data in the same mode to represent the local relation of samples, and calculating according to the neighborhood graphs, the image basic feature matrix and the text basic feature image to obtain the semantic consistency in the mode of the multi-mode original samples.
5. The cross-modal search method based on a hashing algorithm and neighborhood map of claim 1, wherein the regularization terms comprise a regression coefficient matrix, a sample noise matrix, the image basis feature matrix, and the text basis feature matrix.
6. A cross-modal retrieval device based on a hash algorithm and a neighborhood graph is characterized by comprising:
the minimum processing module is used for acquiring a multi-modal original sample, and performing minimum processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimum residual value;
the first calculation module is used for learning potential associations among the multi-modal original samples according to a collaborative matrix decomposition method and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential associations;
the second calculation module is used for calculating and obtaining semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of the neighborhood map;
and the third calculation module is used for calculating the minimized residual value, the semantic consistency among the modalities and the semantic consistency in the modalities by combining a regularization term avoiding overfitting to obtain a target function.
7. The hash algorithm and neighborhood map based cross-modal retrieval device of claim 6, wherein the minimization process module is specifically configured to:
and obtaining a hash code set of a training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining a minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to a principle of error minimization.
8. The cross-modal retrieval device based on a hash algorithm and a neighborhood graph of claim 6, wherein the first computing module is specifically configured to:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain semantic consistency among the modes.
9. The cross-modal retrieval device based on a hash algorithm and a neighborhood graph of claim 8, wherein the second computing module is specifically configured to:
constructing neighborhood graphs among the data in the same mode to represent the local relation of samples, and calculating according to the neighborhood graphs, the image basic feature matrix and the text basic feature image to obtain the semantic consistency in the mode of the multi-mode original samples.
10. The cross-modal search method based on a hashing algorithm and neighborhood map of claim 1, wherein the regularization terms comprise a regression coefficient matrix, a sample noise matrix, the image basis feature matrix, and the text basis feature matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011224930.7A CN112199531B (en) | 2020-11-05 | 2020-11-05 | Cross-modal retrieval method and device based on hash algorithm and neighborhood graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011224930.7A CN112199531B (en) | 2020-11-05 | 2020-11-05 | Cross-modal retrieval method and device based on hash algorithm and neighborhood graph |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112199531A true CN112199531A (en) | 2021-01-08 |
CN112199531B CN112199531B (en) | 2024-05-17 |
Family
ID=74033344
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011224930.7A Active CN112199531B (en) | 2020-11-05 | 2020-11-05 | Cross-modal retrieval method and device based on hash algorithm and neighborhood graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112199531B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112836068A (en) * | 2021-03-24 | 2021-05-25 | 南京大学 | Unsupervised cross-modal Hash retrieval method based on noisy label learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110059198A (en) * | 2019-04-08 | 2019-07-26 | 浙江大学 | A kind of discrete Hash search method across modal data kept based on similitude |
CN111078952A (en) * | 2019-11-20 | 2020-04-28 | 重庆邮电大学 | Cross-modal variable-length Hash retrieval method based on hierarchical structure |
CN111382555A (en) * | 2020-03-19 | 2020-07-07 | 网易(杭州)网络有限公司 | Data processing method, medium, device and computing equipment |
CN111461157A (en) * | 2019-01-22 | 2020-07-28 | 大连理工大学 | Self-learning-based cross-modal Hash retrieval method |
CN111460077A (en) * | 2019-01-22 | 2020-07-28 | 大连理工大学 | Cross-modal Hash retrieval method based on class semantic guidance |
CN111753116A (en) * | 2019-05-20 | 2020-10-09 | 北京京东尚科信息技术有限公司 | Image retrieval method, device, equipment and readable storage medium |
-
2020
- 2020-11-05 CN CN202011224930.7A patent/CN112199531B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461157A (en) * | 2019-01-22 | 2020-07-28 | 大连理工大学 | Self-learning-based cross-modal Hash retrieval method |
CN111460077A (en) * | 2019-01-22 | 2020-07-28 | 大连理工大学 | Cross-modal Hash retrieval method based on class semantic guidance |
CN110059198A (en) * | 2019-04-08 | 2019-07-26 | 浙江大学 | A kind of discrete Hash search method across modal data kept based on similitude |
CN111753116A (en) * | 2019-05-20 | 2020-10-09 | 北京京东尚科信息技术有限公司 | Image retrieval method, device, equipment and readable storage medium |
CN111078952A (en) * | 2019-11-20 | 2020-04-28 | 重庆邮电大学 | Cross-modal variable-length Hash retrieval method based on hierarchical structure |
CN111382555A (en) * | 2020-03-19 | 2020-07-07 | 网易(杭州)网络有限公司 | Data processing method, medium, device and computing equipment |
Non-Patent Citations (1)
Title |
---|
LU JIN等: "Semantic Neighbor Graph Hashing for Multimodal Retrieval", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 27, no. 3, pages 1405 - 1417, XP011674891, DOI: 10.1109/TIP.2017.2776745 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112836068A (en) * | 2021-03-24 | 2021-05-25 | 南京大学 | Unsupervised cross-modal Hash retrieval method based on noisy label learning |
CN112836068B (en) * | 2021-03-24 | 2023-09-26 | 南京大学 | Unsupervised cross-modal hash retrieval method based on noisy tag learning |
Also Published As
Publication number | Publication date |
---|---|
CN112199531B (en) | 2024-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022068196A1 (en) | Cross-modal data processing method and device, storage medium, and electronic device | |
WO2022155994A1 (en) | Attention-based deep cross-modal hash retrieval method and apparatus, and related device | |
WO2023065617A1 (en) | Cross-modal retrieval system and method based on pre-training model and recall and ranking | |
CN116431847B (en) | Cross-modal hash retrieval method and device based on multiple contrast and double-way countermeasure | |
CN112199532A (en) | Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism | |
CN110866129A (en) | Cross-media retrieval method based on cross-media uniform characterization model | |
CN115775349A (en) | False news detection method and device based on multi-mode fusion | |
CN113822427A (en) | Model training method, image matching device and storage medium | |
CN115658934A (en) | Image-text cross-modal retrieval method based on multi-class attention mechanism | |
CN117093687A (en) | Question answering method and device, electronic equipment and storage medium | |
CN116933051A (en) | Multi-mode emotion recognition method and system for modal missing scene | |
CN112182273B (en) | Cross-modal retrieval method and system based on semantic constraint matrix decomposition hash | |
CN112199531A (en) | Cross-modal retrieval method and device based on Hash algorithm and neighborhood map | |
Perdana et al. | Instance-based deep transfer learning on cross-domain image captioning | |
CN116956128A (en) | Hypergraph-based multi-mode multi-label classification method and system | |
CN116737877A (en) | Cross-modal retrieval method and device based on attention network countermeasure hash | |
CN116756363A (en) | Strong-correlation non-supervision cross-modal retrieval method guided by information quantity | |
CN116958852A (en) | Video and text matching method and device, electronic equipment and storage medium | |
CN114330239A (en) | Text processing method and device, storage medium and electronic equipment | |
CN113641790A (en) | Cross-modal retrieval model based on distinguishing representation depth hash | |
Yang et al. | Automatic metadata information extraction from scientific literature using deep neural networks | |
CN115909317B (en) | Learning method and system for three-dimensional model-text joint expression | |
CN117830601B (en) | Three-dimensional visual positioning method, device, equipment and medium based on weak supervision | |
CN116825210B (en) | Hash retrieval method, system, equipment and medium based on multi-source biological data | |
CN118113815B (en) | Content searching method, related device and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |