CN112199531A - Cross-modal retrieval method and device based on Hash algorithm and neighborhood map - Google Patents

Cross-modal retrieval method and device based on Hash algorithm and neighborhood map Download PDF

Info

Publication number
CN112199531A
CN112199531A CN202011224930.7A CN202011224930A CN112199531A CN 112199531 A CN112199531 A CN 112199531A CN 202011224930 A CN202011224930 A CN 202011224930A CN 112199531 A CN112199531 A CN 112199531A
Authority
CN
China
Prior art keywords
modal
matrix
cross
semantic consistency
neighborhood
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011224930.7A
Other languages
Chinese (zh)
Other versions
CN112199531B (en
Inventor
杜翠凤
蒋仕宝
孙广波
朱春荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Jiesai Communication Planning And Design Institute Co ltd
GCI Science and Technology Co Ltd
Original Assignee
Guangzhou Jiesai Communication Planning And Design Institute Co ltd
GCI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Jiesai Communication Planning And Design Institute Co ltd, GCI Science and Technology Co Ltd filed Critical Guangzhou Jiesai Communication Planning And Design Institute Co ltd
Priority to CN202011224930.7A priority Critical patent/CN112199531B/en
Publication of CN112199531A publication Critical patent/CN112199531A/en
Application granted granted Critical
Publication of CN112199531B publication Critical patent/CN112199531B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a cross-modal retrieval method and a cross-modal retrieval device based on a Hash algorithm and a neighborhood map, wherein the retrieval method comprises the following steps: obtaining a multi-modal original sample, and performing minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain minimized residual values; learning potential correlation among the multi-modal original samples according to a collaborative matrix decomposition method, and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential correlation; calculating to obtain semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of the neighborhood map; and combining the minimized residual value, the semantic consistency among the modes and the semantic consistency in the modes with regularization calculation for avoiding overfitting to obtain the target function. According to the embodiment of the invention, the target function for cross-modal retrieval is obtained by calculating by comprehensively considering the global characteristics of multiple modes and the local characteristics among the modes, so that the comprehensiveness and the accuracy of the cross-modal retrieval are improved.

Description

Cross-modal retrieval method and device based on Hash algorithm and neighborhood map
Technical Field
The invention relates to the technical field of retrieval, in particular to a cross-modal retrieval method and a cross-modal retrieval device based on a hash algorithm and a neighborhood map.
Background
The rapid development of information technology brings about the explosive growth of multi-modal data, including multi-source heterogeneous data such as images, audio, text, video and the like. Since there are heterogeneous differences in semantic representations between modalities, efficient multi-modal retrieval is one of the key issues in current multi-modal fusion. In the prior art, the multi-modal retrieval is mostly realized by using a hash algorithm, the hash algorithm maps multi-modal data to a uniform potential space, and the alignment of the multi-modal space is realized by using a hash code obtained by quantizing a feature vector through a hash function. However, the applicant finds that, in research, the existing cross-modal retrieval method does not consider the similarity between samples in the same modality and the similarity between modalities, so that the cross-modal retrieval effect is poor.
Disclosure of Invention
The invention provides a cross-modal retrieval method and device based on a Hash algorithm and a neighborhood graph, and aims to solve the technical problem that the cross-modal retrieval effect is poor due to the fact that the similarity between samples in the same modality and the similarity between modalities are not considered in the conventional cross-modal retrieval method.
The first embodiment of the present invention provides a cross-modal retrieval method based on a hash algorithm and a neighborhood graph, including:
obtaining a multi-modal original sample, and performing minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimized residual value;
learning potential associations among the multi-modal original samples according to a collaborative matrix decomposition method, and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential associations;
calculating to obtain semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of a neighborhood map;
and combining the minimized residual value, the semantic consistency among the modes and the semantic consistency in the modes with regularization calculation for avoiding overfitting to obtain a target function.
Further, the obtaining of the multi-modal original sample performs minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimized residual value, and specifically includes:
and obtaining a hash code set of a training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining a minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to a principle of error minimization.
Further, the learning of the potential correlation between the multi-modal original samples according to the collaborative matrix decomposition method and the calculation of the semantic consistency between the modalities of the multi-modal original samples according to the potential correlation specifically include:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain semantic consistency among the modes.
Further, the calculating to obtain the semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of the neighborhood map specifically comprises:
constructing neighborhood graphs among the data in the same mode to represent the local relation of samples, and calculating according to the neighborhood graphs, the image basic feature matrix and the text basic feature image to obtain the semantic consistency in the mode of the multi-mode original samples.
Further, the regularization term includes a regression coefficient matrix, a sample noise matrix, the image basis feature matrix, and the text basis feature matrix.
A second embodiment of the present invention provides a cross-modal search apparatus based on a hash algorithm and a neighborhood map, including:
the minimum processing module is used for acquiring a multi-modal original sample, and performing minimum processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimum residual value;
the first calculation module is used for learning potential associations among the multi-modal original samples according to a collaborative matrix decomposition method and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential associations;
the second calculation module is used for calculating and obtaining semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of the neighborhood map;
and the third calculation module is used for combining the minimized residual value, the semantic consistency among the modalities and the semantic consistency in the modalities with regularization calculation for avoiding overfitting to obtain a target function.
Further, the minimization processing module is specifically configured to:
and obtaining a hash code set of a training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining a minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to a principle of error minimization.
Further, the first calculating module is specifically configured to:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain semantic consistency among the modes.
Further, the second calculation module is specifically configured to:
constructing neighborhood graphs among the data in the same mode to represent the local relation of samples, calculating according to the neighborhood graphs, the image basic feature matrix and the text basic feature image to obtain the semantic consistency in the mode of the multi-mode original samples,
further, the regularization term includes a regression coefficient matrix, a sample noise matrix, the image basis feature matrix, and the text basis feature matrix.
The embodiment of the invention combines the residual value before and after the minimum original sample transformation, the semantic consistency between the modes and the semantic consistency in the modes, and considers the global characteristics of the multiple modes and the local characteristics between the modes, calculates and obtains the target function for cross-mode retrieval, and realizes the improvement of the comprehensiveness and the accuracy of the cross-mode retrieval.
Drawings
Fig. 1 is a schematic flowchart of a cross-modal retrieval method based on a hash algorithm and a neighborhood map according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a cross-modal retrieval apparatus based on a hash algorithm and a neighborhood map according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description of the present application, it is to be understood that the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless otherwise specified.
In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
Referring to fig. 1, a first embodiment of the present invention provides a cross-modal retrieval method based on a hash algorithm and a neighborhood map, including:
s1, obtaining a multi-modal original sample, and performing minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimized residual value;
s2, learning potential correlation among the multi-modal original samples according to a collaborative matrix decomposition method, and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential correlation;
s3, calculating semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of the neighborhood map;
and S4, combining the minimized residual value, the semantic consistency among the modes and the semantic consistency in the modes with regularization calculation for avoiding overfitting to obtain a target function.
When the cross-modal retrieval is carried out, residual values before and after the original sample is converted are minimized, the semantic consistency among the modalities and the semantic consistency in the modalities are comprehensively considered, and the influence factors in the original data conversion process are comprehensively considered, so that the target function is calculated, and the comprehensiveness and the accuracy of the cross-modal retrieval are improved.
Specifically, the embodiment of the invention embodies the overall characteristics of multiple modes by minimizing the residual values before and after the original sample transformation, embodies the local characteristics between the modes by the consistency between the modes, embodies the local characteristics in the modes by the consistency in the modes, realizes the high-efficiency extraction of the overall characteristics of the multiple modes, and thus can effectively improve the accuracy and reliability of the cross-mode retrieval.
As a specific implementation manner of the embodiment of the present invention, a multi-modal original sample is obtained, and a residual value obtained before and after the multi-modal original sample is subjected to feature transformation is subjected to minimization processing to obtain a minimized residual value, which specifically is:
and obtaining a hash code set of the training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining the minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to the principle of error minimization.
Specifically, under the condition of giving a semantic label matrix L, setting the hash code corresponding to each sample as B, setting the hash code set of the training set as B, and according to the principle of error minimization, minimizing the residual value before and after the transformation of the original sample as:
Figure BDA0002763344240000051
wherein, W is a regression coefficient matrix obtained by hash learning, and L can be understood as a uniform potential semantic space between modalities. Assuming that a linear relation exists between original samples, learning and obtaining a hash code b for retrieval by using a linear regression mode, and further decomposing different modal data to obtain a uniform potential semantic space.
As a specific implementation manner of the embodiment of the present invention, the potential correlation between the multi-modal original samples is learned according to a collaborative matrix decomposition method, and the semantic consistency between the modalities of the multi-modal original samples is obtained according to the potential correlation calculation, which specifically includes:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain the semantic consistency among the modes.
Specifically, the semantic consistency between modalities is expressed according to the relevance between modalities, wherein the relevance obtained by learning the potential relevance between multimodal samples in a collaborative matrix decomposition mode is obtained by the modality facultative relevance, namely the process of hash function learning and feature extraction, and the semantic consistency between modalities is specifically expressed as follows:
Figure BDA0002763344240000061
wherein X is a multimodal original sample, UXFor a basic feature matrix of the image, UYAnd B is a text basic feature matrix and a Hash code set.
As a specific implementation manner of the embodiment of the present invention, the semantic consistency in the modality of the multi-modality original sample is calculated by adopting the popular learning of the neighborhood map, which specifically includes:
and constructing neighborhood graphs among the data in the same mode to represent the local relation of the samples, and calculating according to the neighborhood graphs, the basic feature matrix of the images and the basic feature images of the texts to obtain the semantic consistency in the mode of the multi-mode original samples.
It should be noted that the semantics within the modality are based on the assumption that the data is approximately extracted from the same underlying space. The method constructs a neighborhood graph S among the same modal data to represent the local relation of original samples, and obtains the semantic consistency in the modal of the multi-modal original samples through calculation and optimization according to the neighborhood graph, an image basic feature matrix and a text basic feature image, wherein the semantic consistency in the modal is specifically represented as follows:
Figure BDA0002763344240000062
wherein, therein
Figure BDA0002763344240000063
And
Figure BDA0002763344240000064
respectively representing a similarity matrix in the X-modality and a similarity matrix in the Y-modality.
As a specific implementation manner of the embodiment of the present invention, the regularization term includes a regression coefficient matrix, a sample noise matrix, an image basic feature matrix, and a text basic feature matrix.
In the embodiment of the invention, an objective function is established by introducing residual values before and after the transformation of a small original sample, semantic consistency between modes and semantic consistency in the modes and combining a regularization term:
Figure BDA0002763344240000071
the first term of the objective function is a minimum residual value obtained by semantic label learning, which is beneficial to obtaining a model with high discrimination, the second term and the third term are consistency among modalities, the fourth term and the fifth term are consistency in the modalities, the sixth term is a constraint term, which is a regularization term avoiding overfitting, wherein the regularization term comprises a regression coefficient matrix W and an image basic feature matrix UXAnd text basic feature matrix UYAnd a sample noise matrix E.
The embodiment of the invention has the following beneficial effects:
when the cross-modal retrieval is carried out, residual values before and after the original sample is converted are minimized, the semantic consistency among the modalities and the semantic consistency in the modalities are comprehensively considered, and the influence factors in the original data conversion process are comprehensively considered, so that the target function is calculated, and the comprehensiveness and the accuracy of the cross-modal retrieval are improved.
Specifically, the embodiment of the invention embodies the overall characteristics of multiple modes by minimizing the residual values before and after the original sample transformation, embodies the local characteristics between the modes by the consistency between the modes, embodies the local characteristics in the modes by the consistency in the modes, realizes the high-efficiency extraction of the overall characteristics of the multiple modes, and thus can effectively improve the accuracy and reliability of the cross-mode retrieval.
Referring to fig. 2, a second embodiment of the present invention provides a cross-modal search apparatus based on a hash algorithm and a neighborhood map, including:
the minimization processing module 10 is configured to obtain a multi-modal original sample, and perform minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimized residual value;
the first calculation module 20 is configured to learn potential associations among the multi-modal original samples according to a collaborative matrix decomposition method, and calculate semantic consistency among the modalities of the multi-modal original samples according to the potential associations;
the second calculation module 30 is configured to calculate semantic consistency in the modality of the multi-modality original sample by using the popular learning of the neighborhood map;
and the third calculation module 40 is configured to calculate a target function by combining the minimized residual value, the semantic consistency between the modalities, and the semantic consistency within the modalities with regularization that avoids overfitting.
When the cross-modal retrieval is carried out, residual values before and after the original sample is converted are minimized, the semantic consistency among the modalities and the semantic consistency in the modalities are comprehensively considered, and the influence factors in the original data conversion process are comprehensively considered, so that the target function is calculated, and the comprehensiveness and the accuracy of the cross-modal retrieval are improved.
Specifically, the embodiment of the invention embodies the overall characteristics of multiple modes by minimizing the residual values before and after the original sample transformation, embodies the local characteristics between the modes by the consistency between the modes, embodies the local characteristics in the modes by the consistency in the modes, realizes the high-efficiency extraction of the overall characteristics of the multiple modes, and thus can effectively improve the accuracy and reliability of the cross-mode retrieval.
As a specific implementation manner of the embodiment of the present invention, the minimization processing module 10 is specifically configured to:
and obtaining a hash code set of the training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining the minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to the principle of error minimization.
Specifically, under the condition of giving a semantic label matrix L, setting the hash code corresponding to each sample as B, setting the hash code set of the training set as B, and according to the principle of error minimization, minimizing the residual value before and after the transformation of the original sample as:
Figure BDA0002763344240000081
wherein, W is a regression coefficient matrix obtained by hash learning, and L can be understood as a uniform potential semantic space between modalities. Assuming that a linear relation exists between original samples, learning and obtaining a hash code b for retrieval by using a linear regression mode, and further decomposing different modal data to obtain a uniform potential semantic space.
As a specific implementation manner of the embodiment of the present invention, the first calculating module 20 is specifically configured to:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain the semantic consistency among the modes.
Specifically, the semantic consistency between modalities is expressed according to the relevance between modalities, wherein the relevance obtained by learning the potential relevance between multimodal samples in a collaborative matrix decomposition mode is obtained by the modality facultative relevance, namely the process of hash function learning and feature extraction, and the semantic consistency between modalities is specifically expressed as follows:
Figure BDA0002763344240000091
wherein X is a multimodal original sample, UXFor a basic feature matrix of the image, UYAnd B is a text basic feature matrix and a Hash code set.
As a specific implementation manner of the embodiment of the present invention, the second calculating module 30 is specifically configured to:
and constructing neighborhood graphs among the data in the same mode to represent the local relation of the samples, and calculating according to the neighborhood graphs, the basic feature matrix of the images and the basic feature images of the texts to obtain the semantic consistency in the mode of the multi-mode original samples.
It should be noted that the semantics within the modality are based on the assumption that the data is approximately extracted from the same underlying space. The method constructs a neighborhood graph S among the same modal data to represent the local relation of original samples, and obtains the semantic consistency in the modal of the multi-modal original samples through calculation and optimization according to the neighborhood graph, an image basic feature matrix and a text basic feature image, wherein the semantic consistency in the modal is specifically represented as follows:
Figure BDA0002763344240000092
wherein, therein
Figure BDA0002763344240000093
And
Figure BDA0002763344240000094
respectively representing a similarity matrix in the X-modality and a similarity matrix in the Y-modality.
As a specific implementation manner of the embodiment of the present invention, the regularization term includes a regression coefficient matrix, a sample noise matrix, an image basic feature matrix, and a text basic feature matrix.
In the embodiment of the invention, an objective function is established by introducing residual values before and after the transformation of a small original sample, semantic consistency between modes and semantic consistency in the modes and combining a regularization term:
Figure BDA0002763344240000101
the first term of the objective function is a minimum residual value obtained by semantic label learning, which is beneficial to obtaining a model with high discrimination, the second term and the third term are consistency among modalities, the fourth term and the fifth term are consistency in the modalities, the sixth term is a constraint term, which is a regularization term avoiding overfitting, wherein the regularization term comprises a regression coefficient matrix W and an image basic feature matrix UXAnd text basic feature matrix UYAnd a sample noise matrix E.
The embodiment of the invention has the following beneficial effects:
when the cross-modal retrieval is carried out, residual values before and after the original sample is converted are minimized, the semantic consistency among the modalities and the semantic consistency in the modalities are comprehensively considered, and the influence factors in the original data conversion process are comprehensively considered, so that the target function is calculated, and the comprehensiveness and the accuracy of the cross-modal retrieval are improved.
Specifically, the embodiment of the invention embodies the overall characteristics of multiple modes by minimizing the residual values before and after the original sample transformation, embodies the local characteristics between the modes by the consistency between the modes, embodies the local characteristics in the modes by the consistency in the modes, realizes the high-efficiency extraction of the overall characteristics of the multiple modes, and thus can effectively improve the accuracy and reliability of the cross-mode retrieval.
The foregoing is a preferred embodiment of the present invention, and it should be noted that it would be apparent to those skilled in the art that various modifications and enhancements can be made without departing from the principles of the invention, and such modifications and enhancements are also considered to be within the scope of the invention.

Claims (10)

1. A cross-modal retrieval method based on a hash algorithm and a neighborhood graph is characterized by comprising the following steps:
obtaining a multi-modal original sample, and performing minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimized residual value;
learning potential associations among the multi-modal original samples according to a collaborative matrix decomposition method, and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential associations;
calculating to obtain semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of a neighborhood map;
and calculating the minimized residual value, the semantic consistency among the modalities and the semantic consistency in the modalities by combining a regularization item for avoiding overfitting to obtain a target function.
2. The cross-modal retrieval method based on the hash algorithm and the neighborhood map as claimed in claim 1, wherein the obtaining of the multi-modal original sample is performed by performing minimization processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain minimized residual values, and specifically comprises:
and obtaining a hash code set of a training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining a minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to a principle of error minimization.
3. The cross-modal retrieval method based on the hash algorithm and the neighborhood graph according to claim 1, wherein the potential associations between the multi-modal original samples are learned according to a collaborative matrix decomposition method, and semantic consistency between modalities of the multi-modal original samples is obtained by calculation according to the potential associations, specifically:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain semantic consistency among the modes.
4. The cross-modal retrieval method based on a hash algorithm and a neighborhood map as claimed in claim 3, wherein the semantic consistency in the modality of the multi-modal original sample is obtained by calculation using the popular learning of the neighborhood map, specifically:
constructing neighborhood graphs among the data in the same mode to represent the local relation of samples, and calculating according to the neighborhood graphs, the image basic feature matrix and the text basic feature image to obtain the semantic consistency in the mode of the multi-mode original samples.
5. The cross-modal search method based on a hashing algorithm and neighborhood map of claim 1, wherein the regularization terms comprise a regression coefficient matrix, a sample noise matrix, the image basis feature matrix, and the text basis feature matrix.
6. A cross-modal retrieval device based on a hash algorithm and a neighborhood graph is characterized by comprising:
the minimum processing module is used for acquiring a multi-modal original sample, and performing minimum processing on residual values obtained before and after the multi-modal original sample is subjected to feature transformation to obtain a minimum residual value;
the first calculation module is used for learning potential associations among the multi-modal original samples according to a collaborative matrix decomposition method and calculating semantic consistency among the modalities of the multi-modal original samples according to the potential associations;
the second calculation module is used for calculating and obtaining semantic consistency in the modality of the multi-modality original sample by adopting the popular learning of the neighborhood map;
and the third calculation module is used for calculating the minimized residual value, the semantic consistency among the modalities and the semantic consistency in the modalities by combining a regularization term avoiding overfitting to obtain a target function.
7. The hash algorithm and neighborhood map based cross-modal retrieval device of claim 6, wherein the minimization process module is specifically configured to:
and obtaining a hash code set of a training set by setting a hash code corresponding to each sample in the multi-modal samples, and obtaining a minimized residual value of the multi-modal original sample by using the hash code set and a preset semantic label matrix according to a principle of error minimization.
8. The cross-modal retrieval device based on a hash algorithm and a neighborhood graph of claim 6, wherein the first computing module is specifically configured to:
performing feature extraction on the multi-modal original sample to obtain an image basic feature matrix and a text basic feature matrix; and calculating according to the image basic feature matrix and the text basic feature matrix to obtain semantic consistency among the modes.
9. The cross-modal retrieval device based on a hash algorithm and a neighborhood graph of claim 8, wherein the second computing module is specifically configured to:
constructing neighborhood graphs among the data in the same mode to represent the local relation of samples, and calculating according to the neighborhood graphs, the image basic feature matrix and the text basic feature image to obtain the semantic consistency in the mode of the multi-mode original samples.
10. The cross-modal search method based on a hashing algorithm and neighborhood map of claim 1, wherein the regularization terms comprise a regression coefficient matrix, a sample noise matrix, the image basis feature matrix, and the text basis feature matrix.
CN202011224930.7A 2020-11-05 2020-11-05 Cross-modal retrieval method and device based on hash algorithm and neighborhood graph Active CN112199531B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011224930.7A CN112199531B (en) 2020-11-05 2020-11-05 Cross-modal retrieval method and device based on hash algorithm and neighborhood graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011224930.7A CN112199531B (en) 2020-11-05 2020-11-05 Cross-modal retrieval method and device based on hash algorithm and neighborhood graph

Publications (2)

Publication Number Publication Date
CN112199531A true CN112199531A (en) 2021-01-08
CN112199531B CN112199531B (en) 2024-05-17

Family

ID=74033344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011224930.7A Active CN112199531B (en) 2020-11-05 2020-11-05 Cross-modal retrieval method and device based on hash algorithm and neighborhood graph

Country Status (1)

Country Link
CN (1) CN112199531B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836068A (en) * 2021-03-24 2021-05-25 南京大学 Unsupervised cross-modal Hash retrieval method based on noisy label learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059198A (en) * 2019-04-08 2019-07-26 浙江大学 A kind of discrete Hash search method across modal data kept based on similitude
CN111078952A (en) * 2019-11-20 2020-04-28 重庆邮电大学 Cross-modal variable-length Hash retrieval method based on hierarchical structure
CN111382555A (en) * 2020-03-19 2020-07-07 网易(杭州)网络有限公司 Data processing method, medium, device and computing equipment
CN111461157A (en) * 2019-01-22 2020-07-28 大连理工大学 Self-learning-based cross-modal Hash retrieval method
CN111460077A (en) * 2019-01-22 2020-07-28 大连理工大学 Cross-modal Hash retrieval method based on class semantic guidance
CN111753116A (en) * 2019-05-20 2020-10-09 北京京东尚科信息技术有限公司 Image retrieval method, device, equipment and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461157A (en) * 2019-01-22 2020-07-28 大连理工大学 Self-learning-based cross-modal Hash retrieval method
CN111460077A (en) * 2019-01-22 2020-07-28 大连理工大学 Cross-modal Hash retrieval method based on class semantic guidance
CN110059198A (en) * 2019-04-08 2019-07-26 浙江大学 A kind of discrete Hash search method across modal data kept based on similitude
CN111753116A (en) * 2019-05-20 2020-10-09 北京京东尚科信息技术有限公司 Image retrieval method, device, equipment and readable storage medium
CN111078952A (en) * 2019-11-20 2020-04-28 重庆邮电大学 Cross-modal variable-length Hash retrieval method based on hierarchical structure
CN111382555A (en) * 2020-03-19 2020-07-07 网易(杭州)网络有限公司 Data processing method, medium, device and computing equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LU JIN等: "Semantic Neighbor Graph Hashing for Multimodal Retrieval", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 27, no. 3, pages 1405 - 1417, XP011674891, DOI: 10.1109/TIP.2017.2776745 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836068A (en) * 2021-03-24 2021-05-25 南京大学 Unsupervised cross-modal Hash retrieval method based on noisy label learning
CN112836068B (en) * 2021-03-24 2023-09-26 南京大学 Unsupervised cross-modal hash retrieval method based on noisy tag learning

Also Published As

Publication number Publication date
CN112199531B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
WO2022068196A1 (en) Cross-modal data processing method and device, storage medium, and electronic device
WO2022155994A1 (en) Attention-based deep cross-modal hash retrieval method and apparatus, and related device
WO2023065617A1 (en) Cross-modal retrieval system and method based on pre-training model and recall and ranking
CN116431847B (en) Cross-modal hash retrieval method and device based on multiple contrast and double-way countermeasure
CN112199532A (en) Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism
CN110866129A (en) Cross-media retrieval method based on cross-media uniform characterization model
CN115775349A (en) False news detection method and device based on multi-mode fusion
CN113822427A (en) Model training method, image matching device and storage medium
CN115658934A (en) Image-text cross-modal retrieval method based on multi-class attention mechanism
CN117093687A (en) Question answering method and device, electronic equipment and storage medium
CN116933051A (en) Multi-mode emotion recognition method and system for modal missing scene
CN112182273B (en) Cross-modal retrieval method and system based on semantic constraint matrix decomposition hash
CN112199531A (en) Cross-modal retrieval method and device based on Hash algorithm and neighborhood map
Perdana et al. Instance-based deep transfer learning on cross-domain image captioning
CN116956128A (en) Hypergraph-based multi-mode multi-label classification method and system
CN116737877A (en) Cross-modal retrieval method and device based on attention network countermeasure hash
CN116756363A (en) Strong-correlation non-supervision cross-modal retrieval method guided by information quantity
CN116958852A (en) Video and text matching method and device, electronic equipment and storage medium
CN114330239A (en) Text processing method and device, storage medium and electronic equipment
CN113641790A (en) Cross-modal retrieval model based on distinguishing representation depth hash
Yang et al. Automatic metadata information extraction from scientific literature using deep neural networks
CN115909317B (en) Learning method and system for three-dimensional model-text joint expression
CN117830601B (en) Three-dimensional visual positioning method, device, equipment and medium based on weak supervision
CN116825210B (en) Hash retrieval method, system, equipment and medium based on multi-source biological data
CN118113815B (en) Content searching method, related device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant