CN113270199A - Medical cross-modal multi-scale fusion class guidance hash method and system thereof - Google Patents

Medical cross-modal multi-scale fusion class guidance hash method and system thereof Download PDF

Info

Publication number
CN113270199A
CN113270199A CN202110483387.0A CN202110483387A CN113270199A CN 113270199 A CN113270199 A CN 113270199A CN 202110483387 A CN202110483387 A CN 202110483387A CN 113270199 A CN113270199 A CN 113270199A
Authority
CN
China
Prior art keywords
hash
network
class
text
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110483387.0A
Other languages
Chinese (zh)
Other versions
CN113270199B (en
Inventor
欧卫华
张勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Education University
Original Assignee
Guizhou Education University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Education University filed Critical Guizhou Education University
Priority to CN202110483387.0A priority Critical patent/CN113270199B/en
Publication of CN113270199A publication Critical patent/CN113270199A/en
Application granted granted Critical
Publication of CN113270199B publication Critical patent/CN113270199B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Primary Health Care (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Radiology & Medical Imaging (AREA)
  • Pathology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Library & Information Science (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a medical cross-modal multi-scale fusion class guidance hash method and a medical cross-modal multi-scale fusion class guidance hash system. A number of experiments on the medical data set MIMIC-CXR have shown that this approach outperforms the existing baseline in the cross-modality retrieval task.

Description

Medical cross-modal multi-scale fusion class guidance hash method and system thereof
Technical Field
The invention belongs to the field of cross-modal retrieval, and particularly relates to a medical cross-modal multi-scale fusion class guidance hash method and system.
Background
With the rapid development of medical technology, a large amount of medical data such as radiology reports, CT images, PET images, X-ray images, and the like are generated. Although they differ in form, they have similar semantics. Recently, many single modality methods have been proposed to separately understand these data, such as medical image segmentation, medical image classification and content-based medical image retrieval. Although much work has been done on clinical imaging, other morphological data of medical data, such as radiology reports, have been overlooked. In order to enable physicians to obtain comprehensive information about queries, retrieve semantically similar clinical profiles in different modalities, and provide diagnostic results according to their previous medical recommendations, a medical cross-modality retrieval is proposed, i.e. using an instance of one modality (e.g. an x-ray image) to retrieve an instance of another modality (e.g. a radiology report) with similar semantics.
Hashing is applied to cross-modal retrieval due to its high retrieval rate and low storage cost. Existing cross-modal hashing methods are generally divided into three categories: unsupervised, semi-supervised and supervised methods. Generally, while some tags may be damaged and inaccurate, tag information is useful for learning more discriminative features. Therefore, supervised cross-modal hashing methods can generally achieve better retrieval performance.
With the remarkable progress of deep learning, the deep neural network shows potential capability in cross-modal retrieval. For example, jiang et al propose depth cross-modal hashing (DCMH), which is an end-to-end framework that can learn depth features and hash functions simultaneously. Deep Visual Semantic Hashing (DVSH) uses a Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) to learn the hash code for each modality. Lie et al propose a self-supervised antagonistic hash network (SSAH) to design a self-supervised semantic network incorporating antagonistic learning to explore the semantic relationships between different modalities. Compared with a manual characteristic cross-modal retrieval method, the deep cross-modal retrieval performance is greatly improved.
However, the cross-modal retrieval methods described above all rely on semantic similarity matrices to supervise the generation of hash codes. Specifically, two data are defined to be similar if their respective tags share at least one common category, and are otherwise dissimilar. However, it is obvious that the definition omits rich semantic information and cannot well retain semantic structure information. Meanwhile, different modal data sharing the same semantics are embedded into a uniform hash code by the cross-modal retrieval method, and error codes are inevitably generated due to inherent modal differences and noise.
Based on the current situation, a medical cross-modal multi-scale fusion class-guided hashing (MCMFCH) method and a system thereof are provided.
Disclosure of Invention
Technical problem to be solved
The invention aims to provide a medical cross-modal multi-scale fusion class guidance hash method and a medical cross-modal multi-scale fusion class guidance hash system. Meanwhile, the combined network is utilized to guide the learning of the Hash codes of the images and the texts, so that modal semantics are mutually associated, and the semantic correlation among the modalities is favorably improved.
(II) technical scheme
In order to achieve the above purpose, the invention adopts the following technical scheme:
a medical cross-modal multi-scale fusion class guidance hash method comprises the following specific steps:
s1, inputting category semantics, and establishing a category hash network for learning hash codes of various categories;
s2, inputting data of different modes, establishing an image network and a text network so as to obtain the characteristics and hash codes of the modes, and combining the image text to generate a combined hash code;
s3, representing labels by class hash codes as supervision information to train hash codes of images, texts and joint networks;
s4, federated network to guide the learning of hash codes for images and text.
Further, the model of the class hash network in S1 is:
Figure BDA0003050017500000031
s.t.pi=sgn(H(c))=sgn(fc(ci;θc))
wherein α is a hyperparameter; 1 is a vector with all elements 1; sgn (.) is a sign function; wherein p isiRepresents a category ciThe learned hash code. Finally, the Hash code is obtained
Figure BDA0003050017500000032
Further, in S2, establishing an image hash network and a text hash network to obtain features and hash codes of each modality, and generating a joint hash code by the joint hash network specifically includes the following steps:
s2.1, an image hash network, wherein in order to obtain high-resolution and high-semantic medical image features, a deep convolutional network (VGG) is combined with a target pyramid network (FPN) to obtain image multi-scale features, and the image multi-scale features are called as a VFPN multi-scale network; the network fuses high resolution and weak semantic features and low resolution and strong semantic features to obtain high resolution and strong semantic features fx(x;θx) (ii) a In addition, three full-connection layers are added as hash functions to convert the characteristics fx(x;θx) Conversion into binary code H(x)=fx(x;θx)∈{-1,1}k(ii) a Wherein the first two fully connected layers are the same as the last two layers of the VGG, and the third fully connected layer has k hidden units, using the tanh (-) function as the activation function. Finally pass through Bx=sgn(Hx)∈{-1,1}kObtaining a hash code of an image mode; where k is the length of the hash code;
s2.2, text Hash network, using the baseA text network multi-scale fusion model for self-supervised secure Hash Cross-modal retrieval (SSAH); firstly, extracting a plurality of scale features from text data by using 5 average pooling layers of 1 × 1, 1 × 2, 1 × 3, 1 × 6 and 1 × 10, and then fusing the features by using one 1 × 1 convolutional layer; then, obtaining the multi-scale text semantic feature f by utilizing the processes of size adjustment and connectiony(y;θy) (ii) a The fusion features are sent into a three-layer feedforward neural network to be used as a Hash function to convert the features f into the features fy(y;θy) Conversion into binary code H(y)=fy(y;θy)∈{-1,1}k(ii) a Finally pass through By=sgn(Hy)∈{-1,1}kObtaining a hash code of a text mode;
s2.3, combining the Hash network, wherein the network uses the image multi-scale feature f generated by the VFPN multi-scale network in the image networkx(x;θx) And multi-scale fusion features f in texty(y;θy) Of (a) intersection fu(u;θu) Is input; intersection feature fu(u;θu) Is fed into a three-layer feedforward neural network as a hash function to convert the features into binary codes H(u)=fu(u;θu)∈{-1,1}k(ii) a Finally pass through Bu=sgn(Hu)∈{-1,1}kObtaining a hash code of the union network;
further, in the step S3, the step of monitoring the learning of the modal hash codes according to the class hash codes includes:
s3.1, cross-modal similarity and rich semantic structure information are kept through the Hamming distance,
Figure BDA0003050017500000043
and belong to class ciShould be smaller than not belonging to class ciThe hamming distance between the hash codes is modeled as:
Figure BDA0003050017500000041
Figure BDA0003050017500000042
wherein x represents x, y, u image, text and union modality; mu epsilon [0,1]Is a predefined margin, k is the hash code length; eiAs data pointsiIndex set of the class to which it belongs, i.e. label vector liIndex of middle element 1; qi={1,…,c}-EiAs data pointsiIndex set of categories not belonging to, i.e. label vector liAn index of the middle element "0";
Figure BDA0003050017500000051
is that
Figure BDA0003050017500000052
And peThe hamming distance of;
Figure BDA0003050017500000053
is that
Figure BDA0003050017500000054
Should be equal to the average of the similar class hash codes of
Figure BDA0003050017500000055
Similarly; furthermore, if
Figure BDA0003050017500000056
Class hash code corresponding to the same
Figure BDA0003050017500000057
Ratio { p }q|q∈QiThe class hash codes in the data are more similar, then
Figure BDA0003050017500000058
The semantic similarity and the semantic structure information are well kept at the same time;
S3.2、
Figure BDA0003050017500000059
the loss of each mode can be supervised and generated by a class hash code P, wherein the loss of each mode is as follows:
Figure BDA00030500175000000510
Figure BDA00030500175000000511
Figure BDA00030500175000000512
wherein λ is a hyperparameter; x, y, u images, text, and union modality;
Figure BDA00030500175000000513
is that
Figure BDA00030500175000000514
Average value of similar category hash codes of (1); p is a radical ofqIs that
Figure BDA00030500175000000515
Is different from the similar class hash code.
Further, in S4, a joint network is used to guide the learning of hash codes of images and texts, and the specific model is as follows:
Figure BDA00030500175000000516
wherein
Figure BDA00030500175000000517
Respectively, hash codes for the federated network, the image, and the text.
A retrieval model based on a medical cross-modal multi-scale fusion class guidance hash method is generated by adopting the medical cross-modal multi-scale fusion class guidance hash method, and the retrieval model is as follows:
Figure BDA0003050017500000061
Figure BDA0003050017500000062
Figure BDA0003050017500000063
wherein gamma and eta are hyper-parameters; x, y, u images, text, and union modality;
Figure BDA0003050017500000064
is that
Figure BDA0003050017500000065
Average value of similar category hash codes of (1); p is a radical ofqIs that
Figure BDA0003050017500000066
The dissimilar class hash codes of (1);
Figure BDA0003050017500000067
respectively, hash codes for the federated network, the image, and the text.
A retrieval system based on medical cross-modal multi-scale fusion class guidance hash method comprises the following steps:
the input module I is used for inputting category semantics;
the first characteristic processing module is used for establishing a category hash network to learn hash codes of various categories;
the input module II is used for inputting data of different modes;
the second characteristic processing module is used for establishing an image network and a text network to obtain characteristics and hash codes of each mode, and generating a combined hash code by combining the image text characteristics;
the learning training module is used for training images, texts and hash codes of a joint network by using class hash codes to represent labels as monitoring information, and simultaneously the joint network is used for guiding the learning of the hash codes of the images and the texts and searching;
and the output module is used for outputting the retrieval result.
(III) advantageous effects
Compared with the prior art, the method obtains the mode specific representation of each mode by using multi-scale fusion, and guides the learning of the hash code of each mode by using class hash. Experiments on two data sets simultaneously show that the method has better retrieval performance.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of an algorithm architecture proposed by the method of the present invention;
FIG. 3 shows the first 10 search results on the MIMIC-CXR dataset for CCA, DCMH and the method of the present invention;
fig. 4 is a schematic structural diagram of a cross-modal retrieval system according to an embodiment of the present invention.
Detailed Description
As shown in FIG. 1, the invention provides a medical cross-modal multi-scale fusion class guidance hashing method, and a corresponding system is designed according to the method.
The medical cross-modal multi-scale fusion category guidance hashing method comprises the following specific steps:
s1, inputting category semantics, and establishing a category hash network for learning hash codes of various categories;
s2, inputting data of different modes, establishing an image network and a text network so as to obtain the characteristics and hash codes of the modes, and combining the image text to generate a combined hash code;
s3, representing labels by class hash codes as supervision information to train hash codes of images, texts and joint networks;
s4, federated network to guide the learning of hash codes for images and text.
The class hash network is used to generate the hash codes of the classes, so that the learned class hash codes can represent the labels, and the model of the class hash network in S1, that is, the objective function, is as follows:
Figure BDA0003050017500000081
s.t.pi=sgn(H(c))=sgn(fc(ci;θc))
wherein α is a hyperparameter; 1 is a vector with all elements 1; sgn (.) is a sign function; wherein p isiRepresents a category ciThe learned hash code. Finally, the Hash code is obtained
Figure BDA0003050017500000082
In S2, the image hash network, the text hash network, and the joint hash network learn features and hash codes of different modalities, and the specific implementation process is as follows:
s2.1, image hash network, firstly, a deep convolutional network (VGG) is combined with a target pyramid network (FPN) to obtain image multi-scale features, and the image multi-scale features are called as VFPN multi-scale networks. The network fuses high resolution and weak semantic features and low resolution and strong semantic features to obtain high resolution and strong semantic features, namely the highest resolution and strong semantic feature fx(x;θx). Furthermore, the addition of the first two fully connected layers is the same as the last two layers of the VGG. The third fully-connected layer has k hidden units, using the tanh (-) function as the activation function. These three layers use the feature f as a hash functionx(x;θx) Conversion into binary code H(x)=fx(x;θx)∈{-1,1}k. Then, we pass Bx=sgn(Hx)∈{-1,1}kA hash code of the image modality is obtained, where k is the length of the hash code.
S2.2, the text Hash network adopts a text network multi-scale fusion model based on cross-modal retrieval of self-supervision countermeasure Hash (SSAH). The multi-scale fusion model comprises 5 average pooling layers of 1 × 1, 1 × 2, 1 × 3, 1 × 6 and 1 × 10 for extracting text dataMultiple scale features and a 1 x 1 convolutional layer fuse the multiple features. Then, obtaining the multi-scale text semantic feature f by utilizing the processes of size adjustment and connectiony(y;θy). The fusion features are sent into a three-layer feedforward neural network to be used as a Hash function to convert the features f into the features fy(y;θy) Conversion into binary code H (y)=fy(y;θy)∈{-1,1}k. Then, we pass By=sgn(Hy)∈{-1,1}kA hash code of the textual modality is obtained.
S2.3, combining the Hash network, wherein the network uses the image multi-scale feature f generated by the VFPN multi-scale network in the image networkx(x;θx) And multi-scale fusion features f in texty(y;θy) Of (a) intersection fu(u;θu)=concat(fx(x;θx),fy(y;θy) ) is input. Intersection feature fu(u;θu) Is fed into a three-layer feedforward neural network as a hash function to convert the features into binary codes H(u)=fu(u;θu)∈{-1,1}k. Then, we pass Bu=sgn(Hu)∈{-1,1}kA hash code for the federated network is obtained.
In S3, the step of monitoring the learning of the modal hash codes according to the class hash codes means that the following steps are adopted:
s3.1, cross-modal similarity and rich semantic structure information are kept through the Hamming distance,
Figure BDA0003050017500000091
and belong to class ciShould be smaller than not belonging to class ciThe hamming distance between the hash codes is modeled as:
Figure BDA0003050017500000092
Figure BDA0003050017500000093
wherein x represents x, y, u image, text and union modality; mu epsilon [0,1]Is a predefined margin, k is the hash code length; eiAs data pointsiIndex set of the class to which it belongs, i.e. label vector liIndex of middle element 1; qi={1,…,c}-EiAs data pointsiIndex set of categories not belonging to, i.e. label vector liAn index of the middle element "0";
Figure BDA0003050017500000094
is that
Figure BDA0003050017500000095
And peThe hamming distance of;
Figure BDA0003050017500000096
is that
Figure BDA0003050017500000097
Should be equal to the average of the similar class hash codes of
Figure BDA0003050017500000098
Similarly; furthermore, if
Figure BDA0003050017500000099
Class hash code corresponding to the same
Figure BDA00030500175000000910
Ratio { p }q|q∈QiThe class hash codes in the data are more similar, then
Figure BDA00030500175000000911
The semantic similarity and the semantic structure information are well kept at the same time;
S3.2、
Figure BDA0003050017500000101
the loss of each mode can be supervised and generated by a class hash code P, wherein the loss of each mode is as follows:
Figure BDA0003050017500000102
Figure BDA0003050017500000103
Figure BDA0003050017500000104
wherein λ is a hyperparameter; x, y, u images, text, and union modality;
Figure BDA0003050017500000105
is that
Figure BDA0003050017500000106
Average value of similar category hash codes of (1); p is a radical ofqIs that
Figure BDA0003050017500000107
Is different from the similar class hash code.
Guiding each mode to generate a hash code by utilizing a category network, wherein the hash objective function is as follows:
Figure BDA0003050017500000108
Figure BDA0003050017500000109
Figure BDA00030500175000001010
wherein λ is a hyperparameter; x, y, u images, text, and union modality;
Figure BDA00030500175000001011
is that
Figure BDA00030500175000001012
Average value of similar category hash codes of (1); p is a radical ofqIs that
Figure BDA00030500175000001013
Is different from the similar class hash code. The cross-modal similarity and rich semantic structure information can be well maintained through the hash code learned by the model provided by the embodiment.
In S4, the hash code generation and learning of the image and the text are guided by using a joint hash network, so as to improve the correlation of modalities, that is:
Figure BDA00030500175000001014
wherein
Figure BDA00030500175000001015
Respectively, hash codes for the federated network, the image, and the text.
Combining the above functions, a retrieval model based on medical cross-modal multi-scale fusion class guidance hash method is disclosed, the retrieval model is:
Figure BDA0003050017500000111
Figure BDA0003050017500000112
Figure BDA0003050017500000113
wherein gamma and eta are hyper-parameters; x, y, u images, text, and union modality;
Figure BDA0003050017500000114
is that
Figure BDA0003050017500000115
Average value of similar category hash codes of (1); p is a radical ofqIs that
Figure BDA0003050017500000116
The dissimilar class hash codes of (1);
Figure BDA0003050017500000117
respectively, hash codes for the federated network, the image, and the text.
In order to verify the superiority of the method in cross-modal retrieval, MIMIC-CXR on a public medical data set is selected for experiment, mAP is adopted for cross-modal retrieval evaluation, and a retrieval result of Top-10 is also displayed; in the experiment, the training of the method of the embodiment is performed for 5 times, the average value is taken as the final result, and the parameters are set as follows: α ═ 0.05, β ═ 0.01, λ ═ 0.3, γ ═ 0.3, η ═ 0.3, and μ ═ 0.3
Table 1: mAP values on MIMIC-CXR datasets
Figure BDA0003050017500000118
Figure BDA0003050017500000121
(1) Analysis of results of mAP values on two public data sets
The method of this embodiment is compared with the existing 7 cross-modal retrieval methods, namely CCA, CMSSH, SCM, STMH, CMFH, SePH, DCMH. Compared experiments are carried out on two data sets in all the methods, as shown in the table above, the mAP value of the method is higher than that of other compared experiment methods, the feasibility of the method of substituting the category hash for the semantic similarity matrix is shown, and the combined semantics is beneficial to improving the semantic relevance.
(2) Comparative analysis of Top-10 search results
As shown in fig. 3, there are multiple failure cases of CCA and DCMH methods, and this embodiment compares that although our method is unsuccessful in both task image retrieval text and text retrieval image, respectively, the ranking is earlier, and the retrieval result is intuitively semantically related to the query.
As shown in fig. 4, a retrieval system based on medical cross-modal multi-scale fusion class guidance hash method includes:
the input module I1 is used for inputting category semantics;
the characteristic processing module I2 is used for establishing a category hash network to learn hash codes of various categories;
the input module II 3 is used for inputting data of different modes;
the second feature processing module 4 is used for establishing an image network and a text network to obtain features and hash codes of each modality, and generating a combined hash code by combining the image and text features;
the learning training module 5 is used for representing the label as monitoring information by utilizing the class Hash code to train the Hash codes of the image, the text and the combined network, and simultaneously, the combined network is used for guiding the learning of the Hash codes of the image and the text and searching;
and the output module 6 is used for outputting the retrieval result.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any modification and replacement based on the technical solution and inventive concept provided by the present invention should be covered within the scope of the present invention.

Claims (7)

1. The medical cross-modal multi-scale fusion class guidance hash method is characterized by comprising the following steps: the method comprises the following specific steps:
s1, inputting category semantics, and establishing a category hash network for learning hash codes of various categories;
s2, inputting data of different modes, establishing an image network and a text network so as to obtain the characteristics and hash codes of the modes, and combining the image text to generate a combined hash code;
s3, representing labels by class hash codes as supervision information to train hash codes of images, texts and joint networks;
s4, federated network to guide the learning of hash codes for images and text.
2. The medical cross-modal multi-scale fusion class-guided hashing method according to claim 1, wherein: the model of the class hash network in S1 is:
Figure FDA0003050017490000011
s.t.pi=sgn(H(c))=sgn(fc(ci;θc))
wherein α is a hyperparameter; 1 is a vector with all elements 1; sgn (.) is a sign function; wherein p isiRepresents a category ciA learned hash code; finally, the Hash code is obtained
Figure FDA0003050017490000012
3. The medical cross-modal multi-scale fusion class-guided hashing method according to claim 1, wherein: in S2, an image hash network and a text hash network are established to obtain features and hash codes of each modality, and a joint hash network generates a joint hash code, which is specifically implemented by the following steps:
s2.1, an image hash network, wherein in order to obtain high-resolution and high-semantic medical image features, a deep convolutional network (VGG) is combined with a target pyramid network (FPN) to obtain image multi-scale features, and the image multi-scale features are called as a VFPN multi-scale network; the network fuses high resolution and weak semantic features and low resolution and strong semantic features to obtain high resolution and strong semantic features fx(x;θx) (ii) a In addition, three full-connection layers are added as hash functions to convert the characteristics fx(x;θx) Conversion into binary code H(x)=fx(x;θx)∈{-1,1}k(ii) a Wherein the first two are fully connectedThe layers are identical to the last two layers of the VGG, and the third fully connected layer has k hidden units, using the tanh (-) function as the activation function. Finally pass through Bx=sgn(Hx)∈{-1,1}kObtaining a hash code of an image mode; where k is the length of the hash code;
s2.2, adopting a text hash network multi-scale fusion model based on self-supervision anti-hash cross-modal retrieval (SSAH); firstly, extracting a plurality of scale features from text data by using 5 average pooling layers of 1 × 1, 1 × 2, 1 × 3, 1 × 6 and 1 × 10, and then fusing the features by using one 1 × 1 convolutional layer; then, obtaining the multi-scale text semantic feature f by utilizing the processes of size adjustment and connectiony(y;θy) (ii) a The fusion features are sent into a three-layer feedforward neural network to be used as a Hash function to convert the features f into the features fy(y;θy) Conversion into binary code H(y)=fy(y;θy)∈{-1,1}k(ii) a Finally pass through By=sgn(Hy)∈{-1,1}kObtaining a hash code of a text mode;
s2.3, combining the Hash network, wherein the network uses the image multi-scale feature f generated by the VFPN multi-scale network in the image networkx(x;θx) And multi-scale fusion features f in texty(y;θy) Of (a) intersection fu(u;θu) Is input; intersection feature fu(u;θu) Is fed into a three-layer feedforward neural network as a hash function to convert the features into binary codes H(u)=fu(u;θu)∈{-1,1}k(ii) a Finally pass through Bu=sgn(Hu)∈{-1,1}kA hash code for the federated network is obtained.
4. The medical cross-modal multi-scale fusion class-guided hashing method according to claim 1, wherein: in S3, the step of monitoring the learning of the modal hash codes according to the class hash codes includes:
s3.1, cross-modal similarity and rich semantic structure information are kept through the Hamming distance,
Figure FDA0003050017490000021
and belong to class ciShould be smaller than not belonging to class ciThe hamming distance between the hash codes is modeled as:
Figure FDA0003050017490000031
wherein x represents x, y, u image, text and union modality; mu epsilon [0,1]Is a predefined margin, k is the hash code length; eiAs data pointsiIndex set of the class to which it belongs, i.e. label vector liIndex of middle element 1; qi={1,...,c}-EiAs data pointsiIndex set of categories not belonging to, i.e. label vector liAn index of the middle element "0";
Figure FDA0003050017490000032
is that
Figure FDA0003050017490000033
And peThe hamming distance of;
Figure FDA0003050017490000034
is that
Figure FDA0003050017490000035
Should be equal to the average of the similar class hash codes of
Figure FDA0003050017490000036
Similarly; furthermore, if
Figure FDA0003050017490000037
Class hash code corresponding to the same
Figure FDA0003050017490000038
Ratio { p }q|q∈QiThe class hash codes in the data are more similar, then
Figure FDA0003050017490000039
The semantic similarity and the semantic structure information are well kept at the same time;
S3.2、
Figure FDA00030500174900000310
the loss of each mode can be supervised and generated by a class hash code P, wherein the loss of each mode is as follows:
Figure FDA00030500174900000311
Figure FDA00030500174900000312
Figure FDA00030500174900000313
wherein λ is a hyperparameter; x, y, u images, text, and union modality;
Figure FDA00030500174900000314
is that
Figure FDA00030500174900000315
Average value of similar category hash codes of (1); p is a radical ofqIs that
Figure FDA00030500174900000316
Is different from the similar class hash code.
5. The medical cross-modal multi-scale fusion class-guided hashing method according to claim 1, wherein: in S4, a joint network is used to guide the learning of hash codes of images and texts, and the specific model is as follows:
Figure FDA00030500174900000317
wherein
Figure FDA0003050017490000041
Respectively, hash codes for the federated network, the image, and the text.
6. The retrieval model based on the medical cross-modal multi-scale fusion class guidance hash method is characterized in that: the retrieval model is generated by adopting the medical cross-modal multi-scale fusion class guidance hash method of claim 1, and the retrieval model is as follows:
Figure FDA0003050017490000042
Figure FDA0003050017490000043
Figure FDA0003050017490000044
wherein gamma and eta are hyper-parameters; x, y, u images, text, and union modality;
Figure FDA0003050017490000045
is that
Figure FDA0003050017490000046
Average value of similar category hash codes of (1); p is a radical ofqIs that
Figure FDA0003050017490000047
The dissimilar class hash codes of (1);
Figure FDA0003050017490000048
respectively, hash codes for the federated network, the image, and the text.
7. The retrieval system based on the medical cross-modal multi-scale fusion class guidance hash method is characterized by comprising the following steps:
the input module I (1) is used for inputting category semantics;
the characteristic processing module I (2) is used for establishing a class hash network to learn hash codes of various classes;
the input module II (3) is used for inputting data of different modes;
the second feature processing module (4) is used for establishing an image network and a text network to obtain features and hash codes of all modalities, and combining the image and text features to generate a combined hash code;
the learning training module (5) is used for representing the label as monitoring information by utilizing the class Hash code to train the Hash codes of the image, the text and the combined network, and simultaneously the combined network is used for guiding the learning of the Hash codes of the image and the text and searching;
and the output module (6) is used for outputting the retrieval result.
CN202110483387.0A 2021-04-30 2021-04-30 Medical cross-mode multi-scale fusion class guide hash method and system thereof Active CN113270199B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110483387.0A CN113270199B (en) 2021-04-30 2021-04-30 Medical cross-mode multi-scale fusion class guide hash method and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110483387.0A CN113270199B (en) 2021-04-30 2021-04-30 Medical cross-mode multi-scale fusion class guide hash method and system thereof

Publications (2)

Publication Number Publication Date
CN113270199A true CN113270199A (en) 2021-08-17
CN113270199B CN113270199B (en) 2024-04-26

Family

ID=77229860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110483387.0A Active CN113270199B (en) 2021-04-30 2021-04-30 Medical cross-mode multi-scale fusion class guide hash method and system thereof

Country Status (1)

Country Link
CN (1) CN113270199B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704537A (en) * 2021-10-28 2021-11-26 南京码极客科技有限公司 Fine-grained cross-media retrieval method based on multi-scale feature union
CN117112829A (en) * 2023-10-24 2023-11-24 吉林大学 Medical data cross-modal retrieval method and device and related equipment
WO2024087218A1 (en) * 2022-10-28 2024-05-02 深圳先进技术研究院 Cross-modal medical image generation method and apparatus

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017092183A1 (en) * 2015-12-03 2017-06-08 中山大学 Image retrieval method based on variable-length deep hash learning
CN109299216A (en) * 2018-10-29 2019-02-01 山东师范大学 A kind of cross-module state Hash search method and system merging supervision message
CN110110122A (en) * 2018-06-22 2019-08-09 北京交通大学 Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval
CN110309331A (en) * 2019-07-04 2019-10-08 哈尔滨工业大学(深圳) A kind of cross-module state depth Hash search method based on self-supervisory
CN110765281A (en) * 2019-11-04 2020-02-07 山东浪潮人工智能研究院有限公司 Multi-semantic depth supervision cross-modal Hash retrieval method
CN111127385A (en) * 2019-06-06 2020-05-08 昆明理工大学 Medical information cross-modal Hash coding learning method based on generative countermeasure network
WO2020182019A1 (en) * 2019-03-08 2020-09-17 苏州大学 Image search method, apparatus, device, and computer-readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017092183A1 (en) * 2015-12-03 2017-06-08 中山大学 Image retrieval method based on variable-length deep hash learning
CN110110122A (en) * 2018-06-22 2019-08-09 北京交通大学 Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval
CN109299216A (en) * 2018-10-29 2019-02-01 山东师范大学 A kind of cross-module state Hash search method and system merging supervision message
WO2020182019A1 (en) * 2019-03-08 2020-09-17 苏州大学 Image search method, apparatus, device, and computer-readable storage medium
CN111127385A (en) * 2019-06-06 2020-05-08 昆明理工大学 Medical information cross-modal Hash coding learning method based on generative countermeasure network
CN110309331A (en) * 2019-07-04 2019-10-08 哈尔滨工业大学(深圳) A kind of cross-module state depth Hash search method based on self-supervisory
CN110765281A (en) * 2019-11-04 2020-02-07 山东浪潮人工智能研究院有限公司 Multi-semantic depth supervision cross-modal Hash retrieval method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WU L等: "Cycle-consistent deep generative hashing for cross-modal retrieval", IEEE TRANSACTIONS ON IMAGE PROCESSING *
刘昊鑫;吴小俊;庾骏;: "联合哈希特征和分类器学习的跨模态检索算法", 模式识别与人工智能, no. 02 *
欧卫华,刘彬,周永辉等: "跨模态检索研究综述", 贵州师范大学学报(自然科学版) *
陈飞;吕绍和;李军;王晓东;窦勇;: "目标提取与哈希机制的多标签图像检索", 中国图象图形学报, no. 02 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704537A (en) * 2021-10-28 2021-11-26 南京码极客科技有限公司 Fine-grained cross-media retrieval method based on multi-scale feature union
CN113704537B (en) * 2021-10-28 2022-02-15 南京码极客科技有限公司 Fine-grained cross-media retrieval method based on multi-scale feature union
WO2024087218A1 (en) * 2022-10-28 2024-05-02 深圳先进技术研究院 Cross-modal medical image generation method and apparatus
CN117112829A (en) * 2023-10-24 2023-11-24 吉林大学 Medical data cross-modal retrieval method and device and related equipment
CN117112829B (en) * 2023-10-24 2024-02-02 吉林大学 Medical data cross-modal retrieval method and device and related equipment

Also Published As

Publication number Publication date
CN113270199B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
Li et al. A survey of multi-view representation learning
CN110059217B (en) Image text cross-media retrieval method for two-stage network
Cao et al. Cross-modal hamming hashing
Arevalo et al. Gated multimodal units for information fusion
CN108984724B (en) Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation
Arevalo et al. Gated multimodal networks
Zheng et al. A deep and autoregressive approach for topic modeling of multimodal data
CN113270199B (en) Medical cross-mode multi-scale fusion class guide hash method and system thereof
Shi et al. Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval
Sharma et al. A survey of methods, datasets and evaluation metrics for visual question answering
Yang et al. Image captioning by incorporating affective concepts learned from both visual and textual components
Kim et al. Gaining extra supervision via multi-task learning for multi-modal video question answering
Qiao et al. Word-character attention model for Chinese text classification
CN116204706A (en) Multi-mode content retrieval method and system for text content and image analysis
CN116561305A (en) False news detection method based on multiple modes and transformers
Chen et al. Leveraging unpaired out-of-domain data for image captioning
Zhang et al. Category supervised cross-modal hashing retrieval for chest x-ray and radiology reports
Yu et al. Multimodal multitask deep learning for X-ray image retrieval
CN117556067B (en) Data retrieval method, device, computer equipment and storage medium
Wu et al. Deep semantic hashing with dual attention for cross-modal retrieval
Bayoudh A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges
Wu et al. Visual Question Answering
CN112182273B (en) Cross-modal retrieval method and system based on semantic constraint matrix decomposition hash
Huang et al. Explore instance similarity: An instance correlation based hashing method for multi-label cross-model retrieval
Perdana et al. Instance-based deep transfer learning on cross-domain image captioning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant