CN113270199A - Medical cross-modal multi-scale fusion class guidance hash method and system thereof - Google Patents
Medical cross-modal multi-scale fusion class guidance hash method and system thereof Download PDFInfo
- Publication number
- CN113270199A CN113270199A CN202110483387.0A CN202110483387A CN113270199A CN 113270199 A CN113270199 A CN 113270199A CN 202110483387 A CN202110483387 A CN 202110483387A CN 113270199 A CN113270199 A CN 113270199A
- Authority
- CN
- China
- Prior art keywords
- hash
- network
- class
- text
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000004927 fusion Effects 0.000 title claims abstract description 35
- 230000006870 function Effects 0.000 claims description 22
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000012544 monitoring process Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 abstract description 6
- 230000003042 antagnostic effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/5866—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Primary Health Care (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Radiology & Medical Imaging (AREA)
- Pathology (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Library & Information Science (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention discloses a medical cross-modal multi-scale fusion class guidance hash method and a medical cross-modal multi-scale fusion class guidance hash system. A number of experiments on the medical data set MIMIC-CXR have shown that this approach outperforms the existing baseline in the cross-modality retrieval task.
Description
Technical Field
The invention belongs to the field of cross-modal retrieval, and particularly relates to a medical cross-modal multi-scale fusion class guidance hash method and system.
Background
With the rapid development of medical technology, a large amount of medical data such as radiology reports, CT images, PET images, X-ray images, and the like are generated. Although they differ in form, they have similar semantics. Recently, many single modality methods have been proposed to separately understand these data, such as medical image segmentation, medical image classification and content-based medical image retrieval. Although much work has been done on clinical imaging, other morphological data of medical data, such as radiology reports, have been overlooked. In order to enable physicians to obtain comprehensive information about queries, retrieve semantically similar clinical profiles in different modalities, and provide diagnostic results according to their previous medical recommendations, a medical cross-modality retrieval is proposed, i.e. using an instance of one modality (e.g. an x-ray image) to retrieve an instance of another modality (e.g. a radiology report) with similar semantics.
Hashing is applied to cross-modal retrieval due to its high retrieval rate and low storage cost. Existing cross-modal hashing methods are generally divided into three categories: unsupervised, semi-supervised and supervised methods. Generally, while some tags may be damaged and inaccurate, tag information is useful for learning more discriminative features. Therefore, supervised cross-modal hashing methods can generally achieve better retrieval performance.
With the remarkable progress of deep learning, the deep neural network shows potential capability in cross-modal retrieval. For example, jiang et al propose depth cross-modal hashing (DCMH), which is an end-to-end framework that can learn depth features and hash functions simultaneously. Deep Visual Semantic Hashing (DVSH) uses a Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) to learn the hash code for each modality. Lie et al propose a self-supervised antagonistic hash network (SSAH) to design a self-supervised semantic network incorporating antagonistic learning to explore the semantic relationships between different modalities. Compared with a manual characteristic cross-modal retrieval method, the deep cross-modal retrieval performance is greatly improved.
However, the cross-modal retrieval methods described above all rely on semantic similarity matrices to supervise the generation of hash codes. Specifically, two data are defined to be similar if their respective tags share at least one common category, and are otherwise dissimilar. However, it is obvious that the definition omits rich semantic information and cannot well retain semantic structure information. Meanwhile, different modal data sharing the same semantics are embedded into a uniform hash code by the cross-modal retrieval method, and error codes are inevitably generated due to inherent modal differences and noise.
Based on the current situation, a medical cross-modal multi-scale fusion class-guided hashing (MCMFCH) method and a system thereof are provided.
Disclosure of Invention
Technical problem to be solved
The invention aims to provide a medical cross-modal multi-scale fusion class guidance hash method and a medical cross-modal multi-scale fusion class guidance hash system. Meanwhile, the combined network is utilized to guide the learning of the Hash codes of the images and the texts, so that modal semantics are mutually associated, and the semantic correlation among the modalities is favorably improved.
(II) technical scheme
In order to achieve the above purpose, the invention adopts the following technical scheme:
a medical cross-modal multi-scale fusion class guidance hash method comprises the following specific steps:
s1, inputting category semantics, and establishing a category hash network for learning hash codes of various categories;
s2, inputting data of different modes, establishing an image network and a text network so as to obtain the characteristics and hash codes of the modes, and combining the image text to generate a combined hash code;
s3, representing labels by class hash codes as supervision information to train hash codes of images, texts and joint networks;
s4, federated network to guide the learning of hash codes for images and text.
Further, the model of the class hash network in S1 is:
s.t.pi=sgn(H(c))=sgn(fc(ci;θc))
wherein α is a hyperparameter; 1 is a vector with all elements 1; sgn (.) is a sign function; wherein p isiRepresents a category ciThe learned hash code. Finally, the Hash code is obtained
Further, in S2, establishing an image hash network and a text hash network to obtain features and hash codes of each modality, and generating a joint hash code by the joint hash network specifically includes the following steps:
s2.1, an image hash network, wherein in order to obtain high-resolution and high-semantic medical image features, a deep convolutional network (VGG) is combined with a target pyramid network (FPN) to obtain image multi-scale features, and the image multi-scale features are called as a VFPN multi-scale network; the network fuses high resolution and weak semantic features and low resolution and strong semantic features to obtain high resolution and strong semantic features fx(x;θx) (ii) a In addition, three full-connection layers are added as hash functions to convert the characteristics fx(x;θx) Conversion into binary code H(x)=fx(x;θx)∈{-1,1}k(ii) a Wherein the first two fully connected layers are the same as the last two layers of the VGG, and the third fully connected layer has k hidden units, using the tanh (-) function as the activation function. Finally pass through Bx=sgn(Hx)∈{-1,1}kObtaining a hash code of an image mode; where k is the length of the hash code;
s2.2, text Hash network, using the baseA text network multi-scale fusion model for self-supervised secure Hash Cross-modal retrieval (SSAH); firstly, extracting a plurality of scale features from text data by using 5 average pooling layers of 1 × 1, 1 × 2, 1 × 3, 1 × 6 and 1 × 10, and then fusing the features by using one 1 × 1 convolutional layer; then, obtaining the multi-scale text semantic feature f by utilizing the processes of size adjustment and connectiony(y;θy) (ii) a The fusion features are sent into a three-layer feedforward neural network to be used as a Hash function to convert the features f into the features fy(y;θy) Conversion into binary code H(y)=fy(y;θy)∈{-1,1}k(ii) a Finally pass through By=sgn(Hy)∈{-1,1}kObtaining a hash code of a text mode;
s2.3, combining the Hash network, wherein the network uses the image multi-scale feature f generated by the VFPN multi-scale network in the image networkx(x;θx) And multi-scale fusion features f in texty(y;θy) Of (a) intersection fu(u;θu) Is input; intersection feature fu(u;θu) Is fed into a three-layer feedforward neural network as a hash function to convert the features into binary codes H(u)=fu(u;θu)∈{-1,1}k(ii) a Finally pass through Bu=sgn(Hu)∈{-1,1}kObtaining a hash code of the union network;
further, in the step S3, the step of monitoring the learning of the modal hash codes according to the class hash codes includes:
s3.1, cross-modal similarity and rich semantic structure information are kept through the Hamming distance,and belong to class ciShould be smaller than not belonging to class ciThe hamming distance between the hash codes is modeled as:
wherein x represents x, y, u image, text and union modality; mu epsilon [0,1]Is a predefined margin, k is the hash code length; eiAs data pointsiIndex set of the class to which it belongs, i.e. label vector liIndex of middle element 1; qi={1,…,c}-EiAs data pointsiIndex set of categories not belonging to, i.e. label vector liAn index of the middle element "0";is thatAnd peThe hamming distance of;is thatShould be equal to the average of the similar class hash codes ofSimilarly; furthermore, ifClass hash code corresponding to the sameRatio { p }q|q∈QiThe class hash codes in the data are more similar, thenThe semantic similarity and the semantic structure information are well kept at the same time;
S3.2、the loss of each mode can be supervised and generated by a class hash code P, wherein the loss of each mode is as follows:
wherein λ is a hyperparameter; x, y, u images, text, and union modality;is thatAverage value of similar category hash codes of (1); p is a radical ofqIs thatIs different from the similar class hash code.
Further, in S4, a joint network is used to guide the learning of hash codes of images and texts, and the specific model is as follows:
A retrieval model based on a medical cross-modal multi-scale fusion class guidance hash method is generated by adopting the medical cross-modal multi-scale fusion class guidance hash method, and the retrieval model is as follows:
wherein gamma and eta are hyper-parameters; x, y, u images, text, and union modality;is thatAverage value of similar category hash codes of (1); p is a radical ofqIs thatThe dissimilar class hash codes of (1);respectively, hash codes for the federated network, the image, and the text.
A retrieval system based on medical cross-modal multi-scale fusion class guidance hash method comprises the following steps:
the input module I is used for inputting category semantics;
the first characteristic processing module is used for establishing a category hash network to learn hash codes of various categories;
the input module II is used for inputting data of different modes;
the second characteristic processing module is used for establishing an image network and a text network to obtain characteristics and hash codes of each mode, and generating a combined hash code by combining the image text characteristics;
the learning training module is used for training images, texts and hash codes of a joint network by using class hash codes to represent labels as monitoring information, and simultaneously the joint network is used for guiding the learning of the hash codes of the images and the texts and searching;
and the output module is used for outputting the retrieval result.
(III) advantageous effects
Compared with the prior art, the method obtains the mode specific representation of each mode by using multi-scale fusion, and guides the learning of the hash code of each mode by using class hash. Experiments on two data sets simultaneously show that the method has better retrieval performance.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of an algorithm architecture proposed by the method of the present invention;
FIG. 3 shows the first 10 search results on the MIMIC-CXR dataset for CCA, DCMH and the method of the present invention;
fig. 4 is a schematic structural diagram of a cross-modal retrieval system according to an embodiment of the present invention.
Detailed Description
As shown in FIG. 1, the invention provides a medical cross-modal multi-scale fusion class guidance hashing method, and a corresponding system is designed according to the method.
The medical cross-modal multi-scale fusion category guidance hashing method comprises the following specific steps:
s1, inputting category semantics, and establishing a category hash network for learning hash codes of various categories;
s2, inputting data of different modes, establishing an image network and a text network so as to obtain the characteristics and hash codes of the modes, and combining the image text to generate a combined hash code;
s3, representing labels by class hash codes as supervision information to train hash codes of images, texts and joint networks;
s4, federated network to guide the learning of hash codes for images and text.
The class hash network is used to generate the hash codes of the classes, so that the learned class hash codes can represent the labels, and the model of the class hash network in S1, that is, the objective function, is as follows:
s.t.pi=sgn(H(c))=sgn(fc(ci;θc))
wherein α is a hyperparameter; 1 is a vector with all elements 1; sgn (.) is a sign function; wherein p isiRepresents a category ciThe learned hash code. Finally, the Hash code is obtained
In S2, the image hash network, the text hash network, and the joint hash network learn features and hash codes of different modalities, and the specific implementation process is as follows:
s2.1, image hash network, firstly, a deep convolutional network (VGG) is combined with a target pyramid network (FPN) to obtain image multi-scale features, and the image multi-scale features are called as VFPN multi-scale networks. The network fuses high resolution and weak semantic features and low resolution and strong semantic features to obtain high resolution and strong semantic features, namely the highest resolution and strong semantic feature fx(x;θx). Furthermore, the addition of the first two fully connected layers is the same as the last two layers of the VGG. The third fully-connected layer has k hidden units, using the tanh (-) function as the activation function. These three layers use the feature f as a hash functionx(x;θx) Conversion into binary code H(x)=fx(x;θx)∈{-1,1}k. Then, we pass Bx=sgn(Hx)∈{-1,1}kA hash code of the image modality is obtained, where k is the length of the hash code.
S2.2, the text Hash network adopts a text network multi-scale fusion model based on cross-modal retrieval of self-supervision countermeasure Hash (SSAH). The multi-scale fusion model comprises 5 average pooling layers of 1 × 1, 1 × 2, 1 × 3, 1 × 6 and 1 × 10 for extracting text dataMultiple scale features and a 1 x 1 convolutional layer fuse the multiple features. Then, obtaining the multi-scale text semantic feature f by utilizing the processes of size adjustment and connectiony(y;θy). The fusion features are sent into a three-layer feedforward neural network to be used as a Hash function to convert the features f into the features fy(y;θy) Conversion into binary code H (y)=fy(y;θy)∈{-1,1}k. Then, we pass By=sgn(Hy)∈{-1,1}kA hash code of the textual modality is obtained.
S2.3, combining the Hash network, wherein the network uses the image multi-scale feature f generated by the VFPN multi-scale network in the image networkx(x;θx) And multi-scale fusion features f in texty(y;θy) Of (a) intersection fu(u;θu)=concat(fx(x;θx),fy(y;θy) ) is input. Intersection feature fu(u;θu) Is fed into a three-layer feedforward neural network as a hash function to convert the features into binary codes H(u)=fu(u;θu)∈{-1,1}k. Then, we pass Bu=sgn(Hu)∈{-1,1}kA hash code for the federated network is obtained.
In S3, the step of monitoring the learning of the modal hash codes according to the class hash codes means that the following steps are adopted:
s3.1, cross-modal similarity and rich semantic structure information are kept through the Hamming distance,and belong to class ciShould be smaller than not belonging to class ciThe hamming distance between the hash codes is modeled as:
wherein x represents x, y, u image, text and union modality; mu epsilon [0,1]Is a predefined margin, k is the hash code length; eiAs data pointsiIndex set of the class to which it belongs, i.e. label vector liIndex of middle element 1; qi={1,…,c}-EiAs data pointsiIndex set of categories not belonging to, i.e. label vector liAn index of the middle element "0";is thatAnd peThe hamming distance of;is thatShould be equal to the average of the similar class hash codes ofSimilarly; furthermore, ifClass hash code corresponding to the sameRatio { p }q|q∈QiThe class hash codes in the data are more similar, thenThe semantic similarity and the semantic structure information are well kept at the same time;
S3.2、the loss of each mode can be supervised and generated by a class hash code P, wherein the loss of each mode is as follows:
wherein λ is a hyperparameter; x, y, u images, text, and union modality;is thatAverage value of similar category hash codes of (1); p is a radical ofqIs thatIs different from the similar class hash code.
Guiding each mode to generate a hash code by utilizing a category network, wherein the hash objective function is as follows:
wherein λ is a hyperparameter; x, y, u images, text, and union modality;is thatAverage value of similar category hash codes of (1); p is a radical ofqIs thatIs different from the similar class hash code. The cross-modal similarity and rich semantic structure information can be well maintained through the hash code learned by the model provided by the embodiment.
In S4, the hash code generation and learning of the image and the text are guided by using a joint hash network, so as to improve the correlation of modalities, that is:
Combining the above functions, a retrieval model based on medical cross-modal multi-scale fusion class guidance hash method is disclosed, the retrieval model is:
wherein gamma and eta are hyper-parameters; x, y, u images, text, and union modality;is thatAverage value of similar category hash codes of (1); p is a radical ofqIs thatThe dissimilar class hash codes of (1);respectively, hash codes for the federated network, the image, and the text.
In order to verify the superiority of the method in cross-modal retrieval, MIMIC-CXR on a public medical data set is selected for experiment, mAP is adopted for cross-modal retrieval evaluation, and a retrieval result of Top-10 is also displayed; in the experiment, the training of the method of the embodiment is performed for 5 times, the average value is taken as the final result, and the parameters are set as follows: α ═ 0.05, β ═ 0.01, λ ═ 0.3, γ ═ 0.3, η ═ 0.3, and μ ═ 0.3
Table 1: mAP values on MIMIC-CXR datasets
(1) Analysis of results of mAP values on two public data sets
The method of this embodiment is compared with the existing 7 cross-modal retrieval methods, namely CCA, CMSSH, SCM, STMH, CMFH, SePH, DCMH. Compared experiments are carried out on two data sets in all the methods, as shown in the table above, the mAP value of the method is higher than that of other compared experiment methods, the feasibility of the method of substituting the category hash for the semantic similarity matrix is shown, and the combined semantics is beneficial to improving the semantic relevance.
(2) Comparative analysis of Top-10 search results
As shown in fig. 3, there are multiple failure cases of CCA and DCMH methods, and this embodiment compares that although our method is unsuccessful in both task image retrieval text and text retrieval image, respectively, the ranking is earlier, and the retrieval result is intuitively semantically related to the query.
As shown in fig. 4, a retrieval system based on medical cross-modal multi-scale fusion class guidance hash method includes:
the input module I1 is used for inputting category semantics;
the characteristic processing module I2 is used for establishing a category hash network to learn hash codes of various categories;
the input module II 3 is used for inputting data of different modes;
the second feature processing module 4 is used for establishing an image network and a text network to obtain features and hash codes of each modality, and generating a combined hash code by combining the image and text features;
the learning training module 5 is used for representing the label as monitoring information by utilizing the class Hash code to train the Hash codes of the image, the text and the combined network, and simultaneously, the combined network is used for guiding the learning of the Hash codes of the image and the text and searching;
and the output module 6 is used for outputting the retrieval result.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any modification and replacement based on the technical solution and inventive concept provided by the present invention should be covered within the scope of the present invention.
Claims (7)
1. The medical cross-modal multi-scale fusion class guidance hash method is characterized by comprising the following steps: the method comprises the following specific steps:
s1, inputting category semantics, and establishing a category hash network for learning hash codes of various categories;
s2, inputting data of different modes, establishing an image network and a text network so as to obtain the characteristics and hash codes of the modes, and combining the image text to generate a combined hash code;
s3, representing labels by class hash codes as supervision information to train hash codes of images, texts and joint networks;
s4, federated network to guide the learning of hash codes for images and text.
2. The medical cross-modal multi-scale fusion class-guided hashing method according to claim 1, wherein: the model of the class hash network in S1 is:
s.t.pi=sgn(H(c))=sgn(fc(ci;θc))
3. The medical cross-modal multi-scale fusion class-guided hashing method according to claim 1, wherein: in S2, an image hash network and a text hash network are established to obtain features and hash codes of each modality, and a joint hash network generates a joint hash code, which is specifically implemented by the following steps:
s2.1, an image hash network, wherein in order to obtain high-resolution and high-semantic medical image features, a deep convolutional network (VGG) is combined with a target pyramid network (FPN) to obtain image multi-scale features, and the image multi-scale features are called as a VFPN multi-scale network; the network fuses high resolution and weak semantic features and low resolution and strong semantic features to obtain high resolution and strong semantic features fx(x;θx) (ii) a In addition, three full-connection layers are added as hash functions to convert the characteristics fx(x;θx) Conversion into binary code H(x)=fx(x;θx)∈{-1,1}k(ii) a Wherein the first two are fully connectedThe layers are identical to the last two layers of the VGG, and the third fully connected layer has k hidden units, using the tanh (-) function as the activation function. Finally pass through Bx=sgn(Hx)∈{-1,1}kObtaining a hash code of an image mode; where k is the length of the hash code;
s2.2, adopting a text hash network multi-scale fusion model based on self-supervision anti-hash cross-modal retrieval (SSAH); firstly, extracting a plurality of scale features from text data by using 5 average pooling layers of 1 × 1, 1 × 2, 1 × 3, 1 × 6 and 1 × 10, and then fusing the features by using one 1 × 1 convolutional layer; then, obtaining the multi-scale text semantic feature f by utilizing the processes of size adjustment and connectiony(y;θy) (ii) a The fusion features are sent into a three-layer feedforward neural network to be used as a Hash function to convert the features f into the features fy(y;θy) Conversion into binary code H(y)=fy(y;θy)∈{-1,1}k(ii) a Finally pass through By=sgn(Hy)∈{-1,1}kObtaining a hash code of a text mode;
s2.3, combining the Hash network, wherein the network uses the image multi-scale feature f generated by the VFPN multi-scale network in the image networkx(x;θx) And multi-scale fusion features f in texty(y;θy) Of (a) intersection fu(u;θu) Is input; intersection feature fu(u;θu) Is fed into a three-layer feedforward neural network as a hash function to convert the features into binary codes H(u)=fu(u;θu)∈{-1,1}k(ii) a Finally pass through Bu=sgn(Hu)∈{-1,1}kA hash code for the federated network is obtained.
4. The medical cross-modal multi-scale fusion class-guided hashing method according to claim 1, wherein: in S3, the step of monitoring the learning of the modal hash codes according to the class hash codes includes:
s3.1, cross-modal similarity and rich semantic structure information are kept through the Hamming distance,and belong to class ciShould be smaller than not belonging to class ciThe hamming distance between the hash codes is modeled as:
wherein x represents x, y, u image, text and union modality; mu epsilon [0,1]Is a predefined margin, k is the hash code length; eiAs data pointsiIndex set of the class to which it belongs, i.e. label vector liIndex of middle element 1; qi={1,...,c}-EiAs data pointsiIndex set of categories not belonging to, i.e. label vector liAn index of the middle element "0";is thatAnd peThe hamming distance of;is thatShould be equal to the average of the similar class hash codes ofSimilarly; furthermore, ifClass hash code corresponding to the sameRatio { p }q|q∈QiThe class hash codes in the data are more similar, thenThe semantic similarity and the semantic structure information are well kept at the same time;
S3.2、the loss of each mode can be supervised and generated by a class hash code P, wherein the loss of each mode is as follows:
5. The medical cross-modal multi-scale fusion class-guided hashing method according to claim 1, wherein: in S4, a joint network is used to guide the learning of hash codes of images and texts, and the specific model is as follows:
6. The retrieval model based on the medical cross-modal multi-scale fusion class guidance hash method is characterized in that: the retrieval model is generated by adopting the medical cross-modal multi-scale fusion class guidance hash method of claim 1, and the retrieval model is as follows:
7. The retrieval system based on the medical cross-modal multi-scale fusion class guidance hash method is characterized by comprising the following steps:
the input module I (1) is used for inputting category semantics;
the characteristic processing module I (2) is used for establishing a class hash network to learn hash codes of various classes;
the input module II (3) is used for inputting data of different modes;
the second feature processing module (4) is used for establishing an image network and a text network to obtain features and hash codes of all modalities, and combining the image and text features to generate a combined hash code;
the learning training module (5) is used for representing the label as monitoring information by utilizing the class Hash code to train the Hash codes of the image, the text and the combined network, and simultaneously the combined network is used for guiding the learning of the Hash codes of the image and the text and searching;
and the output module (6) is used for outputting the retrieval result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110483387.0A CN113270199B (en) | 2021-04-30 | 2021-04-30 | Medical cross-mode multi-scale fusion class guide hash method and system thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110483387.0A CN113270199B (en) | 2021-04-30 | 2021-04-30 | Medical cross-mode multi-scale fusion class guide hash method and system thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113270199A true CN113270199A (en) | 2021-08-17 |
CN113270199B CN113270199B (en) | 2024-04-26 |
Family
ID=77229860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110483387.0A Active CN113270199B (en) | 2021-04-30 | 2021-04-30 | Medical cross-mode multi-scale fusion class guide hash method and system thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113270199B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704537A (en) * | 2021-10-28 | 2021-11-26 | 南京码极客科技有限公司 | Fine-grained cross-media retrieval method based on multi-scale feature union |
CN117112829A (en) * | 2023-10-24 | 2023-11-24 | 吉林大学 | Medical data cross-modal retrieval method and device and related equipment |
WO2024087218A1 (en) * | 2022-10-28 | 2024-05-02 | 深圳先进技术研究院 | Cross-modal medical image generation method and apparatus |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017092183A1 (en) * | 2015-12-03 | 2017-06-08 | 中山大学 | Image retrieval method based on variable-length deep hash learning |
CN109299216A (en) * | 2018-10-29 | 2019-02-01 | 山东师范大学 | A kind of cross-module state Hash search method and system merging supervision message |
CN110110122A (en) * | 2018-06-22 | 2019-08-09 | 北京交通大学 | Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval |
CN110309331A (en) * | 2019-07-04 | 2019-10-08 | 哈尔滨工业大学(深圳) | A kind of cross-module state depth Hash search method based on self-supervisory |
CN110765281A (en) * | 2019-11-04 | 2020-02-07 | 山东浪潮人工智能研究院有限公司 | Multi-semantic depth supervision cross-modal Hash retrieval method |
CN111127385A (en) * | 2019-06-06 | 2020-05-08 | 昆明理工大学 | Medical information cross-modal Hash coding learning method based on generative countermeasure network |
WO2020182019A1 (en) * | 2019-03-08 | 2020-09-17 | 苏州大学 | Image search method, apparatus, device, and computer-readable storage medium |
-
2021
- 2021-04-30 CN CN202110483387.0A patent/CN113270199B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017092183A1 (en) * | 2015-12-03 | 2017-06-08 | 中山大学 | Image retrieval method based on variable-length deep hash learning |
CN110110122A (en) * | 2018-06-22 | 2019-08-09 | 北京交通大学 | Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval |
CN109299216A (en) * | 2018-10-29 | 2019-02-01 | 山东师范大学 | A kind of cross-module state Hash search method and system merging supervision message |
WO2020182019A1 (en) * | 2019-03-08 | 2020-09-17 | 苏州大学 | Image search method, apparatus, device, and computer-readable storage medium |
CN111127385A (en) * | 2019-06-06 | 2020-05-08 | 昆明理工大学 | Medical information cross-modal Hash coding learning method based on generative countermeasure network |
CN110309331A (en) * | 2019-07-04 | 2019-10-08 | 哈尔滨工业大学(深圳) | A kind of cross-module state depth Hash search method based on self-supervisory |
CN110765281A (en) * | 2019-11-04 | 2020-02-07 | 山东浪潮人工智能研究院有限公司 | Multi-semantic depth supervision cross-modal Hash retrieval method |
Non-Patent Citations (4)
Title |
---|
WU L等: "Cycle-consistent deep generative hashing for cross-modal retrieval", IEEE TRANSACTIONS ON IMAGE PROCESSING * |
刘昊鑫;吴小俊;庾骏;: "联合哈希特征和分类器学习的跨模态检索算法", 模式识别与人工智能, no. 02 * |
欧卫华,刘彬,周永辉等: "跨模态检索研究综述", 贵州师范大学学报(自然科学版) * |
陈飞;吕绍和;李军;王晓东;窦勇;: "目标提取与哈希机制的多标签图像检索", 中国图象图形学报, no. 02 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704537A (en) * | 2021-10-28 | 2021-11-26 | 南京码极客科技有限公司 | Fine-grained cross-media retrieval method based on multi-scale feature union |
CN113704537B (en) * | 2021-10-28 | 2022-02-15 | 南京码极客科技有限公司 | Fine-grained cross-media retrieval method based on multi-scale feature union |
WO2024087218A1 (en) * | 2022-10-28 | 2024-05-02 | 深圳先进技术研究院 | Cross-modal medical image generation method and apparatus |
CN117112829A (en) * | 2023-10-24 | 2023-11-24 | 吉林大学 | Medical data cross-modal retrieval method and device and related equipment |
CN117112829B (en) * | 2023-10-24 | 2024-02-02 | 吉林大学 | Medical data cross-modal retrieval method and device and related equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113270199B (en) | 2024-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | A survey of multi-view representation learning | |
CN110059217B (en) | Image text cross-media retrieval method for two-stage network | |
Cao et al. | Cross-modal hamming hashing | |
Arevalo et al. | Gated multimodal units for information fusion | |
CN108984724B (en) | Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation | |
Arevalo et al. | Gated multimodal networks | |
Zheng et al. | A deep and autoregressive approach for topic modeling of multimodal data | |
CN113270199B (en) | Medical cross-mode multi-scale fusion class guide hash method and system thereof | |
Shi et al. | Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval | |
Sharma et al. | A survey of methods, datasets and evaluation metrics for visual question answering | |
Yang et al. | Image captioning by incorporating affective concepts learned from both visual and textual components | |
Kim et al. | Gaining extra supervision via multi-task learning for multi-modal video question answering | |
Qiao et al. | Word-character attention model for Chinese text classification | |
CN116204706A (en) | Multi-mode content retrieval method and system for text content and image analysis | |
CN116561305A (en) | False news detection method based on multiple modes and transformers | |
Chen et al. | Leveraging unpaired out-of-domain data for image captioning | |
Zhang et al. | Category supervised cross-modal hashing retrieval for chest x-ray and radiology reports | |
Yu et al. | Multimodal multitask deep learning for X-ray image retrieval | |
CN117556067B (en) | Data retrieval method, device, computer equipment and storage medium | |
Wu et al. | Deep semantic hashing with dual attention for cross-modal retrieval | |
Bayoudh | A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges | |
Wu et al. | Visual Question Answering | |
CN112182273B (en) | Cross-modal retrieval method and system based on semantic constraint matrix decomposition hash | |
Huang et al. | Explore instance similarity: An instance correlation based hashing method for multi-label cross-model retrieval | |
Perdana et al. | Instance-based deep transfer learning on cross-domain image captioning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |