CN116881336A - Efficient multi-mode contrast depth hash retrieval method for medical big data - Google Patents

Efficient multi-mode contrast depth hash retrieval method for medical big data Download PDF

Info

Publication number
CN116881336A
CN116881336A CN202310922846.XA CN202310922846A CN116881336A CN 116881336 A CN116881336 A CN 116881336A CN 202310922846 A CN202310922846 A CN 202310922846A CN 116881336 A CN116881336 A CN 116881336A
Authority
CN
China
Prior art keywords
hash
medical
momentum
features
modal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310922846.XA
Other languages
Chinese (zh)
Inventor
曹玉东
孙浩轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning University of Technology
Original Assignee
Liaoning University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning University of Technology filed Critical Liaoning University of Technology
Priority to CN202310922846.XA priority Critical patent/CN116881336A/en
Publication of CN116881336A publication Critical patent/CN116881336A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Primary Health Care (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pathology (AREA)
  • Fuzzy Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Image Analysis (AREA)

Abstract

A high-efficiency multi-mode contrast depth hash retrieval method for medical big data relates to the technical field of artificial intelligence, and a graph search function can enable researchers to search all similar image cases in a full database, is not influenced by the prior expert diagnosis conclusion, and provides possibility for further enriching and correcting the database. The cross-modal "literal search" function is more suitable for primary doctors or researchers. The method has the advantages of small storage space occupation, realization of cross-modal rapid search, development of an efficient multi-modal medical data retrieval model by utilizing the potential correlation between the medical report and the corresponding x-ray image, reduction of the storage space, improvement of the medical big data retrieval efficiency, and better study research and clinical diagnosis by doctors.

Description

Efficient multi-mode contrast depth hash retrieval method for medical big data
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a high-efficiency multi-mode contrast depth hash retrieval method for medical big data, which comprises the steps of large-scale image processing, artificial neural network and multi-mode data retrieval, and mainly aims at the problems of high medical image labeling cost, low retrieval efficiency, long time consumption and the like. The invention improves the retrieval efficiency by comparing and learning the advanced semantic information in the mined medical data, and reduces the search time and the storage space by compressing the high-dimensional features into the two-dimensional hash features.
Background
Advances in medical imaging technology have drastically changed healthcare practices and improved patient diagnostic results. However, the increasing number of image data stored in recent years places a tremendous burden on radiologists, affecting the quality and speed of clinical decisions. The deep cross-modal hash retrieval method provides a very promising solution for automatic analysis of medical images. Current hash medical image retrieval techniques include the following: the single-mode Hash medical image retrieval technology comprises the following steps: the single-mode hash medical image retrieval technology can be divided into supervision and non-supervision, and the supervision single-mode hash medical image retrieval technology utilizes label information to supervise the training of the hash model, so that the retrieval efficiency is higher, but the manual labeling cost is high, and the method is not applicable to the condition of large-scale data. The unsupervised single-mode hash medical image retrieval technology utilizes the inherent attribute of the medical image to train the hash model, does not need manual annotation, but has poor retrieval efficiency. The traditional single-mode hash medical image retrieval technology only uses an image data training model, and has great limitation in the current age of coexistence of a large amount of multimedia heterogeneous data. The multi-mode Hash medical image retrieval technology comprises the following steps: the medical report is used as a supervision signal to guide the model to learn multi-mode representation, so that cross-mode retrieval is realized, but the retrieval performance is poor, and the large data retrieval requirement cannot be met.
The existing single-mode hash medical image retrieval technology utilizes a neural network to extract medical image characteristics, transfers the medical image characteristics to a Hamming space and carries out similarity calculation so as to train the discrimination capability of a model, has a simple structure, is not applicable to multi-mode data, and cannot meet the current medical requirements. Although the existing multi-mode hash medical data retrieval technology solves the problems, the traditional multi-mode method is simply migrated, so that high-level semantic information in medical images cannot be acquired, and the retrieval efficiency is low. The invention aims to establish a multi-modal medical data hash retrieval model based on contrast learning, and improve the cross-modal searching efficiency while reducing the searching time and the storage space.
Disclosure of Invention
Aiming at the problems, the invention provides a high-efficiency multi-mode contrast depth hash retrieval method for medical big data, and the 'graph search' function can enable researchers to search all similar image cases in a full database, is not influenced by the prior expert diagnosis conclusion, and provides possibility for further enriching and correcting the database; the cross-modal "literal search" function is more suitable for primary doctors or researchers.
The technical scheme adopted by the invention is as follows:
a high-efficiency multi-modal contrast depth hash retrieval method for medical big data, comprising the steps of:
s1, acquiring a multi-mode data set for model training, wherein the multi-mode data set comprises medical x-ray images and corresponding radiology reports.
S2, extracting original features and momentum features of medical multi-mode data respectively by using an original feature coding model and a momentum coding model, and converting the original features and the momentum features into original hash features and momentum hash features through a hash layer.
S3, further performing clustering operation on the momentum hash features to convert the momentum hash features into clustering hash features, and then performing contrast learning training with the original hash features to explore intra-class and inter-class distinctiveness of the multi-mode hash features.
S4, taking the clustering center as a pseudo tag, and guiding the hash generation network to filter out a large amount of noise in the data.
S5, comparing and learning the original characteristic representations of the image mode and the text mode, and further mining the similarity among classes of the multi-mode data.
S6, in order to adapt to multi-modal retrieval tasks, cross-modal similarity learning is needed.
S7, using the comparison training frame established by the S1-S6 in the training of the multi-mode comparison hash model to guide the model learning process and help the model to realize more efficient multi-mode medical data retrieval.
Based on the above scheme, each step can be realized in the above manner;
further, in step S1, the training dataset comprises a number of medical x-ray images I train And corresponding medical report T train
Further, in step S2, original features and momentum features of the medical multi-modal data are extracted using the original feature encoding model and the momentum encoding model, respectively, specifically comprising the following sub-steps:
s21, acquiring two image hash feature encoders ImgNet o And imgNet m The network parameters are respectively theta o And theta m The image hash feature encoder includes vision transformer encoder E v And a hash layer H i
S22, acquiring two text hash feature encoders TexNet o And TexNet m The network parameters are respectivelyAndthe text hash feature encoder includes a transducer encoder E T And a hash layer H t
S23, for each medical x-ray image I in the training data set train It is first subjected to ImgNet o And imgNet m Respectively generating original image featuresAnd momentum image feature->And then is againThrough hash layer H i Generating an original hash feature and a momentum hash feature, respectively>And->
S24, for each medical report T in the training data set train It is first passed through TexNet o And TexNet m Respectively generating original text featuresAnd momentum text feature->Then pass through the hash layer H t Generating an original hash feature and a momentum hash feature, respectively>And->
Further, in step S3, the momentum hash feature is further subjected to clustering operation and converted into a clustering hash feature, and then is subjected to contrast learning training with the original hash feature, so as to explore intra-class and inter-class distinctiveness of the multi-mode hash feature, and the method specifically comprises the following sub-steps:
s31, hashing image momentumObtaining the clustering center of the K-means clustering algorithm>And clustering center->Store as dynamic queue ready for hashing with text feature +.>Is a comparison of the study of (a).
S32, hashing image momentumObtaining the clustering center of the K-means clustering algorithm>And clustering center->Store as dynamic queue ready for hashing with text feature +.>Is a comparison of the study of (a).
S33, using contrast loss L h Training a model to expand intra-class and inter-class distinctions of data for different diseases in medical data:
wherein:<>representing a matrix multiplication, τ is a temperature hyper-parameter. Lambda is a balanced hyper-parameter and the similarity between different hash points is measured by dot product. Here, the first queryFrom image modality, and key value +.>And->From a text modality. In contrast, second query ++>From text modality, key value->And->From image modalities, where the key values are all from momentum queues of the respective modalities.
Further, in step S5, in order to further explore the similarity between classes of the multimodal data, the image modality and the text modality original feature representation are subjected to contrast learning.
Wherein sim (f) x ,f y )=f xT f y /||f x ||||f y Valuej, when k=x, the value is 1.f (f) x ,f y Representing the original features of the image modality and the text modality, respectively.
Further, in step S6, in order to adapt to the multi-modal search task, cross-modal similarity learning is also required, and the cross-modal similarity loss is:
wherein S is pair Represents x i And y is i Semantic similarity of (c) to each other.
S pair (b x ,b y )=b xT b y /||b x ||||b y In the multi-tag setting, two instances (x i And y j ) Annotated with a plurality of tags. Thus, if x i And y j Sharing at least one tag, then defineOtherwise, go (L)>v denotes the distance boundary and N denotes the number of samples trained for this purpose.
Further, in step S7, the comparison training framework established by S1 to S6 is used in the training of the multimodal comparison hash model to guide the model learning process, and help the model to realize more efficient multimodal medical data retrieval, which specifically includes the following steps:
s71, setting a total loss function of a medical multi-mode contrast hash model (wherein alpha and beta are balance super parameters):
L c =β(L h +L d )+(1-β)L f
L=αL s +(1-α)L c
s72, training a medical multi-mode comparison hash model through a loss function L by using an SGD optimization method and a back propagation algorithm until the loss function converges.
On the other hand, the invention provides a multi-mode contrast depth hash retrieval method for medical big data, which obtains a trained model ImgNet by using the method described in the scheme o And TexNet o Efficient multi-modal retrieval can be achieved.
The advantages are that:
compared with the existing multi-mode medical hash model, the multi-mode comparison hash model for medical data retrieval has the following beneficial effects:
firstly, the dual-contrast structure designed by the invention can effectively mine medical semantic information in multi-mode data, thereby improving the accuracy of the model.
Then, the invention adds distillation training, removes useless information in the data, and improves generalization of the model.
Finally, the proposed multi-modal retrieval model is applicable to different multi-modal medical databases, achieves remarkable performance improvement on a main-stream large-scale multi-modal medical data set, and is a multi-modal retrieval model convenient to popularize.
Drawings
FIG. 1 is a training flow diagram of a multimodal comparative hash model of the present invention.
Fig. 2 is an overall structure diagram of the multi-modal comparative hash model of the present invention.
FIG. 3 is a schematic diagram of a multi-modal search flow.
Fig. 4 is a partial enlarged view of the first portion of fig. 2.
Fig. 5 is a partial enlarged view of the second portion of fig. 2.
Fig. 6 is a partial enlarged view of the third portion of fig. 2.
Detailed Description
FIG. 1 is a training flowchart of the multimodal comparative hash model of the present invention, and detailed descriptions of specific training steps of the model are described below, where the model training specifically includes the following steps:
s1, acquiring a multi-mode data set for model training, wherein the multi-mode data set comprises medical x-ray images and corresponding radiology reports.
In the present embodiment, the training data set in the step S1 includes a plurality of medical x-ray images I train And corresponding medical report T train
The aim of training the medical multi-mode hash contrast model is as follows: the method and the device realize the efficient mutual retrieval of the x-ray images and the medical report, so that the search results are as comprehensive and accurate as possible, and the attention is paid to the fact that the method and the device are suitable for the x-ray images of all parts of the human body and the corresponding diagnostic reports thereof.
S2, extracting original features and momentum features of medical multi-mode data respectively by using an original feature coding model and a momentum coding model, and converting the original features and the momentum features into original hash features and momentum hash features through a hash layer.
In this embodiment, in the step S2, the original feature encoding model and the momentum encoding model are used to extract the original feature and the momentum feature of the medical multi-mode data respectively, and then the original hash feature and the momentum hash feature are converted through the hash layer, which specifically includes the following steps:
s21, acquiring two image hash feature encoders ImgNet o And imgNet m The network parameters are respectively theta o And theta m The image hash feature encoder includes vision transformer encoder E v And a hash layer H i
S22, acquiring two text hash feature encoders TexNet o And TexNet m The network parameters are respectivelyAndthe text hash feature encoder includes a transducer encoder E T And a hash layer H t
S23, for each medical x-ray image I in the training data set train It is first subjected to ImgNet o And imgNet m Respectively generating original image featuresAnd momentum image feature->Then pass through the hash layer H i Generating original hash features respectively>And momentum hash feature->
S24, for each medical report T in the training data set train It is first passed through TexNet o And TexNet m Respectively generating original image featuresAnd momentum image feature->Then pass through the hash layer H t Generating original hash features respectivelyAnd momentum hash feature->
S3, further performing clustering operation on the momentum hash features to convert the momentum hash features into clustering hash features, and then performing contrast learning training with the original hash features to explore intra-class and inter-class distinctiveness of the multi-mode hash features.
In this embodiment, in the step S3, further performing a clustering operation to convert into a clustered hash feature, and then performing a contrast learning training with an original hash feature to explore intra-class and inter-class distinctiveness of a multi-modal hash feature, which specifically includes the following sub-steps:
s31, hashing image momentumObtaining the clustering center of the K-means clustering algorithm>And clustering center->Store as dynamic queue ready for hashing with text feature +.>Is a comparison of the study of (a).
S32, hashing image momentumObtaining the clustering center of the K-means clustering algorithm>And clustering center->Store as dynamic queue ready for hashing with text feature +.>Is a comparison of the study of (a).
S33, using contrast loss L h Training a model to expand intra-class and inter-class distinctions of data for different diseases in medical data:
wherein:<>representing a matrix multiplication, τ is a temperature hyper-parameter. Lambda is a balanced hyper-parameter and the similarity between different hash points is measured by dot product. Here, the first queryFrom image modality, and key value +.>And->From a text modality. In contrast, second query ++>From text modality, key value->And->From image modalities, where the key values are all from momentum queues of the respective modalities.
S5, comparing and learning the original characteristic representations of the image mode and the text mode, and further exploring the similarity among classes of the multi-mode data.
Wherein sim (f) x ,f y )=f xT f y /||f x ||||f y Valuej, when k=x, the value is 1.f (f) x ,f y Representing the original features of the image modality and the text modality, respectively.
S6, in order to adapt to multi-modal retrieval tasks, cross-modal similarity learning is needed:
wherein S is pair Represents x i And y is i Semantic similarity of (c) to each other. S is S pair (b x ,b y )=b xT b y /||b x ||||b y In the multi-tag setting, two instances (x i And y j ) Annotated with a plurality of tags. Thus, if x i And y j Sharing at least one tag, then defineOtherwise, go (L)>v denotes the distance boundary and N denotes the number of samples trained for this purpose.
S7, the contrast training frame established in the steps S1-S6 guides a model learning process, and helps the model to achieve efficient multi-mode medical data retrieval.
In this embodiment, in the step S6, the model learning process is guided by using the contrast training frames established by S1 to S5, so as to help the model to realize more efficient multi-modal medical data retrieval, which specifically includes the following sub-steps:
s71, setting a total loss function of a medical multi-mode contrast hash model (wherein alpha and beta are balance super parameters):
L c =β(L h +L d )+(1-β)L f
L=αL s +(1-α)L c
s72, training a medical multi-mode comparison hash model through a loss function L by using an SGD (random gradient descent) optimization method and a back propagation algorithm until the loss function converges.
Finally, when a specific cross-modal retrieval task is executed, a trained model ImgNet is obtained by using the method described in the scheme o And TexNet o Efficient multimodal retrieval can be achieved.
The multi-mode comparison hash model for efficient retrieval of medical data is applied to a specific data set example, and the technical effects achieved by the multi-mode comparison hash model are shown.
The implementation method of this example is as described in the foregoing S1 to S7, and specific steps are not described in detail, and only the effect of the method is shown for the case column data, and the method is implemented on a multi-mode data set with real labels, and is respectively as follows:
CheXpert 5x200 dataset: this multi-class classification dataset has 64740 frontal images dedicated to the CheXpert competition task: cardiac enlargement, edema, pleural effusion, etc., including 64540 training images, 200 test images and their corresponding report forms.
The overall structure of the model is shown in fig. 2, blue is a momentum coding module in fig. 2 and is used for generating momentum features, light yellow is an original feature coding module and is used for generating image and text features, light purple is a clustering-level momentum queue and is used for participating in contrast learning training, and dark yellow is a hash library and is used for storing generated hash codes for searching.
The specific multi-mode medical data retrieval flow is shown in fig. 3, and is defined as a mode data (text/image), which is firstly changed into hash codes through hash model, then the hash codes are subjected to similarity calculation with the stored image/text mode hash codes in Ha Xiku, one or more hash codes which are most similar to the hash codes are found, finally, the corresponding image/text is found, and the result of the query is output.
In order to compare the effectiveness of the method, the model performance proposed by the invention is compared with three most advanced models at present, including DSVE, VSE++, conVIRT (note that the three methods are all multi-modal medical retrieval models and do not involve a hash method, and no medical retrieval model currently uses the hash method for multi-modal data searching).
The detection precision of the detection result of the example is shown in the following table, and the top-K precision is mainly used for measuring the retrieval performance, which is also an important index of multi-mode retrieval, and the larger the numerical value is, the better the retrieval performance is.
Method Prec@5 Prec@10 Prec@100
DSVE 40.60 32.77 24.74
VSE++ 44.28 36.81 26.89
ConVIRT 66.98 63.06 49.03
ours 70.84 68.22 60.69
As shown in the above table, the outer represents the model proposed by the present invention in this example compared to the three most advanced methods (DSVE, VSE++ and ConVIRT). It can be seen that the method of the invention has a significant improvement in performance, and most importantly, the invention stores short and effective hash codes in the database, occupies little storage space, ensures that the search operation can be rapidly realized in the memory of the computer, and greatly improves the search speed.
Through the technical scheme, the multi-mode contrast depth hash retrieval method for medical big data is invented based on the deep learning technology, and the solution provided by the invention fully utilizes the contrast learning and multi-mode hash methods, so that the retrieval performance and the retrieval speed of the medical multi-mode data can be obviously improved.
The method has the advantages that the occupied storage space is small, the cross-modal quick search can be realized, and the method comprises two query modes: in graph searches and in keyword searches. The method comprises the following steps: the original feature coding model and the momentum coding model are constructed by using a transformer (text feature coder) and a vision transformer (image feature coder), the original features and the momentum features of the text mode and the image mode are extracted by using the original feature coding model and the momentum coding model, the original features and the momentum features of the text mode and the image mode are converted into hash features and momentum hash features, the momentum hash features are clustered to obtain cluster centers of each class, the momentum hash features are replaced by the cluster centers, and the cluster centers are stored in a momentum queue. And then, performing contrast learning training on the original hash characteristic and a clustering center in the momentum queue to obtain intra-class and inter-class distinctiveness of the multi-modal hash characteristic. In order to better utilize the semantic information of the original data, the original features of the image and the text modes are subjected to contrast learning, so that the distinction between classes of the multi-mode data is further increased. In addition, the momentum coding model only participates in training and performs the momentum update during training, and the original feature coding model is only used for realizing multi-mode rapid retrieval during test application. The rapid retrieval method provided by the invention fully utilizes the inherent distinguishability of contrast learning to mine large-scale multi-modal medical data, develops an efficient multi-modal medical data retrieval model by utilizing the potential correlation between the medical report and the corresponding x-ray image thereof, reduces the storage space, improves the efficiency of medical large data retrieval, enables doctors to better learn research and clinical diagnosis (given one image/text as input query, can find out a plurality of texts/images with the highest correlation, helps doctors to search similar cases, and shortens the clinical diagnosis time).

Claims (7)

1. The high-efficiency multi-mode contrast depth hash retrieval method for medical big data is characterized by comprising the following steps of:
s1, acquiring a multi-mode data set for model training, wherein the multi-mode data set comprises medical x-ray images and corresponding radiology reports;
s2, extracting original features and momentum features of medical multi-mode data respectively by using an original feature coding model and a momentum coding model, and converting the original features and the momentum features into original hash features and momentum hash features through hash layer processing;
s3, performing clustering operation on the momentum hash features, converting the momentum hash features into clustering hash features, performing contrast learning optimization on the clustering hash features and the original hash features, and mining intra-class and inter-class distinctiveness of the multi-mode hash features;
s4, taking the clustering center as a pseudo tag, and guiding the hash generation network to filter out a large amount of noise in the data;
s5, comparing and learning the original characteristic representations of the image mode and the text mode at the same time, and further mining the similarity among classes of the multi-mode data;
s6, in order to adapt to multi-modal retrieval tasks, cross-modal similarity learning is needed;
s7, using the comparison training frame established by the S1-S6 in the training of the multi-mode comparison hash model to guide the model learning process and help the model to realize more efficient multi-mode medical data retrieval.
2. The efficient multi-modal contrast depth hash method for medical big data according to claim 1, characterized by comprising the steps of:
in step S1, the training dataset comprises a number of medical x-ray images I train And corresponding medical report T train
3. The efficient multi-modal contrast depth hash method for medical big data according to claim 1, characterized by comprising the steps of:
in step S2, original features and momentum features of the medical multi-modal data are extracted using the original feature encoding model and the momentum encoding model, respectively, specifically comprising the following sub-steps:
s21, acquiring two image hash feature encoders ImgNet o And imgNet m The network parameters are respectively theta o And theta m The image hash feature encoder includes vision transformer encoder E v And a hash layer H i
S22, acquiring two text hash feature encoders TexNet o And TexNet m The network parameters are respectivelyAnd->The text hash feature encoder includes a transducer encoder E T And a hash layer H t
S23, for each medical x-ray image I in the training data set train It is first passed through an encoder ImgNet o And imgNet m Respectively generating original image featuresAnd momentum image feature->Then pass through the hash layer H i Generating an original hash feature and a momentum hash feature, respectively>And->
S24, for training dataConcentrated each medical report T train It is encoded with its TexNet o And TexNet m Respectively generating original text features f o t And momentum text featuresThen pass through the hash layer H t Generating an original hash feature and a momentum hash feature, respectively>And->
The mathematical expression is as follows:
4. the efficient multi-modal contrast depth hash method for medical big data according to claim 1, characterized by comprising the steps of:
in step S3, the momentum hash feature is subjected to clustering operation, converted into a clustering hash feature, and then subjected to contrast learning training with the original hash feature, so as to mine intra-class and inter-class distinctiveness of the multi-modal hash feature, and the method specifically comprises the following sub-steps:
s31, hashing image momentumObtaining the clustering center of the K-means clustering algorithm>And clustering center->Store as dynamic queue ready for hashing with text feature +.>Is a contrast study of (2);
s32, hashing image momentumObtaining the clustering center of the K-means clustering algorithm>And clustering center->Store as dynamic queue ready for hashing with text feature +.>Is a contrast study of (2);
s33, adopting contrast loss L h Training a model to expand intra-class and inter-class distinctions of different disease data in medical data:
wherein:<>representing matrix multiplication, τ being a temperature hyper-parameter; lambda is a balanced hyper-parameter, and the similarity between different hash points is measured by dot product; here, the first queryFrom image modality, and key value +.>And->From a text modality; in contrast, second query ++>From text modality, key value->And->From image modalities, where the key values are all from momentum queues of the respective modalities.
5. The efficient multi-modal contrast depth hash method for medical big data according to claim 1, characterized by comprising the steps of:
in step S4, in order to further explore the similarity among classes of the multi-modal data, the original characteristic representation of the image mode and the text mode is subjected to contrast learning; the feature level contrast loss is expressed as:
wherein sim (f x ,f y )=f xT f y /||f x |||f y I, when k=The value of x is 1; fx and fy represent the original features of the image modality and the text modality, respectively.
6. The efficient multi-modal contrast depth hash method for medical big data according to claim 1, characterized by comprising the steps of:
in step S5, in order to adapt to the multi-modal search task, cross-modal similarity learning is also required, and the cross-modal similarity loss is:
wherein S is pair Represents x i And y is i Semantic similarity of (c); s is S pair (b x ,b y )=b xT b y /||b x ||||b y In the multi-tag setting, two instances (x i And y j ) Annotated with a plurality of tags; thus, if x i And y j Sharing at least one tag, then defineOtherwise, go (L)>v denotes the distance boundary and N denotes the number of samples trained for this purpose.
7. The efficient multi-modal contrast depth hash method for medical big data according to claim 1, characterized by comprising the steps of:
in step S7, the comparative training framework established by S1 to S6 is used in the training of the multimodal comparative hash model to guide the model learning process, and specifically includes the following steps:
s71, setting a total loss function of a medical multi-mode contrast hash model (wherein alpha and beta are balance super parameters):
L c =β(L h +L d )+(1-β)L f
L=αL s +(1-α)L c
s72, training a medical multi-mode comparison hash model through a loss function L by using an SGD optimization method and a back propagation algorithm.
CN202310922846.XA 2023-07-26 2023-07-26 Efficient multi-mode contrast depth hash retrieval method for medical big data Pending CN116881336A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310922846.XA CN116881336A (en) 2023-07-26 2023-07-26 Efficient multi-mode contrast depth hash retrieval method for medical big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310922846.XA CN116881336A (en) 2023-07-26 2023-07-26 Efficient multi-mode contrast depth hash retrieval method for medical big data

Publications (1)

Publication Number Publication Date
CN116881336A true CN116881336A (en) 2023-10-13

Family

ID=88258511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310922846.XA Pending CN116881336A (en) 2023-07-26 2023-07-26 Efficient multi-mode contrast depth hash retrieval method for medical big data

Country Status (1)

Country Link
CN (1) CN116881336A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112829A (en) * 2023-10-24 2023-11-24 吉林大学 Medical data cross-modal retrieval method and device and related equipment
CN118035424A (en) * 2024-04-11 2024-05-14 四川大学 Code searching method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112829A (en) * 2023-10-24 2023-11-24 吉林大学 Medical data cross-modal retrieval method and device and related equipment
CN117112829B (en) * 2023-10-24 2024-02-02 吉林大学 Medical data cross-modal retrieval method and device and related equipment
CN118035424A (en) * 2024-04-11 2024-05-14 四川大学 Code searching method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Gong et al. Natural language inference over interaction space
Karpathy et al. Deep visual-semantic alignments for generating image descriptions
CN111949759A (en) Method and system for retrieving medical record text similarity and computer equipment
Shi et al. Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval
CN116881336A (en) Efficient multi-mode contrast depth hash retrieval method for medical big data
CN110364234A (en) Electronic health record intelligent storage analyzing search system and method
CN109994216A (en) A kind of ICD intelligent diagnostics coding method based on machine learning
Allaouzi et al. Automatic caption generation for medical images
CN114239585A (en) Biomedical nested named entity recognition method
Peng et al. A self-attention based deep learning method for lesion attribute detection from CT reports
Pendyala et al. Automated medical diagnosis from clinical data
CN114220516A (en) Brain CT medical report generation method based on hierarchical recurrent neural network decoding
CN116595195A (en) Knowledge graph construction method, device and medium
CN112183104A (en) Code recommendation method, system and corresponding equipment and storage medium
CN110889505A (en) Cross-media comprehensive reasoning method and system for matching image-text sequences
Zou et al. Cross-modal cloze task: A new task to brain-to-word decoding
CN117454217A (en) Deep ensemble learning-based depression emotion recognition method, device and system
Zhang et al. Multi-head self-attention gated-dilated convolutional neural network for word sense disambiguation
Li et al. Stacking-BERT model for Chinese medical procedure entity normalization
Liang et al. Disease prediction based on multi-type data fusion from Chinese electronic health record
Marerngsit et al. A two-stage text-to-emotion depressive disorder screening assistance based on contents from online community
Cárdenas-López et al. Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition
CN116227594A (en) Construction method of high-credibility knowledge graph of medical industry facing multi-source data
Akalya devi et al. Multimodal emotion recognition framework using a decision-level fusion and feature-level fusion approach
Bizzoni et al. Deep learning of binary and gradient judgements for semantic paraphrase

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination