CN111127385A - Medical information cross-modal Hash coding learning method based on generative countermeasure network - Google Patents
Medical information cross-modal Hash coding learning method based on generative countermeasure network Download PDFInfo
- Publication number
- CN111127385A CN111127385A CN201910490562.1A CN201910490562A CN111127385A CN 111127385 A CN111127385 A CN 111127385A CN 201910490562 A CN201910490562 A CN 201910490562A CN 111127385 A CN111127385 A CN 111127385A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- feature
- hash
- discriminator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 206010056342 Pulmonary mass Diseases 0.000 claims abstract description 38
- 239000011159 matrix material Substances 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 17
- 239000002356 single layer Substances 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 36
- 230000001575 pathological effect Effects 0.000 claims description 26
- 239000010410 layer Substances 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 16
- 238000005520 cutting process Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 7
- 230000003042 antagnostic effect Effects 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 5
- 210000004072 lung Anatomy 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000007547 defect Effects 0.000 claims description 3
- 238000007907 direct compression Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 230000002685 pulmonary effect Effects 0.000 claims description 3
- 230000010365 information processing Effects 0.000 abstract description 2
- 210000000038 chest Anatomy 0.000 description 19
- 238000011160 research Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 4
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 201000005202 lung cancer Diseases 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000004195 computer-aided diagnosis Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10081—Computed x-ray tomography [CT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30061—Lung
- G06T2207/30064—Lung nodule
Abstract
The invention relates to a medical information cross-modal Hash code learning method based on a generative countermeasure network, and belongs to the technical field of medical information processing and information retrieval. The invention adopts a generative confrontation network to learn the Hash codes of the chest CT image and the text, and constrains the learned Hash codes through a semantic similarity matrix. And finally, learning accurate Hash codes, and successfully realizing semantic association between the two modes. According to the invention, on the basis of the lung nodule characteristics of single-layer fine granularity, more complete characteristic information of the three-dimensional lung nodule is extracted, and a Hash code generation model obtained by adopting a supervised training mode in the invention realizes better accuracy in cross-modal retrieval.
Description
Technical Field
The invention relates to a medical information cross-modal Hash code learning method based on a generative confrontation network, and belongs to the technical field of medical information processing and information retrieval.
Background
The research of computer-aided diagnosis through a deep learning method to solve some problems in the medical field has been paid attention to by more and more researchers and doctors, and lung cancer is one of the most widely studied diseases at present. The early lung cancer is detected by a radiologist in a mode of screening nodules through a chest CT image, and the detection result is stored in a text mode and serves as a diagnosis basis of a clinician. In the early stage, the malignancy degree of the lung nodule is mainly diagnosed by setting a threshold value, observing the change of the nodule volume of the nodule at different time, and finally evaluating the nodule growth rate through a standard formula. At present, students develop multi-modal retrieval research and cross-modal research aiming at two most basic modal data, namely medical images and texts which are commonly used in the medical field, and the main methods are as follows. The CCA is used for learning the correlation matching of the image modality and the text modality, and the cross-modality retrieval performance is improved by combining the semantic matching. Applying KCCA proposes a cross-modal correlation learning framework using hyperlink information to improve the performance of the correlation learning model. A deep typical correlation analysis model DCCA is provided, and the nonlinear mapping of two groups of media data based on maximum correlation is learned through a deep neural network, so that feature representations of different media with correlation in a isomorphic space have strong consistency. A three-view nuclear CCA method is provided, and text and images with the same semantics have good aggregation in a isomorphic space by introducing third high-level semantic view information. The retrieval of the medical images based on the texts and the contents is single-mode retrieval, the retrieval of modal data can be carried out only by relying on single-mode semantic information or even annotation information, and the hidden semantic information among different modes cannot be fully utilized.
The invention provides a cross-modal hash coding learning method of medical information based on a generative countermeasure network, which aims to solve the problems.
Disclosure of Invention
The invention provides a medical information cross-modal Hash code learning method based on a generative confrontation network, and develops cross-modal Hash retrieval method research aiming at the image of a lung nodule and the text description of corresponding pathological information. The invention adopts a generative confrontation network to learn the Hash codes of the chest CT image and the text, and constrains the learned Hash codes through a semantic similarity matrix. And finally, learning accurate Hash codes, and successfully realizing semantic association between the two modes.
The technical scheme of the invention is as follows: a cross-modal Hash code learning method for medical information based on a generative confrontation network comprises the following specific steps:
step1, extracting the characteristics of the chest CT image-text data; firstly, preprocessing a chest CT image, cutting out an ROI image block, and then extracting image features and text features from the ROI image block and the chest CT image-text data through a CMSFF model and a bag-of-words model;
step2, constructing a constraint condition of the discriminator; the discriminator submodule of the learning hash code receives two inputs simultaneously, namely an image characteristic vector and a text characteristic vector of the previous submodule, wherein the image characteristic and the text characteristic are respectively used as real data and generated data, and the discriminator is restricted by a similar matrix to monitor the accuracy of the hash code obtained in the subsequent steps;
performing antagonistic learning in Step3 and a discriminator; in the generative countermeasure network, a discriminator continuously judges whether an input sample is real data or generated data generated by a generator, and feeds back an obtained judgment result to the generator to prompt the generator to continuously adjust parameters and learn the probability distribution of the real data; wherein network parameters are adjusted using antagonistic learning;
step4, learning hash codes; firstly, extracting continuous sample characteristics to obtain a group of discrete values through sign operation, and obtaining a characteristic matrix corresponding to an image and a text; then, constraint is carried out through a similar matrix, so that Hash codes of different modes of the same object are as close as possible, and Hash codes of different objects are as different as possible;
step5, training and optimizing network parameters; in the learning process of the Hash codes, respectively carrying out iterative optimization on the feature generator and the discriminator; optimizing generator parameters θ in a modelp,tThe discriminator parameter thetaDAnd a binary coding parameter B in the model training process; learning hash codes of different modes through a GANHL model, storing the hash codes in a Hash coding database, and simultaneously obtaining a trained model for a cross-mode retrieval system;
step6, retrieving corresponding chest CT image-text information according to the pathological text or ROI image information, and realizing transmembrane state retrieval.
Further, the specific steps of Step1 are as follows:
step1.1, firstly, carrying out data preprocessing; for the image data set, in order to avoid the loss of pixels caused by the direct compression from the size of 512 × 512 to 224 × 224, a method of cutting the original CT image is adopted; cutting out ROI image blocks with the size of R {16 × 16,32 × 32,64 × 64,128 × 128} according to the diameters of lung nodules on the slices; for a text data set, a radiologist generally adopts fixed words for pathological information description of lung nodules, and the words respectively correspond to different pathological levels; because the word does not need to consider the sequence of words, text preprocessing is not needed, and the word bag model is directly used for extracting text characteristics;
step1.2, extracting the characteristics of the chest CT image; extracting image features by adopting a convolutional neural network-based multi-level second-order feature fusion model CMSFF; respectively extracting the features of the ROI image blocks of the 3 layers, and fusing the feature information extracted from different layers of the same node to make up the defect of incomplete expression of the slice feature information of a single layer and finally improve the feature expression capability of local lung nodules; the input of the model is three continuous different ROI image blocks of the same lung nodule, and the output is a feature vector of the lung nodule;
step1.3, extracting the characteristics of the lung node pathological information text; the pathological description of the pulmonary nodule is characterized by the word yjThe bag-of-words model is used to represent the vector fjThe bag-of-words vector is input into a multi-layer perceptron network formed by two fully-connected layers, two fully-connected layers fc1 and fc2, wherein fc1 has 4096 layers, and the number of layers of fc2 is the number of hash codes to be generatedA length h; and taking the text feature extraction network as a text feature generator to output the feature vector of the text.
Further, the specific steps of Step2 are as follows:
step2.1, the similar matrix plays a role in supervising the generation of hash codes in cross-modal hash retrieval; aiming at cross-modal retrieval between lung nodules and texts, a similarity matrix is directly constructed through category labels labeled based on pathological information of the lung nodules on the chest; when constructing the similar matrix, taking an input triple as a sample; because each sample corresponds to pathological information of 9 categories, the labels of the 9 categories of each sample are counted to obtain 32 category label information;
step2.2, forming the marking information one-hot of each sample into a 01 vector LiIf L isiThe k-th position is 0 and represents LiWithout the label information, otherwise with the label information, wherein LiHas a length m of 32. if the number of samples is n, a label matrix LAll of the samples is constructedn×mThe similarity matrix S can then be obtained by the following equation: (S) LAll × LAllT> 0, and the size of S is nxn;
and Step2.3, constraining the discriminator by using the obtained similar matrix S for monitoring the accuracy of the Hash code obtained in the subsequent steps.
Further, the specific steps of Step3 are as follows:
step3.1, inputting the feature expression of the image and the feature expression of the text into the discriminator respectively; the features extracted from the image of the lung nodule are more expressive than those extracted from the pathological text, so the feature information of the image is used as the real training data F of the generatorpFeature vector of text as generating feature F of generatorg;
Step3.2, F obtained in Step3.1pAnd FgAs the input of the discriminator, the discriminator judges whether the input sample is real data, and then feeds back the discrimination result to the generator; the generator adjusts the self-parameters through the minimum loss function according to the judgment result to learn trueProbability distribution of real data; wherein the loss function is: l isD=-(log(1-sigmoid(Fg))+log sigmoid(Fp))
Step3.3, particularly, setting a threshold value g of a discrimination threshold in a discriminator, and when the generated features are not optimal after the discriminator passes g times, re-extracting the feature vector of the ROI image block by a generator, updating real sample data, and performing discrimination training on the output of the generator again.
Further, the specific steps of Step4 are as follows:
step4.1, different modalities of the same object have semantic relation, in cross-modality hashing, data of different modalities generally need to be mapped to a common space, hash codes of different modalities of the same object are enabled to be similar as much as possible, and hash codes of different objects are enabled to be different as much as possible. It is therefore common practice to subject the extracted continuous sample features to a sign operation to obtain a discrete set of values, such as,
performing Sigmoid operation on the group of discrete values to obtain a binary hash code denoted as H;
Step4.2、Fpfeature vectors representing the extracted ROI image blocks, denoted FtFeature vectors representing learned text, denoted FgVector representing the generated feature of the generator, wherein Fg=FtThen, the cosine similarity Ψ between the text feature of the ith sample and the image feature of the jth sampleijExpressed as:let HpAnd HtRespectively, image features FpAnd text feature FtThe generated hash code is also used for solving the similarity phiij;
Step4.3, constructing the loss function of the generative model by the cross entropy loss function as follows:
wherein S is a similarity matrix, α, lambda and delta are hyper-parameters in the model training process, and Bp、BtRespectively, are Hash codes, Hp、HtBinary coding obtained through sign operation; and (4) performing back propagation through the loss function, updating the network weight, and obtaining a new Hash code H through Step4.1.
Further, in Step 5:
in neural networks, an alternating optimization strategy is employed, i.e. two parameters are fixed at a time, one parameter is optimized during the random gradient descent. For example, at update θp,tIn the process, the parameter θDAnd B is fixed and therefore can be considered as a constant ceiling, with the parameters updated by back-propagation from the loss function in step4. Hash codes of different modes are learned through a GANHL model and stored in a Hash code database, and a trained model is obtained and can be used for a cross-mode retrieval system.
The invention has the beneficial effects that:
1. the invention preprocesses the chest CT image and uses a multi-level second-order fusion feature extraction method to extract the image features. Because the position and the size of the lung nodule in the chest CT image have no any rule and can be followed, aiming at the problem, the corresponding size cutting is carried out according to the marked position of the lung nodule in the data preprocessing process, so that the high-level semantic information of the lung nodule is more accurately extracted, and the influence of other organs in the lung on the extraction of the feature of the lung nodule is reduced. And then, a multi-level second-order fusion feature extraction method is adopted, and more complete feature information of the three-dimensional lung nodule is extracted on the basis of the single-level fine-grained lung nodule feature.
2. Semantic association between breast CT image-text is achieved. The extracted lung nodule image characteristic information and the characteristic information of the corresponding text are mapped to a Hamming space, and the obtained Hash code is restrained by constructing a similarity matrix based on data sample class marking. Experiments show that the Hash code generation model obtained by adopting the supervised training mode realizes better accuracy rate in cross-modal retrieval.
In summary, the invention provides a medical information cross-modal hash coding learning method based on a generative countermeasure network. Aiming at the characteristic of huge data volume of chest CT images, a depth hash-based method is adopted to learn hash codes of different modes, and semantic association between the two modes is realized in a Hamming space. The feasibility of the method is verified through experiments, and the hash coding database is constructed through the hash codes learned in the experiments. Finally, a cross-modal search test is carried out between the lung nodule and the text thereof through the trained Hash coding learning model, and the search result proves that the research of the cross-modal search method adopted by the text aiming at the chest CT image-text is feasible.
Drawings
FIG. 1 is a table illustrating a generative confrontation-based network Hash code learning model according to the present invention;
FIG. 2 is a cross-modality chest CT image-text retrieval process according to the present invention;
FIG. 3 is a sample cut of different lung nodules;
FIG. 4 is a diagram of ROI image query and search results in accordance with the present invention;
FIG. 5 is a diagram of the pathological text query and search results of the present invention.
Detailed Description
Example 1: as shown in fig. 1 to 4, a cross-modal hash coding learning method for medical information based on a generative countermeasure network includes the following specific steps:
step1, extracting the characteristics of the chest CT image-text data; firstly, preprocessing a chest CT image, cutting out an ROI image block, and then extracting image features and text features from the ROI image block and the chest CT image-text data through a CMSFF model and a bag-of-words model;
further, the specific steps of Step1 are as follows:
step1.1, firstly, carrying out data preprocessing; for the image data set, in order to avoid the loss of pixels caused by the direct compression from the size of 512 × 512 to 224 × 224, a method of cutting the original CT image is adopted; cutting out ROI image blocks with the size of R {16 × 16,32 × 32,64 × 64,128 × 128} according to the diameters of lung nodules on the slices; as shown in fig. 3. Fig. 3 shows on the left a ROI image block cut out of the original CT image, and on the right a ROI image block expanded 224 × 224 for different size blocks; for the text data set, the pathological information description of the lung nodules by the radiologist usually adopts fixed words, and the words respectively correspond to different pathological levels; because the sequence of words does not need to be considered in terms, text preprocessing is not needed, and the text features are directly extracted by using a bag-of-words model;
step1.2, extracting the characteristics of the chest CT image; extracting image features by adopting a convolutional neural network-based multi-level second-order feature fusion model CMSFF; respectively extracting the features of the ROI image blocks of the 3 layers, and fusing the feature information extracted from different layers of the same node to make up the defect of incomplete expression of the slice feature information of a single layer and finally improve the feature expression capability of local lung nodules; the input of the model is three continuous different ROI image blocks of the same lung nodule, and the output is a feature vector of the lung nodule;
step1.3, extracting the characteristics of the lung node pathological information text; the pathological description of the pulmonary nodule is characterized by the word yjThe bag-of-words model is used to represent the vector fjInputting the bag-of-words vector into a multi-layer perceptron network formed by two fully-connected layers, namely two fully-connected layers fc1 and fc2, wherein fc1 has 4096 layers, and the number of layers of fc2 is the length h of the hash code to be generated; and taking the text feature extraction network as a text feature generator to output the feature vector of the text.
Step2, constructing a constraint condition of the discriminator; the discriminator submodule of the learning hash code receives two inputs simultaneously, namely an image characteristic vector and a text characteristic vector of the previous submodule, wherein the image characteristic and the text characteristic are respectively used as real data and generated data, and the discriminator is restricted by a similar matrix to monitor the accuracy of the hash code obtained in the subsequent steps;
further, the specific steps of Step2 are as follows:
step2.1, the similar matrix plays a role in supervising the generation of hash codes in cross-modal hash retrieval; aiming at cross-modal retrieval between lung nodules and texts, a similarity matrix is directly constructed through category labels labeled based on pathological information of the lung nodules on the chest; when constructing the similar matrix, taking an input triple as a sample; because each sample corresponds to pathological information of 9 categories, the labels of the 9 categories of each sample are counted to obtain 32 category label information;
step2.2, forming the marking information one-hot of each sample into a 01 vector LiIf L isiThe k-th position is 0 and represents LiWithout the label information, otherwise with the label information, wherein LiHas a length m of 32. if the number of samples is n, a label matrix LAll of the samples is constructedn×mThe similarity matrix S can then be obtained by the following equation: (S) LAll × LAllT> 0, and the size of S is nxn;
and Step2.3, constraining the discriminator by using the obtained similar matrix S for monitoring the accuracy of the Hash code obtained in the subsequent steps.
Performing antagonistic learning in Step3 and a discriminator; in the generative countermeasure network, a discriminator continuously judges whether an input sample is real data or generated data generated by a generator, and feeds back an obtained judgment result to the generator to prompt the generator to continuously adjust parameters and learn the probability distribution of the real data; wherein network parameters are adjusted using antagonistic learning;
further, the specific steps of Step3 are as follows:
step3.1, inputting the feature expression of the image and the feature expression of the text into the discriminator respectively; the features extracted from the image of the lung nodule are more expressive than those extracted from the pathological text, so the feature information of the image is used as the real training data F of the generatorpFeature vector of text as generating feature F of generatorg;
Step3.2, F obtained in Step3.1pAnd FgAs the input of the discriminator, the discriminator judges whether the input sample is real data, and then feeds back the discrimination result to the generator; the generator adjusts the parameters of the generator according to the judgment result through the minimum loss function so as to learn the probability distribution of the real data; wherein the loss function is: l isD=-(log(1-sigmoid(Fg))+log sigmoid(Fp))
Step3.3, particularly, setting a threshold value g of a discrimination threshold in a discriminator, and when the generated features are not optimal after the discriminator passes g times, re-extracting the feature vector of the ROI image block by a generator, updating real sample data, and performing discrimination training on the output of the generator again.
Step4, learning hash codes; firstly, extracting continuous sample characteristics to obtain a group of discrete values through sign operation, and obtaining a characteristic matrix corresponding to an image and a text; then, constraint is carried out through a similar matrix, so that Hash codes of different modes of the same object are as close as possible, and Hash codes of different objects are as different as possible;
further, the specific steps of Step4 are as follows:
step4.1, different modalities of the same object have semantic relation, in cross-modality hashing, data of different modalities generally need to be mapped to a common space, hash codes of different modalities of the same object are enabled to be similar as much as possible, and hash codes of different objects are enabled to be different as much as possible. It is therefore common practice to subject the extracted continuous sample features to a sign operation to obtain a discrete set of values, such as,
performing Sigmoid operation on the group of discrete values to obtain a binary hash code denoted as H;
Step4.2、Fprepresentation extractionFeature vectors of ROI image blocks of (1), using FtFeature vectors representing learned text, denoted FgVector representing the generated feature of the generator, wherein Fg=FtThen the cosine similarity ψ of the text feature of the ith sample and the image feature of the jth sampleijExpressed as:let HpAnd HtRespectively, image features FpAnd text feature FtThe generated hash code is also used for solving the similarity phiij;
Step4.3, constructing the loss function of the generative model by the cross entropy loss function as follows:
wherein S is a similarity matrix, α, lambda and delta are hyper-parameters in the model training process, and Bp、BtRespectively, are Hash codes, Hp、HtBinary coding obtained through sign operation; and (4) performing back propagation through the loss function, updating the network weight, and obtaining a new Hash code H through Step4.1.
In the experimental phase, to reduce the amount of computation of the back propagation, pair α is set to 1, and the selection for λ and δ is shown in table 1, where the hash code length is 64 bits each.
TABLE 1 lambda, delta parameter selection comparison experiment table
Step5, training and optimizing network parameters; in the learning process of the Hash codes, respectively carrying out iterative optimization on the feature generator and the discriminator; optimizing generator parameters θ in a modelp,tAnd then, the judgment is madeParameter thetanAnd a binary coding parameter B in the model training process; learning hash codes of different modes through a GANHL model, storing the hash codes in a Hash coding database, and simultaneously obtaining a trained model for a cross-mode retrieval system;
step6, retrieving corresponding chest CT image-text information according to the pathological text or ROI image information, and realizing transmembrane state retrieval.
Specifically, Hash codes of different modes are learned through Step5 and stored in a Hash code database, and a trained model is obtained for the cross-mode retrieval system. Inputting any group of ROI image blocks, obtaining corresponding hash codes through cross-modal retrieval, and further retrieving the optimal result in a hash code database, wherein the retrieval process is shown in fig. 2.
The specific steps of Step6 are as follows:
for image modalities, a set of ROI image blocks x ═ s1,s2,s3) Inputting the image into a retrieval system, performing feature extraction on the image by calling model parameters, and the like to finally obtain Hash code expression of the image, such as formula Cx=h(x)(f(x)(x;θp,θD) Shown in (c).
And realizing query data of a given image through a GANHL retrieval model, and carrying out approximate nearest neighbor searching through Hamming sequencing and Hash searching strategy. In the invention, the hash search strategy with radius r is used for returning 2r retrieval results which are most similar to the hash search strategy in different modal databases. The search results are shown in fig. 4.
The results are compared for the accuracy and recall obtained by the hash lookup, as shown in table 2. The hash lookup range is (0,8), and the hash code length is 64 bits. The retrieval result shows the retrieval accuracy and recall result under different search radiuses.
TABLE 2 comparison of P, R, F values for different methods
Example 2: as shown in fig. 1 to 5, a cross-modal hash coding learning method for medical information based on a generative countermeasure network is the same as that in embodiment 1, except that:
and Step6, learning out hash codes of different modes through Step5, storing the hash codes in a hash code database, and obtaining a trained model for the cross-mode retrieval system. Inputting text data of any group of lung nodules, obtaining corresponding hash codes through cross-modal retrieval, and further retrieving optimal results in a hash code database, wherein the retrieval process is shown in fig. 2.
The specific steps of Step6 are as follows:
for the text mode, let its input be y, learn its hash code through the GANHL model, as formula Cy=h(y)(f(y)(y;θt,θD) Shown in (c).
And realizing query data of a given text through a GANHL retrieval model, and carrying out approximate nearest neighbor search through Korean sorting and a Hash search strategy. In the invention, the hash search strategy with radius r is used for returning 2r retrieval results which are most similar to the hash search strategy in different modal databases. The search results are shown in fig. 5.
The results are compared for the accuracy and recall obtained by the hash lookup, as shown in table 3. The hash lookup range is (0,8), and the hash code length is 64 bits. The retrieval result shows the retrieval accuracy and recall result under different search radiuses.
TABLE 3 comparison of P, R, F values for different methods
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (6)
1. The medical information cross-modal Hash coding learning method based on the generative countermeasure network is characterized by comprising the following steps of: the method comprises the following specific steps:
step1, extracting the characteristics of the chest CT image-text data; firstly, preprocessing a chest CT image, cutting out an ROI image block, and then extracting image features and text features from the ROI image block and the chest CT image-text data through a CMSFF model and a bag-of-words model;
step2, constructing a constraint condition of the discriminator; the discriminator submodule of the learning hash code receives two inputs simultaneously, namely an image characteristic vector and a text characteristic vector of the previous submodule, wherein the image characteristic and the text characteristic are respectively used as real data and generated data, and the discriminator is restricted by a similar matrix to monitor the accuracy of the hash code obtained in the subsequent steps;
performing antagonistic learning in Step3 and a discriminator; in the generative countermeasure network, a discriminator continuously judges whether an input sample is real data or generated data generated by a generator, and feeds back an obtained judgment result to the generator to prompt the generator to continuously adjust parameters and learn the probability distribution of the real data; wherein network parameters are adjusted using antagonistic learning;
step4, learning hash codes; firstly, extracting continuous sample characteristics to obtain a group of discrete values through sign operation, and obtaining a characteristic matrix corresponding to an image and a text; then, constraint is carried out through a similar matrix, so that Hash codes of different modes of the same object are as close as possible, and Hash codes of different objects are as different as possible;
step5, training and optimizing network parameters; in the learning process of the Hash codes, respectively carrying out iterative optimization on the feature generator and the discriminator; optimizing generator parameters θ in a modelp,tThe discriminator parameter thetaDAnd a binary coding parameter B in the model training process; learning out hash codes of different modes through a GANHL model, storing the hash codes in a hash code database, and simultaneously obtaining a trained model for a cross-mode retrieval system;
step6, retrieving corresponding chest CT image-text information according to the pathological text or ROI image information, and realizing transmembrane state retrieval.
2. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step1 are as follows:
step1.1, firstly, carrying out data preprocessing; for the image data set, in order to avoid the loss of pixels caused by the direct compression from the size of 512 × 512 to 224 × 224, a method of cutting the original CT image is adopted; cutting out ROI image blocks with the size of R {16 × 16,32 × 32,64 × 64,128 × 128} according to the diameters of lung nodules on the slices; for a text data set, a radiologist generally adopts fixed words for pathological information description of lung nodules, and the words respectively correspond to different pathological levels; because the sequence of words does not need to be considered in terms, text preprocessing is not needed, and the word bag model is directly used for extracting text features;
step1.2, extracting the characteristics of the chest CT image; extracting image features by adopting a convolutional neural network-based multi-level second-order feature fusion model CMSFF; respectively extracting the features of the ROI image blocks of the 3 layers, and fusing the feature information extracted from different layers of the same nodule to make up the defect of incomplete expression of the slice feature information of a single layer and finally improve the feature expression capability of the local lung nodule; the input of the model is three continuous different ROI image blocks of the same lung nodule, and the output is a feature vector of the lung nodule;
step1.3, extracting the characteristics of the lung node pathological information text; the pathological description of the pulmonary nodule is characterized by the word yjThe bag-of-words model is used to represent the vector fjInputting the bag-of-words vector into a multi-layer perceptron network formed by two fully-connected layers, namely two fully-connected layers fc1 and fc2, wherein fc1 has 4096 layers, and the number of layers of fc2 is the length h of the hash code to be generated; and taking the text feature extraction network as a text feature generator to output the feature vector of the text.
3. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step2 are as follows:
step2.1, the similar matrix plays a role in supervising the generation of hash codes in cross-modal hash retrieval; aiming at cross-modal retrieval between lung nodules and texts, a similarity matrix is directly constructed through category labels labeled based on pathological information of the lung nodules on the chest; when constructing the similar matrix, taking an input triplet as a sample; because each sample corresponds to pathological information of 9 categories, the labels of the 9 categories of each sample are counted to obtain 32 category label information;
step2.2, forming the marking information one-hot of each sample into a 01 vector LiIf L isiThe k-th position is 0 and represents LiWithout the label information, otherwise with the label information, wherein LiHas a length m of 32. if the number of samples is n, a label matrix LAll of the samples is constructedn×mThe similarity matrix S can then be obtained by the following equation: (S) LAll × LAllT>0, and S is of size n × n;
and Step2.3, constraining the discriminator by using the obtained similar matrix S for monitoring the accuracy of the Hash code obtained in the subsequent steps.
4. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step3 are as follows:
step3.1, inputting the feature expression of the image and the feature expression of the text into the discriminator respectively; the features extracted from the image of the lung nodule are more expressive than those extracted from the pathological text, so the feature information of the image is used as the real training data F of the generatorpFeature vector of text as generating feature F of generatorg;
Step3.2, F obtained in Step3.1pAnd FgAs input to the discriminator, the discriminator determines whether the input sample is trueReal data, and then the judgment result is fed back to the generator; the generator adjusts the parameters thereof through a minimum loss function according to the judgment result so as to learn the probability distribution of the real data; wherein the loss function is: l isD=-(log(1-sigmoid(Fg))+logsigmoid(Fp))
Step3.3, particularly, setting a threshold value g of a discrimination threshold in a discriminator, and when the generated features are not optimal after the discriminator passes g times, re-extracting the feature vector of the ROI image block by a generator, updating real sample data, and performing discrimination training on the output of the generator again.
5. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step4 are as follows:
step4.1, different modalities of the same object have semantic relation, in cross-modality hashing, data of different modalities generally need to be mapped to a common space, hash codes of different modalities of the same object are enabled to be similar as much as possible, and hash codes of different objects are enabled to be different as much as possible. It is therefore common practice to subject the extracted continuous sample features to a sign operation to obtain a discrete set of values, such as,
performing Sigmoid operation on the group of discrete values to obtain a binary hash code denoted as H;
Step4.2、Fpfeature vectors representing the extracted ROI image blocks, denoted FtFeature vectors representing learned text, denoted FgVector representing the generated feature of the generator, wherein Fg=FtThen, the cosine similarity Ψ between the text feature of the ith sample and the image feature of the jth sampleijExpressed as:let HpAnd HtRespectively, image features FpAnd text feature FtThe generated hash code is also used for solving the similarity phiij;
Step4.3, constructing the loss function of the generative model by the cross entropy loss function as follows:
wherein S is a similarity matrix, α, lambda and delta are hyper-parameters in the model training process, and Bp、BtRespectively, are Hash codes, Hp、HtBinary coding obtained through sign operation; and (4) performing back propagation through the loss function, updating the network weight, and obtaining a new Hash code H through Step4.1.
6. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: in Step 5:
in a neural network, an alternating optimization strategy is adopted, namely two parameters are fixed at a time, and one parameter is optimized in the random gradient descent process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910490562.1A CN111127385B (en) | 2019-06-06 | 2019-06-06 | Medical information cross-modal Hash coding learning method based on generative countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910490562.1A CN111127385B (en) | 2019-06-06 | 2019-06-06 | Medical information cross-modal Hash coding learning method based on generative countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111127385A true CN111127385A (en) | 2020-05-08 |
CN111127385B CN111127385B (en) | 2023-01-13 |
Family
ID=70496015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910490562.1A Active CN111127385B (en) | 2019-06-06 | 2019-06-06 | Medical information cross-modal Hash coding learning method based on generative countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111127385B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111598155A (en) * | 2020-05-13 | 2020-08-28 | 北京工业大学 | Fine-grained image weak supervision target positioning method based on deep learning |
CN111651561A (en) * | 2020-06-05 | 2020-09-11 | 拾音智能科技有限公司 | High-quality difficult sample generation method |
CN112085714A (en) * | 2020-08-31 | 2020-12-15 | 广州视源电子科技股份有限公司 | Pulmonary nodule detection method, model training method, device, equipment and medium |
CN112115317A (en) * | 2020-08-20 | 2020-12-22 | 鹏城实验室 | Targeted attack method for deep hash retrieval and terminal device |
CN112380216A (en) * | 2020-11-17 | 2021-02-19 | 北京融七牛信息技术有限公司 | Automatic feature generation method based on intersection |
CN113204522A (en) * | 2021-07-05 | 2021-08-03 | 中国海洋大学 | Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network |
CN113270199A (en) * | 2021-04-30 | 2021-08-17 | 贵州师范大学 | Medical cross-modal multi-scale fusion class guidance hash method and system thereof |
CN113658683A (en) * | 2021-08-05 | 2021-11-16 | 重庆金山医疗技术研究院有限公司 | Disease diagnosis system and data recommendation method |
CN113836901A (en) * | 2021-09-14 | 2021-12-24 | 灵犀量子(北京)医疗科技有限公司 | Chinese and English medicine synonym data cleaning method and system |
CN114972929A (en) * | 2022-07-29 | 2022-08-30 | 中国医学科学院医学信息研究所 | Pre-training method and device for medical multi-modal model |
CN116431847A (en) * | 2023-06-14 | 2023-07-14 | 北京邮电大学 | Cross-modal hash retrieval method and device based on multiple contrast and double-way countermeasure |
CN113270199B (en) * | 2021-04-30 | 2024-04-26 | 贵州师范大学 | Medical cross-mode multi-scale fusion class guide hash method and system thereof |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030078899A1 (en) * | 2001-08-13 | 2003-04-24 | Xerox Corporation | Fuzzy text categorizer |
WO2013073622A1 (en) * | 2011-11-18 | 2013-05-23 | 日本電気株式会社 | Local feature amount extraction device, local feature amount extraction method, and program |
CN108596265A (en) * | 2018-05-02 | 2018-09-28 | 中山大学 | Model is generated based on text description information and the video for generating confrontation network |
CN109299216A (en) * | 2018-10-29 | 2019-02-01 | 山东师范大学 | A kind of cross-module state Hash search method and system merging supervision message |
CN109299342A (en) * | 2018-11-30 | 2019-02-01 | 武汉大学 | A kind of cross-module state search method based on circulation production confrontation network |
-
2019
- 2019-06-06 CN CN201910490562.1A patent/CN111127385B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030078899A1 (en) * | 2001-08-13 | 2003-04-24 | Xerox Corporation | Fuzzy text categorizer |
WO2013073622A1 (en) * | 2011-11-18 | 2013-05-23 | 日本電気株式会社 | Local feature amount extraction device, local feature amount extraction method, and program |
CN108596265A (en) * | 2018-05-02 | 2018-09-28 | 中山大学 | Model is generated based on text description information and the video for generating confrontation network |
CN109299216A (en) * | 2018-10-29 | 2019-02-01 | 山东师范大学 | A kind of cross-module state Hash search method and system merging supervision message |
CN109299342A (en) * | 2018-11-30 | 2019-02-01 | 武汉大学 | A kind of cross-module state search method based on circulation production confrontation network |
Non-Patent Citations (11)
Title |
---|
ANUBHAV KUMAR: "An efficient text extraction algorithm in complex images", 《INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING》 * |
刘星: "融合局部语义信息的多模态舆情分析模型", 《信息安全研究》 * |
张璐: "基于对抗学习的跨模态检索方法研究进展", 《现代计算机》 * |
朱治兰 等: "有监督鉴别哈希跨模态检索", 《计算机应用与软件》 * |
李帷韬 等: "相似青梅品级半监督智能反馈认知方法研究", 《电子测量与仪器学报》 * |
李维 等: "基于CNN多层面二阶特征融合的肺结节分类", 《计算机科学与探索》 * |
杨海龙 等: "基于多区域中心加权卷积特征的图像检索", 《软件导刊》 * |
温佩芝 等: "基于卷积神经网络改进的图像自动分割方法", 《计算机应用研究》 * |
袁绍锋 等: "有条件生成对抗网络的IVUS图像内膜与中-外膜边界检测:", 《中国生物医学工程学报》 * |
赵晓乐: "面向胸部CT图像—文本的跨模态哈希检索技术研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
马春光 等: "生成式对抗网络图像增强研究综述", 《信息网络安全》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111598155A (en) * | 2020-05-13 | 2020-08-28 | 北京工业大学 | Fine-grained image weak supervision target positioning method based on deep learning |
CN111651561A (en) * | 2020-06-05 | 2020-09-11 | 拾音智能科技有限公司 | High-quality difficult sample generation method |
CN112115317A (en) * | 2020-08-20 | 2020-12-22 | 鹏城实验室 | Targeted attack method for deep hash retrieval and terminal device |
CN112085714B (en) * | 2020-08-31 | 2023-12-15 | 广州视源电子科技股份有限公司 | Pulmonary nodule detection method, model training method, device, equipment and medium |
CN112085714A (en) * | 2020-08-31 | 2020-12-15 | 广州视源电子科技股份有限公司 | Pulmonary nodule detection method, model training method, device, equipment and medium |
CN112380216A (en) * | 2020-11-17 | 2021-02-19 | 北京融七牛信息技术有限公司 | Automatic feature generation method based on intersection |
CN113270199A (en) * | 2021-04-30 | 2021-08-17 | 贵州师范大学 | Medical cross-modal multi-scale fusion class guidance hash method and system thereof |
CN113270199B (en) * | 2021-04-30 | 2024-04-26 | 贵州师范大学 | Medical cross-mode multi-scale fusion class guide hash method and system thereof |
CN113204522A (en) * | 2021-07-05 | 2021-08-03 | 中国海洋大学 | Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network |
CN113204522B (en) * | 2021-07-05 | 2021-09-24 | 中国海洋大学 | Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network |
CN113658683A (en) * | 2021-08-05 | 2021-11-16 | 重庆金山医疗技术研究院有限公司 | Disease diagnosis system and data recommendation method |
CN113836901B (en) * | 2021-09-14 | 2023-11-14 | 灵犀量子(北京)医疗科技有限公司 | Method and system for cleaning Chinese and English medical synonym data |
CN113836901A (en) * | 2021-09-14 | 2021-12-24 | 灵犀量子(北京)医疗科技有限公司 | Chinese and English medicine synonym data cleaning method and system |
CN114972929A (en) * | 2022-07-29 | 2022-08-30 | 中国医学科学院医学信息研究所 | Pre-training method and device for medical multi-modal model |
CN116431847A (en) * | 2023-06-14 | 2023-07-14 | 北京邮电大学 | Cross-modal hash retrieval method and device based on multiple contrast and double-way countermeasure |
CN116431847B (en) * | 2023-06-14 | 2023-11-14 | 北京邮电大学 | Cross-modal hash retrieval method and device based on multiple contrast and double-way countermeasure |
Also Published As
Publication number | Publication date |
---|---|
CN111127385B (en) | 2023-01-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111127385B (en) | Medical information cross-modal Hash coding learning method based on generative countermeasure network | |
CN108984724B (en) | Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation | |
EP3665703B1 (en) | Computer-aided diagnostics using deep neural networks | |
CN109918528A (en) | A kind of compact Hash code learning method based on semanteme protection | |
CN113343125B (en) | Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system | |
CN114358188A (en) | Feature extraction model processing method, feature extraction model processing device, sample retrieval method, sample retrieval device and computer equipment | |
CN112949740B (en) | Small sample image classification method based on multilevel measurement | |
CN113806582B (en) | Image retrieval method, image retrieval device, electronic equipment and storage medium | |
Buvana et al. | Content-based image retrieval based on hybrid feature extraction and feature selection technique pigeon inspired based optimization | |
CN111242948A (en) | Image processing method, image processing device, model training method, model training device, image processing equipment and storage medium | |
CN110046660A (en) | A kind of product quantization method based on semi-supervised learning | |
CN111524140B (en) | Medical image semantic segmentation method based on CNN and random forest method | |
Al Zorgani et al. | Comparative study of image classification using machine learning algorithms | |
CN116469561A (en) | Breast cancer survival prediction method based on deep learning | |
Lin et al. | A fusion-based convolutional fuzzy neural network for lung cancer classification | |
CN117371511A (en) | Training method, device, equipment and storage medium for image classification model | |
CN115797795B (en) | Remote sensing image question-answer type retrieval system and method based on reinforcement learning | |
CN111144453A (en) | Method and equipment for constructing multi-model fusion calculation model and method and equipment for identifying website data | |
CN116012903A (en) | Automatic labeling method and system for facial expressions | |
CN114168780A (en) | Multimodal data processing method, electronic device, and storage medium | |
Goundar | Improved deep learning model based on integrated convolutional neural networks and transfer learning for shoeprint image classification | |
Sohail et al. | Selection of optimal texture descriptors for retrieving ultrasound medical images | |
CN117171413B (en) | Data processing system and method for digital collection management | |
Wang et al. | Image Classification Based on Improved Unsupervised Clustering Algorithm | |
Darsana et al. | DICOM Image Retrieval Based on Neural Network Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |