CN111127385A - Medical information cross-modal Hash coding learning method based on generative countermeasure network - Google Patents

Medical information cross-modal Hash coding learning method based on generative countermeasure network Download PDF

Info

Publication number
CN111127385A
CN111127385A CN201910490562.1A CN201910490562A CN111127385A CN 111127385 A CN111127385 A CN 111127385A CN 201910490562 A CN201910490562 A CN 201910490562A CN 111127385 A CN111127385 A CN 111127385A
Authority
CN
China
Prior art keywords
text
image
feature
hash
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910490562.1A
Other languages
Chinese (zh)
Other versions
CN111127385B (en
Inventor
黄青松
贺周雨
赵晓乐
刘利军
冯旭鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201910490562.1A priority Critical patent/CN111127385B/en
Publication of CN111127385A publication Critical patent/CN111127385A/en
Application granted granted Critical
Publication of CN111127385B publication Critical patent/CN111127385B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung
    • G06T2207/30064Lung nodule

Abstract

The invention relates to a medical information cross-modal Hash code learning method based on a generative countermeasure network, and belongs to the technical field of medical information processing and information retrieval. The invention adopts a generative confrontation network to learn the Hash codes of the chest CT image and the text, and constrains the learned Hash codes through a semantic similarity matrix. And finally, learning accurate Hash codes, and successfully realizing semantic association between the two modes. According to the invention, on the basis of the lung nodule characteristics of single-layer fine granularity, more complete characteristic information of the three-dimensional lung nodule is extracted, and a Hash code generation model obtained by adopting a supervised training mode in the invention realizes better accuracy in cross-modal retrieval.

Description

Medical information cross-modal Hash coding learning method based on generative countermeasure network
Technical Field
The invention relates to a medical information cross-modal Hash code learning method based on a generative confrontation network, and belongs to the technical field of medical information processing and information retrieval.
Background
The research of computer-aided diagnosis through a deep learning method to solve some problems in the medical field has been paid attention to by more and more researchers and doctors, and lung cancer is one of the most widely studied diseases at present. The early lung cancer is detected by a radiologist in a mode of screening nodules through a chest CT image, and the detection result is stored in a text mode and serves as a diagnosis basis of a clinician. In the early stage, the malignancy degree of the lung nodule is mainly diagnosed by setting a threshold value, observing the change of the nodule volume of the nodule at different time, and finally evaluating the nodule growth rate through a standard formula. At present, students develop multi-modal retrieval research and cross-modal research aiming at two most basic modal data, namely medical images and texts which are commonly used in the medical field, and the main methods are as follows. The CCA is used for learning the correlation matching of the image modality and the text modality, and the cross-modality retrieval performance is improved by combining the semantic matching. Applying KCCA proposes a cross-modal correlation learning framework using hyperlink information to improve the performance of the correlation learning model. A deep typical correlation analysis model DCCA is provided, and the nonlinear mapping of two groups of media data based on maximum correlation is learned through a deep neural network, so that feature representations of different media with correlation in a isomorphic space have strong consistency. A three-view nuclear CCA method is provided, and text and images with the same semantics have good aggregation in a isomorphic space by introducing third high-level semantic view information. The retrieval of the medical images based on the texts and the contents is single-mode retrieval, the retrieval of modal data can be carried out only by relying on single-mode semantic information or even annotation information, and the hidden semantic information among different modes cannot be fully utilized.
The invention provides a cross-modal hash coding learning method of medical information based on a generative countermeasure network, which aims to solve the problems.
Disclosure of Invention
The invention provides a medical information cross-modal Hash code learning method based on a generative confrontation network, and develops cross-modal Hash retrieval method research aiming at the image of a lung nodule and the text description of corresponding pathological information. The invention adopts a generative confrontation network to learn the Hash codes of the chest CT image and the text, and constrains the learned Hash codes through a semantic similarity matrix. And finally, learning accurate Hash codes, and successfully realizing semantic association between the two modes.
The technical scheme of the invention is as follows: a cross-modal Hash code learning method for medical information based on a generative confrontation network comprises the following specific steps:
step1, extracting the characteristics of the chest CT image-text data; firstly, preprocessing a chest CT image, cutting out an ROI image block, and then extracting image features and text features from the ROI image block and the chest CT image-text data through a CMSFF model and a bag-of-words model;
step2, constructing a constraint condition of the discriminator; the discriminator submodule of the learning hash code receives two inputs simultaneously, namely an image characteristic vector and a text characteristic vector of the previous submodule, wherein the image characteristic and the text characteristic are respectively used as real data and generated data, and the discriminator is restricted by a similar matrix to monitor the accuracy of the hash code obtained in the subsequent steps;
performing antagonistic learning in Step3 and a discriminator; in the generative countermeasure network, a discriminator continuously judges whether an input sample is real data or generated data generated by a generator, and feeds back an obtained judgment result to the generator to prompt the generator to continuously adjust parameters and learn the probability distribution of the real data; wherein network parameters are adjusted using antagonistic learning;
step4, learning hash codes; firstly, extracting continuous sample characteristics to obtain a group of discrete values through sign operation, and obtaining a characteristic matrix corresponding to an image and a text; then, constraint is carried out through a similar matrix, so that Hash codes of different modes of the same object are as close as possible, and Hash codes of different objects are as different as possible;
step5, training and optimizing network parameters; in the learning process of the Hash codes, respectively carrying out iterative optimization on the feature generator and the discriminator; optimizing generator parameters θ in a modelp,tThe discriminator parameter thetaDAnd a binary coding parameter B in the model training process; learning hash codes of different modes through a GANHL model, storing the hash codes in a Hash coding database, and simultaneously obtaining a trained model for a cross-mode retrieval system;
step6, retrieving corresponding chest CT image-text information according to the pathological text or ROI image information, and realizing transmembrane state retrieval.
Further, the specific steps of Step1 are as follows:
step1.1, firstly, carrying out data preprocessing; for the image data set, in order to avoid the loss of pixels caused by the direct compression from the size of 512 × 512 to 224 × 224, a method of cutting the original CT image is adopted; cutting out ROI image blocks with the size of R {16 × 16,32 × 32,64 × 64,128 × 128} according to the diameters of lung nodules on the slices; for a text data set, a radiologist generally adopts fixed words for pathological information description of lung nodules, and the words respectively correspond to different pathological levels; because the word does not need to consider the sequence of words, text preprocessing is not needed, and the word bag model is directly used for extracting text characteristics;
step1.2, extracting the characteristics of the chest CT image; extracting image features by adopting a convolutional neural network-based multi-level second-order feature fusion model CMSFF; respectively extracting the features of the ROI image blocks of the 3 layers, and fusing the feature information extracted from different layers of the same node to make up the defect of incomplete expression of the slice feature information of a single layer and finally improve the feature expression capability of local lung nodules; the input of the model is three continuous different ROI image blocks of the same lung nodule, and the output is a feature vector of the lung nodule;
step1.3, extracting the characteristics of the lung node pathological information text; the pathological description of the pulmonary nodule is characterized by the word yjThe bag-of-words model is used to represent the vector fjThe bag-of-words vector is input into a multi-layer perceptron network formed by two fully-connected layers, two fully-connected layers fc1 and fc2, wherein fc1 has 4096 layers, and the number of layers of fc2 is the number of hash codes to be generatedA length h; and taking the text feature extraction network as a text feature generator to output the feature vector of the text.
Further, the specific steps of Step2 are as follows:
step2.1, the similar matrix plays a role in supervising the generation of hash codes in cross-modal hash retrieval; aiming at cross-modal retrieval between lung nodules and texts, a similarity matrix is directly constructed through category labels labeled based on pathological information of the lung nodules on the chest; when constructing the similar matrix, taking an input triple as a sample; because each sample corresponds to pathological information of 9 categories, the labels of the 9 categories of each sample are counted to obtain 32 category label information;
step2.2, forming the marking information one-hot of each sample into a 01 vector LiIf L isiThe k-th position is 0 and represents LiWithout the label information, otherwise with the label information, wherein LiHas a length m of 32. if the number of samples is n, a label matrix LAll of the samples is constructedn×mThe similarity matrix S can then be obtained by the following equation: (S) LAll × LAllT> 0, and the size of S is nxn;
and Step2.3, constraining the discriminator by using the obtained similar matrix S for monitoring the accuracy of the Hash code obtained in the subsequent steps.
Further, the specific steps of Step3 are as follows:
step3.1, inputting the feature expression of the image and the feature expression of the text into the discriminator respectively; the features extracted from the image of the lung nodule are more expressive than those extracted from the pathological text, so the feature information of the image is used as the real training data F of the generatorpFeature vector of text as generating feature F of generatorg
Step3.2, F obtained in Step3.1pAnd FgAs the input of the discriminator, the discriminator judges whether the input sample is real data, and then feeds back the discrimination result to the generator; the generator adjusts the self-parameters through the minimum loss function according to the judgment result to learn trueProbability distribution of real data; wherein the loss function is: l isD=-(log(1-sigmoid(Fg))+log sigmoid(Fp))
Step3.3, particularly, setting a threshold value g of a discrimination threshold in a discriminator, and when the generated features are not optimal after the discriminator passes g times, re-extracting the feature vector of the ROI image block by a generator, updating real sample data, and performing discrimination training on the output of the generator again.
Further, the specific steps of Step4 are as follows:
step4.1, different modalities of the same object have semantic relation, in cross-modality hashing, data of different modalities generally need to be mapped to a common space, hash codes of different modalities of the same object are enabled to be similar as much as possible, and hash codes of different objects are enabled to be different as much as possible. It is therefore common practice to subject the extracted continuous sample features to a sign operation to obtain a discrete set of values, such as,
Figure BDA0002086858680000041
performing Sigmoid operation on the group of discrete values to obtain a binary hash code denoted as H;
Step4.2、Fpfeature vectors representing the extracted ROI image blocks, denoted FtFeature vectors representing learned text, denoted FgVector representing the generated feature of the generator, wherein Fg=FtThen, the cosine similarity Ψ between the text feature of the ith sample and the image feature of the jth sampleijExpressed as:
Figure BDA0002086858680000042
let HpAnd HtRespectively, image features FpAnd text feature FtThe generated hash code is also used for solving the similarity phiij
Step4.3, constructing the loss function of the generative model by the cross entropy loss function as follows:
Figure BDA0002086858680000043
wherein S is a similarity matrix, α, lambda and delta are hyper-parameters in the model training process, and Bp、BtRespectively, are Hash codes, Hp、HtBinary coding obtained through sign operation; and (4) performing back propagation through the loss function, updating the network weight, and obtaining a new Hash code H through Step4.1.
Further, in Step 5:
in neural networks, an alternating optimization strategy is employed, i.e. two parameters are fixed at a time, one parameter is optimized during the random gradient descent. For example, at update θp,tIn the process, the parameter θDAnd B is fixed and therefore can be considered as a constant ceiling, with the parameters updated by back-propagation from the loss function in step4. Hash codes of different modes are learned through a GANHL model and stored in a Hash code database, and a trained model is obtained and can be used for a cross-mode retrieval system.
The invention has the beneficial effects that:
1. the invention preprocesses the chest CT image and uses a multi-level second-order fusion feature extraction method to extract the image features. Because the position and the size of the lung nodule in the chest CT image have no any rule and can be followed, aiming at the problem, the corresponding size cutting is carried out according to the marked position of the lung nodule in the data preprocessing process, so that the high-level semantic information of the lung nodule is more accurately extracted, and the influence of other organs in the lung on the extraction of the feature of the lung nodule is reduced. And then, a multi-level second-order fusion feature extraction method is adopted, and more complete feature information of the three-dimensional lung nodule is extracted on the basis of the single-level fine-grained lung nodule feature.
2. Semantic association between breast CT image-text is achieved. The extracted lung nodule image characteristic information and the characteristic information of the corresponding text are mapped to a Hamming space, and the obtained Hash code is restrained by constructing a similarity matrix based on data sample class marking. Experiments show that the Hash code generation model obtained by adopting the supervised training mode realizes better accuracy rate in cross-modal retrieval.
In summary, the invention provides a medical information cross-modal hash coding learning method based on a generative countermeasure network. Aiming at the characteristic of huge data volume of chest CT images, a depth hash-based method is adopted to learn hash codes of different modes, and semantic association between the two modes is realized in a Hamming space. The feasibility of the method is verified through experiments, and the hash coding database is constructed through the hash codes learned in the experiments. Finally, a cross-modal search test is carried out between the lung nodule and the text thereof through the trained Hash coding learning model, and the search result proves that the research of the cross-modal search method adopted by the text aiming at the chest CT image-text is feasible.
Drawings
FIG. 1 is a table illustrating a generative confrontation-based network Hash code learning model according to the present invention;
FIG. 2 is a cross-modality chest CT image-text retrieval process according to the present invention;
FIG. 3 is a sample cut of different lung nodules;
FIG. 4 is a diagram of ROI image query and search results in accordance with the present invention;
FIG. 5 is a diagram of the pathological text query and search results of the present invention.
Detailed Description
Example 1: as shown in fig. 1 to 4, a cross-modal hash coding learning method for medical information based on a generative countermeasure network includes the following specific steps:
step1, extracting the characteristics of the chest CT image-text data; firstly, preprocessing a chest CT image, cutting out an ROI image block, and then extracting image features and text features from the ROI image block and the chest CT image-text data through a CMSFF model and a bag-of-words model;
further, the specific steps of Step1 are as follows:
step1.1, firstly, carrying out data preprocessing; for the image data set, in order to avoid the loss of pixels caused by the direct compression from the size of 512 × 512 to 224 × 224, a method of cutting the original CT image is adopted; cutting out ROI image blocks with the size of R {16 × 16,32 × 32,64 × 64,128 × 128} according to the diameters of lung nodules on the slices; as shown in fig. 3. Fig. 3 shows on the left a ROI image block cut out of the original CT image, and on the right a ROI image block expanded 224 × 224 for different size blocks; for the text data set, the pathological information description of the lung nodules by the radiologist usually adopts fixed words, and the words respectively correspond to different pathological levels; because the sequence of words does not need to be considered in terms, text preprocessing is not needed, and the text features are directly extracted by using a bag-of-words model;
step1.2, extracting the characteristics of the chest CT image; extracting image features by adopting a convolutional neural network-based multi-level second-order feature fusion model CMSFF; respectively extracting the features of the ROI image blocks of the 3 layers, and fusing the feature information extracted from different layers of the same node to make up the defect of incomplete expression of the slice feature information of a single layer and finally improve the feature expression capability of local lung nodules; the input of the model is three continuous different ROI image blocks of the same lung nodule, and the output is a feature vector of the lung nodule;
step1.3, extracting the characteristics of the lung node pathological information text; the pathological description of the pulmonary nodule is characterized by the word yjThe bag-of-words model is used to represent the vector fjInputting the bag-of-words vector into a multi-layer perceptron network formed by two fully-connected layers, namely two fully-connected layers fc1 and fc2, wherein fc1 has 4096 layers, and the number of layers of fc2 is the length h of the hash code to be generated; and taking the text feature extraction network as a text feature generator to output the feature vector of the text.
Step2, constructing a constraint condition of the discriminator; the discriminator submodule of the learning hash code receives two inputs simultaneously, namely an image characteristic vector and a text characteristic vector of the previous submodule, wherein the image characteristic and the text characteristic are respectively used as real data and generated data, and the discriminator is restricted by a similar matrix to monitor the accuracy of the hash code obtained in the subsequent steps;
further, the specific steps of Step2 are as follows:
step2.1, the similar matrix plays a role in supervising the generation of hash codes in cross-modal hash retrieval; aiming at cross-modal retrieval between lung nodules and texts, a similarity matrix is directly constructed through category labels labeled based on pathological information of the lung nodules on the chest; when constructing the similar matrix, taking an input triple as a sample; because each sample corresponds to pathological information of 9 categories, the labels of the 9 categories of each sample are counted to obtain 32 category label information;
step2.2, forming the marking information one-hot of each sample into a 01 vector LiIf L isiThe k-th position is 0 and represents LiWithout the label information, otherwise with the label information, wherein LiHas a length m of 32. if the number of samples is n, a label matrix LAll of the samples is constructedn×mThe similarity matrix S can then be obtained by the following equation: (S) LAll × LAllT> 0, and the size of S is nxn;
and Step2.3, constraining the discriminator by using the obtained similar matrix S for monitoring the accuracy of the Hash code obtained in the subsequent steps.
Performing antagonistic learning in Step3 and a discriminator; in the generative countermeasure network, a discriminator continuously judges whether an input sample is real data or generated data generated by a generator, and feeds back an obtained judgment result to the generator to prompt the generator to continuously adjust parameters and learn the probability distribution of the real data; wherein network parameters are adjusted using antagonistic learning;
further, the specific steps of Step3 are as follows:
step3.1, inputting the feature expression of the image and the feature expression of the text into the discriminator respectively; the features extracted from the image of the lung nodule are more expressive than those extracted from the pathological text, so the feature information of the image is used as the real training data F of the generatorpFeature vector of text as generating feature F of generatorg
Step3.2, F obtained in Step3.1pAnd FgAs the input of the discriminator, the discriminator judges whether the input sample is real data, and then feeds back the discrimination result to the generator; the generator adjusts the parameters of the generator according to the judgment result through the minimum loss function so as to learn the probability distribution of the real data; wherein the loss function is: l isD=-(log(1-sigmoid(Fg))+log sigmoid(Fp))
Step3.3, particularly, setting a threshold value g of a discrimination threshold in a discriminator, and when the generated features are not optimal after the discriminator passes g times, re-extracting the feature vector of the ROI image block by a generator, updating real sample data, and performing discrimination training on the output of the generator again.
Step4, learning hash codes; firstly, extracting continuous sample characteristics to obtain a group of discrete values through sign operation, and obtaining a characteristic matrix corresponding to an image and a text; then, constraint is carried out through a similar matrix, so that Hash codes of different modes of the same object are as close as possible, and Hash codes of different objects are as different as possible;
further, the specific steps of Step4 are as follows:
step4.1, different modalities of the same object have semantic relation, in cross-modality hashing, data of different modalities generally need to be mapped to a common space, hash codes of different modalities of the same object are enabled to be similar as much as possible, and hash codes of different objects are enabled to be different as much as possible. It is therefore common practice to subject the extracted continuous sample features to a sign operation to obtain a discrete set of values, such as,
Figure BDA0002086858680000071
performing Sigmoid operation on the group of discrete values to obtain a binary hash code denoted as H;
Step4.2、Fprepresentation extractionFeature vectors of ROI image blocks of (1), using FtFeature vectors representing learned text, denoted FgVector representing the generated feature of the generator, wherein Fg=FtThen the cosine similarity ψ of the text feature of the ith sample and the image feature of the jth sampleijExpressed as:
Figure BDA0002086858680000081
let HpAnd HtRespectively, image features FpAnd text feature FtThe generated hash code is also used for solving the similarity phiij
Step4.3, constructing the loss function of the generative model by the cross entropy loss function as follows:
Figure BDA0002086858680000082
wherein S is a similarity matrix, α, lambda and delta are hyper-parameters in the model training process, and Bp、BtRespectively, are Hash codes, Hp、HtBinary coding obtained through sign operation; and (4) performing back propagation through the loss function, updating the network weight, and obtaining a new Hash code H through Step4.1.
In the experimental phase, to reduce the amount of computation of the back propagation, pair α is set to 1, and the selection for λ and δ is shown in table 1, where the hash code length is 64 bits each.
TABLE 1 lambda, delta parameter selection comparison experiment table
Figure BDA0002086858680000083
Step5, training and optimizing network parameters; in the learning process of the Hash codes, respectively carrying out iterative optimization on the feature generator and the discriminator; optimizing generator parameters θ in a modelp,tAnd then, the judgment is madeParameter thetanAnd a binary coding parameter B in the model training process; learning hash codes of different modes through a GANHL model, storing the hash codes in a Hash coding database, and simultaneously obtaining a trained model for a cross-mode retrieval system;
step6, retrieving corresponding chest CT image-text information according to the pathological text or ROI image information, and realizing transmembrane state retrieval.
Specifically, Hash codes of different modes are learned through Step5 and stored in a Hash code database, and a trained model is obtained for the cross-mode retrieval system. Inputting any group of ROI image blocks, obtaining corresponding hash codes through cross-modal retrieval, and further retrieving the optimal result in a hash code database, wherein the retrieval process is shown in fig. 2.
The specific steps of Step6 are as follows:
for image modalities, a set of ROI image blocks x ═ s1,s2,s3) Inputting the image into a retrieval system, performing feature extraction on the image by calling model parameters, and the like to finally obtain Hash code expression of the image, such as formula Cx=h(x)(f(x)(x;θp,θD) Shown in (c).
And realizing query data of a given image through a GANHL retrieval model, and carrying out approximate nearest neighbor searching through Hamming sequencing and Hash searching strategy. In the invention, the hash search strategy with radius r is used for returning 2r retrieval results which are most similar to the hash search strategy in different modal databases. The search results are shown in fig. 4.
The results are compared for the accuracy and recall obtained by the hash lookup, as shown in table 2. The hash lookup range is (0,8), and the hash code length is 64 bits. The retrieval result shows the retrieval accuracy and recall result under different search radiuses.
TABLE 2 comparison of P, R, F values for different methods
Figure RE-RE-RE-GDA0002300961910000091
Example 2: as shown in fig. 1 to 5, a cross-modal hash coding learning method for medical information based on a generative countermeasure network is the same as that in embodiment 1, except that:
and Step6, learning out hash codes of different modes through Step5, storing the hash codes in a hash code database, and obtaining a trained model for the cross-mode retrieval system. Inputting text data of any group of lung nodules, obtaining corresponding hash codes through cross-modal retrieval, and further retrieving optimal results in a hash code database, wherein the retrieval process is shown in fig. 2.
The specific steps of Step6 are as follows:
for the text mode, let its input be y, learn its hash code through the GANHL model, as formula Cy=h(y)(f(y)(y;θt,θD) Shown in (c).
And realizing query data of a given text through a GANHL retrieval model, and carrying out approximate nearest neighbor search through Korean sorting and a Hash search strategy. In the invention, the hash search strategy with radius r is used for returning 2r retrieval results which are most similar to the hash search strategy in different modal databases. The search results are shown in fig. 5.
The results are compared for the accuracy and recall obtained by the hash lookup, as shown in table 3. The hash lookup range is (0,8), and the hash code length is 64 bits. The retrieval result shows the retrieval accuracy and recall result under different search radiuses.
TABLE 3 comparison of P, R, F values for different methods
Figure RE-GDA0002300961910000101
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (6)

1. The medical information cross-modal Hash coding learning method based on the generative countermeasure network is characterized by comprising the following steps of: the method comprises the following specific steps:
step1, extracting the characteristics of the chest CT image-text data; firstly, preprocessing a chest CT image, cutting out an ROI image block, and then extracting image features and text features from the ROI image block and the chest CT image-text data through a CMSFF model and a bag-of-words model;
step2, constructing a constraint condition of the discriminator; the discriminator submodule of the learning hash code receives two inputs simultaneously, namely an image characteristic vector and a text characteristic vector of the previous submodule, wherein the image characteristic and the text characteristic are respectively used as real data and generated data, and the discriminator is restricted by a similar matrix to monitor the accuracy of the hash code obtained in the subsequent steps;
performing antagonistic learning in Step3 and a discriminator; in the generative countermeasure network, a discriminator continuously judges whether an input sample is real data or generated data generated by a generator, and feeds back an obtained judgment result to the generator to prompt the generator to continuously adjust parameters and learn the probability distribution of the real data; wherein network parameters are adjusted using antagonistic learning;
step4, learning hash codes; firstly, extracting continuous sample characteristics to obtain a group of discrete values through sign operation, and obtaining a characteristic matrix corresponding to an image and a text; then, constraint is carried out through a similar matrix, so that Hash codes of different modes of the same object are as close as possible, and Hash codes of different objects are as different as possible;
step5, training and optimizing network parameters; in the learning process of the Hash codes, respectively carrying out iterative optimization on the feature generator and the discriminator; optimizing generator parameters θ in a modelp,tThe discriminator parameter thetaDAnd a binary coding parameter B in the model training process; learning out hash codes of different modes through a GANHL model, storing the hash codes in a hash code database, and simultaneously obtaining a trained model for a cross-mode retrieval system;
step6, retrieving corresponding chest CT image-text information according to the pathological text or ROI image information, and realizing transmembrane state retrieval.
2. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step1 are as follows:
step1.1, firstly, carrying out data preprocessing; for the image data set, in order to avoid the loss of pixels caused by the direct compression from the size of 512 × 512 to 224 × 224, a method of cutting the original CT image is adopted; cutting out ROI image blocks with the size of R {16 × 16,32 × 32,64 × 64,128 × 128} according to the diameters of lung nodules on the slices; for a text data set, a radiologist generally adopts fixed words for pathological information description of lung nodules, and the words respectively correspond to different pathological levels; because the sequence of words does not need to be considered in terms, text preprocessing is not needed, and the word bag model is directly used for extracting text features;
step1.2, extracting the characteristics of the chest CT image; extracting image features by adopting a convolutional neural network-based multi-level second-order feature fusion model CMSFF; respectively extracting the features of the ROI image blocks of the 3 layers, and fusing the feature information extracted from different layers of the same nodule to make up the defect of incomplete expression of the slice feature information of a single layer and finally improve the feature expression capability of the local lung nodule; the input of the model is three continuous different ROI image blocks of the same lung nodule, and the output is a feature vector of the lung nodule;
step1.3, extracting the characteristics of the lung node pathological information text; the pathological description of the pulmonary nodule is characterized by the word yjThe bag-of-words model is used to represent the vector fjInputting the bag-of-words vector into a multi-layer perceptron network formed by two fully-connected layers, namely two fully-connected layers fc1 and fc2, wherein fc1 has 4096 layers, and the number of layers of fc2 is the length h of the hash code to be generated; and taking the text feature extraction network as a text feature generator to output the feature vector of the text.
3. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step2 are as follows:
step2.1, the similar matrix plays a role in supervising the generation of hash codes in cross-modal hash retrieval; aiming at cross-modal retrieval between lung nodules and texts, a similarity matrix is directly constructed through category labels labeled based on pathological information of the lung nodules on the chest; when constructing the similar matrix, taking an input triplet as a sample; because each sample corresponds to pathological information of 9 categories, the labels of the 9 categories of each sample are counted to obtain 32 category label information;
step2.2, forming the marking information one-hot of each sample into a 01 vector LiIf L isiThe k-th position is 0 and represents LiWithout the label information, otherwise with the label information, wherein LiHas a length m of 32. if the number of samples is n, a label matrix LAll of the samples is constructedn×mThe similarity matrix S can then be obtained by the following equation: (S) LAll × LAllT>0, and S is of size n × n;
and Step2.3, constraining the discriminator by using the obtained similar matrix S for monitoring the accuracy of the Hash code obtained in the subsequent steps.
4. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step3 are as follows:
step3.1, inputting the feature expression of the image and the feature expression of the text into the discriminator respectively; the features extracted from the image of the lung nodule are more expressive than those extracted from the pathological text, so the feature information of the image is used as the real training data F of the generatorpFeature vector of text as generating feature F of generatorg
Step3.2, F obtained in Step3.1pAnd FgAs input to the discriminator, the discriminator determines whether the input sample is trueReal data, and then the judgment result is fed back to the generator; the generator adjusts the parameters thereof through a minimum loss function according to the judgment result so as to learn the probability distribution of the real data; wherein the loss function is: l isD=-(log(1-sigmoid(Fg))+logsigmoid(Fp))
Step3.3, particularly, setting a threshold value g of a discrimination threshold in a discriminator, and when the generated features are not optimal after the discriminator passes g times, re-extracting the feature vector of the ROI image block by a generator, updating real sample data, and performing discrimination training on the output of the generator again.
5. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step4 are as follows:
step4.1, different modalities of the same object have semantic relation, in cross-modality hashing, data of different modalities generally need to be mapped to a common space, hash codes of different modalities of the same object are enabled to be similar as much as possible, and hash codes of different objects are enabled to be different as much as possible. It is therefore common practice to subject the extracted continuous sample features to a sign operation to obtain a discrete set of values, such as,
Figure FDA0002086858670000031
performing Sigmoid operation on the group of discrete values to obtain a binary hash code denoted as H;
Step4.2、Fpfeature vectors representing the extracted ROI image blocks, denoted FtFeature vectors representing learned text, denoted FgVector representing the generated feature of the generator, wherein Fg=FtThen, the cosine similarity Ψ between the text feature of the ith sample and the image feature of the jth sampleijExpressed as:
Figure FDA0002086858670000032
let HpAnd HtRespectively, image features FpAnd text feature FtThe generated hash code is also used for solving the similarity phiij
Step4.3, constructing the loss function of the generative model by the cross entropy loss function as follows:
Figure FDA0002086858670000033
wherein S is a similarity matrix, α, lambda and delta are hyper-parameters in the model training process, and Bp、BtRespectively, are Hash codes, Hp、HtBinary coding obtained through sign operation; and (4) performing back propagation through the loss function, updating the network weight, and obtaining a new Hash code H through Step4.1.
6. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: in Step 5:
in a neural network, an alternating optimization strategy is adopted, namely two parameters are fixed at a time, and one parameter is optimized in the random gradient descent process.
CN201910490562.1A 2019-06-06 2019-06-06 Medical information cross-modal Hash coding learning method based on generative countermeasure network Active CN111127385B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910490562.1A CN111127385B (en) 2019-06-06 2019-06-06 Medical information cross-modal Hash coding learning method based on generative countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910490562.1A CN111127385B (en) 2019-06-06 2019-06-06 Medical information cross-modal Hash coding learning method based on generative countermeasure network

Publications (2)

Publication Number Publication Date
CN111127385A true CN111127385A (en) 2020-05-08
CN111127385B CN111127385B (en) 2023-01-13

Family

ID=70496015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910490562.1A Active CN111127385B (en) 2019-06-06 2019-06-06 Medical information cross-modal Hash coding learning method based on generative countermeasure network

Country Status (1)

Country Link
CN (1) CN111127385B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598155A (en) * 2020-05-13 2020-08-28 北京工业大学 Fine-grained image weak supervision target positioning method based on deep learning
CN111651561A (en) * 2020-06-05 2020-09-11 拾音智能科技有限公司 High-quality difficult sample generation method
CN112085714A (en) * 2020-08-31 2020-12-15 广州视源电子科技股份有限公司 Pulmonary nodule detection method, model training method, device, equipment and medium
CN112115317A (en) * 2020-08-20 2020-12-22 鹏城实验室 Targeted attack method for deep hash retrieval and terminal device
CN112380216A (en) * 2020-11-17 2021-02-19 北京融七牛信息技术有限公司 Automatic feature generation method based on intersection
CN113204522A (en) * 2021-07-05 2021-08-03 中国海洋大学 Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network
CN113270199A (en) * 2021-04-30 2021-08-17 贵州师范大学 Medical cross-modal multi-scale fusion class guidance hash method and system thereof
CN113658683A (en) * 2021-08-05 2021-11-16 重庆金山医疗技术研究院有限公司 Disease diagnosis system and data recommendation method
CN113836901A (en) * 2021-09-14 2021-12-24 灵犀量子(北京)医疗科技有限公司 Chinese and English medicine synonym data cleaning method and system
CN114972929A (en) * 2022-07-29 2022-08-30 中国医学科学院医学信息研究所 Pre-training method and device for medical multi-modal model
CN116431847A (en) * 2023-06-14 2023-07-14 北京邮电大学 Cross-modal hash retrieval method and device based on multiple contrast and double-way countermeasure
CN113270199B (en) * 2021-04-30 2024-04-26 贵州师范大学 Medical cross-mode multi-scale fusion class guide hash method and system thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030078899A1 (en) * 2001-08-13 2003-04-24 Xerox Corporation Fuzzy text categorizer
WO2013073622A1 (en) * 2011-11-18 2013-05-23 日本電気株式会社 Local feature amount extraction device, local feature amount extraction method, and program
CN108596265A (en) * 2018-05-02 2018-09-28 中山大学 Model is generated based on text description information and the video for generating confrontation network
CN109299216A (en) * 2018-10-29 2019-02-01 山东师范大学 A kind of cross-module state Hash search method and system merging supervision message
CN109299342A (en) * 2018-11-30 2019-02-01 武汉大学 A kind of cross-module state search method based on circulation production confrontation network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030078899A1 (en) * 2001-08-13 2003-04-24 Xerox Corporation Fuzzy text categorizer
WO2013073622A1 (en) * 2011-11-18 2013-05-23 日本電気株式会社 Local feature amount extraction device, local feature amount extraction method, and program
CN108596265A (en) * 2018-05-02 2018-09-28 中山大学 Model is generated based on text description information and the video for generating confrontation network
CN109299216A (en) * 2018-10-29 2019-02-01 山东师范大学 A kind of cross-module state Hash search method and system merging supervision message
CN109299342A (en) * 2018-11-30 2019-02-01 武汉大学 A kind of cross-module state search method based on circulation production confrontation network

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
ANUBHAV KUMAR: "An efficient text extraction algorithm in complex images", 《INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING》 *
刘星: "融合局部语义信息的多模态舆情分析模型", 《信息安全研究》 *
张璐: "基于对抗学习的跨模态检索方法研究进展", 《现代计算机》 *
朱治兰 等: "有监督鉴别哈希跨模态检索", 《计算机应用与软件》 *
李帷韬 等: "相似青梅品级半监督智能反馈认知方法研究", 《电子测量与仪器学报》 *
李维 等: "基于CNN多层面二阶特征融合的肺结节分类", 《计算机科学与探索》 *
杨海龙 等: "基于多区域中心加权卷积特征的图像检索", 《软件导刊》 *
温佩芝 等: "基于卷积神经网络改进的图像自动分割方法", 《计算机应用研究》 *
袁绍锋 等: "有条件生成对抗网络的IVUS图像内膜与中-外膜边界检测:", 《中国生物医学工程学报》 *
赵晓乐: "面向胸部CT图像—文本的跨模态哈希检索技术研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
马春光 等: "生成式对抗网络图像增强研究综述", 《信息网络安全》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598155A (en) * 2020-05-13 2020-08-28 北京工业大学 Fine-grained image weak supervision target positioning method based on deep learning
CN111651561A (en) * 2020-06-05 2020-09-11 拾音智能科技有限公司 High-quality difficult sample generation method
CN112115317A (en) * 2020-08-20 2020-12-22 鹏城实验室 Targeted attack method for deep hash retrieval and terminal device
CN112085714B (en) * 2020-08-31 2023-12-15 广州视源电子科技股份有限公司 Pulmonary nodule detection method, model training method, device, equipment and medium
CN112085714A (en) * 2020-08-31 2020-12-15 广州视源电子科技股份有限公司 Pulmonary nodule detection method, model training method, device, equipment and medium
CN112380216A (en) * 2020-11-17 2021-02-19 北京融七牛信息技术有限公司 Automatic feature generation method based on intersection
CN113270199A (en) * 2021-04-30 2021-08-17 贵州师范大学 Medical cross-modal multi-scale fusion class guidance hash method and system thereof
CN113270199B (en) * 2021-04-30 2024-04-26 贵州师范大学 Medical cross-mode multi-scale fusion class guide hash method and system thereof
CN113204522A (en) * 2021-07-05 2021-08-03 中国海洋大学 Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network
CN113204522B (en) * 2021-07-05 2021-09-24 中国海洋大学 Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network
CN113658683A (en) * 2021-08-05 2021-11-16 重庆金山医疗技术研究院有限公司 Disease diagnosis system and data recommendation method
CN113836901B (en) * 2021-09-14 2023-11-14 灵犀量子(北京)医疗科技有限公司 Method and system for cleaning Chinese and English medical synonym data
CN113836901A (en) * 2021-09-14 2021-12-24 灵犀量子(北京)医疗科技有限公司 Chinese and English medicine synonym data cleaning method and system
CN114972929A (en) * 2022-07-29 2022-08-30 中国医学科学院医学信息研究所 Pre-training method and device for medical multi-modal model
CN116431847A (en) * 2023-06-14 2023-07-14 北京邮电大学 Cross-modal hash retrieval method and device based on multiple contrast and double-way countermeasure
CN116431847B (en) * 2023-06-14 2023-11-14 北京邮电大学 Cross-modal hash retrieval method and device based on multiple contrast and double-way countermeasure

Also Published As

Publication number Publication date
CN111127385B (en) 2023-01-13

Similar Documents

Publication Publication Date Title
CN111127385B (en) Medical information cross-modal Hash coding learning method based on generative countermeasure network
CN108984724B (en) Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation
EP3665703B1 (en) Computer-aided diagnostics using deep neural networks
CN109918528A (en) A kind of compact Hash code learning method based on semanteme protection
CN113343125B (en) Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system
CN114358188A (en) Feature extraction model processing method, feature extraction model processing device, sample retrieval method, sample retrieval device and computer equipment
CN112949740B (en) Small sample image classification method based on multilevel measurement
CN113806582B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
Buvana et al. Content-based image retrieval based on hybrid feature extraction and feature selection technique pigeon inspired based optimization
CN111242948A (en) Image processing method, image processing device, model training method, model training device, image processing equipment and storage medium
CN110046660A (en) A kind of product quantization method based on semi-supervised learning
CN111524140B (en) Medical image semantic segmentation method based on CNN and random forest method
Al Zorgani et al. Comparative study of image classification using machine learning algorithms
CN116469561A (en) Breast cancer survival prediction method based on deep learning
Lin et al. A fusion-based convolutional fuzzy neural network for lung cancer classification
CN117371511A (en) Training method, device, equipment and storage medium for image classification model
CN115797795B (en) Remote sensing image question-answer type retrieval system and method based on reinforcement learning
CN111144453A (en) Method and equipment for constructing multi-model fusion calculation model and method and equipment for identifying website data
CN116012903A (en) Automatic labeling method and system for facial expressions
CN114168780A (en) Multimodal data processing method, electronic device, and storage medium
Goundar Improved deep learning model based on integrated convolutional neural networks and transfer learning for shoeprint image classification
Sohail et al. Selection of optimal texture descriptors for retrieving ultrasound medical images
CN117171413B (en) Data processing system and method for digital collection management
Wang et al. Image Classification Based on Improved Unsupervised Clustering Algorithm
Darsana et al. DICOM Image Retrieval Based on Neural Network Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant