CN111127385A

CN111127385A - Medical information cross-modal Hash coding learning method based on generative countermeasure network

Info

Publication number: CN111127385A
Application number: CN201910490562.1A
Authority: CN
Inventors: 黄青松; 贺周雨; 赵晓乐; 刘利军; 冯旭鹏
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2019-06-06
Filing date: 2019-06-06
Publication date: 2020-05-08
Anticipated expiration: 2039-06-06
Also published as: CN111127385B

Abstract

The invention relates to a medical information cross-modal Hash code learning method based on a generative countermeasure network, and belongs to the technical field of medical information processing and information retrieval. The invention adopts a generative confrontation network to learn the Hash codes of the chest CT image and the text, and constrains the learned Hash codes through a semantic similarity matrix. And finally, learning accurate Hash codes, and successfully realizing semantic association between the two modes. According to the invention, on the basis of the lung nodule characteristics of single-layer fine granularity, more complete characteristic information of the three-dimensional lung nodule is extracted, and a Hash code generation model obtained by adopting a supervised training mode in the invention realizes better accuracy in cross-modal retrieval.

Description

Medical information cross-modal Hash coding learning method based on generative countermeasure network

Technical Field

The invention relates to a medical information cross-modal Hash code learning method based on a generative confrontation network, and belongs to the technical field of medical information processing and information retrieval.

Background

The research of computer-aided diagnosis through a deep learning method to solve some problems in the medical field has been paid attention to by more and more researchers and doctors, and lung cancer is one of the most widely studied diseases at present. The early lung cancer is detected by a radiologist in a mode of screening nodules through a chest CT image, and the detection result is stored in a text mode and serves as a diagnosis basis of a clinician. In the early stage, the malignancy degree of the lung nodule is mainly diagnosed by setting a threshold value, observing the change of the nodule volume of the nodule at different time, and finally evaluating the nodule growth rate through a standard formula. At present, students develop multi-modal retrieval research and cross-modal research aiming at two most basic modal data, namely medical images and texts which are commonly used in the medical field, and the main methods are as follows. The CCA is used for learning the correlation matching of the image modality and the text modality, and the cross-modality retrieval performance is improved by combining the semantic matching. Applying KCCA proposes a cross-modal correlation learning framework using hyperlink information to improve the performance of the correlation learning model. A deep typical correlation analysis model DCCA is provided, and the nonlinear mapping of two groups of media data based on maximum correlation is learned through a deep neural network, so that feature representations of different media with correlation in a isomorphic space have strong consistency. A three-view nuclear CCA method is provided, and text and images with the same semantics have good aggregation in a isomorphic space by introducing third high-level semantic view information. The retrieval of the medical images based on the texts and the contents is single-mode retrieval, the retrieval of modal data can be carried out only by relying on single-mode semantic information or even annotation information, and the hidden semantic information among different modes cannot be fully utilized.

The invention provides a cross-modal hash coding learning method of medical information based on a generative countermeasure network, which aims to solve the problems.

Disclosure of Invention

The invention provides a medical information cross-modal Hash code learning method based on a generative confrontation network, and develops cross-modal Hash retrieval method research aiming at the image of a lung nodule and the text description of corresponding pathological information. The invention adopts a generative confrontation network to learn the Hash codes of the chest CT image and the text, and constrains the learned Hash codes through a semantic similarity matrix. And finally, learning accurate Hash codes, and successfully realizing semantic association between the two modes.

The technical scheme of the invention is as follows: a cross-modal Hash code learning method for medical information based on a generative confrontation network comprises the following specific steps:

step1, extracting the characteristics of the chest CT image-text data; firstly, preprocessing a chest CT image, cutting out an ROI image block, and then extracting image features and text features from the ROI image block and the chest CT image-text data through a CMSFF model and a bag-of-words model;

step2, constructing a constraint condition of the discriminator; the discriminator submodule of the learning hash code receives two inputs simultaneously, namely an image characteristic vector and a text characteristic vector of the previous submodule, wherein the image characteristic and the text characteristic are respectively used as real data and generated data, and the discriminator is restricted by a similar matrix to monitor the accuracy of the hash code obtained in the subsequent steps;

performing antagonistic learning in Step3 and a discriminator; in the generative countermeasure network, a discriminator continuously judges whether an input sample is real data or generated data generated by a generator, and feeds back an obtained judgment result to the generator to prompt the generator to continuously adjust parameters and learn the probability distribution of the real data; wherein network parameters are adjusted using antagonistic learning;

step4, learning hash codes; firstly, extracting continuous sample characteristics to obtain a group of discrete values through sign operation, and obtaining a characteristic matrix corresponding to an image and a text; then, constraint is carried out through a similar matrix, so that Hash codes of different modes of the same object are as close as possible, and Hash codes of different objects are as different as possible;

step5, training and optimizing network parameters; in the learning process of the Hash codes, respectively carrying out iterative optimization on the feature generator and the discriminator; optimizing generator parameters θ in a model^p,tThe discriminator parameter theta_DAnd a binary coding parameter B in the model training process; learning hash codes of different modes through a GANHL model, storing the hash codes in a Hash coding database, and simultaneously obtaining a trained model for a cross-mode retrieval system;

step6, retrieving corresponding chest CT image-text information according to the pathological text or ROI image information, and realizing transmembrane state retrieval.

Further, the specific steps of Step1 are as follows:

step1.1, firstly, carrying out data preprocessing; for the image data set, in order to avoid the loss of pixels caused by the direct compression from the size of 512 × 512 to 224 × 224, a method of cutting the original CT image is adopted; cutting out ROI image blocks with the size of R {16 × 16,32 × 32,64 × 64,128 × 128} according to the diameters of lung nodules on the slices; for a text data set, a radiologist generally adopts fixed words for pathological information description of lung nodules, and the words respectively correspond to different pathological levels; because the word does not need to consider the sequence of words, text preprocessing is not needed, and the word bag model is directly used for extracting text characteristics;

step1.2, extracting the characteristics of the chest CT image; extracting image features by adopting a convolutional neural network-based multi-level second-order feature fusion model CMSFF; respectively extracting the features of the ROI image blocks of the 3 layers, and fusing the feature information extracted from different layers of the same node to make up the defect of incomplete expression of the slice feature information of a single layer and finally improve the feature expression capability of local lung nodules; the input of the model is three continuous different ROI image blocks of the same lung nodule, and the output is a feature vector of the lung nodule;

step1.3, extracting the characteristics of the lung node pathological information text; the pathological description of the pulmonary nodule is characterized by the word y_jThe bag-of-words model is used to represent the vector f_jThe bag-of-words vector is input into a multi-layer perceptron network formed by two fully-connected layers, two fully-connected layers fc1 and fc2, wherein fc1 has 4096 layers, and the number of layers of fc2 is the number of hash codes to be generatedA length h; and taking the text feature extraction network as a text feature generator to output the feature vector of the text.

Further, the specific steps of Step2 are as follows:

step2.1, the similar matrix plays a role in supervising the generation of hash codes in cross-modal hash retrieval; aiming at cross-modal retrieval between lung nodules and texts, a similarity matrix is directly constructed through category labels labeled based on pathological information of the lung nodules on the chest; when constructing the similar matrix, taking an input triple as a sample; because each sample corresponds to pathological information of 9 categories, the labels of the 9 categories of each sample are counted to obtain 32 category label information;

step2.2, forming the marking information one-hot of each sample into a 01 vector L_iIf L is_iThe k-th position is 0 and represents L_iWithout the label information, otherwise with the label information, wherein L_iHas a length m of 32. if the number of samples is n, a label matrix LAll of the samples is constructed^n×mThe similarity matrix S can then be obtained by the following equation: (S) LAll × LAll^T> 0, and the size of S is nxn;

and Step2.3, constraining the discriminator by using the obtained similar matrix S for monitoring the accuracy of the Hash code obtained in the subsequent steps.

Further, the specific steps of Step3 are as follows:

step3.1, inputting the feature expression of the image and the feature expression of the text into the discriminator respectively; the features extracted from the image of the lung nodule are more expressive than those extracted from the pathological text, so the feature information of the image is used as the real training data F of the generator^pFeature vector of text as generating feature F of generator^g；

Step3.2, F obtained in Step3.1^pAnd F^gAs the input of the discriminator, the discriminator judges whether the input sample is real data, and then feeds back the discrimination result to the generator; the generator adjusts the self-parameters through the minimum loss function according to the judgment result to learn trueProbability distribution of real data; wherein the loss function is: l is_D＝-(log(1-sigmoid(F^g))+log sigmoid(F^p))

Step3.3, particularly, setting a threshold value g of a discrimination threshold in a discriminator, and when the generated features are not optimal after the discriminator passes g times, re-extracting the feature vector of the ROI image block by a generator, updating real sample data, and performing discrimination training on the output of the generator again.

Further, the specific steps of Step4 are as follows:

step4.1, different modalities of the same object have semantic relation, in cross-modality hashing, data of different modalities generally need to be mapped to a common space, hash codes of different modalities of the same object are enabled to be similar as much as possible, and hash codes of different objects are enabled to be different as much as possible. It is therefore common practice to subject the extracted continuous sample features to a sign operation to obtain a discrete set of values, such as,

performing Sigmoid operation on the group of discrete values to obtain a binary hash code denoted as H;

Step4.2、F^pfeature vectors representing the extracted ROI image blocks, denoted F^tFeature vectors representing learned text, denoted F^gVector representing the generated feature of the generator, wherein F^g＝F^tThen, the cosine similarity Ψ between the text feature of the ith sample and the image feature of the jth sample_ijExpressed as:

let H^pAnd H^tRespectively, image features F^pAnd text feature F^tThe generated hash code is also used for solving the similarity phi_ij；

Step4.3, constructing the loss function of the generative model by the cross entropy loss function as follows:

wherein S is a similarity matrix, α, lambda and delta are hyper-parameters in the model training process, and B^p、B^tRespectively, are Hash codes, H^p、H^tBinary coding obtained through sign operation; and (4) performing back propagation through the loss function, updating the network weight, and obtaining a new Hash code H through Step4.1.

Further, in Step 5:

in neural networks, an alternating optimization strategy is employed, i.e. two parameters are fixed at a time, one parameter is optimized during the random gradient descent. For example, at update θ^p,tIn the process, the parameter θ_DAnd B is fixed and therefore can be considered as a constant ceiling, with the parameters updated by back-propagation from the loss function in step4. Hash codes of different modes are learned through a GANHL model and stored in a Hash code database, and a trained model is obtained and can be used for a cross-mode retrieval system.

The invention has the beneficial effects that:

1. the invention preprocesses the chest CT image and uses a multi-level second-order fusion feature extraction method to extract the image features. Because the position and the size of the lung nodule in the chest CT image have no any rule and can be followed, aiming at the problem, the corresponding size cutting is carried out according to the marked position of the lung nodule in the data preprocessing process, so that the high-level semantic information of the lung nodule is more accurately extracted, and the influence of other organs in the lung on the extraction of the feature of the lung nodule is reduced. And then, a multi-level second-order fusion feature extraction method is adopted, and more complete feature information of the three-dimensional lung nodule is extracted on the basis of the single-level fine-grained lung nodule feature.

2. Semantic association between breast CT image-text is achieved. The extracted lung nodule image characteristic information and the characteristic information of the corresponding text are mapped to a Hamming space, and the obtained Hash code is restrained by constructing a similarity matrix based on data sample class marking. Experiments show that the Hash code generation model obtained by adopting the supervised training mode realizes better accuracy rate in cross-modal retrieval.

In summary, the invention provides a medical information cross-modal hash coding learning method based on a generative countermeasure network. Aiming at the characteristic of huge data volume of chest CT images, a depth hash-based method is adopted to learn hash codes of different modes, and semantic association between the two modes is realized in a Hamming space. The feasibility of the method is verified through experiments, and the hash coding database is constructed through the hash codes learned in the experiments. Finally, a cross-modal search test is carried out between the lung nodule and the text thereof through the trained Hash coding learning model, and the search result proves that the research of the cross-modal search method adopted by the text aiming at the chest CT image-text is feasible.

Drawings

FIG. 1 is a table illustrating a generative confrontation-based network Hash code learning model according to the present invention;

FIG. 2 is a cross-modality chest CT image-text retrieval process according to the present invention;

FIG. 3 is a sample cut of different lung nodules;

FIG. 4 is a diagram of ROI image query and search results in accordance with the present invention;

FIG. 5 is a diagram of the pathological text query and search results of the present invention.

Detailed Description

Example 1: as shown in fig. 1 to 4, a cross-modal hash coding learning method for medical information based on a generative countermeasure network includes the following specific steps:

further, the specific steps of Step1 are as follows:

step1.1, firstly, carrying out data preprocessing; for the image data set, in order to avoid the loss of pixels caused by the direct compression from the size of 512 × 512 to 224 × 224, a method of cutting the original CT image is adopted; cutting out ROI image blocks with the size of R {16 × 16,32 × 32,64 × 64,128 × 128} according to the diameters of lung nodules on the slices; as shown in fig. 3. Fig. 3 shows on the left a ROI image block cut out of the original CT image, and on the right a ROI image block expanded 224 × 224 for different size blocks; for the text data set, the pathological information description of the lung nodules by the radiologist usually adopts fixed words, and the words respectively correspond to different pathological levels; because the sequence of words does not need to be considered in terms, text preprocessing is not needed, and the text features are directly extracted by using a bag-of-words model;

step1.3, extracting the characteristics of the lung node pathological information text; the pathological description of the pulmonary nodule is characterized by the word y_jThe bag-of-words model is used to represent the vector f_jInputting the bag-of-words vector into a multi-layer perceptron network formed by two fully-connected layers, namely two fully-connected layers fc1 and fc2, wherein fc1 has 4096 layers, and the number of layers of fc2 is the length h of the hash code to be generated; and taking the text feature extraction network as a text feature generator to output the feature vector of the text.

further, the specific steps of Step2 are as follows:

further, the specific steps of Step3 are as follows:

Step3.2, F obtained in Step3.1^pAnd F^gAs the input of the discriminator, the discriminator judges whether the input sample is real data, and then feeds back the discrimination result to the generator; the generator adjusts the parameters of the generator according to the judgment result through the minimum loss function so as to learn the probability distribution of the real data; wherein the loss function is: l is_D＝-(log(1-sigmoid(F^g))+log sigmoid(F^p))

further, the specific steps of Step4 are as follows:

Step4.2、F^prepresentation extractionFeature vectors of ROI image blocks of (1), using F^tFeature vectors representing learned text, denoted F^gVector representing the generated feature of the generator, wherein F^g＝F^tThen the cosine similarity ψ of the text feature of the ith sample and the image feature of the jth sample_ijExpressed as:

In the experimental phase, to reduce the amount of computation of the back propagation, pair α is set to 1, and the selection for λ and δ is shown in table 1, where the hash code length is 64 bits each.

TABLE 1 lambda, delta parameter selection comparison experiment table

Step5, training and optimizing network parameters; in the learning process of the Hash codes, respectively carrying out iterative optimization on the feature generator and the discriminator; optimizing generator parameters θ in a model^p,tAnd then, the judgment is madeParameter theta_nAnd a binary coding parameter B in the model training process; learning hash codes of different modes through a GANHL model, storing the hash codes in a Hash coding database, and simultaneously obtaining a trained model for a cross-mode retrieval system;

Specifically, Hash codes of different modes are learned through Step5 and stored in a Hash code database, and a trained model is obtained for the cross-mode retrieval system. Inputting any group of ROI image blocks, obtaining corresponding hash codes through cross-modal retrieval, and further retrieving the optimal result in a hash code database, wherein the retrieval process is shown in fig. 2.

The specific steps of Step6 are as follows:

for image modalities, a set of ROI image blocks x ═ s₁，s₂，s₃) Inputting the image into a retrieval system, performing feature extraction on the image by calling model parameters, and the like to finally obtain Hash code expression of the image, such as formula C^x＝h^(x)(f^(x)(x；θ^p，θ_D) Shown in (c).

And realizing query data of a given image through a GANHL retrieval model, and carrying out approximate nearest neighbor searching through Hamming sequencing and Hash searching strategy. In the invention, the hash search strategy with radius r is used for returning 2r retrieval results which are most similar to the hash search strategy in different modal databases. The search results are shown in fig. 4.

The results are compared for the accuracy and recall obtained by the hash lookup, as shown in table 2. The hash lookup range is (0,8), and the hash code length is 64 bits. The retrieval result shows the retrieval accuracy and recall result under different search radiuses.

TABLE 2 comparison of P, R, F values for different methods

Example 2: as shown in fig. 1 to 5, a cross-modal hash coding learning method for medical information based on a generative countermeasure network is the same as that in embodiment 1, except that:

and Step6, learning out hash codes of different modes through Step5, storing the hash codes in a hash code database, and obtaining a trained model for the cross-mode retrieval system. Inputting text data of any group of lung nodules, obtaining corresponding hash codes through cross-modal retrieval, and further retrieving optimal results in a hash code database, wherein the retrieval process is shown in fig. 2.

The specific steps of Step6 are as follows:

for the text mode, let its input be y, learn its hash code through the GANHL model, as formula C^y＝h^(y)(f^(y)(y；θ^t，θ_D) Shown in (c).

And realizing query data of a given text through a GANHL retrieval model, and carrying out approximate nearest neighbor search through Korean sorting and a Hash search strategy. In the invention, the hash search strategy with radius r is used for returning 2r retrieval results which are most similar to the hash search strategy in different modal databases. The search results are shown in fig. 5.

The results are compared for the accuracy and recall obtained by the hash lookup, as shown in table 3. The hash lookup range is (0,8), and the hash code length is 64 bits. The retrieval result shows the retrieval accuracy and recall result under different search radiuses.

TABLE 3 comparison of P, R, F values for different methods

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. The medical information cross-modal Hash coding learning method based on the generative countermeasure network is characterized by comprising the following steps of: the method comprises the following specific steps:

step5, training and optimizing network parameters; in the learning process of the Hash codes, respectively carrying out iterative optimization on the feature generator and the discriminator; optimizing generator parameters θ in a model^p,tThe discriminator parameter theta_DAnd a binary coding parameter B in the model training process; learning out hash codes of different modes through a GANHL model, storing the hash codes in a hash code database, and simultaneously obtaining a trained model for a cross-mode retrieval system;

2. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step1 are as follows:

step1.1, firstly, carrying out data preprocessing; for the image data set, in order to avoid the loss of pixels caused by the direct compression from the size of 512 × 512 to 224 × 224, a method of cutting the original CT image is adopted; cutting out ROI image blocks with the size of R {16 × 16,32 × 32,64 × 64,128 × 128} according to the diameters of lung nodules on the slices; for a text data set, a radiologist generally adopts fixed words for pathological information description of lung nodules, and the words respectively correspond to different pathological levels; because the sequence of words does not need to be considered in terms, text preprocessing is not needed, and the word bag model is directly used for extracting text features;

step1.2, extracting the characteristics of the chest CT image; extracting image features by adopting a convolutional neural network-based multi-level second-order feature fusion model CMSFF; respectively extracting the features of the ROI image blocks of the 3 layers, and fusing the feature information extracted from different layers of the same nodule to make up the defect of incomplete expression of the slice feature information of a single layer and finally improve the feature expression capability of the local lung nodule; the input of the model is three continuous different ROI image blocks of the same lung nodule, and the output is a feature vector of the lung nodule;

3. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step2 are as follows:

step2.1, the similar matrix plays a role in supervising the generation of hash codes in cross-modal hash retrieval; aiming at cross-modal retrieval between lung nodules and texts, a similarity matrix is directly constructed through category labels labeled based on pathological information of the lung nodules on the chest; when constructing the similar matrix, taking an input triplet as a sample; because each sample corresponds to pathological information of 9 categories, the labels of the 9 categories of each sample are counted to obtain 32 category label information;

step2.2, forming the marking information one-hot of each sample into a 01 vector L_iIf L is_iThe k-th position is 0 and represents L_iWithout the label information, otherwise with the label information, wherein L_iHas a length m of 32. if the number of samples is n, a label matrix LAll of the samples is constructed^n×mThe similarity matrix S can then be obtained by the following equation: (S) LAll × LAll^T>0, and S is of size n × n;

4. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step3 are as follows:

Step3.2, F obtained in Step3.1^pAnd F^gAs input to the discriminator, the discriminator determines whether the input sample is trueReal data, and then the judgment result is fed back to the generator; the generator adjusts the parameters thereof through a minimum loss function according to the judgment result so as to learn the probability distribution of the real data; wherein the loss function is: l is_D＝-(log(1-sigmoid(F^g))+logsigmoid(F^p))

5. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: the specific steps of Step4 are as follows:

6. The generative confrontation network-based medical information cross-modal hash coding learning method according to claim 1, wherein: in Step 5:

in a neural network, an alternating optimization strategy is adopted, namely two parameters are fixed at a time, and one parameter is optimized in the random gradient descent process.