CN113889228A - Semantic enhanced Hash medical image retrieval method based on mixed attention - Google Patents

Semantic enhanced Hash medical image retrieval method based on mixed attention Download PDF

Info

Publication number
CN113889228A
CN113889228A CN202111106128.2A CN202111106128A CN113889228A CN 113889228 A CN113889228 A CN 113889228A CN 202111106128 A CN202111106128 A CN 202111106128A CN 113889228 A CN113889228 A CN 113889228A
Authority
CN
China
Prior art keywords
medical
hash
images
medical image
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111106128.2A
Other languages
Chinese (zh)
Inventor
陈亚雄
李小玉
汤一博
王凡
熊盛武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202111106128.2A priority Critical patent/CN113889228A/en
Publication of CN113889228A publication Critical patent/CN113889228A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/20ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Image Analysis (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention relates to a semantic enhanced hash medical image retrieval method based on mixed attention. Firstly, a data set is divided into a training set and a testing retrieval set, images are randomly selected from the training set to form a medical triple, then an integral network model is constructed, the medical triple sample is used as the input of the network model, finally, the integral network model is trained, and a retrieval result is obtained by using the trained network. The invention utilizes the channel attention module and the space attention module to form a mixed attention mechanism, and can efficiently extract region of interest (ROI) information; the class-level semantic information is utilized to constrain the learning process of the hash codes, which is beneficial to distinguishing different classes of similar hash codes; when the depth embedding is mapped to the discrete hash code, the quantization error between the depth embedding and the hash code is reduced by using the quantization loss item, and the precision of medical image retrieval can be further improved.

Description

Semantic enhanced Hash medical image retrieval method based on mixed attention
Technical Field
The invention belongs to the field of medical image retrieval, and particularly relates to a semantic enhanced hash medical image retrieval method based on mixed attention.
Background
With the rapid development of radiographic imaging technology, medical data is gradually electronized, and the number of medical images is sharply increased. Mining useful information in large-scale medical images is critical to better assist in medical diagnosis and assessment. Therefore, medical image retrieval attracts a wide attention.
Medical image retrieval can be divided into two categories: text-based medical image retrieval and content-based medical image retrieval. Text-based medical image retrieval occurs early in medical image retrieval, which avoids analysis of medical image visualization elements, indexes medical images in terms of name, size, type, etc., and often queries medical images based on keywords. Text-based medical image retrieval relies on highly subjective manual labeling, and text does not fully express the rich semantic content in medical images. Content-based medical image retrieval aims at extracting low-dimensional visual features and high-dimensional semantic features directly from medical images, thereby forming feature vectors as objective bases for indexing and matching images required for retrieval. However, most of the existing content-based medical image retrieval methods only learn the relative relationship of medical images to extract deep features, and ignore the category level semantics of the medical images and labels, which causes the problem of insufficient utilization of high-level semantic information and finally influences retrieval performance.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a semantic enhanced hash medical image retrieval method based on mixed attention. Firstly, a data set is divided into a training set and a testing retrieval set, images are randomly selected from the training set to form a medical triple, then an integral network model is constructed, the medical triple sample is used as the input of the network model, finally, the integral network model is trained, and a retrieval result is obtained by using the trained network.
In order to achieve the above object, the technical solution provided by the present invention is a semantic enhanced hash medical image retrieval method based on mixed attention, comprising the following steps:
step 1, dividing a data set into a training set and a test retrieval set;
step 2, randomly selecting images to form a medical triple;
step 3, constructing an integral network model, and taking the medical triple sample as the input of the network model;
step 4, training an integral network model;
and 5, obtaining a retrieval result by using the trained network.
Furthermore, in step 1, 3 data sets are used, namely a combined selected data set curatedX-Ray of the chest X-Ray image data set COVID-19radio image, a combined selected data set of the chest X-Ray image data set COVID-19radio image and a dermoscope image data set HAM10000, for each data set, 70% of data is selected as a training set, the remaining 30% is selected as a testing and searching set, medical images in the same data set are the same type of medical images, and medical images in different data sets are different types of medical images.
Furthermore, in step 2, given m training images, a training set I ═ I is formed1,I2,...,ImRandomly selecting two medical images of the same kind from a training set as an anchor point image QiAnd a normal example image PiThen randomly selecting one frame and Qi、PiMedical images of different classes as negative example images NiForming a medical triplet T ═ { Q ═ Qi,Pi,Ni}i∈{1,...,m}. Anchor point image Q in triadiAnd a normal example image PiSimilar, negative example image NiAre not similar. When a medical triple sample unit is constructed, a medical image with a small number of samples is selected as a rare image and is used as a negative image of a common sample, so that the rare image is multiplexed in a training stage, and the problem of sample imbalance in the field of medical image retrieval is solved.
Moreover, in the step 3, for each triplet, the three medical images are simultaneously input into a weight-shared twin neural network, and the twin neural network consists of a rolling block, a dense block, a rolling block and a full-connection layer for outputting the hash code; a channel attention module is added between the convolution block and the dense block, and a space attention module is added between the dense block and the convolution block to form a mixed attention mechanism; the channel attention module and the space attention module are used for acquiring the information of the region of interest, so that the dependence between channels and the remarkable characteristics of a space domain can be acquired simultaneously, and the difference of the medical image can be paid attention to more effectively.
Firstly, a medical image obtains a characteristic diagram X belonging to R through a first convolution blockC×H×WWhere H and W represent the height and width of the feature map, respectively, and C represents the number of channels. The input feature map is then compressed by the channel attention module using the average pooling and maximum pooling operations. The channel attention module comprises two continuous convolution layers, the first 1 x 1 convolution is used for projecting the features after the pooling operation to a hidden layer with fewer parameters, and a ReLU function is used as an activation function; the second 1 x 1 convolution is intended to recover the number of channels and uses the sigmoid function as the activation function. And then adding the average pooling vector and the maximum pooling vector element by element, performing weighting operation by using a sigmoid function, and finally multiplying by a feature map X.
The channel attention module may be expressed as:
Figure BDA0003272454990000031
Figure BDA0003272454990000032
in the formula, MC(X) is a one-dimensional channel attention, size C × 1 × 1; conv1×1Represents a convolution operation with a filter size of 1 × 1; sigma represents a sigmoid function; AvgPool (·) is the average pooling function; MaxPool (. cndot.) is the maximum pooling function.
In order to fully utilize the feature map and enhance the transfer of the feature map, the feature map is input into a dense block composed of four dense layers. Input of each dense layerThe egress will pass to each subsequent layer to enable the creation of a short path from the early layer to the later layer. The space attention module is a supplement of the channel attention module, and focuses on the part with the largest sample information amount. Let Y be an element of RC×H×WRepresenting the feature map extracted from the last dense layer, where H and W represent the height and width of the feature map, respectively, and C represents the number of channels, the spatial attention module can be expressed as:
MS(Y)=σ(Conv7×7([AvgPool(X);MaxPool(X)])) (3)
Figure BDA0003272454990000033
in the formula, MS(Y) is a two-dimensional spatial attention map, the size being 1 × H × W; conv7×7Represents a convolution operation with a filter size of 7 x 7; sigma represents a sigmoid function; AvgPool (·) is the average pooling function; MaxPool (. cndot.) is the maximum pooling function.
And finally, mapping the depth embedding to a hash code generation layer, wherein the depth embedding is restricted by semantic enhancement loss, regularization loss and triple cross entropy loss.
And in the step 4, based on the mixed attention mechanism and the twin neural network, the model is trained by optimizing an overall loss function, wherein the overall loss function comprises a hash triple, a semantic enhancement item and a quantization item.
The hash function can map the medical instance into a compact hash code, and meanwhile, the semantic information matched with the medical image and the label in the original space is reserved, and because the Hamming distance of the discrete hash code is not convenient to optimize in the deep learning network, the invention uses the deeply-embedded Euclidean distance output by the linear layer to replace the Hamming distance of the hash code; to capture relative relevance in hash space, the basic triplet terms of a medical image can be represented as:
Figure BDA0003272454990000034
in the formula, | · the luminance | |2Representing a two-normal vector for measuring distance;
Figure BDA0003272454990000035
and
Figure BDA0003272454990000036
indicating a k-bit depth embedding that has not been discretized; δ represents an edge threshold.
Class-level semantics help to distinguish different classes of similar hash codes, and to capture the class-level semantics of medical images, the learning process of hash codes is constrained using matching images and true labels.
The semantic enhancement term can be expressed as:
Figure BDA0003272454990000041
in the formula (I), the compound is shown in the specification,
Figure BDA0003272454990000042
represents a cross-entropy loss function of the entropy of the sample,
Figure BDA0003272454990000043
and
Figure BDA0003272454990000044
respectively represent Qi、Pi、NiThe tag information of (1).
Since the computation of triplet loss is based on depth embedding without discretization, quantization errors will result, which are inspired by iterative quantization, using quantization terms to reduce the quantization error between depth embedding and hash codes.
The quantization term can be expressed as:
Figure BDA0003272454990000045
in the formula, | · the luminance | |2Representing a two-normal vector for measuring distance;
Figure BDA0003272454990000046
and
Figure BDA0003272454990000047
indicating a k-bit depth embedding that has not yet been discretized,
Figure BDA0003272454990000048
and
Figure BDA0003272454990000049
respectively represent Qi、Pi、NiK-bit hash code of (1).
Considering the above three parts, the overall loss function can be expressed as:
Ltotal=Ltri+α×Lse+β×Lqu (8)
in the formula, α and β represent superparameters that control the weight of the loss term.
When the overall network model is trained, the size of a medical triple image is adjusted to 256 × 256, random sampling is taken as input of the network in each round of training, the edge threshold value delta of triple loss is set to 0.5, parameters alpha and beta of an overall loss function are set to 1 and 0.8 respectively, the network optimizes the loss by using an Adam function, the learning rate is 0.001, the performance of Hash code numbers from 8, 16, 32, 48 to 64 and the performance of the most similar images from 5, 10, 15, 20, 25 to 30 are evaluated, and the trained model is obtained by training 100 rounds or until the loss is not reduced any more.
In step 5, the trained network is used to calculate the average hit rate (mHR), average precision (mAP) and average reciprocal rank (mRR) of the sample images in the test data set, and the retrieval performance is evaluated according to the three indexes. Wherein the Hit Rate (HR) is used to measure how many images in the returned list are similar to the query image; in the return list, Average Precision (AP) carries out average operation on the ranking positions of the images similar to the query image, so that the ranking quality is measured; the Reciprocal Rank (RR) refers to the reciprocal position of the first similar image ordering in the return list.
Compared with the prior art, the invention has the following advantages: 1) the invention utilizes the channel attention module and the space attention module to form a mixed attention mechanism, and can efficiently extract region of interest (ROI) information; 2) the class-level semantic information is utilized to constrain the learning process of the hash codes, which is beneficial to distinguishing different classes of similar hash codes; 3) when the depth embedding is mapped to the discrete hash code, the quantization error between the depth embedding and the hash code is reduced by using the quantization loss item, and the precision of medical image retrieval can be further improved.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 is a network structure diagram according to an embodiment of the present invention.
Fig. 3 is a graph comparing the search performance results of the method of the present invention with other methods on different datasets, where fig. 3(a) is the first 10 maps medical search performance using different hash bits on COVID-19 radiographic dataset, fig. 3(b) is the first 10 maps medical search performance using different hash bits on Curated X-Ray dataset, and fig. 3(c) is the first 10 maps medical search performance using different hash bits on HAM10000 dataset.
Fig. 4 is a comparison graph of the retrieval performance results of different retrieval points on different data sets according to the method of the present invention and other methods, where fig. 4(a) is the medical retrieval performance of 48-bit hash codes of different retrieval points on COVID-19radio data set, fig. 4(b) is the medical retrieval performance of 48-bit hash codes of different retrieval points on Curated X-Ray data set, and fig. 4(c) is the medical retrieval performance of 48-bit hash codes of different retrieval points on HAM10000 data set.
FIG. 5 shows the first 10 similar images returned by the method of the present invention by retrieving images on the Curated X-Ray dataset and the HAM10000 dataset, wherein FIG. 5(a) shows the first 10 similar images returned by retrieving images on the Curated X-Ray dataset, FIG. 5(b) shows the first 10 similar images returned by retrieving images on the HAM10000 dataset, and the wrong images are labeled by different names below.
Detailed Description
The invention provides a semantic enhanced hash medical image retrieval method based on mixed attention, which comprises the steps of dividing a data set into a training set and a testing retrieval set, randomly selecting images from the training set to form medical triples, then constructing an integral network model, taking medical triples as the input of the network model, finally training the integral network model, and obtaining a retrieval result by using the trained network.
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
As shown in fig. 1, the process of the embodiment of the present invention includes the following steps:
step 1, dividing a data set into a training set and a test retrieval set.
Three data sets were used, respectively, the combined culled data set curatedX-Ray and the dermoscopic image data set HAM10000 for the chest X-Ray image data set COVID-19Radiography, COVID-19 chest X-Ray images. For each data set, 70% of the data is selected as a training set, the remaining 30% is selected as a testing and retrieving set, the medical images in the same data set are the same type of medical images, and the medical images in different data sets are different types of medical images.
And 2, randomly selecting images to form a medical triple.
Given m training images, a training set I ═ I is formed1,I2,...,ImRandomly selecting two medical images of the same kind from a training set as an anchor point image QiAnd a normal example image PiThen randomly selecting one frame and Qi、PiMedical images of different classes as negative example images NiForming a medical triplet T ═ { Q ═ Qi,Pi,Ni}i∈{1,...,m}. Anchor point image Q in triadiAnd a normal example image PiSimilar, negative example image NiAre not similar. When a medical triple sample unit is constructed, a medical image with a small number of samples is selected as a rare image and is used as a negative image of a common sample, so that the rare image is multiplexed in a training stage, and the problem of sample imbalance in the field of medical image retrieval is solved.
And 3, constructing an integral network model, and taking the medical triple sample as the input of the network model.
For each triplet, three medical images are simultaneously input into the weight-shared twin neural network. As shown in fig. 2, the twin neural network is composed of a convolution block, a dense block, a convolution block, and a fully connected layer for hash code output. And a channel attention module is added between the convolution block and the dense block, and a space attention module is added between the dense block and the convolution block to form a mixed attention mechanism. The channel attention module and the space attention module acquire region of interest (ROI) information, so that the dependence between channels and the remarkable characteristics of a space domain can be acquired simultaneously, and the difference of a medical image can be focused more effectively.
Firstly, a medical image obtains a characteristic diagram X belonging to R through a first convolution blockC×H×WWhere H and W represent the height and width of the feature map, respectively, and C represents the number of channels. The input feature map is then compressed by the channel attention module using the average pooling and maximum pooling operations. The channel attention module comprises two continuous convolution layers, the first 1 x 1 convolution is used for projecting the features after the pooling operation to a hidden layer with fewer parameters, and a ReLU function is used as an activation function; the second 1 x 1 convolution is intended to recover the number of channels and uses the sigmoid function as the activation function. And then adding the average pooling vector and the maximum pooling vector element by element, performing weighting operation by using a sigmoid function, and finally multiplying by a feature map X.
The channel attention module may be expressed as:
Figure BDA0003272454990000071
Figure BDA0003272454990000072
in the formula, MC(X) is a one-dimensional channel attention, size C × 1 × 1; conv1×1Represents a convolution operation with a filter size of 1 × 1; sigma represents a sigmoid function; AvgPool (·) is the average pooling function; MaxPool (. cndot.) is the maximum pooling function.
In order to fully utilize the feature map and enhance the transfer of the feature map, the feature map is input into a dense block composed of four dense layers. The output of each dense tier will be passed to each subsequent tier to enable the creation of a short path from the early tier to the later tier. The space attention module is a supplement of the channel attention module, and focuses on the part with the largest sample information amount. Let Y be an element of RC×H×WRepresenting the feature map extracted from the last dense layer, where H and W represent the height and width of the feature map, respectively, and C represents the number of channels, the spatial attention module can be expressed as:
MS(Y)=σ(Conv7×7([AvgPool(X);MaxPool(X)])) (3)
Figure BDA0003272454990000073
in the formula, MS(Y) is a two-dimensional spatial attention map, the size being 1 × H × W; conv7×7Represents a convolution operation with a filter size of 7 x 7; sigma represents a sigmoid function; AvgPool (·) is the average pooling function; MaxPool (. cndot.) is the maximum pooling function.
And finally, mapping the depth embedding to a hash code generation layer, wherein the depth embedding is restricted by semantic enhancement loss, regularization loss and triple cross entropy loss.
And 4, training the whole network model.
Based on a hybrid attention mechanism and a twin neural network, a model is trained by optimizing an overall loss function, which includes hash triples, semantic enhancement terms, and quantization terms.
The hash function may map the medical instance to a compact hash code while preserving semantic information in the original space that matches the medical image and the label. Since the hamming distance of discrete hash codes is not easily optimized in deep learning networks, the present invention replaces the hamming distance of hash codes with deeply embedded euclidean distances output by linear layers. To capture relative relevance in hash space, the basic triplet terms of a medical image can be represented as:
Figure BDA0003272454990000081
in the formula, | · the luminance | |2Representing a two-normal vector for measuring distance;
Figure BDA0003272454990000082
and
Figure BDA0003272454990000083
indicating a k-bit depth embedding that has not been discretized; δ represents an edge threshold.
Class-level semantics help to distinguish different classes of similar hash codes, and to capture the class-level semantics of medical images, the learning process of hash codes is constrained using matching images and true labels.
The semantic enhancement term can be expressed as:
Figure BDA0003272454990000084
in the formula (I), the compound is shown in the specification,
Figure BDA0003272454990000085
represents a cross-entropy loss function of the entropy of the sample,
Figure BDA0003272454990000086
and
Figure BDA0003272454990000087
respectively represent Qi、Pi、NiThe tag information of (1).
Since the computation of triplet loss is based on depth embedding without discretization, quantization errors will result, which are inspired by iterative quantization, using quantization terms to reduce the quantization error between depth embedding and hash codes.
The quantization term can be expressed as:
Figure BDA0003272454990000088
in the formula, | · the luminance | |2Representing a two-normal vector for measuring distance;
Figure BDA0003272454990000089
and
Figure BDA00032724549900000810
indicating a k-bit depth embedding that has not yet been discretized,
Figure BDA00032724549900000811
and
Figure BDA00032724549900000812
respectively represent Qi、Pi、NiK-bit hash code of (1).
Considering the above three parts, the overall loss function can be expressed as:
Ltotal=Ltri+α×Lse+β×Lqu (8)
in the formula, α and β represent superparameters that control the weight of the loss term.
When the whole network model is trained, the size of the medical triple image is adjusted to 256 multiplied by 256, and random sampling is taken as the input of the network in each round of training. The edge threshold δ of the triplet penalty is set to 0.5 and the parameters α and β of the overall penalty function are set to 1 and 0.8, respectively. The network optimizes the loss using the Adam function with a learning rate of 0.001. The performance of the hash code numbers from 8, 16, 32, 48 to 64 and the performance of the most similar images from 5, 10, 15, 20, 25 to 30 were evaluated. Training for 100 rounds or until the loss is no longer reduced, resulting in a trained model.
And 5, obtaining a retrieval result by using the trained network.
And (3) calculating the average hit rate (mHR), the average precision (mAP) and the average reciprocal ranking (mRR) of the sample images in the test data set by using the trained network, and evaluating the retrieval performance according to the three indexes. Wherein the Hit Rate (HR) is used to measure how many images in the returned list are similar to the query image; in the return list, Average Precision (AP) carries out average operation on the ranking positions of the images similar to the query image, so that the ranking quality is measured; the Reciprocal Rank (RR) refers to the reciprocal position of the first similar image ordering in the return list.
To evaluate the effectiveness of the method of the invention, an ablation experiment was first performed: firstly, the method of the invention is used for extracting the feature (HASE-C) without a channel attention module; secondly, learning a hash function (HASE-S) by utilizing the method without semantic enhancement loss; thirdly, the method of the invention is used for executing a learning hash function (HASE-Q) without considering the quantization item; finally, the method of the invention (HASE) is carried out. Then, the method of the present invention is compared with advanced methods such as ASH, ATH, DHN, DPSH, DSH, DTSH and IDHN in search performance.
TABLE 1
Figure BDA0003272454990000091
Table 1 shows the results of comparative experiments of the present invention with HASE-C, HASE-S, HASE-Q on COVID-19 Radiograpy data sets for different hash bits. The comparison result shows that the average precision index of the method provided by the invention on the COVID-19radiograph data set aiming at the first 10 retrieval results of different hash bits is highest.
TABLE 2
Figure BDA0003272454990000092
Figure BDA0003272454990000101
Table 2 shows the results of experiments comparing the COVID-19 Radiogry dataset, the Current X-Ray dataset and the HAM10000 dataset according to the invention with other methods by means of the indices mHR @10, mAP @10 and mRR @ 10. The comparison result shows that the average accuracy index of the first 10 retrieval results of the method provided by the invention on the three data sets is the highest.
In specific implementation, the above process can adopt computer software technology to realize automatic operation process.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (9)

1. A semantic enhanced hash medical image retrieval method based on mixed attention is characterized by comprising the following steps:
step 1, dividing a data set into a training set and a test retrieval set;
step 2, randomly selecting images to form a medical triple;
step 3, constructing an integral network model, and taking the medical triple sample as the input of the network model;
step 4, training an integral network model;
and 5, obtaining a retrieval result by using the trained network.
2. The semantic enhanced hash medical image retrieval method based on mixed attention as claimed in claim 1, wherein: in step 1, n data sets are used, for each data set, 70% of data is selected as a training set, the remaining 30% of data is selected as a testing and searching set, medical images in the same data set are similar medical images, and medical images in different data sets are different medical images.
3. The semantic enhanced hash medical image retrieval method based on mixed attention as claimed in claim 2, characterized in that: in step 2, given m training images, a training set I ═ I is formed1,I2,...,ImRandomly selecting two medical images of the same kind from a training set as an anchor point image QiAnd a normal example image PiThen randomly selecting one frameAnd Qi、PiMedical images of different classes as negative example images NiForming a medical triplet T ═ { Q ═ Qi,Pi,Ni}i∈{1,...,m}(ii) a When a medical triple sample unit is constructed, a medical image with a small number of samples is selected as a rare image and is used as a negative image of a common sample, so that the rare image is multiplexed in a training stage, and the problem of sample imbalance in the field of medical image retrieval is solved.
4. The semantic enhanced hash medical image retrieval method based on mixed attention as claimed in claim 3, wherein: in step 3, for each triple, three medical images are simultaneously input into a weight-shared twin neural network, and the twin neural network consists of a rolling block, a dense block, a rolling block and a full connection layer for outputting hash codes; a channel attention module is added between the convolution block and the dense block, and a space attention module is added between the dense block and the convolution block to form a mixed attention mechanism; the channel attention module and the space attention module are used for acquiring the information of the region of interest, so that the dependence between channels and the remarkable characteristics of a space domain can be acquired simultaneously, and the difference of the medical image can be paid attention to more effectively.
5. The semantic enhanced hash medical image retrieval method based on mixed attention as claimed in claim 4, wherein: in step 3, a medical image firstly obtains a feature map X belonging to R through a first convolution blockC×H×WWherein H and W represent the height and width of the feature map, respectively, and C represents the number of channels; compressing the input feature map by the channel attention module by using average pooling and maximum pooling operations; the channel attention module comprises two continuous convolution layers, the first 1 x 1 convolution is used for projecting the features after the pooling operation to a hidden layer with fewer parameters, and a ReLU function is used as an activation function; the second 1 × 1 convolution aims at recovering the number of channels and uses the sigmoid function as an activation function; then adding the average pooling vector and the maximum pooling vector element by element, performing weighting operation by using a sigmoid function, and finallyMultiplying by a feature map X;
the channel attention module may be expressed as:
Figure FDA0003272454980000021
Figure FDA0003272454980000022
in the formula, MC(X) is a one-dimensional channel attention, size C × 1 × 1; conv1×1Represents a convolution operation with a filter size of 1 × 1; sigma represents a sigmoid function; AvgPool (·) is the average pooling function; MaxPool (. cndot.) is the maximum pooling function.
6. The semantic enhanced hash medical image retrieval method based on mixed attention as claimed in claim 5, wherein: in step 3, in order to fully utilize the feature diagram and strengthen the transmission of the feature diagram, the feature diagram is input into a dense block consisting of four dense layers, and the output of each dense layer is transmitted to each subsequent layer so as to realize the creation of a short path from an early layer to a later layer; the space attention module is a supplement of the channel attention module, focuses on the part with the maximum sample information quantity, and makes Y be equal to RC×H×WRepresenting the feature map extracted from the last dense layer, where H and W represent the height and width of the feature map, respectively, and C represents the number of channels, the spatial attention module can be expressed as:
MS(Y)=σ(Conv7×7([AvgPool(X);MaxPool(X)])) (3)
Figure FDA0003272454980000023
in the formula, MS(Y) is a two-dimensional spatial attention map, the size being 1 × H × W; conv7×7Represents a convolution operation with a filter size of 7 x 7; sigma represents a sigmoid function; AvgPool (·) is the average pooling function; maxPool (. cndot.) is the maximum pooling function;
and finally, mapping the depth embedding to a hash code generation layer, wherein the depth embedding is restricted by semantic enhancement loss, regularization loss and triple cross entropy loss.
7. The semantic enhanced hash medical image retrieval method based on mixed attention as claimed in claim 1, wherein: training a model by optimizing an overall loss function based on a mixed attention mechanism and a twin neural network in the step 4, wherein the overall loss function comprises a Hash triple, a semantic enhancement item and a quantization item;
the hash function can map the medical instance into a compact hash code while preserving semantic information matching the medical image and the label in the original space, since hamming distance of discrete hash codes is not convenient to be optimized in the deep learning network, the hamming distance of hash codes is replaced by the deeply embedded euclidean distance output by the linear layer; to capture relative relevance in hash space, the basic triplet terms of a medical image can be represented as:
Figure FDA0003272454980000031
in the formula, | · the luminance | |2Representing a two-normal vector for measuring distance;
Figure FDA0003272454980000032
and
Figure FDA0003272454980000033
indicating a k-bit depth embedding that has not been discretized; δ represents an edge threshold;
class-level semantics helps to distinguish different classes of similar hash codes, in order to capture the class-level semantics of medical images, the learning process of the hash codes is constrained using the matched images and the real labels;
the semantic enhancement term can be expressed as:
Figure FDA0003272454980000034
in the formula (I), the compound is shown in the specification,
Figure FDA0003272454980000035
represents a cross-entropy loss function of the entropy of the sample,
Figure FDA0003272454980000036
and
Figure FDA0003272454980000037
respectively represent Qi、Pi、NiThe tag information of (a);
since the computation of the triple loss is based on depth embedding without discretization, quantization errors will be caused, and the quantization error between the depth embedding and the hash code is reduced by using a quantization item under the inspiration of iterative quantization;
the quantization term can be expressed as:
Figure FDA0003272454980000038
in the formula, | · the luminance | |2Representing a two-normal vector for measuring distance;
Figure FDA0003272454980000039
and
Figure FDA00032724549800000310
indicating a k-bit depth embedding that has not yet been discretized,
Figure FDA00032724549800000311
and
Figure FDA00032724549800000312
respectively represent Qi、Pi、NiK-bit hash code of (1);
considering the above three parts, the overall loss function can be expressed as:
Ltotal=Ltri+α×Lse+β×Lqu (8)
in the formula, α and β represent superparameters that control the weight of the loss term.
8. The semantic enhanced hash medical image retrieval method based on mixed attention as claimed in claim 7, wherein: when the whole network model is trained in the step 4, the size of the medical triple image is adjusted to 256 × 256, random sampling is performed in each round of training to serve as input of the network, the edge threshold value delta of triple loss is set to 0.5, parameters alpha and beta of the overall loss function are set to 1 and 0.8 respectively, the network optimizes loss by using an Adam function, the learning rate is 0.001, the performance of the Hash code number from 8, 16, 32, 48 to 64 and the performance of the most similar image from 5, 10, 15, 20, 25 to 30 are evaluated, and the trained model is obtained after 100 rounds of training or until loss is not reduced any more.
9. The semantic enhanced hash medical image retrieval method based on mixed attention as claimed in claim 1, wherein: in the step 5, the trained network is used for calculating the average hit rate, the average precision and the average reciprocal ranking of the sample images in the test data set, and the retrieval performance is evaluated according to the three indexes; wherein, the hit rate is used for measuring how many images in the return list are similar to the query image; in the return list, averaging the ranking positions of the images similar to the query image with average precision, so as to measure the ranking quality; the reciprocal rank refers to the reciprocal position of the ordering of the first similar image in the returned list.
CN202111106128.2A 2021-09-22 2021-09-22 Semantic enhanced Hash medical image retrieval method based on mixed attention Pending CN113889228A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111106128.2A CN113889228A (en) 2021-09-22 2021-09-22 Semantic enhanced Hash medical image retrieval method based on mixed attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111106128.2A CN113889228A (en) 2021-09-22 2021-09-22 Semantic enhanced Hash medical image retrieval method based on mixed attention

Publications (1)

Publication Number Publication Date
CN113889228A true CN113889228A (en) 2022-01-04

Family

ID=79009681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111106128.2A Pending CN113889228A (en) 2021-09-22 2021-09-22 Semantic enhanced Hash medical image retrieval method based on mixed attention

Country Status (1)

Country Link
CN (1) CN113889228A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114792398A (en) * 2022-06-23 2022-07-26 阿里巴巴(中国)有限公司 Image classification method and target data classification model construction method
CN114863138A (en) * 2022-07-08 2022-08-05 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, storage medium, and device
CN115292532A (en) * 2022-06-24 2022-11-04 中南大学 Remote sensing image domain adaptive retrieval method based on pseudo label consistency learning
CN115329118A (en) * 2022-10-14 2022-11-11 山东省凯麟环保设备股份有限公司 Image similarity retrieval method and system for garbage image
CN116662490A (en) * 2023-08-01 2023-08-29 山东大学 Confusion-free text hash algorithm and confusion-free text hash device for fusing hierarchical label information

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114792398A (en) * 2022-06-23 2022-07-26 阿里巴巴(中国)有限公司 Image classification method and target data classification model construction method
CN115292532A (en) * 2022-06-24 2022-11-04 中南大学 Remote sensing image domain adaptive retrieval method based on pseudo label consistency learning
CN115292532B (en) * 2022-06-24 2024-03-15 中南大学 Remote sensing image domain adaptive retrieval method based on pseudo tag consistency learning
CN114863138A (en) * 2022-07-08 2022-08-05 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, storage medium, and device
CN114863138B (en) * 2022-07-08 2022-09-06 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and equipment
CN115329118A (en) * 2022-10-14 2022-11-11 山东省凯麟环保设备股份有限公司 Image similarity retrieval method and system for garbage image
CN115329118B (en) * 2022-10-14 2023-02-28 山东省凯麟环保设备股份有限公司 Image similarity retrieval method and system for garbage image
CN116662490A (en) * 2023-08-01 2023-08-29 山东大学 Confusion-free text hash algorithm and confusion-free text hash device for fusing hierarchical label information
CN116662490B (en) * 2023-08-01 2023-10-13 山东大学 Confusion-free text hash algorithm and confusion-free text hash device for fusing hierarchical label information

Similar Documents

Publication Publication Date Title
CN113889228A (en) Semantic enhanced Hash medical image retrieval method based on mixed attention
CN110059198B (en) Discrete hash retrieval method of cross-modal data based on similarity maintenance
CN109815364B (en) Method and system for extracting, storing and retrieving mass video features
CN106227851B (en) The image search method of depth of seam division search based on depth convolutional neural networks
CN108280187B (en) Hierarchical image retrieval method based on depth features of convolutional neural network
CN110222218B (en) Image retrieval method based on multi-scale NetVLAD and depth hash
CN108334574A (en) A kind of cross-module state search method decomposed based on Harmonious Matrix
CN106874489B (en) Lung nodule image block retrieval method and device based on convolutional neural network
US20120221572A1 (en) Contextual weighting and efficient re-ranking for vocabulary tree based image retrieval
CN109166615B (en) Medical CT image storage and retrieval method based on random forest hash
CN105243154B (en) Remote sensing image retrieval method based on notable point feature and sparse own coding and system
CN109871454B (en) Robust discrete supervision cross-media hash retrieval method
CN111177435B (en) CBIR method based on improved PQ algorithm
CN106844524A (en) A kind of medical image search method converted based on deep learning and Radon
CN113836341B (en) Remote sensing image retrieval method based on unsupervised converter balanced hash
Ma et al. Spatial-content image search in complex scenes
CN112182275A (en) Trademark approximate retrieval system and method based on multi-dimensional feature fusion
CN114579794A (en) Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion
CN117763185A (en) Hash image retrieval method based on thinking space dimension
CN116128846B (en) Visual transducer hash method for lung X-ray image retrieval
CN110442736B (en) Semantic enhancer spatial cross-media retrieval method based on secondary discriminant analysis
CN117609583A (en) Customs import and export commodity classification method based on image text combination retrieval
CN105989094B (en) Image retrieval method based on middle layer expression of hidden layer semantics
Li et al. A novel approach to remote sensing image retrieval with multi-feature VP-tree indexing and online feature selection
CN110069643A (en) The method and system of similar building picture searching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination