CN111797267A

CN111797267A - Medical image retrieval method and system, electronic device and storage medium

Info

Publication number: CN111797267A
Application number: CN202010675891.6A
Authority: CN
Inventors: 刘伟; 裴世宇; 折强; 卫毅然; 刘承乾
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2020-07-14
Filing date: 2020-07-14
Publication date: 2020-10-20

Abstract

The invention discloses a medical image retrieval method, which classifies images to be queried and retrieves the images in the same category as the images to be queried by using a sorting fusion algorithm.

Description

Medical image retrieval method and system, electronic device and storage medium

Technical Field

The invention belongs to the technical field of image retrieval, and relates to a medical image retrieval method and system, electronic equipment and a storage medium, in particular to a medical image retrieval method and system, electronic equipment and a storage medium based on sorting and fusing of multiple classes.

Background

The current image has higher complexity, a large amount of multi-label image data is often encountered when a retrieval task is carried out, for example, an image that a person holds a dog is provided, label information of the image not only has a label of 'person' but also a label of 'dog', which undoubtedly increases the retrieval difficulty, and the retrieval accuracy is too low.

For example, in the field of breast molybdenum target lesion image retrieval, a retrieval method provided by a Picture Archiving and Communication System (PACS) widely used in hospitals at all levels at present is text-based, and images cannot be retrieved according to the image contents themselves. And the physical characteristics and the visual characteristics of the lesion area of the mammary gland image are difficult to be described exactly by characters, and the text retrieval function of a PACS (Picture archiving and communication System) cannot be used. A Content-Based Image Retrieval (CBIR) method may retrieve cases similar thereto according to the contents of the lesion Image.

Many researchers at home and abroad use the image retrieval method in the breast lesion image retrieval research, and although the research at home and abroad has made a certain progress, the research at home and abroad mainly focuses on extracting visual features of lesion images and improving the classification accuracy by adopting a machine learning model. For image retrieval, it is detected that the image set itself contains important information such as an image ranking number. How to improve the retrieval accuracy of the focus image by using the information is a problem worthy of research; secondly, deep learning models have enjoyed great success in image recognition in recent years. How to apply the deep learning model to the tumor focus image retrieval is also a problem worthy of exploration.

The traditional image retrieval 'Query By Example' (QBE) framework directly calculates the distance between a Query image and each database image By using a distance formula, and selects a group of images with the closest distance as a retrieval result according to ranking, but the method has low classification accuracy, high false positive of the retrieval result and long retrieval time.

Disclosure of Invention

In view of the above-mentioned deficiencies in the prior art, the present invention provides a medical image retrieval method.

The invention discloses a medical image retrieval method, which comprises the following steps:

training a classifier model using the image dataset;

classifying the image to be queried by using the trained classifier model to obtain the category of the image to be queried, and obtaining an image set of the category of the image to be queried from the image data set;

calculating the similarity of the image to be queried and each image in the image set of the category to which the image belongs, performing descending order arrangement on all the images in the image set of the category to which the image belongs according to the similarity, and selecting the first 2K images as neighborhoods of the image to be queried, wherein K is a natural number;

constructing an undirected graph by taking the image characteristics of the image to be queried as a center, calculating the image characteristics of the image to be queried and the edge weight values of all images in the field, performing descending order arrangement according to the edge weight values, and selecting the front K pieces of images as undirected subgraphs of the image to be queried;

fusing a plurality of undirected subgraphs obtained based on a plurality of different similarity algorithms of the image to be inquired and each image in the image set of the category to which the image belongs to form a weight graph;

arranging all the images in the weight map in a descending order according to the weight value, and selecting the first K images as a retrieval list;

and taking the i images in front of the retrieval list as a whole to calculate a regression score value to obtain a final retrieval result of the i image, and outputting the retrieval result, wherein i is a positive integer in K.

The formula of the edge weight value is as follows:

w(I_m,I_n)＝α(I_m,I_n)·J(I_m,I_n)；

α(I_m,I_n)＝d(I_m,I_n)·b(I_m,I_n)；

wherein, w (I)_m,I_n) Is the edge weight value; alpha (I)_m,I_n) Is a gain factor; j (I)_m,I_n) Is Jaccard similarity coefficient; i is_mIs to be treatedInquiring the image; i is_nIs one image in the image set of the category; n is the number of images in the image set of the category to which the image belongs; n (I)_m) Is a reaction of_mAll image sets having a mutual adjacency; n (I)_n) Is a reaction of_nAll image sets having a mutual adjacency; d (I)_m,I_n) Is a distance gain factor; b (I)_m,I_n) Is the equilibrium coefficient of the distance gain.

Preferably, d (I) is_m,I_n) And b (I) described_m,I_n) Respectively as follows:

wherein R (I)_m,I_n) Is represented by_mIn and I_nSimilarity ranking of all image sets with mutual adjacency; r (I)_n,I_m) Is represented by_nIn and I_mThe similarity of all image sets with mutual adjacency ranks.

Preferably, the score value formula:

wherein, I is the number of the images which are positioned in the retrieval list, belong to the category images and have the serial numbers within K'; j is the number of images which are positioned in the retrieval list, do not belong to the category image and have sequence numbers within K'; k' is the position of the current image in the retrieval list; k 'is a positive K'

integer

1,2,3,. K; w is a_iIs the characteristic distance between the image of the retrieval list and the image to be inquired; w is a_jThe characteristic distance between the image of the non-retrieval list and the image to be inquired;

according to the score value formula, the score value is in the range of [0,1], a threshold value is set as Q, when the score value is larger than the threshold value Q, the search result is judged to belong to the category, otherwise, the search result does not belong to the category;

wherein Q is 1/number of categories.

Preferably, said w_iThe formula of (a):

wherein, y_qTo query an image, x_iFor each image feature, d is a similarity measure of the query image and the search image.

Preferably, the formula of d:

wherein the similarity is given by f_rMeasured by the difference of the characteristic functions, f_rIs a feature function defined in the query image and the retrieved image high-dimensional feature vector space.

Preferably, a medical image retrieval system includes:

the classifier training module is used for carrying out feature training on the image data set to obtain a classifier model;

the classifier identification module is used for classifying the image to be queried according to the model obtained by the classifier training module and obtaining an image set of the category of the image to be queried;

the similarity calculation module is used for calculating the characteristic distances of the query image and all the images in the image set of the category to which the query image belongs, sequencing all the images in the image set of the category to which the query image belongs according to the characteristic distances from large to small, and selecting the top 2K images as neighborhoods of the images to be queried;

the undirected subgraph generation module is used for constructing an undirected graph by taking the image characteristics of the image to be inquired as a center, calculating the edge weight values of the image characteristics of the image to be inquired and the images in the field, performing descending arrangement according to the edge weight values, and selecting the front K pieces of images as the undirected subgraphs of the image to be inquired;

the fusion module is used for fusing a plurality of non-oriented subgraphs obtained based on a plurality of different similarity algorithms of the image to be inquired and each image in the image set belonging to the category to form a weight graph;

and the evaluation index submodule is used for arranging all the images in the weight map in a descending order according to the weight values, selecting the front K retrieval images as a retrieval list, taking the front i images in the retrieval list as a whole to calculate a regression score value to obtain a final retrieval result of the ith image, and outputting the retrieval result, wherein i is a positive integer in K.

Preferably, an electronic device for medical image retrieval comprises a processor and a memory; the memory for storing a computer program; the processor is configured to perform the above medical image retrieval method when the computer program is executed.

Preferably, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, causes the processor to execute the above-mentioned medical image retrieval method.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the method, the query image is classified according to the classifier model, the image set with the same category as the query image is obtained, and the image set is taken as the retrieval image library, so that the size of the image data set can be reduced, the retrieval time is shortened, the computing resources are saved, and the high-efficiency retrieval capability is obtained;

2. the retrieval is carried out in the retrieval image library according to a sorting fusion algorithm, and the calculation method can improve the retrieval accuracy.

Drawings

FIG. 1 is a flow chart of a method of a medical image retrieval method of the present invention;

FIG. 2 is a block diagram of a QBE search method;

FIG. 3 shows evaluation index results of PRECISION for manual feature search;

FIG. 4 shows evaluation index results of depth feature search PRECISION;

FIG. 5 shows evaluation index results of different feature search PRECISION;

FIG. 6 is a comparison of different search algorithms PRECISION;

FIG. 7 is a comparison of different AVG-R search algorithms;

FIG. 8 is a comparison of different AVG-P search algorithms;

FIG. 9 is a comparison of different search algorithms ANMRR;

FIG. 10 is a comparison of search times for different search algorithms.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention is described in further detail below with reference to the attached drawing figures: .

Referring to fig. 1, a medical image retrieval method includes:

training a classifier model using the image dataset;

retrieving image features of an image to be queried by using a trained classifier model to obtain a category of the image to be queried, and obtaining an image set of the category of the image to be queried from an image data set;

calculating the similarity between the image to be inquired and each image in the image set of the category, performing descending order arrangement on all the images in the image set of the category according to the similarity, and selecting the first 2K images as the field of the image to be inquired, wherein K is a natural number;

constructing an undirected graph by taking the image characteristics of the image to be queried as a center, calculating the image characteristics of the image to be queried and the edge weight values of all images in the neighborhood, arranging the image characteristics and the edge weight values in a descending order according to the edge weight values, and selecting the front K pieces of images as undirected subgraphs of the image to be queried;

and (3) calculating a regression score value by taking i images in front of the retrieval list as a whole to obtain a final retrieval result of the i image, and outputting the retrieval result, wherein i is a positive integer in K.

In the present embodiment, it is set that the image set D belonging to the category includes images I ═ { I ═ in₁,I₂......,I_NDefine an image I_mAnd q is the query image, image I_nAnd D is one image of the image set D of the belonging category, N (I)_m) Is a reaction of_mAll image sets having a mutual adjacency; n is the number of images of the image set D belonging to the category.

Obtaining a plurality of preliminary retrieval lists with different arrangement sequences based on a plurality of different similarity algorithms of the images to be queried and the images in the image set belonging to the category, and querying the image I for each group of preliminary retrieval lists_mAs a center, construct an undirected graph G ═ (V, E, w), if the query image I_mAnd I_nSatisfies the mutual adjacent relation and is composed of images (I)_m,I_nE.g. D) are nodes in the graph, and two images I_mAnd I_nBy edge edg (eI)_m,I_n) E is connected, the edge weight W is defined as I_mAnd I_nJaccard similarity coefficient J (I) between neighborhoods of_m,I_n) Weighted correction value of (2):

w(I_m,I_n)＝α(I_m,I_n)·J(I_m,I_n)；

wherein, I_mIs an image to be inquired; i is_nIs one image in the image set of the category; n is the number of images in the image set of the category; n (I)_m) Is a reaction of_mAll image sets having a mutual adjacency; n (I)_n) Is a reaction of_nAll image sets having a mutual adjacency; alpha (I)_m,I_n) Is a gain factor; j (I)_m,I_n) Is the Jaccard similarity coefficient.

Furthermore, because the similarity algorithms used are different, the similarity scores produced are also different in criteria, and therefore it is difficult to compare or weigh the importance of the features and their gain factors. In addition, the preliminary retrieval list usually contains false positive images, and particularly when the quality of the retrieved images is poor, the ranking of the preliminary retrieval list is relatively noisy, so that the similarity score is not very reliable, and the similarity retrieval sequence between the images cannot be accurately represented. The distance gain factor between two graphs is defined according to their ranking in the queue of the other graph:

Still further, the distance gain factor represents the magnitude of the overall similarity when the image to be queried matches the image in the image dataset, when I_mIn I_nWhen ranked top, R (I)_m,I_n) Will take a smaller value, and I_nIn I_mAlso in the ranking of (1), R (I)_m,I_n) Also takes a smaller value, when d (I)_m,I_n) A larger value is taken, which indicates that the distance gain factor is positive, and efficient. But due to I_mAnd I_mLocal density of neighborhood vectors is different, I_mIn I_nRank weight sum of_nIn I_mThe ranking weight in (1) is not completely equivalent, and a balance coefficient b (I) of distance gain is required_m,I_n) The distance gain is rectified. Wherein, b (I)_m,I_n) The formula is as follows:

α(I_m,I_n)＝d(I_m,I_n)·b(I_m,I_n)。

in this embodiment, the side weight values are obtained through the above formula, and are arranged in a descending order according to the side weight values, and the top K pieces of images are selected as undirected subgraphs of the image to be queried. And fusing a plurality of undirected subgraphs obtained based on a plurality of different similarity algorithms of the image to be queried and each image in the image set of the category to form a weight graph.

In this embodiment, all images in the weight map are sorted in a descending order according to the size of the weight value, and the top K images are selected as a retrieval list; when it is necessary to determine whether the retrieval of the ith (i ═ 1, 2., K) image in the retrieval list is correct, score calculation needs to be performed on the first i images in the retrieval list which are collectively regarded as a whole, and the integral score of the number i is regarded as the result of the ith image. The score is calculated as follows:

wherein, I is the number of the images which are positioned in the retrieval list, belong to the category images and have the serial numbers within K'; j is the number of images which are positioned in the retrieval list, do not belong to the category image and have the sequence number within K'; k' is the position of the current image in the retrieval list; k 'is a positive K'

integer

1,2,3,. K; w is a_iThe distance between the image of the retrieval list and the feature vector of the image to be inquired is determined; w is a_jThe feature vector distance between the image of the non-retrieval list and the image to be inquired;

according to the score value formula, the score value is in the range of [0,1], a threshold value is set to be Q, when the score value is larger than the threshold value Q, the search result is judged to belong to the category, otherwise, the search result does not belong to the category;

wherein Q is 1/number of categories.

Further, w_iThe formula of (a):

Still further, the formula for d:

In this embodiment, the pseudo code based on the above sorting fusion algorithm is:

example 1

The retrieval method is applied to retrieval of the mammary molybdenum target lesion map.

The image dataset consists of 464 images collected from actual cases. These data are derived from the public database DDSM and case data collected from a tumor hospital in the country. Each image is accompanied by a matched lesion boundary map manually annotated by the physician. These images were pathologically validated. Among them, 177 cases of benign disease and 287 cases of malignant disease. This embodiment pertains to a binary search task.

In this embodiment, there are various feature extraction methods and sorting fusion algorithms in the above method, and the method for selecting the optimal breast molybdenum target lesion image is performed through the following experiment.

1. Feature-based tumor focus retrieval

The application of manual and depth features in tumor lesion retrieval was selected for this experiment and a classical case-by-case retrieval QBE retrieval approach was used, as shown in figure 2.

The experimental scheme is as follows:

performing 10 rounds of retrieval on each manual feature, randomly selecting 15 images from a data set in each round as an object to be retrieved, then retrieving the object, performing characteristic matching by using Euclidean distance for measurement, only keeping the results of 30 images before ranking, and calculating a retrieval evaluation index; the average of the 10 evaluation indexes was finally calculated as a result. The retrieval result is shown in fig. 3, wherein TD represents the tumor focus characteristics of tumourdesicator, MS-C represents the multi-scale complexity characteristics, MS-FD represents the multi-scale fractal dimension characteristics, Haar represents the Haar wavelet characteristics, DB2 represents the DB2 wavelet characteristics, Gabor represents the Gabor characteristics, EHD represents the MPEG-7 characteristics, and LBP represents the local binary pattern characteristics.

Referring to fig. 3, the abscissa in fig. 3 is a different manual feature extraction method, which is arranged in ascending order of feature dimensions, respectively 15, 18, 24, 80, 256 dimensions; the ordinate is the accuracy of the retrieval result, the calculation mode of the accuracy is the ratio of the number of the pictures which are correctly retrieved to the total number of the retrieved pictures, and the correct retrieval is defined as representing correct retrieval if the retrieval result is consistent with the type of the query image, otherwise, the accuracy is not counted. It can be observed from the figure that all feature extraction algorithms can achieve more than 50% of retrieval accuracy in the QBE frame, i.e. more than 15 retrieval accurate images are retrieved, wherein the best effect is the TD tumor focus feature, which achieves 67.53% accuracy with 15-dimensional features, i.e. 20 images are correctly retrieved.

Referring to fig. 4, depth features are extracted using a pre-trained VGG, Resnet, inclusion depth network, and each image is converted into a 1024-dimensional feature vector. Because the depth feature has high dimensionality and redundant information exists, a PCA (principal Components analysis) method is used for reducing the dimensionality to 16-dimension, 32-dimension, 64-dimension, 128-dimension, 256-dimension, 512-dimension and 1024-dimension respectively, and then QBE retrieval is carried out according to the characteristics after the dimensionality reduction.

As can be seen from fig. 4, the inclusion depth feature has better retrieval results than the Vgg and Resnet depth features in different dimensions. The retrieval accuracy of the Resnet depth feature is at its lowest among the three. The result depth feature achieves the best retrieval result when the dimension is 1024, the accuracy rate is 54.13%, the next best result is achieved when the dimension is 256, the accuracy rate is 54.51%, and the retrieval accuracy rate fluctuates in a stage of descending and then ascending along with the increase of the dimension; the Vgg depth characteristic achieves the best retrieval result when the characteristic dimension is 128 dimensions, the retrieval accuracy is 45.48%, and the retrieval accuracy is relatively stable along with the increase of the dimension; the Incep depth feature achieves the best retrieval result when the feature dimension is 16-dimensional, the retrieval accuracy is 60.48%, and the retrieval performance is gradually reduced with the increase of the dimension.

Referring to fig. 5, fig. 5 is a comparison of search results corresponding to different feature methods, where feature dimensions corresponding to the best search accuracy are selected from depth features, and the feature dimensions are 15, 16, 18, 18, 24, 24, 24, 80, 128, 256, and 256 dimensions from left to right, respectively, as can be seen from fig. 8, the inclusion feature in the depth features can obtain the best search accuracy with the smallest dimension, but the conventional TD feature can obtain a search accuracy higher than the accuracy by 7% with a smaller feature dimension.

In conclusion, the TD characteristics in the traditional manual characteristics achieve the best application effect of all methods in the QBE retrieval method of the molybdenum target lesion, and the accuracy is 67.53%.

2. Classifier selection and analysis

In the first stage of the hierarchical system, the features obtained by 8 methods of feature extraction algorithms are compared, and 5 classifiers are respectively used for measuring the classification accuracy. The training set accounts for 65% of the total sample and the test set accounts for the remaining 35% of the total sample.

In the experiment, a BP classifier uses a Weka tool to perform the experiment, an ELM classifier uses a Matlab tool to perform the experiment, SVM, RF and GBDT classifiers use a Python tool library skearn to perform the experiment, a deep learning classifier uses a Keras platform to perform model construction, including Resnet, VGG and Incepton models, and a migration learning pre-training data set adopts ImageNet, wherein all parameters are set by default. In addition, TD represents TumorDescriptor tumor focus characteristics, MS-C represents multi-scale complexity characteristics, MS-FD represents multi-scale fractal dimension characteristics, Haar represents Haar wavelet characteristics, DB2 represents DB2 wavelet characteristics, Gabor represents Gabor characteristics, EHD represents MPEG-7 characteristics, and LBP represents local binary pattern characteristics. Table 1 performs ascending sorting according to the dimensions of the features, and shows the classification results of the conventional classifier, and the bold and underlined indexes in the table are the optimal classification results corresponding to each classifier model.

TABLE 1

	GBDT	BP	ELM	RF	SVM
						TD	0.7727	0.5763	0.6809	0.7070	0.7521
MS-C	0.6319	0.5040	0.5812	0.5766	0.6198
						MS-FD	0.7055	0.5237	0.6609	0.6503	0.6329
Haar	0.6809	0.5214	0.6342	0.6564	0.6090
						DB2	0.6993	0.5092	0.6124	0.6380	0.6176
Gabor	0.6319	0.5178	0.5837	0.6073	0.6109
						EHD	0.5828	0.5132	0.5578	0.5644	0.6068
LBP	0.6503	0.5178	0.6032	0.6380	0.6198

The deep learning has fitting ability which can not be achieved by the traditional classifier, but the training process is long and requires a huge training set, so that the learning network structure can be supported. During the preprocessing phase, the deep neural network was experimented with VGG19, Resnet50, and inclusion V3, respectively. Table 2 shows the classification results using the deep learning model.

TABLE 2

Model (model)	Correction ofRate of change
		VGG	0.5818
ResNet	0.5924
		Inception	0.6058

For the result of the classification stage, when the feature extraction algorithm uses the TD descriptor, the classification effect is best GBDT, and the accuracy is 0.7727; the best results were also obtained with GBDT using MS-C characteristics, 0.6319; similarly, in addition to its representation in EHD features, GBDT achieves the best performance for classification under current features. The SVM, which achieves the best results in the EHD feature, has an accuracy of 0.6068. However, the classification accuracy of the deep learning VGG model is 0.5818, the accuracy of the ResNet model is 0.5924, and the accuracy of the inclusion model is 0.6058.

Through the comparison, the classification effect of deep learning is inferior to that of machine learning, which is caused by the particularity of breast molybdenum target lesion images. Because the breast molybdenum target lesion image does not have rich multi-channel information, and the information loss caused by image scaling of gray levels in the training process can deteriorate the deep learning effect. Deep learning is not satisfactory in classification performance in a breast molybdenum target lesion image database, and is inferior to a traditional machine learning classifier in many aspects of characteristics. The reason for this is that deep learning has a huge number of convolution kernels, and is provided with an excessively rich nonlinear fitting capability, and the learning capability of multi-channel images is stronger. The mammary gland molybdenum target focus image is very special, and the mammary gland molybdenum target image is a single-channel gray-scale image with higher gray scale, so that the advantage of deep learning is greatly lost. Moreover, the gray level of the breast molybdenum target focus image is extremely high, and deep learning is easy to enter into a overfitting situation in the learning process, so that the accuracy of the breast molybdenum target focus image is lower than that of traditional machine learning. Even without over-fitting, for single-channel images with excessive gray levels, there is not a good ability to extract local features to characterize the breast molybdenum target focal region. In the inverse machine learning, because the image data format is changed from two-dimension to one-dimension through feature extraction, the features without redundant information are convenient for combing and integrating, and good effect can be obtained in the gray image by matching with the machine learning.

Experiments prove that the GBDT decision tree model has the best effect when the mammary molybdenum target lesion data sets are classified.

3. Selection and analysis of feature extraction methods

The above simply introduces the retrieval situation of QBE under various characteristics, which explains that TD characteristics have good effect under QBE, and provides powerful support for the effectiveness of characteristic selection. However, the effectiveness of the TD features under different sorting fusion algorithms needs to be further proved through experiments, so that the most effective feature extraction mode is selected and applied to the hierarchical system.

In the experiment, 5 sorting fusion algorithms are used to compare the retrieval performance of 8 traditional manual characteristics on the complete data set, and QBE retrieval results are added to compare the precision of the sorting fusion algorithms. The experiment is divided into 10 rounds, 15 images are randomly selected from each semantic class in the data set in each round to serve as an object to be retrieved, the object is retrieved in the next step, only the results of the first 30 images are reserved, and retrieval evaluation indexes are calculated. The average of the 10 evaluation indexes was finally calculated as a result. In the experiment, similarity measurement is carried out by using Euclidean distance and cosine distance, and retrieval algorithms are QBE, QSFR, Diffusion _ Process, Image _ Graph, CombomNZ and Mutual _ Rank respectively. Table 3 shows the search index Precision results of different search algorithms for different features, table 4 shows the search index Average-r results of different search algorithms for different features, table 5 shows the search index Average-p results of different search algorithms for different features, and table 6 shows the search index ANMRR results of different search algorithms for different features; and the bold and underlined indicators in the table are the characteristics of the optimal search result corresponding to each sort fusion algorithm.

TABLE 3

For Precision indexes, the TD features have the best effect in all sorting fusion algorithms, and the highest accuracy in the Mutual _ Rank algorithm is 87.64%. In addition, the LBP feature is second only to TD, and the retrieval accuracy is significantly higher than the remaining features. In summary, the TD and LBP features are at the highest level in each rank fusion algorithm, while the retrieval accuracy of other features is at a lower level, but the difference from other algorithms is not large. TD features are the most preferred features on Precision index.

TABLE 4

Results for the Average-r index. The MS-FD features were found to be 3-fold optimal, the TD features were found to be 2-fold optimal, and the DB2-WL and Gabor features were found to be 1-fold optimal. Taken together, the Average-r index means the Average value of the ranking in the retrieval sequence for retrieving the correct image. When the number of the correct pictures is small and the ranking is Average, the Average-r index is low, and when the number of the correct pictures is large and the ranking is Average, the Average-r index is high. Therefore, the Average-r index does not appear as low as possible, as this may be due to too few number of correct pictures being retrieved. When the number of searches is equivalent, the lower the Average-r index is, the better the search algorithm is. Therefore, the MS-FD characteristic does not perform well in the Precision index, and the accurate picture searching is too few, so that even though the TD characteristic is less than the MS-FD in the Average-r index optimal times, the effect of little difference is obtained, and the TD characteristic also obtains high-level performance in the Average-r index. The remaining other algorithms, to a substantial degree, fluctuate within a reasonable range.

TABLE 5

Results for the Average-p index. The TD features achieved the best 4 times among all ranking fusion algorithms, 1 time each of DB2-WL and LBP features. However, in the algorithm in which the TD is not optimal, the difference between the TD retrieval result and the optimal result is small. In summary, the Average-p index means an Average value of the degree to which correct pictures are searched before the search sequence. When the retrieval list noun is behind the retrieval list noun, the Average-p index is small, and when the retrieval list noun is in front of the retrieval list noun, the Average-p index is large. Therefore, the larger the Average-p, the better the search algorithm is. Therefore, the TD characteristic is optimal in the Average-p index.

TABLE 6

Results for the ANMRR index. The best effect was achieved 4 times in the TD signature, and once in each of the DB2-WL and LBP signatures. In summary, the ANMRR index is the normalized correction of the Average-r index, and the Average-r can only play the due measurement function under the condition that the retrieval number is equivalent, so the smaller the ANMRR index after correction is, the better the algorithm retrieval is.

In summary, the TD features are at the highest level among the 4 indexes, so that the general feature extraction algorithm using the TD features as the hierarchical image retrieval framework based on the ranking fusion is most effective in comparison experiments of different features using different retrieval algorithms.

4. Selection and analysis of rank fusion algorithms

Through the classifier comparison experiment and the feature selection experiment, the comparison experiment stage of classification and retrieval fusion can be completely carried out. Firstly, TD feature extraction is carried out on the query image, the extracted feature matrix is sent into a GBDT decision tree for classification, probability values of the query image divided into various classes are obtained, then the TD feature matrix of the query image is sent into a database for retrieval, and the classification probability is applied to the proportion of the retrieval quantity, namely the probability of the classification corresponding to the proportion of the retrieval quantity. And finally, measuring evaluation indexes of the retrieval list.

Referring to fig. 6, the Mutual _ Rank algorithm works best in the Rank fusion algorithm. The accuracy of other algorithms is basically in a change trend of smooth transition, and the difference between the accuracy and the change trend is also within 0.1, which indicates that the retrieval quantity level is equivalent. Proves the reasonableness that the Average-r index can be evaluated at the same level. The retrieval accuracy of the QSRF algorithm is low because the threshold set by the algorithm is slightly higher, resulting in the images below the threshold not being retrieved and therefore degraded. In another case, the classification effect is not excellent enough, but the performance of the retrieval algorithm is affected, which indicates that the retrieval frame has a short board effect, and the overall effect is well optimized only when two stages simultaneously achieve a good effect.

Referring to FIG. 7, for the Average-r index, the Mutual _ Rank algorithm only works best, and each algorithm is at a comparable level in the Precision index, so the Average-r index has better persuasion. The search algorithm as a whole tends to a similar level. The higher accuracy of the Mutual _ Rank algorithm for other algorithms is higher, and the Average-r index is lower, which indicates that the ranking of the correct image retrieved in the retrieval list of the algorithm is more advanced, because the algorithm corrects the local neighborhood weight, the ranking is advanced.

Referring to FIG. 8, for the Average-p index, the MutualRank algorithm works best in all features. The Mutual _ Rank algorithm is obviously superior to other algorithms, namely the Average-p index is slightly larger. The difference of each algorithm under the index is close and is in a corresponding horizontal range. This shows that the retrieval list of the Mutual _ Rank algorithm ranks the correct images generally at the top, which further corroborates the conclusion of Average-r.

Referring to FIG. 9, for the ANMRR index, the Mutual _ Rank algorithm works best in all features. The MUtual-Rank algorithm is obviously superior to other algorithms, namely the ANMRR index is slightly smaller. The indexes of QSRF algorithms do not perform well because the correct images in their algorithm search lists are not continuous and in a too scattered arrangement.

Summarizing the above, the reason why the Mutual _ Rank algorithm has good effect is that the fusion in the characteristic dimension is realized firstly; secondly, the advantages of various sequencing fusion algorithms are referenced and fused; thirdly, the sorting fusion of the retrieval results is realized. The traditional framework only simply judges the accuracy and other indexes of a single result, and does not introduce feedback or diagnosis thinking of doctors. The method can normalize the result again by grouping and collecting the retrieval list, so that the doctor can fully understand and trust the retrieval result.

As the most commonly used frame is QBE at present, compared with QBE, the frame graph provided by the method is improved by 22% in precision, obviously reduced in AVG-R, improved by 12% in AVG-P and reduced to 0.003 in ANMRR, compared with QBE, the frame graph is obviously superior to QBE in performance on each index.

From the above experiments, it can be verified that the graph-based ranking fusion algorithm provided herein improves the retrieval accuracy and is superior to the used ranking fusion algorithm, and the ranking fusion-based hierarchical retrieval system makes up the deficiency of the QBE and has performance superior to the QBE retrieval framework.

5. Algorithm performance analysis

(1) Time overhead

In the experiment, 10 rounds of calculation are carried out on each feature, 15 images are randomly selected from each semantic class in the data set in each round to serve as an object to be retrieved, the object is retrieved in the next step, only the results of 30 images before the weight ranking are reserved, and finally the average value of the 10 rounds of evaluation indexes is calculated to serve as the result. The results for each round of average search time are shown in the following figure:

referring to FIG. 10, the average search time results for each round are clearly differentiated and are QSRF, Mutual _ Rank, Image _ Graph, Difsion _ Process, CombomNZ, QBE, in ascending chronological order. Unit second. Due to the fact that the database scale is small, the retrieval time of the QBE algorithm is short, the weight graph needs to be constructed and fused in the sorting fusion algorithm, and time overhead is increased.

The time overhead of the Mutual _ Rank can be represented by a subdata set scale N, a similarity measurement method quantity S and a retrieval quantity K, when the data set scale is linearly increased, the time overhead of classifying the query image is O (C), the overhead of calculating the neighborhood of the query image is O (N + lgN), the overhead of calculating the neighborhood of each image in the neighborhood of the query image is O (NlgN.K), and the overhead of calculating the weight map is O (K)²) The time overhead of applying a similarity measure method is O (NlgN K + K) in the above²+ N), so the total cost using S methods is O (S. (lgN. K + K)²+ N)), the overhead of fusing the weight subgraph is O (K)²) The overhead of reordering is O (KlgK) and the overhead of calculating the regression value is O (K)²) Therefore, the total time overhead of Mutual _ Rank is O (C + S. (lgN. K + K)²+N)+K²+KlgK)。

(2) Space overhead

The space overhead generated by the method in the On-line calculation can be expressed as follows: the space overhead of the sub-feature database is O (N), the neighborhood space overhead of the query image is O (K), and the total neighborhood space cost of each image in the neighborhood of the query image is O (K)²) The space overhead of storing the weight subgraph is O (S.K)²) The fusion weight graph cost is O (K)²) The space overhead for reordering and storing the regression values is O (K). So its On-line space overhead is O (N + K)²)。

The spatial cost at the Off-line stage may be expressed as, given a dataset I ═ I1, I2,.., In }, taking each image as a query image and obtaining a similarity score and an initial neighborhood range, the resulting spatial cost is O (N)²+ N.K). The total space overhead of Mutual _ Rank is O (N)²+NK+K²)。

Through the comparison, the retrieval precision is improved, the time is reduced, and the deficiency of QBE is made up.

In this embodiment 1, in the actual diagnosis process of the doctor, since the diagnosis of the breast lesion is a very complicated information fusion process, the search result cannot be regarded as only the individual images in the individual queue, and the individual images that appear to be individual images need to be integrated. Thus, by the above method, the score value formula:

wherein, I is the number of the images which are positioned in the retrieval list, belong to the true positive images and have the serial numbers in K'; j is the number of the images which are positioned in the retrieval list, belong to the false positive images and have the serial numbers within K'; k' is the position of the current picture in the retrieval list; k 'is a positive K'

integer

1,2,3,. K; if score value is greater than 0.5, the search result can be judged as 'malignant' focus, otherwise, the image is judged as 'benign'.

In example 1, the optimum result is selected by the above comparison experiment, and the accuracy of the other results is still higher than that of the general search method. Therefore, in order to obtain the optimal result, the extracted features and algorithms are different for different types of image sets, and the optimal features and algorithms of each type of image can be obtained through simple comparison calculation.

In this embodiment, a medical image retrieval system includes:

the undirected subgraph generation module is used for constructing an undirected graph by taking the image characteristics of the image to be inquired as a center, calculating the image characteristics of the image to be inquired and the edge weight values of all images in the field, performing descending arrangement according to the edge weight values, and selecting the front K pieces of images as undirected subgraphs of the image to be inquired;

In this embodiment, an electronic device for medical image retrieval includes a processor and a memory; the memory for storing a computer program; the processor, when executing the computer program, is configured to perform a medical image retrieval method as described above.

In this embodiment, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, causes the processor to execute one of the above-described medical image retrieval methods.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A medical image retrieval method, comprising:

training a classifier model using the image dataset;

classifying by using the trained classifier model and using the image characteristics of the image to be queried to obtain the category of the image to be queried, and obtaining an image set of the category of the image to be queried from the image data set;

constructing an undirected graph by taking the image feature vector of the image to be queried as a center, calculating edge weight values of the image feature of the image to be queried and each image feature vector of the neighborhood, arranging the edge weight values in a descending order, and selecting the first K pieces of image feature vectors as undirected subgraphs of the image to be queried;

fusing a plurality of undirected subgraphs obtained based on a plurality of different similarity algorithms of the image to be inquired and each image in the image set belonging to the category to form a weight graph;

2. A medical image retrieval method according to claim 1, wherein the edge weight value formula is:

w(I_m,I_n)＝α(I_m,I_n)·J(I_m,I_n)；

α(I_m,I_n)＝d(I_m,I_n)·b(I_m,I_n)；

wherein, w (I)_m,I_n) Is the edge weight value; alpha (I)_m,I_n) Is a gain systemCounting; j (I)_m,I_n) Is Jaccard similarity coefficient; i is_mIs an image to be inquired; i is_nIs one image in the image set of the category; n is the number of images in the image set of the category to which the image belongs; n (I)_m) Is a reaction of_mAll image sets having a mutual adjacency; n (I)_n) Is a reaction of_nAll image sets having a mutual adjacency; d (I)_m,I_n) Is a distance gain factor; b (I)_m,I_n) Is the equilibrium coefficient of the distance gain.

3. A medical image retrieval method according to claim 2, wherein d (I)_m,I_n) And b (I) described_m,I_n) Respectively as follows:

4. The medical image retrieval method of claim 1, wherein the score value formula:

wherein, I is the number of the images which are positioned in the retrieval list, belong to the category images and have the serial numbers within K'; j is the number of images which are positioned in the retrieval list, do not belong to the category image and have sequence numbers within K'; k' is the position of the current image in the retrieval list; k 'is a positive integer K' in K ═1,2,3,...K；w_iIs the characteristic distance between the image of the retrieval list and the image to be inquired; w is a_jThe characteristic distance between the image of the non-retrieval list and the image to be inquired;

wherein Q is 1/number of categories.

5. A medical image retrieval method according to claim 4, wherein w is the sum of w and w_iThe formula of (a):

wherein, y_qTo query an image, x_iFor each image feature, d is a similarity measure of the query image and the search picture.

6. A medical image retrieval method according to claim 5, wherein the formula of d:

wherein the similarity is given by f_rMeasured by the difference of the characteristic functions, f_rAre feature functions defined in the query image and the retrieved image high-dimensional feature vector spaces.

7. A medical image retrieval system, comprising:

and the evaluation index submodule is used for arranging all the images in the weight map in a descending order according to the weight values, selecting the front K retrieval images as a retrieval list, calculating a regression score value by taking the front i images in the retrieval list as a whole to obtain a final retrieval result of the ith image, and outputting the retrieval result, wherein i is a positive integer in K.

8. An electronic device for medical image retrieval comprising a processor and a memory; the memory for storing a computer program; the processor, when executing the computer program, for implementing the method of any of claims 1-6.

9. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method according to any one of claims 1-6.