Hyperplane nearest neighbor classification method for microangioma medical record images
The technical field is as follows:
the invention relates to medical image classification, in particular to a hyperplane nearest neighbor classification method for a microangioma medical record image.
Background art:
diabetes is a disease with high morbidity and is now a great threat to human health. The cost of treatment for the advanced stages of diabetes is quite expensive and detection of diabetes as early as possible can effectively reduce the cost of treatment. Diabetes mellitus often results in retinal abnormalities, a microvascular complication of diabetes, known as diabetic retinopathy. Fundus images can monitor retinal abnormalities, so fundus image classification has now become an effective method for non-invasive detection of diabetes. The fundus images can be converted into feature vectors by segmenting lesion areas and then classified and predicted in a machine learning mode, but in view of reasons such as high dimensionality of the feature vectors or insufficient machine learning, classification accuracy and efficiency need to be improved.
The support vector machine basically does not involve probability measures, law of large numbers and the like, and therefore is different from the existing statistical method. In essence, the method avoids the traditional process from induction to deduction, realizes efficient transduction reasoning from the training sample to the forecast sample, and greatly simplifies the problems of common classification, regression and the like. The final result is determined by a few support vectors, which not only can help us to grasp key samples and 'reject' a large number of redundant samples, but also ensures that the method is simple in algorithm and has better robustness. But the classification accuracy of support vector machines tends to be low near the classification hyperplane. The kNN nearest neighbor method has no hypothesis on data, high accuracy, insensitivity to abnormal points, and is more suitable for automatic classification of class domains with larger sample capacity, the training time complexity is lower than that of a support vector machine, but the overall classification accuracy is not higher than that of the support vector machine. The traditional kNN nearest neighbor method generally uses Euclidean distance, and the calculation amount is huge. The encoding process of the image feature vector by the spectral hash can be regarded as a graph segmentation problem, a relaxation solution can be provided for the graph segmentation problem by analyzing the characteristic value of the Laplacian matrix of the similar graph and the characteristic vector, and binary encoding is generated by thresholding the characteristic vector. The calculation of the Hamming distance between the binary codes is greatly reduced compared with the calculation amount of the Euclidean distance used by the traditional kNN nearest neighbor method. The combination of the three can be better applied to image feature classification.
The invention content is as follows:
the invention aims to provide a hyperplane nearest neighbor classification method for a microangioma medical record image, which is accurate and rapid in classification and comprises the following specific steps:
A. firstly, preprocessing a diabetic fundus medical record image, removing a medical record image background by adopting a median filtering method, removing a medical record blood vessel structure by utilizing image morphology, extracting pathological change information in the image, carrying out corrosion operation on the image by utilizing a linear element, and carrying out Gaussian filtering on the diabetic fundus medical record image to enhance the contrast of a microangioma region;
B. the method comprises the following steps of selecting microangioma lesions in a diabetic fundus medical record image as a target for template matching, designing a function model to match the target according to the gray level and shape parameters of the microangioma in the medical record image, and enabling the gray level of the microangioma image to obey Gaussian distribution, wherein the matching formula of the function template is as follows:
where l is the lowest value of the gray scale, h is the height of the gray scale, e is the natural base number, and d is the point-to-templateThe distance between the centers of the circles, r is the radius, s is the gray scale steepness (x)0,y0) Is the central point of the microangioma;
C. transforming the morphological characteristics, textural characteristics and gray-scale characteristics of the obtained microangioma medical record pathological image into an l-dimensional data vector xi=(xi1,xi2,...,xil),i=1,2,3...;
D. Dividing microangioma medical record image data into training data Xtr=(x1,x2,...,xn) N-1, 2,3te=(x1',x2',...,xm'),m=1,2,3...;
E. Training data X for microangioma medical record imagestr=(x1,x2,...,xn) Training to obtain a set X of support vectors including classification hyperplane and support vectorsv=(x1”,x2”,...,xs"), s ═ 1,2,3., a distance threshold t spectral hash code length nb and a classification Model of the number k of nearest neighbors;
F. data X of a microangioma medical record image test set is classified according to a classification Modelte=(x1',x2',...,xm'), m is 1,2,3, mte, and test sample x is judged first when predictingi'=(xi1',xi2',...,xil'), i ═ 1,2,. times, m distance distHyper to the classification hyperplaneiI is 1,2, K, m, if it is greater than the distance threshold t, if the test sample x is not equal to the threshold valuei'=(xi1',xi2',...,xil'), i is 1,2, the distance from m to the classification hyperplane is greater than a distance threshold t, then the support vector machine model is used for prediction, if the distance from the test set data to the classification hyperplane is not greater than the distance threshold, then a neighbor algorithm combined with a spectrum hash algorithm is used for prediction, and finally a prediction result Y is obtained through synthesiste=(y1',y2',...,ym'),m=1,2,3...。
The invention is further improved in that: the step E specifically comprises the following steps: training data X for microangioma medical record imagestr=(x1,x2,...,xn),n=1,2,3..Training to obtain a set X including classified hyperplane and support vectorsv=(x1”,x2”,...,xs"), s ═ 1,2,3., a distance threshold t spectral hash code length nb and a classification Model of the number k of nearest neighbors; the method comprises the following steps:
a. training data X of medical record images of microangiomastr=(x1,x2,...,xn) N is 1,2,3. the support vector set X is obtained through the training of a support vector machine modelsv=(x1”,x2”,...,xs"), s ═ 1,2,3.. and classification hyperplane Hyper: w is aT·x+b=0,
Wherein w is a classification hyperplane normal vector, T is a transposition operation, b is an offset, and x is a point on the hyperplane;
b. calculating microangioma medical record image training data Xtr=(x1,x2,...,xn) N-1, 2,3.. distance matrix distHyper-w to the classification Hyper-plane HyperT·Xtr+b;
c. Respectively obtaining the training data sets X predicted by using the support vector machine through the set distance threshold ttrsvm={xi|distHyperiT, i ═ 1, 2.. n } and a training data set X predicted using a neighbor algorithmtrknn={xi|distHyperi≤t,i=1,2,...,n};
d. Data set X for microangioma medical record image trainingtrsvm={xi|distHyperiAnd if t, i is 1,2, 1, n, predicting by using a support vector machine model obtained by training to obtain a microangioma medical record image prediction label set Ytrsvm={yi|distHyperi>t,i=1,2,...,n};
e. For support vector set Xsv=(x1”,x2”,...,xs1,2,3, training a corresponding parameter SHAparam by using a spectral hash method through a set code length nb;
f. training data X of microangioma medical record images according to parameter SHAparamtr=(x1,x2,...,xn) Support vector set X of 1,2,3sv=(x1”,x2”,...,xs1,2, 3.) and a microangioma medical record image training dataset X predicted using a neighbor algorithmtrknn={xi|distHyperiCompressing t, i ═ 1, 2.. and n ≦ into binary codes with the same length by a spectral hash method;
g. calculating microangioma medical record image training dataset Xtrknn={xi|distHyperiT, i ≦ 1,2,.., n } and a set of support vectors Xsv=(x1”,x2”,...,xsCompressing the obtained binary codes, calculating Hamming distances corresponding to all the binary codes, and storing the Hamming distances into a Hamming distance table Dhamm _ train;
h. reading a Hamming distance table Dhamm _ train, taking the labels of the first k support vectors with the closest distance, and recording the label with the most occurrence times as a microangioma medical record image training data set Xtrknn={xi|distHyperiObtaining a label set Y of a microangioma medical record image training set prediction label set corresponding to the label of the corresponding sample in t, i ═ 1,2trknn={yi|distHyperi≤t,i=1,2,...,n};
i. Combining two parts to predict tag set YtrsvmAnd YtrknnAnd solving a microangioma medical record image training set prediction label set Ytr=(y1,y2,...,yn) Optimizing three parameters of a distance threshold t, a binary code length nb and the number k of nearest neighbors according to the prediction accuracy to obtain an optimal solution, wherein n is 1,2 and 3;
j. obtaining a set X containing classified hyperplane and support vectorsv=(x1”,x2”,...,xs"), s ═ 1,2,3., distance threshold t, spectral hash code length nb, and number of nearest neighbors k.
The invention is further improved in that: the step F specifically comprises the following steps: data X of a microangioma medical record image test set is classified according to a classification Modelte=(x1',x2',...,xm'), m is 1,2,3This xi'=(xi1',xi2',...,xil'), i ═ 1,2,. times, m distance distHyper to the classification hyperplaneiI is 1,2, K, m, if it is greater than the distance threshold t, if the test sample x is not equal to the threshold valuei'=(xi1',xi2',...,xil'), i is 1,2, the distance from m to the classification hyperplane is greater than a distance threshold t, then the support vector machine model is used for prediction, if the distance from the test set data to the classification hyperplane is not greater than the distance threshold, then a neighbor algorithm combined with a spectrum hash algorithm is used for prediction, and finally a prediction result Y is obtained through synthesiste=(y1',y2',...,ym'), m ═ 1,2,3.
a. Calculating microangioma medical record image test data Xte=(x1',x2',...,xm'), m 1,2,3.. the distance matrix distHyper w to the classification Hyper plane HyperT·Xte+b;
b. Respectively obtaining microangioma medical record test data sets X predicted by using a support vector machine through a set distance threshold ttesvm={xi|distHyperiT, i 1,2, m and microangioma medical record test dataset X predicted using a neighbor algorithmteknn={xi|distHyperi≤t,i=1,2,...,m};
c. To Xtesvm={xi|distHyperiT, i ═ 1, 2.., m } using the classified hyperplane Hyper-plane obtained by training to carry out support vector machine model prediction to obtain a prediction label set Ytesvm={yi|distHyperi>t,i=1,2,...,m};
d. The microangioma medical record test data set Xteknn={xi|distHyperiT, i ≦ 1,2,.., m } and the set of support vectors X in the Modelsv=(x1”,x2”,...,xs1,2,3. compressing the code length nb of the spectral hash coding in the classification Model into binary codes with the same length by a spectral hash method, calculating the Hamming distances corresponding to all the binary codes and storing the Hamming distances into a Hamming distance table Dhamm _ test;
e. reading Chinese charactersAnd (3) taking labels of the first k support vectors closest to the distance table Dhamm _ test, and recording the label with the most statistical occurrence times as a microangioma medical record image test data set Xteknn={xi|distHyperiObtaining a microangioma medical record prediction label set Y by using labels of corresponding samples in t, i ═ 1, 2.., m ≦teknn={yi|distHyperi≤t,i=1,2,...,m};
f. Microangioma medical record prediction tag set Y obtained by comprehensively using support vector machine model for predictiontesvm={yi|distHyperiT, i-1, 2, m and a microangioma medical record prediction tag set Y obtained by prediction by using a nearest neighbor algorithm combined with a spectral hash methodteknn={yi|distHyperiT, i ═ 1,2,.. m }, and the final microangioma medical record prediction result Y is obtainedte=(y1',y2',...,ym'),m=1,2,3...。
Firstly, preprocessing and segmenting diabetic fundus image data, and extracting a lesion area of a microangioma medical record image from a processed fundus medical record image; then, the morphological characteristics, the textural characteristics and the gray-scale characteristics of the image area of the microangioma lesion are converted into a data vector x with the dimension of li(ii) a The data is then divided into training data XtrAnd test data XteBy comparing training data XtrTraining a support vector machine to obtain a set X comprising classified hyperplane and support vectorsvA high-efficiency classification model of a distance threshold t and a spectral hash code length nb; final test data XteAnd during prediction, according to the relation between the distance from the test sample to the classified hyperplane and the distance threshold t, respectively adopting a support vector machine model and a neighbor algorithm of a fusion spectrum hash algorithm to perform prediction, and integrating related prediction results.
Compared with the prior art, the invention has the following advantages:
1. the method has better classification accuracy: the method combines the adjacent algorithm with the support vector machine, divides the diabetic fundus image data into a part which is classified by using the support vector and a part which is classified by using the adjacent algorithm by setting the distance threshold t, simultaneously solves the problems that the classification accuracy of the support vector machine near a classification hyperplane is not high and the total classification accuracy of the adjacent algorithm is not higher than that of the support vector machine, and well improves the classification accuracy.
2. The method has the advantages of good rapidity: the invention aims at the problem that the traditional kNN nearest neighbor method has huge calculation amount by using Euclidean distance, selects the part classified by using the nearest neighbor algorithm in the diabetic fundus image data through the set spectral Hash coding code length nb and the support vector set X of the part classified by using the support vector in the diabetic fundus image data
sv=(x
1”,x
2”,...,x
s"), s ═ 1,2,3
And classifying the part X in the diabetic fundus image data using a nearest neighbor algorithm according to the parameter SHAparam _ train
teknn={x
i|distHyper
iT, i ≦ 1, 2.. multidot.m } with the set of support vectors X
sv=(x
1”,x
2”,...,x
s1,2,3, compressing the binary codes into binary codes with the same length, and then performing classification prediction on the obtained binary codes in a Hamming space by adopting a neighbor algorithm. Compared with the traditional kNN nearest neighbor method for calculating the Euclidean distance, the calculation time of the Hamming distance is greatly reduced, and the method has very high rapidity.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a flow chart of a specific joint classifier training process;
FIG. 3 is a flow chart of a specific joint classifier test.
Detailed Description
For the purpose of enhancing understanding of the present invention, the present invention will be further described in detail with reference to the following examples, which are provided for illustration only and are not to be construed as limiting the scope of the present invention.
Fig. 1-3 show a specific embodiment of a hyperplane nearest neighbor classification method for a medical record image of microangioma:
the method comprises the following specific steps:
A. firstly, preprocessing a diabetic fundus medical record image, removing a medical record image background by adopting a median filtering method, removing a medical record blood vessel structure by utilizing image morphology, extracting pathological change information in the image, carrying out corrosion operation on the image by utilizing a linear element, and carrying out Gaussian filtering on the diabetic fundus medical record image to enhance the contrast of a microangioma region;
B. the method comprises the following steps of selecting microangioma lesions in a diabetic fundus medical record image as a target for template matching, designing a function model to match the target according to the gray level and shape parameters of the microangioma in the medical record image, and enabling the gray level of the microangioma image to obey Gaussian distribution, wherein the matching formula of the function template is as follows:
wherein l is the lowest value of gray scale, h is the height of gray scale, e is the natural base number, d is the distance from the point to the center of the template circle, r is the radius, s is the steepness of gray scale, (x)0,y0) Is the central point of the microangioma;
C. transforming the morphological characteristics, textural characteristics and gray-scale characteristics of the obtained microangioma medical record pathological image into an l-dimensional data vector xi=(xi1,xi2,...,xil),i=1,2,3...;
D. Dividing microangioma medical record image data into training data Xtr=(x1,x2,...,xn) N-1, 2,3te=(x1',x2',...,xm'),m=1,2,3...;
E. Training data X for microangioma medical record imagestr=(x1,x2,...,xn) Training to obtain a set X of support vectors including classification hyperplane and support vectorsv=(x1”,x2”,...,xs"), s ═ 1,2,3., distance threshold t, number of nearest neighbors k, and spectral hash code length nb, classifying Model;
the method specifically comprises the following steps: training data X for microangioma medical record imagestr=(x1,x2,...,xn) Training to obtain a set X of support vectors including classification hyperplane and support vectorsv=(x1”,x2”,...,xs"), s ═ 1,2,3., a distance threshold t spectral hash code length nb and a classification Model of the number k of nearest neighbors;
the method comprises the following steps:
a. training data X of medical record images of microangiomastr=(x1,x2,...,xn) N is 1,2,3. the support vector set X is obtained through the training of a support vector machine modelsv=(x1”,x2”,...,xs"), s ═ 1,2,3.. and classification hyperplane Hyper: w is aT·x+b=0,
Wherein w is a classification hyperplane normal vector, T is a transposition operation, b is an offset, and x is a point on the hyperplane;
b. calculating microangioma medical record image training data Xtr=(x1,x2,...,xn) N-1, 2,3.. distance matrix distHyper-w to the classification Hyper-plane HyperT·Xtr+b;
c. Respectively obtaining the training data sets X predicted by using the support vector machine through the set distance threshold ttrsvm={xi|distHyperiT, i ═ 1, 2.. n } and a training data set X predicted using a neighbor algorithmtrknn={xi|distHyperi≤t,i=1,2,...,n}
d. Data set X for microangioma medical record image trainingtrsvm={xi|distHyperiAnd if t, i is 1,2, 1, n, predicting by using a support vector machine model obtained by training to obtain a microangioma medical record image prediction label set Ytrsvm={yi|distHyperi>t,i=1,2,...,n};
e. For support vector set Xsv=(x1”,x2”,...,xs”),s=1,2,3...,Training a corresponding parameter SHAparam by a spectral hash method through a set code length nb;
f. training data X of microangioma medical record images according to parameter SHAparamtr=(x1,x2,...,xn) Support vector set X of 1,2,3sv=(x1”,x2”,...,xs1,2, 3.) and a microangioma medical record image training dataset X predicted using a neighbor algorithmtrknn={xi|distHyperiCompressing t, i ═ 1, 2.. and n ≦ into binary codes with the same length by a spectral hash method;
g. calculating microangioma medical record image training dataset Xtrknn={xi|distHyperiT, i ≦ 1,2,.., n } and a set of support vectors Xsv=(x1”,x2”,...,xsCompressing the obtained binary codes, calculating Hamming distances corresponding to all the binary codes, and storing the Hamming distances into a Hamming distance table Dhamm _ train;
h. reading a Hamming distance table Dhamm _ train, taking the labels of the first k support vectors with the closest distance, and recording the label with the most occurrence times as a microangioma medical record image training data set Xtrknn={xi|distHyperiObtaining a label set Y of a microangioma medical record image training set prediction label set corresponding to the label of the corresponding sample in t, i ═ 1,2trknn={yi|distHyperi≤t,i=1,2,...,n};
i. Combining two parts to predict tag set YtrsvmAnd YtrknnAnd solving a microangioma medical record image training set prediction label set Ytr=(y1,y2,...,yn) Optimizing three parameters, namely a distance threshold t, the number k of nearest neighbors and a spectrum hash coding code length nb, according to the prediction accuracy to obtain an optimal solution;
j. obtaining a set X containing classified hyperplane and support vectorsv=(x1”,x2”,...,xs"), s ═ 1,2,3., distance threshold t, number of nearest neighbors k, and spectral hash code length nb
F. Data X of a microangioma medical record image test set is classified according to a classification Modelte=(x1',x2',...,xm'), m is 1,2,3, mte, and test sample x is judged first when predictingi'=(xi1',xi2',...,xil'), i ═ 1,2,. times, m distance distHyper to the classification hyperplaneiI is 1,2, K, m, if it is greater than the distance threshold t, if the test sample x is not equal to the threshold valuei'=(xi1',xi2',...,xil'), i is 1,2, the distance from m to the classification hyperplane is greater than a distance threshold t, then the support vector machine model is used for prediction, if the distance from the test set data to the classification hyperplane is not greater than the distance threshold, then a neighbor algorithm combined with a spectrum hash algorithm is used for prediction, and finally a prediction result Y is obtained through synthesiste=(y1',y2',...,ym'), m ═ 1,2, 3.; the method specifically comprises the following steps:
data X of microangioma medical record image test set according to classification model Modte=(x1',x2',...,xm'), m is 1,2,3i'=(xi1',xi2',...,xil'), i ═ 1,2,. times, m distance distHyper to the classification hyperplaneiI is 1,2, K, m, if it is greater than the distance threshold t, if the test sample x is not equal to the threshold valuei'=(xi1',xi2',...,xil'), i is 1,2, the distance from m to the classification hyperplane is greater than a distance threshold t, then the support vector machine model is used for prediction, if the distance from the test set data to the classification hyperplane is not greater than the distance threshold, then a neighbor algorithm combined with a spectrum hash algorithm is used for prediction, and finally a prediction result Y is obtained through synthesiste=(y1',y2',...,ym'), m ═ 1,2,3.
a. Calculating microangioma medical record image test data Xte=(x1',x2',...,xm'), m 1,2,3.. the distance matrix distHyper w to the classification Hyper plane HyperT·Xte+b;
b. Respectively obtaining microangioma medical record test data sets predicted by using support vector machine through set distance threshold tXtesvm={xi|distHyperiT, i 1,2, m and microangioma medical record test dataset X predicted using a neighbor algorithmteknn={xi|distHyperi≤t,i=1,2,...,m};
c. To Xtesvm={xi|distHyperiT, i ═ 1, 2.., m } using the classified hyperplane Hyper-plane obtained by training to carry out support vector machine model prediction to obtain a prediction label set Ytesvm={yi|distHyperi>t,i=1,2,...,m};
d. The microangioma medical record test data set Xteknn={xi|distHyperiT, i ≦ 1,2,.., m } and the set of support vectors X in the Modelsv=(x1”,x2”,...,xs1,2,3. compressing the code length nb of the spectral hash coding in the classification Model into binary codes with the same length by a spectral hash method, calculating the Hamming distances corresponding to all the binary codes and storing the Hamming distances into a Hamming distance table Dhamm _ test;
e. reading a Hamming distance table Dhamm _ test, taking the labels of the first k support vectors with the closest distance, and recording the label with the most occurrence times as a microangioma medical record image test data set Xteknn={xi|distHyperiObtaining a microangioma medical record prediction label set Y by using labels of corresponding samples in t, i ═ 1, 2.., m ≦teknn={yi|distHyperi≤t,i=1,2,...,m};
f. Microangioma medical record prediction tag set Y obtained by comprehensively using support vector machine model for predictiontesvm={yi|distHyperiT, i-1, 2, m and a microangioma medical record prediction tag set Y obtained by prediction by using a nearest neighbor algorithm combined with a spectral hash methodteknn={yi|distHyperiT, i ═ 1,2,.. m }, and the final microangioma medical record prediction result Y is obtainedte=(y1',y2',...,ym'),m=1,2,3...。
The invention has better classification accuracy: the invention combines the proximity algorithm with the support vector machineThe method has the advantages that the distance threshold t is set, the diabetic fundus image data are divided into a part which is classified by using the support vector and a part which is classified by using the neighbor algorithm, the problems that the classification accuracy of the support vector machine near the classification hyperplane is not high and the total classification accuracy of the neighbor algorithm is not higher than that of the support vector machine are solved, and the classification accuracy is improved well. The method has the advantages of good rapidity: the invention aims at the problem that the traditional kNN nearest neighbor method has huge calculation amount by using Euclidean distance, selects the part classified by using the nearest neighbor algorithm in the diabetic fundus image data through the set spectral Hash coding code length nb and the support vector set X of the part classified by using the support vector in the diabetic fundus image datasv=(x1”,x2”,...,xs1,2,3. training to obtain a parameter SHparam _ train, and classifying a part X of the diabetic fundus image data by using a neighbor algorithm according to the parameter SHparam _ trainteknn={xi|distHyperiT, i ≦ 1, 2.. multidot.m } with the set of support vectors Xsv=(x1”,x2”,...,xs1,2,3, compressing the binary codes into binary codes with the same length, and then performing classification prediction on the obtained binary codes in a Hamming space by adopting a neighbor algorithm. Compared with the traditional kNN nearest neighbor method for calculating the Euclidean distance, the calculation time of the Hamming distance is greatly reduced, and the method has very high rapidity.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention.
Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.