CN111340135A - Renal mass classification method based on random projection - Google Patents

Renal mass classification method based on random projection Download PDF

Info

Publication number
CN111340135A
CN111340135A CN202010171801.XA CN202010171801A CN111340135A CN 111340135 A CN111340135 A CN 111340135A CN 202010171801 A CN202010171801 A CN 202010171801A CN 111340135 A CN111340135 A CN 111340135A
Authority
CN
China
Prior art keywords
classifier
data
matrix
projection
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010171801.XA
Other languages
Chinese (zh)
Other versions
CN111340135B (en
Inventor
甄鑫
莫天澜
王琳婧
何强
Original Assignee
Guangzhou Lingtuo Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Lingtuo Medical Technology Co ltd filed Critical Guangzhou Lingtuo Medical Technology Co ltd
Priority to CN202010171801.XA priority Critical patent/CN111340135B/en
Publication of CN111340135A publication Critical patent/CN111340135A/en
Application granted granted Critical
Publication of CN111340135B publication Critical patent/CN111340135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30084Kidney; Renal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

The application relates to a renal mass classification method based on random projection, which comprises the following steps: acquiring N target object data describing the renal mass; performing target region delineation on each CT flat scanning image according to each mask image to obtain an interested region of each CT flat scanning image, and performing radiology characteristic data extraction on each interested region to obtain N pieces of radiology characteristic data; projecting the N pieces of radiologic characteristic data through L random projection matrixes to obtain L sets of projection characteristic data; respectively carrying out multiple classifier training on the L sets of projection characteristic data to obtain a prediction matrix of each classifier and each trained classifier, and determining the weight of each classifier; and performing fusion processing on the data to be classified by adopting each trained classifier according to the corresponding weight so as to determine the corresponding class. The method and the device can improve the robustness in the process of identifying the category of the data to be classified, thereby improving the reliability of the classification result.

Description

Renal mass classification method based on random projection
Technical Field
The application relates to the technical field of machine learning, in particular to a method for classifying renal small masses based on random projection.
Background
In recent years, multi-classifier systems have found widespread use in the field of machine learning to obtain more reliable and accurate predictions in supervised and unsupervised learning tasks than in single classifiers. The method is successfully applied to a plurality of fields of bioinformatics, remote sensing science, network security, astronomical physics, clinical fields, chemical informatics and the like. Most of the current multi-classifier system studies can be summarized in the following two categories: non-generating and generating. The non-generating multi-classifier system focuses on the selection of the classifier or the fusion mode of multi-classifier output to optimize the system structure so as to achieve the purpose of improving the system prediction capability, and the generating multi-classifier system focuses on the generation of the base classifier so as to improve the diversity and difference of the system, thereby improving the system prediction accuracy. Much of the previous research has focused on building new integrated architectures or finding ways to improve classifier diversity. How to construct diversified base classifiers and integrate the base classifiers into a logic fusion architecture is the key to construct a successful multi-classifier system. Exploring a reasonable balance between integration architecture and integration diversity has been a hotspot of many studies in recent years, and is also a problem to be solved.
An effective multi-classifier system is urgently needed for medical decision in clinical tasks of clinical diagnosis and prognosis, and prediction of curative effect from information acquired from radiological images (computed tomography, positron emission tomography, magnetic resonance imaging and the like) and clinical treatment. Since clinical information is diverse, such as image information of different modalities, treatment parameters, dosage parameters, and other clinical features, a fusion system is needed to integrate various information to contribute to clinical judgment or determination of treatment plan. Meanwhile, the medical decision problem is usually determined by the problem itself, different classifiers may have different performances for different diseases and clinical endpoints, and even though the same clinical task is performed, the different classifiers rarely achieve consistent results, for example, in some studies, Yang R et al compares 224 classification models, and finds that there is a significant classification precision difference between models established by different classifier combinations and feature selection methods. Furthermore, a wide variety of clinical information in combination with different classifiers may yield more classification models, and it is unlikely that a truly optimal solution can be approached by traversing all available models and trial and error, and therefore, an efficient multi-classifier framework is always desirable in a clinical environment to fully process diverse medical data.
Vascular smooth muscle lipoma (renal hamartoma) is the most common of renal benign tumors, accounting for about 3% of renal tumors, and can be reliably and accurately diagnosed by imaging through detection of typical intratumoral macroscopic fat. However, the amount of fat is variable, and sometimes some renal vascular smooth muscle lipomas may be free or almost free of fat, so-called atypical, fat-poor renal vascular smooth muscle lipomas or fat-free visible renal vascular smooth muscle lipomas (AMLwvf), behave similarly to Renal Cell Carcinoma (RCC) on CT, are prone to misdiagnosis, resulting in unnecessary surgery. Recent advances and successful application of radiology in previous studies have facilitated improved accuracy in tumor prediction and classification. Based on machine learning, many researchers have attempted to distinguish AMLwvf from RCC using CT texture analysis. However, these general-purpose studies are limited in that either texture features are typically extracted from a single CT phase or randomly selected classifiers are built for classification modeling, and no comprehensive survey reports demonstrate which phase and classifier or possible combination thereof has higher discriminative power. Yang R et al compared 224 classification models and found that image features extracted from non-enhanced phase CT images had higher discriminatory power than the other three phases (renal cortical enhancement scan, contrast post-renal disease scan, contrast post-void scan). However, by traversing all available models, it is unlikely that a truly optimal solution can be approached through trial and error, which makes the conventional procedure for classifying the renal small tumor often have a problem of poor robustness, which easily results in low reliability of the corresponding classification result.
Disclosure of Invention
In view of the above, there is a need to provide a method for classifying renal small masses based on random projection, which can improve the robustness of the renal small mass classification process.
A method for classifying renal masses based on stochastic projection, the method comprising:
s10, acquiring N target object data describing the kidney small tumor; the target object data includes a CT scout image, a mask image and label data of the corresponding kidney small tumor; the label data characterizes the respective renal mass as benign or malignant;
s20, performing target region delineation on each CT flat scanning image according to each mask image to obtain an interested region of each CT flat scanning image, and performing radiology characteristic data extraction on each interested region to obtain N pieces of radiology characteristic data;
s30, projecting the N sets of radiologic characteristic data through L random projection matrixes to obtain L sets of projection characteristic data;
s40, respectively carrying out multiple classifier training on the L sets of projection characteristic data to obtain a prediction matrix of each classifier and each trained classifier, and setting the weight of each classifier according to the prediction matrix of each classifier;
and S50, fusing the data to be classified by adopting the trained classifiers according to the corresponding weights to determine the category of the data to be classified.
The method for classifying the renal small masses based on the random projection comprises the steps of carrying out L-time random projection on radiologic characteristic data extracted from N target object data for describing the renal small masses, obtaining L sets of projection characteristic data generated by the target object data, inputting the projection characteristic data into different classifiers for training to obtain each trained classifier, obtaining a prediction matrix of each classifier, determining the weight of each classifier, fusing the classification data to be classified according to the corresponding weight by adopting each trained classifier, enabling the classifiers to form a hierarchical structure, and integrating the diversity and the structural advantages of the classifiers to improve the robustness in the process of carrying out class identification on the classification data to be classified so as to improve the reliability of an identification result.
Drawings
FIG. 1 is a flow chart of a method for classifying renal masses based on stochastic projection according to an embodiment;
FIG. 2 is a schematic representation of a target volume delineation of a CT planar image in one embodiment;
FIG. 3 is a schematic structural diagram of a small kidney mass classifying device based on random projection according to an embodiment;
FIG. 4 is a schematic diagram of a computer device of an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The method for classifying the renal masses based on the random projection can be applied to intelligent terminal equipment such as an identification terminal or a classification terminal for identifying whether pictures representing the renal masses are benign or malignant. The intelligent terminal equipment can acquire N pieces of target object data describing the renal mass, target area delineation is carried out on each CT horizontal scanning image according to a mask image in the target object data to obtain an interested area of each CT horizontal scanning image, extracting the characteristic data of the radiology in each region of interest to obtain N characteristic data of the radiology, projecting the N characteristic data of the radiology through L random projection matrixes to obtain L sets of projection characteristic data, respectively carrying out a plurality of classifier training on the L sets of projection characteristic data to obtain a prediction matrix of each classifier and each trained classifier, and setting the weight of each classifier according to the prediction matrix of each classifier, and performing fusion processing on the data to be classified by adopting each trained classifier according to the corresponding weight so as to determine the class of the data to be classified corresponding to the renal mass to be detected. The intelligent terminal device maps an original data set of target object data to a low-dimensional space through a large number of random projection matrixes based on an integrated framework of random projection to generate diversified training data sets, inputs the projected training data sets into a plurality of classifiers, designs a two-level hierarchical fusion scheme, and integrates all outputs in a logic mode to generate final classification so as to more reliably classify the quality and the malignancy of the renal small masses.
In one embodiment, as shown in fig. 1, a method for classifying renal masses based on random projection is provided, which is exemplified by an intelligent terminal device such as an identification terminal or a classification terminal for identifying benign or malignant images representing renal masses, and includes the following steps:
s10, acquiring N target object data describing the kidney small tumor, wherein the target object data comprises a CT flat scan image, a mask image and label data of the corresponding kidney small tumor; the label data characterizes the corresponding renal mass as benign or malignant; wherein N is a positive integer.
Specifically, the target object data may be derived from a renal small mass of a patient with a pathologically confirmed renal small mass. Each target object data may describe a small kidney mass, and a target object data specifically describes the corresponding small kidney mass through a CT flat scan image, a mask image and tag data, where for a small kidney mass, the CT flat scan image is an image obtained by CT scanning the small kidney mass, the mask image is an image mask corresponding to the small kidney mass, and the tag data is data indicating that the small kidney mass is benign or malignant, for example, when the small kidney mass is pathologically confirmed to be fatty renal vascular smooth muscle lipoma, the tag data of the small kidney mass is benign, and when the small kidney mass is pathologically confirmed to be renal cell carcinoma, the corresponding tag data is malignant.
Specifically, the CT panned image is a non-contrast enhanced CT scanned image, so that the corresponding renal small tumor can be more accurately represented by the CT panned image.
And S20, performing target region delineation on each CT flat scanning image according to each mask image to obtain an interested region of each CT flat scanning image, and performing radiology characteristic data extraction on each interested region to obtain N pieces of radiology characteristic data.
Specifically, the region of interest is a region where the CT scout image includes valid information, and can be obtained by performing target delineation on the corresponding CT scout image according to a mask image by a plurality of users with experience, such as radiological diagnostic experts. In one example, a schematic representation of a target volume delineation of a CT scout image to obtain a region of interest may be found with reference to fig. 2.
Further, each of the radiologic characteristic data comprises a plurality of radiologic features; the radiologic features may include shape features, first order statistical features, and texture features. The first-order statistical features may include statistical features such as a number obtained by histogram analysis, and the texture features may include second-order statistical features such as image gray distribution.
In one example, one radiologic characteristic data may include 103 radiologic features, where the 103 radiologic features may be extracted from a region of interest (ROI) obtained by delineating a corresponding non-enhanced CT image by two experienced radiology diagnosticians, and specifically may include 12 shape features, 17 first-order statistical features, and 74 texture features, so that in a subsequent process of training a plurality of taxonomies, there are sufficiently complete radiologic features as a basis to ensure accuracy of a training result.
Preferably, after the N pieces of radiologic characteristic data are obtained, data standardization processing may be performed on the N pieces of radiologic characteristic data, so that different characteristic data with a large difference in value among the N pieces of radiologic characteristic data fall within a set range (e.g., [0,1]), thereby eliminating adverse effects caused by singular characteristic data with an excessively large value or an excessively small value in the radiologic characteristic data. In one example, the data after the data standardization processing may be subjected to feature selection by using a feature selection method f _ score to avoid data overfitting, and then the SMOTE algorithm is used to perform class balancing on the data to overcome negative influence caused by data class imbalance, so that values, feature distribution and class distribution of N sets of radiology characteristic data subjected to matrix projection are kept in a balanced state, and therefore the effectiveness of L sets of projection characteristic data obtained subsequently is ensured.
And S30, projecting the N sets of radiologic characteristic data through L random projection matrixes to obtain L sets of projection characteristic data.
The number L of the random projections may be set according to the classification accuracy of the renal mass, for example, L may be set to be 10.
Specifically, the foregoing steps may create L Random Projection matrices according to the J-L theorem of Random Projection (RP) to perform corresponding Random Projection on the N sets of radiology characteristic data, so as to smoothly obtain L sets of Projection characteristic data.
And S40, respectively carrying out multiple classifier training on the L sets of projection characteristic data to obtain a prediction matrix of each classifier and each trained classifier, and setting the weight of each classifier according to the prediction matrix of each classifier.
The number and the category of the classifiers can be determined according to the classification precision of the renal small masses. Specifically, the L sets of projection characteristic data are respectively input into each classifier branch line, and five-fold cross training can be performed in each classifier to obtain a prediction matrix of each classifier relative to each target object data.
In one example, the L sets of projection characteristics data may be input into 7 classifiers for training, respectively, where the 7 classifiers may include a bayesian classifier (e.g., a gaussian bayesian classifier), a logistic regression classifier, a quadratic discriminant analysis classifier, a K-nearest neighbor classifier, a decision tree classifier, a random forest classifier, and an XGBoost classifier.
Specifically, the training of multiple classifiers on the L sets of projection characteristic data respectively to obtain the prediction matrix of each classifier and each trained classifier includes:
constructing each classifier model by adopting scimit-leann machine learning software package under Python programming language environment, respectively inputting L sets of projection characteristic data and corresponding label data into each classifier model, and respectively calling fit functions for each classifier model to train so as to obtain a prediction matrix of each classifier.
And S50, fusing the data to be classified by adopting the trained classifiers according to the corresponding weights to determine the category of the data to be classified.
The data to be classified may include object data describing the renal masses to be tested during the test, object data describing the renal masses of the category to be identified during the actual classification, and so on.
Specifically, in the above steps, each classifier may be used to perform classification prediction on the data to be classified, and the weight of each classifier is used to perform fusion processing on the classification prediction result of the corresponding classifier, so as to identify the category of the renal mass represented by the data to be classified, thereby improving the reliability of the determined category.
In the embodiment, the radiologic characteristic data extracted from N target object data describing the renal mass are subjected to L-time random projection to obtain L sets of projection characteristic data generated by the target object data, the projection characteristic data are input into different classifiers to be trained to obtain each trained classifier, a prediction matrix of each classifier is obtained, the weight of each classifier is determined, and then the classification data is fused by adopting each trained classifier according to the corresponding weight, so that the classifiers form a hierarchical structure, the diversity and the structural advantages of the classifiers can be integrated, the robustness in the process of class identification of the classification data to be classified is improved, and the reliability of the identification result is improved.
In one embodiment, the setting the weight of each classifier according to the prediction matrix of each classifier includes:
calculating a first average prediction matrix corresponding to the benign renal small mass and a second average prediction matrix corresponding to the malignant renal small mass according to the prediction matrixes of the classifiers;
calculating Euclidean distances from the prediction matrix of each classifier to the first average prediction matrix and the second average prediction matrix respectively;
determining the prediction labels of the N target object data on each classifier according to the Euclidean distance from the prediction matrix of each classifier to the first average prediction matrix and the second average prediction matrix respectively;
and calculating the prediction accuracy parameter of each classifier according to the prediction label of the N target object data on each classifier and the label data respectively included by the N target object data, and determining the weight of each classifier according to the prediction accuracy parameter of each classifier.
The prediction matrix of each classifier may be a prediction matrix obtained by performing classification training on each target object data of each classifier. In the prediction matrix of each classifier, each row corresponds to the result of the predicted posterior probability of the current classifier in each projection domain, the first column represents the posterior probability of the current classifier which is judged as the first class under L projections, and the second column represents the posterior probability of the current classifier which is judged as the second class under L projections. In the corresponding distance matrix, each row corresponds to a distance result of one type of classifier, the first column represents the distance from the prediction matrix of each classifier to the first average prediction matrix, and the second column represents the distance from the prediction matrix of each classifier to the second average prediction matrix.
Specifically, the determining process of the first average prediction matrix or the second average prediction matrix includes:
Figure BDA0002409421560000091
Figure BDA0002409421560000092
in the formula (I), the compound is shown in the specification,
Figure BDA0002409421560000093
representing the g-th average prediction matrix, the value of g is 1 or 2,
Figure BDA0002409421560000094
indicates the ith target object data, N indicates the number of target object data,
Figure BDA0002409421560000095
a prediction matrix, y, representing the m-th classifier with respect to the i-th target object datagA label g representing the category of the user is shown,
Figure BDA0002409421560000096
to represent
Figure BDA0002409421560000097
When the category label g takes 1, it indicates that the renal mass described by the corresponding target object data is benign, and when the category label g takes 2, it indicates that the renal mass described by the corresponding target object data is malignant.
Specifically, the calculation process of the euclidean distance includes:
Figure BDA0002409421560000101
in the formula (I), the compound is shown in the specification,
Figure BDA0002409421560000102
and the Euclidean distance from the mth classifier to the G-th average prediction matrix relative to the ith target object data is represented, L represents the number of random projection matrixes, G represents the number of categories, and if the categories of the renal small masses comprise benign and malignant states, the value of G is 2.
Specifically, the process of determining the predictive label of a target object data on a classifier comprises the following steps:
Figure BDA0002409421560000103
in the formula (I), the compound is shown in the specification,
Figure BDA0002409421560000104
indicating the target object data (e.g., ith target object data) classified in the classifier, ysRepresenting the label data predicted by the classifier for the target object data,
Figure BDA0002409421560000105
to represent
Figure BDA0002409421560000106
Predictive tag (i.e., tag data), symbol of
Figure BDA0002409421560000107
When the equation behind the symbol is satisfied, the category attribution relation in front of the symbol is obtained, the symbol min represents the minimum value,
Figure BDA0002409421560000108
and the Euclidean distance from the mth classifier to the mth average prediction matrix relative to the ith target object data is shown, and the subscript s represents the class index with the minimum distance.
Specifically, the process of determining the weight of each classifier according to the prediction accuracy parameter of each classifier includes:
Figure BDA0002409421560000109
in the formula, ωmWeight, acc, representing the mth classifiermA prediction accuracy parameter, acc, representing the mth classifierminRepresenting the minimum of the respective prediction accuracy parameters, accmaxRepresents the maximum of the respective prediction accuracy parameters. Wherein M is 1,2m∈[0,1]。
In this embodiment, the weights of the classifiers are calculated according to the prediction accuracy parameters of the target object data by the classifiers, so that the validity of the calculated weights of the classifiers can be ensured, and the reliability of the subsequent fusion processing by using the corresponding data to be classified of the classifiers according to the weights can be ensured.
Further, performing fusion processing on the data to be classified by adopting each trained classifier according to the corresponding weight to determine the category of the data to be classified comprises:
splicing Euclidean distances from each prediction matrix to a first average prediction matrix and a second average prediction matrix according to rows to obtain prediction distance matrices, weighting Euclidean distances corresponding to corresponding classifiers in the prediction distance matrices by adopting the weights of the classifiers to obtain first weighted distance matrices, and grouping and averaging the first weighted distance matrices according to label data of target object data to obtain a first average distance matrix and a second average distance matrix;
projecting the data to be classified through L random projection matrixes to obtain L sets of classified projection data, inputting the L sets of classified projection data into each trained classifier for prediction respectively, and obtaining a classified prediction matrix obtained by each classifier according to the prediction of the data to be classified;
calculating Euclidean distances from each classified prediction matrix to the first average prediction matrix and the second average prediction matrix respectively;
splicing Euclidean distances from each classified prediction matrix to the first average prediction matrix and the second average prediction matrix according to rows to obtain a classified distance matrix, and weighting Euclidean distances corresponding to corresponding classifiers in the classified distance matrix by adopting the weight of each classifier to obtain a second weighted distance matrix;
substituting the second weighted distance matrix, the first average distance matrix and the second average distance matrix into a classification formula to determine the category of the data to be classified; the classification formula includes:
Figure BDA0002409421560000111
Figure BDA0002409421560000112
in the formula (I), the compound is shown in the specification,
Figure BDA0002409421560000113
representing data to be classified (such as test data or data describing the renal mass of the category to be determined etc.),
Figure BDA0002409421560000114
a second weighted distance matrix is represented that is,
Figure BDA0002409421560000115
denotes the G-th average distance matrix, G denotes the number of classes of data to be classified, ysRepresenting the label data predicted by the classifier for the data to be classified,
Figure BDA0002409421560000116
to represent
Figure BDA0002409421560000117
The subscript s denotes the category index with the smallest distance.
Preferably, the determination formula of the first average distance matrix or the second average distance matrix includes:
Figure BDA0002409421560000121
in the formula (I), the compound is shown in the specification,
Figure BDA0002409421560000122
a first weighted distance matrix is represented.
In one example, the computation of prediction matrices (e.g., a first average prediction matrix and a second average prediction matrix) for each classifier N target object data is consistent with the idea of computing prediction matrices (e.g., classification prediction matrices) for the data to be classified. Correspondingly, in each process, the thought of calculating the euclidean distance to obtain the corresponding weighted distance matrix (such as the first weighted distance matrix and the second weighted distance matrix) is also consistent, and the following describes the process of calculating various distance matrices:
s501, respectively enabling the prediction matrixes to reach Euclidean distances of a first average prediction matrix and a second average prediction matrix, and enabling ith target object data
Figure BDA0002409421560000123
Splicing Euclidean distances corresponding to the prediction matrixes in all the classifiers according to rows to obtain a distance matrix
Figure BDA0002409421560000124
(e.g., a predicted distance matrix) as shown in the following equation:
Figure BDA0002409421560000125
s502, adopting the weight acc of each classifiermWeighting the distance matrix to obtain a weighted distance matrix
Figure BDA0002409421560000126
(e.g., a first weighted distance matrix) as shown in the following equation:
Figure BDA0002409421560000127
s503, obtaining the weighted distance matrix in the step S502
Figure BDA0002409421560000128
Averaging according to the label data groups of the target object according to the following formula to obtain an average distance matrix
Figure BDA0002409421560000129
(e.g., the first average distance matrix or the second average distance matrix):
Figure BDA0002409421560000131
s511, if the data to be classified is the fixed test target object xtestX is to betestBy means of L random projection matrix projections,obtaining L sets of classified projection data
Figure BDA0002409421560000132
S512, obtaining prediction matrixes obtained by classifying the classified projection data by each classifier
Figure BDA0002409421560000133
(e.g., a class prediction matrix);
s513, calculating Euclidean distances from each classified prediction matrix to the first average prediction matrix and the second average prediction matrix respectively;
s514, obtaining a weighted distance matrix according to the calculation method shown in the step S501 to the step S502
Figure BDA0002409421560000134
(second weighted distance matrix);
s515, calculating according to the first formula in the classification formulas
Figure BDA0002409421560000135
And the average distance matrix in step S503
Figure BDA0002409421560000136
The Euclidean distance of (a) is calculated according to a second formula of the classification formula to obtain a test target object xtestThe final classification of (1);
the first of the classification formulas is:
Figure BDA0002409421560000137
the first of the classification formulas is:
Figure BDA0002409421560000138
according to the embodiment, the data to be classified is fused according to the corresponding weights through a plurality of different classifiers, a hierarchical structure is provided, the diversity and the structural advantages of the classifiers are integrated, the data to be classified is classified, the robustness of the classification process can be improved, and the reliability of the classification result is further improved.
In an embodiment, the projecting the N sets of radiologic characteristic data through L random projection matrices to obtain L sets of projection characteristic data includes:
Figure BDA0002409421560000141
in the formula (I), the compound is shown in the specification,
Figure BDA0002409421560000142
the projection characteristic data of the first set is represented, D represents N pieces of radiologic characteristic data, particularly a set comprising N pieces of radiologic characteristic data, and the projection characteristic data can also be written as
Figure BDA0002409421560000143
PlAnd the projection of the ith random projection matrix is represented, q represents the data dimension of a projection domain corresponding to the projection of the random projection matrix, and the value range of L is 1-L. In addition, L random projection matrix projections may be used
Figure BDA0002409421560000144
It is shown that,
Figure BDA0002409421560000145
namely, it is
Figure BDA0002409421560000146
Is a representation of the N radiosomic property data in the new projection domain, where the upper index Λ may represent the projection domain.
Specifically, each random projection matrix in the projection process can be determined by the following formula:
Figure BDA0002409421560000147
wherein P represents a random projection matrix, rijRandomly taking a value from the values in the set,subscript i denotes the row number of P and subscript j denotes the column number of P.
The set of settings may be determined in dependence on the associated projection characteristics. For example, the set of settings may be
Figure BDA0002409421560000148
At this time rijFrom can be according to probability
Figure BDA0002409421560000149
pro(rij0) 1/3 was obtained randomly.
Further, the determination of the data dimension q includes: when p > q0When q is equal to q0(ii) a When p is less than or equal to q0When q is p/2; wherein q is0=[2×ln(n)/ε2]And epsilon is 0.25, and p represents the dimension of the pre-projection radiology characteristic data.
In this embodiment, the above formula is adopted to project the N pieces of radiologic characteristic data D L times, such as to make L at [1, L]In the process of sequentially taking values in the numerical value interval, the method adopts
Figure BDA00024094215600001410
And respectively projecting the N sets of radiologic characteristic data D to obtain L sets of projection characteristic data and ensure the effectiveness of the obtained L sets of projection characteristic data.
In one embodiment, if the classifier specifically includes: a Bayes classifier, a logistic regression classifier, a quadratic discriminant analysis classifier, a K nearest neighbor classifier, a decision tree classifier, a random forest classifier and an XGboost classifier. The scimit-learn machine learning software package in the Python programming language environment is used for constructing each classifier model, L sets of projection characteristic data and corresponding label data are respectively input into each classifier model, a fit function is respectively called for each classifier model for training, a prediction matrix of each classifier is obtained, and the training process of each classifier specifically comprises the following steps:
training process of Bayes classifier: constructing a Bayesian model by adopting a scimit-learn machine learning software package under a Python programming language environment, respectively inputting L sets of projection characteristic data and corresponding label data into the Bayesian model, and calling a fit function for training to obtain a prediction matrix of a Bayesian classifier;
training the logistic regression classifier: constructing a logistic regression model by adopting a scimit-learn machine learning software package under a Python programming language environment, respectively inputting L sets of projection characteristic data and corresponding label data into the logistic regression model, and calling a fit function for training to obtain a prediction matrix of the logistic regression classifier;
training the secondary discriminant analysis classifier: constructing a secondary discriminant analysis model by adopting a scimit-learn machine learning software package in a Python programming language environment, respectively inputting L sets of projection characteristic data and corresponding label data into the secondary discriminant analysis model, and calling a fit function for training to obtain a prediction matrix of a secondary discriminant analysis classifier;
training the K neighbor classifier: constructing a K neighbor model by adopting a scimit-learn machine learning software package under a Python programming language environment, respectively inputting L sets of projection characteristic data and corresponding label data into the K neighbor model, and calling a fit function for training to obtain a prediction matrix of a K neighbor classifier;
training the decision tree classifier: constructing a decision tree model by adopting a scimit-learn machine learning software package in a Python programming language environment, respectively inputting L sets of projection characteristic data and corresponding label data into the decision tree model, and calling a fit function for training to obtain a prediction matrix of a decision tree classifier;
training a random forest classifier: the method comprises the steps of constructing a random forest model by adopting a scimit-learn machine learning software package in a Python programming language environment, inputting L sets of projection characteristic data and corresponding label data into the random forest model respectively, and calling a fit function for training to obtain a prediction matrix of a random forest classifier.
The XGboost classifier training process comprises the following steps: the XGboost model is constructed by adopting a scimit-learn machine learning software package in a Python programming language environment, L sets of projection characteristic data and corresponding label data are respectively input into the XGboost model, and a fit function is called for training to obtain a prediction matrix of the random forest classifier.
Further, in the training process of the 7 classifiers, the process of obtaining the prediction matrix includes:
the Bayes classifier calls a predict _ proba function to predict to obtain the predicted posterior probability of each set of projection characteristic data, and then the predicted posterior probabilities are spliced into a prediction matrix Q related to the Bayes classifier according to rows1
The logistic regression classifier calls a predict _ proba function to predict to obtain the predicted posterior probability of each set of projection characteristic data, and the predicted posterior probabilities are spliced into a prediction matrix Q related to the quadratic discriminant logistic regression classifier according to rows2
The secondary discriminant analysis classifier calls a predict _ proba function to predict to obtain the predicted posterior probability of each set of projection characteristic data, and then the predicted posterior probabilities are spliced into a prediction matrix Q related to the secondary discriminant analysis classifier according to rows3
The K neighbor classifier calls a predict _ proba function to predict to obtain the predicted posterior probability of each set of projection characteristic data, and the predicted posterior probabilities are spliced into a prediction matrix Q related to the K neighbor classifier according to rows4
The decision tree classifier calls a predict _ proba function to predict to obtain the predicted posterior probability of each set of projection characteristic data, and the predicted posterior probabilities are spliced into a prediction matrix Q related to the decision tree classifier according to rows5
The random forest classifier calls a predict _ proba function to predict to obtain the predicted posterior probability of each set of characteristic data, and the predicted posterior probabilities are spliced into a prediction matrix Q related to the random forest classifier according to rows6
The XGboost classifier calls a predict _ proba function to predict to obtain the predicted posterior probability of each set of characteristic data, and then the predicted posterior probabilities are spliced into a prediction matrix Q related to the XGboost classifier according to rows7
The embodiment specifically adopts a Bayesian classifier, a logistic regression classifier, a quadratic discriminant analysis classifier, a K nearest neighbor classifier, a decision tree classifier, a random forest classifier and an XGboost classifier, and the classifiers with stable performance respectively train the L sets of projection characteristic data, so that the stability of the training process can be improved, and the reliability of the training result is further ensured.
In one embodiment, the random projection-based method for classifying renal masses acquires a matrix of classifier weights and mean distances from 130 pathologically confirmed renal mass patients from target object data acquired
Figure BDA0002409421560000171
The process of (a) is explained.
Target subject data collected from 130 pathologically confirmed patients with renal masses includes non-enhanced CT images of the renal masses of the patient subjects, corresponding mask images and labeled data of the types of renal masses. The clinical information of the patients with renal small tumors can be referred to table 1, in table 1, AMLwvf represents fatty-poor renal vascular smooth muscle lipoma, RCC renal cell carcinoma represents, and P value is used to determine whether the null hypothesis is true (the current null hypothesis is that the characteristics in the table are not different between two types of renal small tumor patients).
TABLE 1 clinical information on patients with renal small tumors
Figure BDA0002409421560000172
The data for these 130 pathologically confirmed renal small tumor patients included 94 patients with renal cell carcinoma and 36 patients with lipo-poor renal vascular smooth muscle lipoma. Based on the collected target object data of 130 pathologically confirmed patients with renal masses, the above method for classifying renal masses based on random projections may specifically include:
step one, data input:
respectively inputting target object data from 130 patients with renal small masses, wherein the target object data comprises target object non-enhanced CT images, corresponding mask images and target object label data, and thus obtaining 130 non-enhanced CT images, 130 corresponding mask image data and 130 target object renal small mass type label data;
step two, outputting data characteristics:
and 2.1, extracting characteristic data. The acquisition of the radiologic characteristic data is completed by using an open source python package radiomics, and the acquisition is performed on an ROI (region of interest) sketched on a non-enhanced CT (computed tomography) image of each target object to obtain the imaging characteristic data of the target object and output the characteristic data (radiologic characteristic data).
Step 2.2, 103 radiology features can be obtained from step 2.1, as shown in table 2, the radiology features are divided into three categories, 1) shape features; 2) first order statistical features (histogram analysis); 3) textural features (image gray scale distribution). Since the target object data may have data imbalance, this embodiment employs a synthetic minority class oversampling technique (SMOTE), which oversamples a minority class of target objects with fat-poor renal vascular smooth muscle lipoma by introducing synthetic feature samples.
Table 2103 radiologic features
Figure BDA0002409421560000181
And 2.3, besides performing class balance processing on the target object in the step 2.2, further processing the target data by adopting an f _ score feature selection method, performing feature selection, and performing dimension reduction on the feature space to avoid the over-fitting condition.
Step three, data processing
And 3.1, creating a Random Projection matrix P according to the formula (I), and creating a Random Projection matrix according to the theory of Random Projection (RP) and J-L theorem.
Figure BDA0002409421560000191
Where q is the data dimension in the new projection domain, the element r in the random matrixijFrom
Figure BDA0002409421560000192
According to the outlineRate of change
Figure BDA0002409421560000193
pro(rij0) 2/3 was obtained randomly.
And 3.2, projecting the 130 characteristic data obtained in the second step to a new characteristic space through the L random projection matrixes obtained in the step 3.1 according to the formula (II) to obtain L sets of new characteristic data (projection characteristic data).
Figure BDA0002409421560000194
Wherein
Figure BDA0002409421560000195
D is the set of 130 target objects obtained in step two,
Figure BDA0002409421560000196
p is the original data dimension of the target object,
Figure BDA0002409421560000197
Figure BDA0002409421560000198
is a representation of 130 target objects in the new projection domain, the symbol a representing the projection domain.
Step four, constructing a multi-classifier system model for classifying the benign and malignant renal masses based on random projection:
step 4.1, training a classifier,
specifically, the Bayes training is to adopt scimit-learn machine learning software package under Python programming language environment to construct Bayes model, then to input L sets of new characteristic data and label data into Bayes model, to call fit function to train, to store and output;
specifically, the training of the logistic regression is to adopt a scimit-lean machine learning software package under a Python programming language environment to construct a logistic regression model, then input L sets of new characteristic data and label data into the logistic regression model respectively, call a fit function for training, store and output;
the training of the secondary discriminant analysis specifically comprises the steps of constructing a discriminant analysis model by adopting a scimit-learn machine learning software package in a Python programming language environment, inputting L sets of new characteristic data and label data into the secondary discriminant analysis model respectively, calling a fit function for training, storing and outputting;
the K neighbor training specifically comprises the steps of constructing a discriminant analysis model by adopting a scimit-learn machine learning software package under a Python programming language environment, inputting L sets of new characteristic data and label data into the K neighbor model, calling a fit function for training, storing and outputting;
specifically, the scimit-lean machine learning software package in a Python programming language environment is adopted to construct a decision tree model, then L sets of new characteristic data and label data are input into the decision tree model respectively, a fit function is called to train, and the training is stored and output;
specifically, the random forest training method comprises the steps of constructing a random forest model by adopting a scimit-learn machine learning software package in a Python programming language environment, inputting L sets of new characteristic data and label data into the random forest model, calling a fit function for training, storing and outputting;
the XGboost training specifically comprises the steps of adopting a scimit-leann machine learning software package under a Python programming language environment to construct an XGboost model, inputting L sets of new characteristic data and label data into the XGboost model, calling a fit function to train, storing and outputting;
step 4.2, determining the weight of each classifier,
step 4.2.1, respectively inputting the L sets of new characteristic data obtained in the step three into a Bayes classifier, a logistic regression classifier, a quadratic discriminant analysis classifier, a K neighbor classifier, a decision tree classifier, a random forest classifier and an XGboost classifier, respectively calling a predict _ proba function in each classifier to predict to obtain a predicted posterior probability of each set of characteristic data, and splicing the predicted posterior probabilities into corresponding pre-predictions in each classifier according to rowsA measurement matrix comprising: prediction matrix Q of Bayesian classifier1Prediction matrix Q of quadratic discriminant logistic regression classifier2Prediction matrix Q of quadratic discriminant analysis classifier3Prediction matrix Q of K neighbor classifier4Prediction matrix Q of decision tree classifier5Prediction matrix Q of random forest classifier6And the prediction matrix Q of the XGboost classifier7
Step 4.2.2, according to each prediction matrix obtained in the step 4.2.1, respectively calculating an average prediction matrix of the prediction matrix of each classifier according to the label data grouping of the target object by using a formula (III)
Figure BDA0002409421560000211
Figure BDA0002409421560000212
Figure BDA0002409421560000213
Figure BDA0002409421560000214
Wherein g is 1, 2.
Step 4.2.3, average prediction matrix obtained according to step 4.2.2
Figure BDA0002409421560000215
Calculating the Euclidean distance between the prediction matrix of each classifier obtained in the step 4.2.1 and the corresponding average prediction matrix according to the formula (VI)
Figure BDA0002409421560000216
Figure BDA0002409421560000217
Then, the predicted labels of the 130 target objects on each classifier are obtained according to formula (VII).
Figure BDA0002409421560000218
Where the subscript s denotes the category index with the smallest distance.
Step 4.2.4, respectively calculating the prediction accuracy parameters acc of all classifiers according to the prediction labels of the 130 target objects on each classifier obtained in the step 4.2.3 and the label data obtained in the step onemAnd the process proceeds to step 4.2.5,
step 4.2.5, according to the prediction accuracy parameter acc of each classifier obtained according to the step 4.2.4mThe weight of each classifier is calculated according to equation (VIII).
Figure BDA0002409421560000221
Step 4.3, obtaining an average distance matrix,
step 4.3.1, at the same time, calculating Euclidean distance between the prediction matrix of each classifier and the corresponding average prediction matrix according to the step 4.2, and enabling the target object to be in contact with the target object
Figure BDA0002409421560000222
The results in all classifiers are spliced according to rows to obtain a distance matrix
Figure BDA0002409421560000223
As shown in the formula (IX),
Figure BDA0002409421560000224
wherein G is 2 and M is 7.
Step 4.3.2, the weight acc of each classifier obtained in the step 4.2mWeighting to the classifier corresponding to the distance matrix obtained in step 4.3.1 to obtain a weighted distance matrix
Figure BDA0002409421560000225
As shown in formula (X)
Figure BDA0002409421560000226
Step 4.3.3, obtaining the weighted distance matrix in the step 4.3.2
Figure BDA0002409421560000227
Averaging according to formula (XI) according to the label data packet of the target object to obtain an average distance matrix
Figure BDA0002409421560000228
Figure BDA0002409421560000229
Further, 33 cases of pre-operative data of pathologically confirmed patients with renal small mass were collected as test data, and the classifier weight and average distance matrix obtained in this example was used
Figure BDA00024094215600002210
The application test is carried out aiming at the test data, and the specific steps are as follows:
step one, data input:
respectively inputting 33 pieces of non-enhanced CT images of target objects of the patients with the renal mass confirmed by pathology, corresponding mask images and label data of the target objects to obtain 33 non-enhanced CT images, 33 corresponding mask image data and 33 label data of types of the renal mass of the target objects;
step two, outputting data characteristics: performing characteristic output on a non-enhanced CT image and corresponding mask image data of a target object of a pathologically confirmed renal small tumor patient according to the steps 2.1 to 2.3 of the data characteristic output part of the target object of a single pathologically confirmed renal small tumor patient;
step three, data processing: and (3) randomly projecting 30 target objects according to the steps 3.1 to 3.2 of the data processing part to obtain new characteristic data in different projection domains.
Step four, processing by the multiple classifiers: inputting the characteristic data of the third step into each classifier for corresponding fusion processing.
Step five, classifying by multiple classifiers: and inputting characteristic data in the third step of a single patient with the renal mass confirmed by the pathology into Bayes, logistic regression, secondary discriminant analysis, K neighbor, decision tree, random forest and XGboost which are constructed in the fourth step. A weighted distance matrix is obtained for a single patient target subject with a pathologically confirmed renal small mass according to steps 4.2.2 through 4.3.2.
Step six, classifying the benign or malignant renal masses of a single pathologically confirmed renal mass patient subject according to step 4.4.
Seventhly, repeating the sixth step until all 33 patients with the pathologically confirmed renal small masses are classified into benign and malignant renal small masses.
Step eight, calculating the performance indexes of classification accuracy, AUC, sensitivity and specificity of the whole set of system according to the classification results of benign and malignant renal masses of 33 patients with pathologically confirmed renal masses, comparing the performance indexes of the classification method of renal masses based on random projection provided by the present invention with the performance of each base classifier, and the comparison results are shown in table 3, wherein the numbers indicate that the statistical differences exist between the experimental results obtained in this example and the single classification used in the experiment.
TABLE 3
Figure BDA0002409421560000241
As can be seen from table 3, the embodiment, after fusing multiple classifiers, is generally superior to all the single classifiers in terms of accuracy and AUC; meanwhile, when the significance level is 0.05 by using the wilcoxon signed rank test, whether the prediction result of the embodiment is significantly different from that of a single classifier is compared, the sign in table 3 indicates that the prediction result is significantly different, and the result shows that the result is generally less than 0.05, which indicates that the significance difference exists between the multi-classifier fusion scheme provided by the embodiment and the single classifier.
In the embodiment, the non-enhanced scanning CT image of the target object data and the extracted imaging characteristics of the corresponding mask image are preprocessed and then randomly projected for multiple times to obtain multiple new data sets generated by the original data, the data sets pass through different classifiers, the results of the new data sets obtained by the classifiers are fused, the results of the different classifiers are further fused to obtain a hierarchical structure, the diversity and the structure of the multiple classifiers are comprehensively considered, and the robustness of the integrated application of the multiple classifiers can be improved. The classification scheme implemented for small kidney masses described above can be applied in the individualized disease diagnosis process to assist in guiding clinical decisions.
In one embodiment, as shown in fig. 3, there is provided a random projection-based renal mass classification apparatus including:
an obtaining module 10, configured to obtain N target object data describing a renal mass; the target object data includes a CT scout image, a mask image and label data of the corresponding kidney small tumor; the label data characterizes the respective renal mass as benign or malignant;
the extraction module 20 is configured to perform target region delineation on each CT scout image according to each mask image to obtain an interested region of each CT scout image, and perform radiology characteristic data extraction on each interested region to obtain N radiology characteristic data;
the projection module 30 is configured to project the N sets of radiologic characteristic data through L random projection matrices to obtain L sets of projection characteristic data;
the setting module 40 is configured to perform multiple classifier training on the L sets of projection characteristic data, to obtain a prediction matrix of each classifier and each trained classifier, and set a weight of each classifier according to the prediction matrix of each classifier;
and the determining module 50 is configured to perform fusion processing on the data to be classified according to the corresponding weights by using the trained classifiers, so as to determine the category of the data to be classified.
For the specific definition of the renal mass classifying device based on the stochastic projection, reference may be made to the above definition of the renal mass classifying method based on the stochastic projection, and details thereof are not repeated here. The modules of the above-mentioned renal mass classifying device based on random projection can be implemented in whole or in part by software, hardware and their combination. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for classifying a renal mass based on stochastic projections. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Based on the above examples, in one embodiment, an intelligent terminal device is further provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement any one of the above methods for classifying renal small masses based on stochastic projection.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a non-volatile computer-readable storage medium, and executed by at least one processor of a computer system according to the embodiments of the present invention, to implement the processes of the embodiments including the above random projection-based renal mass classification method. The storage medium may be a magnetic disk, an optical disk, a Read-only Memory (ROM), a Random Access Memory (RAM), or the like.
Accordingly, in an embodiment, there is also provided a computer storage medium having a computer program stored thereon, wherein the program when executed by a processor implements any one of the above-described methods for classifying renal small masses based on stochastic projection.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application.

Claims (10)

1. A method for classifying renal masses based on stochastic projection, the method comprising:
s10, acquiring N target object data describing the kidney small tumor; the target object data includes a CT scout image, a mask image and label data of the corresponding kidney small tumor; the label data characterizes the respective renal mass as benign or malignant;
s20, performing target region delineation on each CT flat scanning image according to each mask image to obtain an interested region of each CT flat scanning image, and performing radiology characteristic data extraction on each interested region to obtain N pieces of radiology characteristic data;
s30, projecting the N sets of radiologic characteristic data through L random projection matrixes to obtain L sets of projection characteristic data;
s40, respectively carrying out multiple classifier training on the L sets of projection characteristic data to obtain a prediction matrix of each classifier and each trained classifier, and setting the weight of each classifier according to the prediction matrix of each classifier;
and S50, fusing the data to be classified by adopting the trained classifiers according to the corresponding weights to determine the category of the data to be classified.
2. The method of claim 1, wherein setting the weight of each classifier according to the prediction matrix of each classifier comprises:
calculating a first average prediction matrix corresponding to the benign renal small mass and a second average prediction matrix corresponding to the malignant renal small mass according to the prediction matrixes of the classifiers;
calculating Euclidean distances from the prediction matrix of each classifier to the first average prediction matrix and the second average prediction matrix respectively;
determining the prediction labels of the N target object data on each classifier according to the Euclidean distance from the prediction matrix of each classifier to the first average prediction matrix and the second average prediction matrix respectively;
and calculating the prediction accuracy parameter of each classifier according to the prediction label of the N target object data on each classifier and the label data respectively included by the N target object data, and determining the weight of each classifier according to the prediction accuracy parameter of each classifier.
3. The method of claim 2, wherein the determining of the first average prediction matrix or the second average prediction matrix comprises:
Figure FDA0002409421550000021
Figure FDA0002409421550000022
in the formula (I), the compound is shown in the specification,
Figure FDA0002409421550000023
representing the g-th average prediction matrix, the value of g is 1 or 2,
Figure FDA0002409421550000024
indicates the ith target object data, N indicates the number of target object data,
Figure FDA0002409421550000025
a prediction matrix representing the mth classifier with respect to the ith target object data,
Figure FDA0002409421550000026
to represent
Figure FDA0002409421550000027
Tag data of ygA presentation category label g;
the calculation process of the Euclidean distance comprises the following steps:
Figure FDA0002409421550000028
in the formula (I), the compound is shown in the specification,
Figure FDA0002409421550000029
the Euclidean distance from the mth classifier to the gth average prediction matrix relative to the ith target object data is represented, L represents the number of random projection matrixes, and G represents the number of categories;
the process of determining a predictive label for target object data on a classifier comprises:
Figure FDA00024094215500000210
in the formula (I), the compound is shown in the specification,
Figure FDA00024094215500000211
indicating that the classifier is performing classificationTarget object data of class, ysRepresenting the label data predicted by the classifier for the target object data,
Figure FDA00024094215500000212
to represent
Figure FDA00024094215500000213
Predictive labels, symbols of
Figure FDA00024094215500000214
When the equation behind the symbol is expressed, obtaining the category attribution relation in front of the symbol;
the process of determining the weight of each classifier according to the prediction accuracy parameter of each classifier comprises the following steps:
Figure FDA0002409421550000031
in the formula, ωmWeight, acc, representing the mth classifiermA prediction accuracy parameter, acc, representing the mth classifierminRepresenting the minimum of the respective prediction accuracy parameters, accmaxRepresents the maximum of the respective prediction accuracy parameters.
4. The method according to claim 3, wherein the using of the trained classifiers to perform fusion processing on the data to be classified according to the corresponding weights to determine the category of the data to be classified comprises:
splicing Euclidean distances from each prediction matrix to a first average prediction matrix and a second average prediction matrix according to rows to obtain prediction distance matrices, weighting Euclidean distances corresponding to corresponding classifiers in the prediction distance matrices by adopting the weights of the classifiers to obtain first weighted distance matrices, and grouping and averaging the first weighted distance matrices according to label data of target object data to obtain a first average distance matrix and a second average distance matrix;
projecting the data to be classified through L random projection matrixes to obtain L sets of classified projection data, inputting the L sets of classified projection data into each trained classifier for prediction respectively, and obtaining a classified prediction matrix obtained by each classifier according to the prediction of the data to be classified;
calculating Euclidean distances from each classified prediction matrix to the first average prediction matrix and the second average prediction matrix respectively;
splicing Euclidean distances from each classified prediction matrix to the first average prediction matrix and the second average prediction matrix according to rows to obtain a classified distance matrix, and weighting Euclidean distances corresponding to corresponding classifiers in the classified distance matrix by adopting the weight of each classifier to obtain a second weighted distance matrix;
substituting the second weighted distance matrix, the first average distance matrix and the second average distance matrix into a classification formula to determine the category of the data to be classified; the classification formula includes:
Figure FDA0002409421550000041
Figure FDA0002409421550000042
in the formula (I), the compound is shown in the specification,
Figure FDA0002409421550000043
the data to be classified is represented by a table,
Figure FDA0002409421550000044
a second weighted distance matrix is represented that is,
Figure FDA0002409421550000045
denotes the G-th average distance matrix, G denotes the number of classes of data to be classified, ysRepresenting the label data predicted by the classifier for the data to be classified,
Figure FDA0002409421550000046
to represent
Figure FDA0002409421550000047
The predictive tag of (1).
5. The method of claim 4, wherein the formula for determining the first or second average distance matrix comprises:
Figure FDA0002409421550000048
in the formula (I), the compound is shown in the specification,
Figure FDA0002409421550000049
a first weighted distance matrix is represented.
6. The method of claim 1, wherein the projecting the N sets of radiologic characteristic data through L random projection matrices to obtain L sets of projection characteristic data comprises:
Figure FDA00024094215500000410
in the formula (I), the compound is shown in the specification,
Figure FDA00024094215500000411
representing the first set of projection characteristic data, D representing N sets of radiologic characteristic data, PlRepresenting the ith random projection matrix projection, and q representing the data dimension of the projection domain corresponding to the random projection matrix projection.
7. The method of claim 6, wherein the random projection matrix is determined by:
Figure FDA00024094215500000412
wherein P represents a random projection matrix, rijRandomly taking values from a set of settingsThe value, subscript i denotes the row number of P and subscript j denotes the column number of P;
and/or the determination mode of the data dimension q comprises the following steps: when p > q0When q is equal to q0(ii) a When p is less than or equal to q0When q is p/2; wherein q is0=[2×ln(n)/ε2]And epsilon is 0.25, and p represents the dimension of the pre-projection radiology characteristic data.
8. The method according to any one of claims 1 to 7, wherein the training of a plurality of classifiers is performed on each of the L sets of projection characteristic data, and obtaining the prediction matrix of each classifier comprises:
constructing each classifier model by adopting scimit-leann machine learning software package under Python programming language environment, respectively inputting L sets of projection characteristic data and corresponding label data into each classifier model, and respectively calling fit functions for each classifier model to train so as to obtain a prediction matrix of each classifier.
9. The method of any of claims 1 to 7, wherein the classifier comprises: a Bayes classifier, a logistic regression classifier, a quadratic discriminant analysis classifier, a K nearest neighbor classifier, a decision tree classifier, a random forest classifier and an XGboost classifier.
10. The method of any one of claims 1 to 7, wherein the CT plan image is a non-contrast enhanced CT scan image;
and/or each of the radiologic characteristic data comprises a plurality of radiologic features; the radiologic features include shape features, first order statistical features, and texture features.
CN202010171801.XA 2020-03-12 2020-03-12 Renal mass classification method based on random projection Active CN111340135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010171801.XA CN111340135B (en) 2020-03-12 2020-03-12 Renal mass classification method based on random projection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010171801.XA CN111340135B (en) 2020-03-12 2020-03-12 Renal mass classification method based on random projection

Publications (2)

Publication Number Publication Date
CN111340135A true CN111340135A (en) 2020-06-26
CN111340135B CN111340135B (en) 2021-07-23

Family

ID=71182399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010171801.XA Active CN111340135B (en) 2020-03-12 2020-03-12 Renal mass classification method based on random projection

Country Status (1)

Country Link
CN (1) CN111340135B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011462A (en) * 2021-02-22 2021-06-22 广州领拓医疗科技有限公司 Classification and device of tumor cell images
CN113902724A (en) * 2021-10-18 2022-01-07 广州医科大学附属肿瘤医院 Method, device, equipment and storage medium for classifying tumor cell images
CN114897796A (en) * 2022-04-22 2022-08-12 深圳市铱硙医疗科技有限公司 Method, device, equipment and medium for judging stability of atherosclerotic plaque
CN116611025A (en) * 2023-05-19 2023-08-18 贵州师范大学 Multi-mode feature fusion method for pulsar candidate signals
CN116805536A (en) * 2023-08-22 2023-09-26 乐陵市人民医院 Data processing method and system based on tumor case follow-up

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101965588A (en) * 2008-01-31 2011-02-02 伊利诺伊大学评议会 Recognition via high-dimensional data classification
CN103164710A (en) * 2013-02-19 2013-06-19 华南农业大学 Selection integrated face identifying method based on compressed sensing
CN104966100A (en) * 2015-06-17 2015-10-07 北京交通大学 A benign and malignant image lump classification method based on texture primitives
CN107403201A (en) * 2017-08-11 2017-11-28 强深智能医疗科技(昆山)有限公司 Tumour radiotherapy target area and jeopardize that organ is intelligent, automation delineation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101965588A (en) * 2008-01-31 2011-02-02 伊利诺伊大学评议会 Recognition via high-dimensional data classification
CN103164710A (en) * 2013-02-19 2013-06-19 华南农业大学 Selection integrated face identifying method based on compressed sensing
CN104966100A (en) * 2015-06-17 2015-10-07 北京交通大学 A benign and malignant image lump classification method based on texture primitives
CN107403201A (en) * 2017-08-11 2017-11-28 强深智能医疗科技(昆山)有限公司 Tumour radiotherapy target area and jeopardize that organ is intelligent, automation delineation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RUIMENG YANG 等: "Radiomics of small renal masses on multiphasic CT: accuracy of machine learning–based classification models for the differentiation of renal cell carcinoma and angiomyolipoma without visible fat", 《SPRINGER》 *
TIEN THANH NGUYEN 等: "A weighted multiple classifier framework based on random projection", 《ELSEVIER》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011462A (en) * 2021-02-22 2021-06-22 广州领拓医疗科技有限公司 Classification and device of tumor cell images
CN113011462B (en) * 2021-02-22 2021-10-22 广州领拓医疗科技有限公司 Classification and device of tumor cell images
CN113902724A (en) * 2021-10-18 2022-01-07 广州医科大学附属肿瘤医院 Method, device, equipment and storage medium for classifying tumor cell images
CN114897796A (en) * 2022-04-22 2022-08-12 深圳市铱硙医疗科技有限公司 Method, device, equipment and medium for judging stability of atherosclerotic plaque
CN114897796B (en) * 2022-04-22 2023-06-30 深圳市铱硙医疗科技有限公司 Method, device, equipment and medium for judging stability of atherosclerosis plaque
CN116611025A (en) * 2023-05-19 2023-08-18 贵州师范大学 Multi-mode feature fusion method for pulsar candidate signals
CN116611025B (en) * 2023-05-19 2024-01-26 贵州师范大学 Multi-mode feature fusion method for pulsar candidate signals
CN116805536A (en) * 2023-08-22 2023-09-26 乐陵市人民医院 Data processing method and system based on tumor case follow-up

Also Published As

Publication number Publication date
CN111340135B (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN111340135B (en) Renal mass classification method based on random projection
Hu et al. Unsupervised learning for cell-level visual representation in histopathology images with generative adversarial networks
US20220367053A1 (en) Multimodal fusion for diagnosis, prognosis, and therapeutic response prediction
WO2017151759A1 (en) Category discovery and image auto-annotation via looped pseudo-task optimization
US20220051060A1 (en) Methods for creating privacy-protecting synthetic data leveraging a constrained generative ensemble model
Kumar et al. Future of machine learning (ML) and deep learning (DL) in healthcare monitoring system
Li et al. A hybrid approach for approximating the ideal observer for joint signal detection and estimation tasks by use of supervised learning and markov-chain monte carlo methods
Tian et al. Radiomics and its clinical application: artificial intelligence and medical big data
Gangadharan et al. Comparative analysis of deep learning-based brain tumor prediction models using MRI scan
Jung et al. Weakly supervised thoracic disease localization via disease masks
Ann et al. Multi-scale conditional generative adversarial network for small-sized lung nodules using class activation region influence maximization
Qiu et al. Spiculation sign recognition in a pulmonary nodule based on spiking neural p systems
CN113011462B (en) Classification and device of tumor cell images
Prasad et al. Lung cancer detection and classification using deep neural network based on hybrid metaheuristic algorithm
Doraiswami et al. Jaya‐tunicate swarm algorithm based generative adversarial network for COVID‐19 prediction with chest computed tomography images
Goel et al. Improving YOLOv6 using advanced PSO optimizer for weight selection in lung cancer detection and classification
JP2024500470A (en) Lesion analysis methods in medical images
Branikas et al. Instance selection techniques for multiple instance classification
Wang et al. A comprehensive survey on deep active learning in medical image analysis
Rocha et al. Confident-CAM: Improving Heat Map Interpretation in Chest X-Ray Image Classification
CN115830020B (en) Lung nodule feature extraction method, classification method, device and medium
Gaikwad Deepsampling: Image sampling technique for cost-effective deep learning
Malla et al. Artificial intelligence in breast cancer: An opportunity for early diagnosis
Thanataveerat Clustering algorithm for zero-inflated data
Owais et al. Volumetric Model Genesis in Medical Domain for the Analysis of Multimodality 2-D/3-D Data Based on the Aggregation of Multilevel Features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210208

Address after: Room 144, 30th floor, Xihu commercial building, 12 Xihu Road, Yuexiu District, Guangzhou, Guangdong 510000

Applicant after: Zhen Xin

Address before: 510030 room 144, 30th floor, Xihu commercial building, 12 Xihu Road, Yuexiu District, Guangzhou City, Guangdong Province

Applicant before: Guangzhou lingtuo Medical Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant