CN114999628B - Method for searching for obvious characteristic of degenerative knee osteoarthritis by using machine learning - Google Patents

Method for searching for obvious characteristic of degenerative knee osteoarthritis by using machine learning Download PDF

Info

Publication number
CN114999628B
CN114999628B CN202210445525.0A CN202210445525A CN114999628B CN 114999628 B CN114999628 B CN 114999628B CN 202210445525 A CN202210445525 A CN 202210445525A CN 114999628 B CN114999628 B CN 114999628B
Authority
CN
China
Prior art keywords
feature
knee osteoarthritis
features
encoder
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210445525.0A
Other languages
Chinese (zh)
Other versions
CN114999628A (en
Inventor
张佳
张子龙
龙锦益
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN202210445525.0A priority Critical patent/CN114999628B/en
Publication of CN114999628A publication Critical patent/CN114999628A/en
Application granted granted Critical
Publication of CN114999628B publication Critical patent/CN114999628B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The invention discloses a method for searching for obvious characteristics of degenerative knee osteoarthritis by utilizing machine learning, which particularly relates to the technical field of intelligent medical treatment and comprises the following specific steps: s1, acquiring traditional Chinese medicine and western medicine information of a clinical patient to be diagnosed, preprocessing the information, and constructing a knee osteoarthritis characteristic data set; s2, training the encoder to learn the risk characteristics of the knee osteoarthritis by utilizing the characteristic dimension reduction characteristics of the self-encoder; s3, performing feature ordering on the knee osteoarthritis feature data set by using 6 existing feature selection algorithms; s4, training a model by using an SVM classifier; s5, taking out the features which appear at high frequency in the 6 algorithm results, and comparing the effect of the self-encoder and the traditional feature selection method on the selection risk factors. The risk factors screened by the invention can provide scientific and reliable references for diagnosing knee osteoarthritis in traditional Chinese medicine, and a more accurate and reliable disease identification model is constructed.

Description

Method for searching for obvious characteristic of degenerative knee osteoarthritis by using machine learning
Technical Field
The invention relates to the technical field of intelligent medical treatment, in particular to a method for searching for obvious characteristics of degenerative knee osteoarthritis by utilizing machine learning.
Background
Degenerative knee osteoarthritis belongs to the dominant disease species of orthopaedics in traditional Chinese medicine. In the long-term medical practice of traditional Chinese medicine, the history of doctors accumulates abundant clinical diagnosis experience and forms a complete diagnosis system which is unique to China, namely four diagnosis (inspection, smelling, inquiring and cutting), differentiation of symptoms and differentiation of symptoms. The unique diagnosis method and the knowledge of the vital activity state of the human body of the traditional Chinese medicine diagnostics always play an important role in clinic from ancient times, are continuously enriched and developed, and have certain influence on foreign medicine. Because of the limitation of history conditions, the diagnostic methods of traditional Chinese medicine have a certain subjectivity. For example, tongue diagnosis and pulse diagnosis are unique contents of traditional Chinese medicine, and have important values in diagnosis, but traditional Chinese medicine is based on experience and subjective feeling of eyes and fingers, and lacks objective indexes as standards for judging tongue manifestations and pulse manifestations, so that the values of tongue diagnosis and pulse diagnosis are clarified, and objectified and practical application of the values are the needs of traditional Chinese medicine development. Therefore, along with the transition of modern medical modes, the method for searching the significance characteristics of the degenerative knee osteoarthritis is researched by utilizing the artificial intelligence technology, so that the scientificity and the feasibility of traditional Chinese medicine diagnosis are verified, a more accurate and reliable disease identification model is constructed, the advantages of the artificial intelligence technology are brought into play, and the co-development and prosperity of interdisciences are promoted.
The purpose of data normalization is to eliminate variability between features, facilitating weight learning at a glance. In the machine learning field, different evaluation indexes (i.e. different features in feature vectors are the different evaluation indexes) often have different dimensions and dimension units, and such a situation can affect the result of data analysis, so that in order to eliminate the dimension effect between indexes, data standardization processing is required to solve the comparability between data indexes. After the original data is subjected to data standardization processing, all indexes are in the same order of magnitude, and the method is suitable for comprehensive comparison and evaluation. Of these, the most typical is normalization/normalization of data, which is required when there is an excessive difference between data in the knee osteoarthritis dataset.
The self-encoder is an unsupervised learning algorithm whose output enables reproduction of the input data. The concept of a self-encoder, which was first proposed by Rumelhart et al, is a data compression algorithm that uses an encoder to achieve data compression and a decoder to achieve decompression. The coding stage maps the high-dimension data into low-dimension data, so that the data quantity is reduced; and the decoding stages are reversed exactly so as to realize reproduction of the input data. The self-encoder is applied to various fields such as image classification, face recognition, natural language processing and other fields in the process of optimizing development, and achieves better results. In addition to feature dimension reduction, new features learned by the automatic encoder can be fed into the supervised learning model, so the automatic encoder can function as a feature extractor. The risk factors of knee osteoarthritis can be extracted by using the method in the patent.
Feature selection is an important issue in feature engineering, whose goal is to find the optimal feature subset. The feature selection can eliminate irrelevant or redundant features, thereby achieving the purposes of reducing the number of features, improving the model accuracy and reducing the running time. The method has been widely focused and applied in the fields of pattern recognition, text classification, biological genetics, information retrieval, data analysis and the like. Specifically, the diseased features of the clinical patient include the features of traditional Chinese medicine and western medicine, and hundreds of features are reached, and the optimal feature subset must be found out in order to find the significant features. Thus, based on the problem of finding the significant features of degenerative knee osteoarthritis by artificial intelligence technology, feature selection technology is introduced into finding the significant feature analysis of degenerative knee osteoarthritis.
Disclosure of Invention
Aiming at the degenerative knee osteoarthritis is a traditional Chinese medicine orthopaedics dominant disease species, and the source of diagnostic information is various and subjective, the invention provides a method for searching for the obvious characteristic of the degenerative knee osteoarthritis by utilizing machine learning.
In order to achieve the above purpose, the present invention provides the following technical solutions: a method for searching for the significant characteristics of degenerative knee osteoarthritis by using machine learning, which comprises the following specific steps:
s1, acquiring information of traditional Chinese medicine and western medicine such as looking, smelling, asking, cutting and the like of a clinical patient, preprocessing the collected information more comprehensively and better, and constructing a knee osteoarthritis characteristic data set;
s2, training the encoder to learn the risk characteristics of the knee osteoarthritis by utilizing the characteristic dimension reduction characteristics of the self-encoder, so as to achieve the purpose of characteristic selection;
s3, performing feature ordering on the knee osteoarthritis feature data set by using 6 existing feature selection algorithms, and reserving physical meanings of risk factors;
s4, training a model by using an SVM classifier, predicting classification performance according to the number of the ordered feature subsets from less to more, respectively reserving the feature subset with the best classification performance in 6 algorithms, and comparing the effect of a self-encoder with a traditional feature selection method on the selection of risk features;
s5, taking out the features which appear at high frequency in the 6 algorithm results, so that the finally obtained features have better generalization and significance.
Further, the step S1 specifically includes:
s11, extracting information of a clinical patient to be diagnosed from the early-stage identification table of the knee osteoarthritis, recording the information into case data, and writing a related data dictionary. For the case that the same field has different symptoms, the symptoms are separately classified as a feature, and 0 is no and 1 is yes;
s12, constructing a knee osteoarthritis data set, and marking the disease state of the knee osteoarthritis of a patient in clinical treatment, wherein 0 is not diseased, and 1 is diseased;
s13, removing rows and columns with too large proportion of the number of the vacancies;
s14, carrying out normalization processing on continuous data by using the maximum normalization, reducing the difference between the data, accelerating the training speed, deleting useless features, and only retaining the features with analysis significance;
s15, splitting discrete features with multiple states so as to obtain the influence degree of the same features on osteoarthropathy.
Further, step S2 specifically includes:
s21, constructing and compiling a traditional three-layer self-coding model, and setting the number of neurons of an intermediate hidden layer; in general, a conventional self-encoder mainly includes an encoding stage and a decoding stage, and the structure is symmetrical, the purpose of the self-encoder is to reconstruct input data at an output layer, and in the most perfect case, an output signal y is completely consistent with an input signal x, and according to the structure shown in fig. 1, the encoding and decoding processes of the conventional self-encoder can be described as:
the coding process comprises the following steps: h is a 1 =σ e (W 1 x+b 1 ) (1);
DecodingThe process comprises the following steps: y=σ d (W 2 h 1 +b 2 ) (2);
Wherein W is 1 ,b 1 For coding weights and offsets, W 2 ,b 2 For decoding weights and offsets, σ e As an activation function of nonlinear transformation, sigmoid, tanh, relu, sigma and the like are commonly used at present d May be the same activation function as in the encoding process, so the loss function from the encoder is to minimize the error between y and x:
Figure GDA0004136536570000041
/>
s22, setting five-fold cross validation for a data set, and training an unsupervised self-coding model by taking a training set as an output signal y and an input signal x at the same time;
s23, compressing the features in the training set by using the trained encoder, and evaluating the effect of feature selection by using an SVM classifier. The encoding stage may be regarded as a deterministic mapping to convert the input signal into a hidden layer representation, while the decoding stage is to remap the hidden layer representation as much as possible into the input signal, and the loss function may choose a cross entropy in addition to the mean square error given by equation (3), specifically expressed as:
Figure GDA0004136536570000042
s24, repeating the steps S21-S23 to respectively obtain SVM classification performances with different hidden layers and preserve the evaluation index with the best performance.
Further, the step S3 specifically includes:
s31, respectively using the existing feature selection algorithm to sort the importance of the features in the knee osteoarthritis feature data set, and storing the subscripts of the features in the original data set in descending order of importance.
Further, the step S4 specifically includes:
s41, selecting the first 1-N features (N is the maximum feature number of the knee osteoarthritis dataset) subjected to feature sequencing in the knee osteoarthritis dataset, and taking out corresponding features in the original dataset according to the subscript to serve as a new dataset for training;
s42, verifying an algorithm by adopting a five-fold cross verification method: the processed normalized data are processed according to 4:1, dividing the ratio into training data and test data;
s43, training a model by using an SVM classifier, and predicting the disease state of a patient. Each feature selection algorithm needs to be trained N times, and the number M of the feature subsets with the best performance and the accuracy index thereof are saved in N classification tests. The first M features are taken out of the N features as the final selection result of the feature selection algorithm.
Further, the step S5 specifically includes: and combining the prediction results of the feature subsets of the various algorithms, and taking out the features which occur at high frequency as final results, so that the obtained features have better robustness and significance.
The invention has the technical effects and advantages that:
the invention can screen out the risk factors of the degenerative knee osteoarthritis, so that a more accurate and reliable disease identification model is constructed, and scientific reference is provided for the diagnosis of the knee osteoarthritis by traditional Chinese medicine.
Compared with the prior art, the invention can integrate the traditional Chinese medicine and western medicine characteristic information of the clinical patient, thereby obtaining more accurate and reliable characteristic analysis results.
In a word, the invention can provide accurate and reliable risk factors of the degenerative knee osteoarthritis and provide scientific and reliable basis for traditional Chinese medicine diagnosis.
Drawings
FIG. 1 is a conventional self-encoder network architecture;
FIG. 2 is a schematic diagram of a prior art feature selection using a machine learning method;
FIG. 3 is a flow chart of the present invention;
FIG. 4 is a traditional Chinese medical auxiliary diagnostic tool for osteoarthropathy;
fig. 5 is a two-dimensional code diagram of an applet.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-5 of the specification, the invention provides a method for searching for the significant characteristics of degenerative knee osteoarthritis by using machine learning, which comprises the following steps:
s1, collecting 5025 cases of information such as blood, CT and the like of Western medicine of clinical patients, information such as looking, smelling, asking, cutting and the like of traditional Chinese medicine, preprocessing the collected information, and constructing a knee osteoarthritis characteristic data set, wherein 254 characteristic numbers are obtained in total.
S11, extracting information of a clinical patient to be diagnosed from the early-stage identification table of the knee osteoarthritis, recording the information into case data, and writing a related data dictionary.
S12, constructing a knee osteoarthritis data set, and marking the knee osteoarthritis disease state of a patient in clinical treatment, wherein 0 is not diseased and 1 is diseased.
S13, removing rows and columns with a large proportion of the number of the vacancies (30% of the vacancies in the example).
S14, carrying out normalization processing on continuous data by using the maximum normalization, reducing the difference between the data, accelerating the training speed, deleting useless features, and only retaining the features with analysis significance.
S15, splitting discrete features with multiple states so as to obtain the influence degree of the same features on osteoarthropathy. After treatment, 3338 cases were obtained, and a total of 178 cases were characterized by the knee osteoarthritis characterization data set.
S2, training the encoder to learn the risk characteristics of the knee osteoarthritis by utilizing the characteristic dimension reduction characteristics of the self-encoder, so as to achieve the purpose of characteristic selection.
S21, constructing and compiling a traditional three-layer self-coding model, and setting the size of neurons of an intermediate hidden layer; in general, a conventional self-encoder mainly includes an encoding stage and a decoding stage, and the structure is symmetrical, the purpose of the self-encoder is to reconstruct input data at an output layer, and in the most perfect case, an output signal y is completely consistent with an input signal x, and according to the structure shown in fig. 1, the encoding and decoding processes of the conventional self-encoder can be described as:
the coding process comprises the following steps: h is a 1 =σ e (W 1 x+b 1 ) (1)
The decoding process comprises the following steps: y=σ d (W 2 h 1 +b 2 ) (2)
Wherein W is 1 ,b 1 For coding weights and offsets, W 2 ,b 2 For decoding weights and offsets, σ e As an activation function of nonlinear transformation, sigmoid, tanh, relu, sigma and the like are commonly used at present d The same activation function as in the encoding process can be used, in this example, both the encoding and decoding activation functions use sigmoid, and the model compilation optimizer uses RMSProp.
S22, setting five-fold cross validation for the data set, and training an unsupervised self-coding model by taking the training set as an output signal y and an input signal x at the same time.
S23, compressing the features in the training set by using the trained encoder, and evaluating the effect of feature selection by using an SVM classifier. The encoding stage may be regarded as a deterministic mapping to convert the input signal into hidden layer representations, while the decoding stage is to remap the hidden layer representations as much as possible into the input signal, the loss function of this example selecting cross entropy, expressed in particular as:
Figure GDA0004136536570000071
s24, repeating the steps S21-S23 to respectively obtain SVM classification performances with different hidden layers and preserve the evaluation index with the best performance.
S3, ordering the features from the knee osteoarthritis dataset by using 6 existing feature selection algorithms respectively, as shown in table 1;
Figure GDA0004136536570000072
s31, respectively using the existing feature selection algorithm, sorting the features in the knee osteoarthritis feature data set in a descending order of importance, and storing subscripts thereof in an array as indexes.
S4, evaluating the feature selection effect by combining with the SVM classifier, and respectively reserving feature subsets with the best classification results in the 6 algorithms.
S41, traversing and selecting the first 1-178 features (178 is the maximum feature number of the knee osteoarthritis dataset), and taking out the corresponding features in the original dataset according to the subscript as a new dataset for training.
S42, verifying an algorithm by adopting a five-fold cross verification method: the processed normalized data are processed according to 4:1 is divided into training data and test data.
S43, predicting whether a patient is ill by using an SVM classifier, and storing the quantity X of the primary feature subsets with the best performance and each evaluation index thereof in 178 classification training, wherein X can be expressed as the optimal dimension under the feature selection algorithm. The first X features are taken out of 178 features as the final selection result of the feature selection algorithm. And the classification performance is evaluated by adopting the 6 indexes.
S44, repeating the steps S31-S33 to obtain feature subset results based on different feature selection algorithms, and comparing the results of the proposed algorithms with the self-encoder, wherein the results are shown in table 2. As can be seen from table 2, the self-encoder can obtain the optimal result on each evaluation index, the feature selection effect is more ideal, and the 6 conventional feature selection algorithms can select risk factors with actual physical meanings although the classification performance is poor:
table 2: performance index comparison of self-encoder and 6 feature selection algorithms under SVM classifier
Figure GDA0004136536570000081
S5, combining prediction results of feature subsets of various algorithms, so that the finally obtained features have better generalization and significance.
And combining the prediction results of the feature subsets of the various algorithms, taking out the features which occur frequently as final results, and removing normal factors of human bodies and some disease-independent factors, such as K & L grade 0, myocardial infarction-free history, genetic disease-free family history and the like, from the feature subsets, so that the obtained risk factors have better robustness and significance. The final extracted risk factors for degenerative knee osteoarthritis are shown in table 3. As can be seen from tables 3 and 4, after a round of feature selection and extraction, 11 risk factors in the meaning of traditional Chinese medicine are finally obtained, wherein 5 features are in accordance with the differentiation of traditional Chinese medicine, and the method is more explanatory. Therefore, in theory, the risk factors obtained by the machine learning method can be used as scientific references for diagnosing knee osteoarthritis in traditional Chinese medicine, a more accurate and reliable disease identification model is established, and the scientificity and practicability of diagnosis of the traditional Chinese medicine are verified on the side.
Table 3: risk factors for degenerative knee osteoarthritis
Figure GDA0004136536570000091
Table 4: traditional Chinese medicine differentiation type of knee osteoarthritis
Figure GDA0004136536570000092
Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (4)

1. A method for searching for the significant characteristics of degenerative knee osteoarthritis by using machine learning, which is characterized by comprising the following steps: the method comprises the following specific steps:
s1, acquiring traditional Chinese medicine and western medicine information of a clinical patient to be diagnosed, preprocessing the collected information, and constructing a knee osteoarthritis characteristic data set;
s2, training the encoder to learn the risk characteristics of the knee osteoarthritis by utilizing the characteristic dimension reduction characteristics of the self-encoder;
the step S2 specifically comprises the following steps:
s21, constructing and compiling a traditional three-layer self-coding model, and setting the number of neurons of an intermediate hidden layer; the conventional self-encoder comprises an encoding stage and a decoding stage, and the structure is symmetrical, and the encoding and decoding processes of the conventional self-encoder are described as follows:
the coding process comprises the following steps: h is a 1 =σ e (W 1 x+b 1 )(1);
The decoding process comprises the following steps: y=σ d (W 2 h 1 +b 2 )(2);
Wherein W is 1 ,b 1 For coding weights and offsets, W 2 ,b 2 For decoding weights and offsets, σ e Sigma, an activation function for nonlinear transformation d Is the same activation function as in the encoding process:
Figure FDA0004185856230000011
s22, setting five-fold cross validation for a data set, and training an unsupervised self-coding model by taking a training set as an output signal y and an input signal x at the same time;
s23, compressing the characteristics in the training set by using a trained encoder, and evaluating the effect of characteristic selection by using an SVM classifier; the encoding stage can be seen as a deterministic mapping that converts the input signal into a hidden layer representation, while the decoding stage remaps the hidden layer representation into the input signal, the loss function can also select the cross entropy, in addition to the mean square error given by equation (3), expressed in particular as:
Figure FDA0004185856230000012
s24, repeating the steps S21-S23 to respectively obtain SVM classification performances with different hidden layers, and storing performance evaluation indexes;
s3, performing feature ordering on the knee osteoarthritis feature data set by using 6 existing feature selection algorithms, and reserving physical meanings of risk factors;
s4, training a model by using an SVM classifier, predicting classification performance from less to more according to the number of the ordered feature subsets, respectively reserving feature subsets with good classification performance in 6 algorithms, and comparing the effect of a self-encoder and a traditional feature selection method on the selection of risk features;
the step S4 specifically comprises the following steps:
s41, selecting the first 1-N features subjected to feature sequencing in the knee osteoarthritis dataset, and taking out corresponding features in the original dataset according to subscripts to serve as a new dataset for training;
s42, verifying an algorithm by adopting a five-fold cross verification method: the processed normalized data are processed according to 4:1, dividing the ratio into training data and test data;
s43, training a model by using an SVM classifier, predicting the disease state of a patient, training each feature selection algorithm for N times, storing the number M of primary feature subsets with the best performance and the precision index thereof in N times of classification tests, taking out the first M features from the N features as the final selection result of the feature selection algorithm, and evaluating the classification performance by adopting the following five indexes:
A. accuracy rate: in all samples, the ratio of the number of correctly classified samples to the total number of samples, i.e. the probability of correct prediction;
B. accuracy rate: how many of the samples predicted to be positive are true positive samples is for the predicted outcome;
C. recall rate: how many positive examples in the sample are predicted to be correct is for the original sample;
D. balance F score: comprehensively considering the reconciliation values of Precision and Recall;
auc: the area under the ROC curve is defined, and the AUC is an evaluation index for measuring the merits of the two classification models and represents the probability that the predicted positive case is arranged in front of the negative case;
s44, repeating the steps S41-S43 to respectively obtain feature subset results based on different feature selection algorithms, and comparing the effect of the self-encoder and the traditional feature selection method on the selection of risk features;
s5, taking out the characteristics of high frequency occurrence in the 6 algorithm results.
2. A method for finding a salient feature of degenerative knee osteoarthritis using machine learning as claimed in claim 1, wherein: the step S1 specifically comprises the following steps:
s11, extracting information of a clinical patient to be diagnosed from an early-stage identification table of knee osteoarthritis, recording the information into case data, writing a related data dictionary, and independently classifying the symptoms into a feature when the same field has different symptoms, wherein 0 is NO, and 1 is yes;
s12, constructing a knee osteoarthritis data set, and marking the disease state of the knee osteoarthritis of a patient in clinical treatment, wherein 0 is not diseased, and 1 is diseased;
s13, removing rows and columns with large blank quantity proportion;
s14, carrying out normalization processing on the continuous data by using the maximum normalization;
s15, splitting discrete features with multiple states to obtain the influence degree of the same features on osteoarthropathy.
3. A method for finding a salient feature of degenerative knee osteoarthritis using machine learning as claimed in claim 1, wherein: the step S3 specifically comprises the following steps: and (3) respectively using the existing feature selection algorithm to sort the importance of the features in the knee osteoarthritis feature data set, and storing the subscripts of the features in the original data set in descending order of importance.
4. A method for finding a salient feature of degenerative knee osteoarthritis using machine learning as claimed in claim 1, wherein: the step S5 specifically comprises the following steps: and combining the prediction results of the feature subsets of the various algorithms, and taking out the features which occur frequently as final results.
CN202210445525.0A 2022-04-26 2022-04-26 Method for searching for obvious characteristic of degenerative knee osteoarthritis by using machine learning Active CN114999628B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210445525.0A CN114999628B (en) 2022-04-26 2022-04-26 Method for searching for obvious characteristic of degenerative knee osteoarthritis by using machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210445525.0A CN114999628B (en) 2022-04-26 2022-04-26 Method for searching for obvious characteristic of degenerative knee osteoarthritis by using machine learning

Publications (2)

Publication Number Publication Date
CN114999628A CN114999628A (en) 2022-09-02
CN114999628B true CN114999628B (en) 2023-06-02

Family

ID=83024713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210445525.0A Active CN114999628B (en) 2022-04-26 2022-04-26 Method for searching for obvious characteristic of degenerative knee osteoarthritis by using machine learning

Country Status (1)

Country Link
CN (1) CN114999628B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115458162A (en) * 2022-11-10 2022-12-09 四川京炜数字科技有限公司 Bone-related disease treatment plan prediction system and method based on machine learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108492873A (en) * 2018-03-13 2018-09-04 山东大学 A kind of knowledge migration learning method for auxiliary diagnosis Alzheimer's disease
US11145416B1 (en) * 2020-04-09 2021-10-12 Tempus Labs, Inc. Predicting likelihood and site of metastasis from patient records

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10713590B2 (en) * 2015-04-30 2020-07-14 Biodesix, Inc. Bagged filtering method for selection and deselection of features for classification
CN107273925B (en) * 2017-06-12 2020-10-09 太原理工大学 Lung parenchyma CT image processing device based on local receptive field and semi-supervised depth self-coding
RU2678716C1 (en) * 2017-12-11 2019-01-31 Общество с ограниченной ответственностью "Аби Продакшн" Use of autoencoders for learning text classifiers in natural language
CN108899086A (en) * 2018-06-11 2018-11-27 浙江大学 A kind of system that osteoarthritis hypotype is diagnosed by blood sample based on machine learning
CN114023444A (en) * 2021-11-22 2022-02-08 广东工业大学 Method, system, computer equipment and medium for predicting osteoarthritis condition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108492873A (en) * 2018-03-13 2018-09-04 山东大学 A kind of knowledge migration learning method for auxiliary diagnosis Alzheimer's disease
US11145416B1 (en) * 2020-04-09 2021-10-12 Tempus Labs, Inc. Predicting likelihood and site of metastasis from patient records

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于小波分解卷积神经网络的病理图像分类;丁偕;崔浩阳;张敬谊;计算机系统应用;30(009);全文 *

Also Published As

Publication number Publication date
CN114999628A (en) 2022-09-02

Similar Documents

Publication Publication Date Title
CN109785976B (en) Gout disease stage prediction system based on Soft-Voting
CN109036553B (en) Disease prediction method based on automatic extraction of medical expert knowledge
CN110349676B (en) Time-series physiological data classification method and device, storage medium and processor
CN108459955B (en) Software defect prediction method based on deep self-coding network
CN112614538A (en) Antibacterial peptide prediction method and device based on protein pre-training characterization learning
CN112712118A (en) Medical text data oriented filtering method and system
CN110633725A (en) Method and device for training classification model and classification method and device
CN110940523A (en) Unsupervised domain adaptive fault diagnosis method
CN114970605A (en) Multi-mode feature fusion neural network refrigeration equipment fault diagnosis method
CN111968741A (en) Diabetes complication high-risk early warning system based on deep learning and integrated learning
CN113052271B (en) Biological fermentation data prediction method based on deep neural network
CN114530249A (en) Disease risk assessment model construction method based on intestinal microorganisms and application
CN111899869A (en) Depression patient identification system and identification method thereof
CN114093515A (en) Age prediction method based on intestinal flora prediction model ensemble learning
CN108154924A (en) Alzheimer's disease tagsort method and system based on support vector machines
CN114999628B (en) Method for searching for obvious characteristic of degenerative knee osteoarthritis by using machine learning
CN113674862A (en) Acute renal function injury onset prediction method based on machine learning
CN116259415A (en) Patient medicine taking compliance prediction method based on machine learning
CN113643756A (en) Protein interaction site prediction method based on deep learning
Hidayat et al. Comparison of K-Nearest Neighbor and Decision Tree Methods using Principal Component Analysis Technique in Heart Disease Classification
CN117195027A (en) Cluster weighted clustering integration method based on member selection
Dadgar et al. A hybrid method of feature selection and neural network with genetic algorithm to predict diabetes
CN110265151B (en) Learning method based on heterogeneous temporal data in EHR
CN112465054A (en) Multivariate time series data classification method based on FCN
CN111400685A (en) Security identity authentication method adopting competition matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant