CN111326260A - Medical analysis method, device, equipment and storage medium - Google Patents

Medical analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN111326260A
CN111326260A CN202010022726.0A CN202010022726A CN111326260A CN 111326260 A CN111326260 A CN 111326260A CN 202010022726 A CN202010022726 A CN 202010022726A CN 111326260 A CN111326260 A CN 111326260A
Authority
CN
China
Prior art keywords
biomarker
potential
biomarkers
medical analysis
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010022726.0A
Other languages
Chinese (zh)
Inventor
王君兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Applied Protein Technology Co Ltd
Original Assignee
Shanghai Applied Protein Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Applied Protein Technology Co Ltd filed Critical Shanghai Applied Protein Technology Co Ltd
Priority to CN202010022726.0A priority Critical patent/CN111326260A/en
Publication of CN111326260A publication Critical patent/CN111326260A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Abstract

The invention is applicable to the technical field of computers, and particularly relates to a medical analysis method, a device, equipment and a storage medium, wherein the medical analysis method comprises the following steps: determining and outputting a medical analysis result according to a biomarker classification model generated based on ensemble learning algorithm training and the acquired detection data of the biomarkers in the biomarker classification model; wherein the comprehensive evaluation score of the biomarker classification model is highest; the comprehensive evaluation at least comprises two of accuracy evaluation, specificity evaluation and sensitivity evaluation. According to the medical analysis method provided by the embodiment of the invention, the biomarker classification model generated by the integrated learning algorithm training is the biomarker integrated model with the highest comprehensive evaluation score generated by the integrated training under various feature selection algorithms and classification algorithms in advance, and the accuracy of an analysis result obtained by performing medical analysis by using the biomarker model is higher.

Description

Medical analysis method, device, equipment and storage medium
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a medical analysis method, a medical analysis device, medical analysis equipment and a storage medium.
Background
Biomarkers can be defined as "features objectively measured and evaluated as pharmacological response indicators of normal biological processes, pathogenic processes or therapeutic interventions", and the screening of biomarkers with good classification effects from a large amount of experimental data and the training of corresponding medical analysis models can greatly reduce the subsequent medical analysis processes. Among them, screening biomarkers with good classification effect is generally called feature selection.
However, the existing feature selection methods generally use a single machine learning algorithm to perform feature screening, and in fact, the repeatability of the selected biomarkers with good classification effect is not necessarily high, and the requirement on the repeatability of the biomarkers means that the selected biomarkers should always show good performance, so as to distinguish cases from controls in different studies. That is, the stability of the analysis model generated by training the biomarkers selected by using a single machine learning algorithm for feature selection is poor, and the practical application capability is low.
Therefore, the analysis model generated by training the biomarker screened by the existing feature selection method has the technical problems of poor stability and low practical application capability.
Disclosure of Invention
The embodiment of the invention aims to provide a medical analysis method, and aims to solve the technical problems of poor stability and low practical application capability of an analysis model generated by training a biomarker screened by using the conventional feature selection method.
The embodiment of the invention is realized in such a way that an analysis method comprises the following steps:
determining and outputting a medical analysis result according to a biomarker classification model generated based on ensemble learning algorithm training and the acquired detection data of the biomarkers in the biomarker classification model; wherein
The comprehensive evaluation score of the biomarker classification model generated based on the ensemble learning algorithm training is highest; the comprehensive evaluation at least comprises two of accuracy evaluation, specificity evaluation and sensitivity evaluation.
It is another object of an embodiment of the present invention to provide an integrated model-based medical analysis apparatus, including:
the analysis module is used for determining and outputting a medical analysis result according to a biomarker classification model generated based on ensemble learning algorithm training and the acquired detection data of the biomarkers in the biomarker classification model; wherein
The comprehensive evaluation score of the biomarker classification model generated based on the ensemble learning algorithm training is highest; the comprehensive evaluation at least comprises two of accuracy evaluation, specificity evaluation and sensitivity evaluation.
It is a further object of embodiments of the invention to provide a computer arrangement comprising a memory and a processor, the memory having stored therein a computer program, which, when executed by the processor, causes the processor to perform the steps of the medical analysis method as described above.
It is a further object of an embodiment of the present invention to provide a computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which, when executed by a processor, causes the processor to perform the steps of the medical analysis method as described above.
The medical analysis method provided by the embodiment of the invention determines and outputs a medical analysis result according to the biomarker classification model generated by training based on the ensemble learning algorithm and the acquired detection data of the biomarkers in the biomarker classification model, since the biomarker classification model generated based on the ensemble learning algorithm training is the biomarker classification model with the highest comprehensive evaluation score generated under the ensemble training in advance under a plurality of classification algorithms, therefore, the accuracy and stability of the medical analysis result directly analyzed and determined according to the classification model can be kept at a higher level, compared with the existing biomarker classification model generated by training by other machine learning methods, the efficiency and the effect of model analysis are remarkably improved, namely, the medical analysis method with high stability and good practical application effect is provided.
Drawings
FIG. 1 is a flow chart illustrating steps of a medical analysis method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating the steps of training a classification model for generating biomarkers according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating another process of training a classification model for generating biomarkers according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating still another procedure for training a classification model for generating biomarkers according to an embodiment of the present invention;
FIG. 5 is a flowchart of a step of obtaining a set of potential biomarkers in advance according to an embodiment of the present invention;
FIG. 6 is a flowchart of one step of calculating the importance scores of potential biomarkers according to an embodiment of the present invention;
FIG. 7 is a flowchart illustrating the steps of calculating a composite valuation score according to an embodiment of the invention;
FIG. 8 is a schematic structural diagram of a medical analysis apparatus according to an embodiment of the present invention;
fig. 9 is a schematic diagram of an internal structure of a computer device for executing a medical analysis method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Fig. 1 is a flowchart illustrating steps of a model-integrated medical analysis method according to an embodiment of the present invention, which specifically includes the following steps
Step S102, determining and outputting a medical analysis result according to the biomarker classification model generated by training based on the ensemble learning algorithm and the acquired detection data of the biomarkers in the biomarker classification model.
In the embodiment of the invention, the comprehensive evaluation score of the biomarker classification model generated based on the ensemble learning algorithm training is the highest.
In the embodiment of the present invention, the detection data of the biomarker mainly refers to the detected content of substances, such as genes, proteins, metabolites, which are commonly used as biomarkers in medicine, and the present invention does not limit the type of the specific biomarker, and all groups that can be used as biomarkers should be within the scope of the present invention.
In the embodiment of the invention, a plurality of candidate biomarkers which obviously contribute to the classification effect under different algorithms are screened from a plurality of potential biomarkers by using an ensemble learning algorithm, the screening is carried out by using the AUC (area under the Curve of ROC, which is an index of an evaluation model) in the screening process, and then the screened candidate biomarkers are trained by using a preset classification algorithm to generate a biomarker classification model with the highest comprehensive evaluation score such as the AUC, the accuracy, the specificity and the sensitivity. For the training of the biomarker classification model, please refer to the explanation of fig. 2 to fig. 7.
The medical analysis method provided by the embodiment of the invention determines and outputs a medical analysis result according to the biomarker classification model generated by training based on the ensemble learning algorithm and the acquired detection data of the biomarkers in the biomarker classification model, since the biomarker classification model generated based on the ensemble learning algorithm training is the biomarker classification model with the highest comprehensive evaluation score generated under the ensemble training in advance under a plurality of classification algorithms, therefore, the accuracy and stability of the medical analysis result directly analyzed and determined according to the classification model can be kept at a higher level, compared with the existing biomarker classification model generated by training by other machine learning methods, the efficiency and the effect of model analysis are remarkably improved, namely, the medical analysis method with high stability and good practical application effect is provided.
As shown in fig. 2, a flowchart of the steps for training and generating the biomarker integration model provided in the embodiment of the present invention specifically includes the following steps:
step S202, determining a plurality of potential biomarker sub-sets respectively corresponding to a plurality of classification algorithms according to sample data, the plurality of classification algorithms and a pre-obtained potential biomarker set; the set of potential biomarkers includes a plurality of potential biomarkers.
In the embodiment of the present invention, the multiple classification algorithms may be, for example, random forest algorithms, and certainly may be support vector machines, logistic regression, and other common classification algorithms, the present invention does not limit the specific classification algorithms, and all steps of utilizing multiple algorithms to perform feature screening on multiple potential biomarker subsets respectively corresponding to the classification algorithms should be within the scope of the present invention as claimed.
Step S204, according to a preset scoring rule and the plurality of potential biomarker subsets, determining the importance degree score of each potential biomarker, and screening out a plurality of candidate biomarkers.
In an embodiment of the invention, each subset of potential biomarkers comprises a plurality of different potential biomarkers, wherein the potential biomarkers can be scored by scoring, wherein for a potential biomarker present in a subset of potential biomarkers the potential biomarker can be considered to contribute to the potential biomarker, and it is clear that the more significant the potential biomarkers present in the subset of biomarkers are. One specific step of determining the importance score is illustrated in FIG. 6 and its description.
In the embodiment of the present invention, a certain number of potential biomarkers with higher importance scores are determined as candidate biomarkers, or potential biomarkers with importance scores higher than a certain threshold are determined as candidate biomarkers.
And S206, respectively calculating comprehensive evaluation scores of the candidate biomarker screening models under the various classification algorithms according to the sample data.
In the embodiment of the present invention, the comprehensive evaluation includes several types of AUC, accuracy, specificity, and sensitivity, and usually, the john index is used as an index for measuring the comprehensive evaluation, and may be selected according to different requirements for sensitivity and specificity, and high sensitivity is often applied to: diagnosing serious diseases but good curative effect to prevent missed diagnosis; the disease may be caused by a variety of diseases, and is used for excluding the possibility of a certain disease; general survey or regular health examination for screening a disease. High specificity is commonly used: diagnosing patients with a high probability of having a certain disease so as to confirm diagnosis; serious diseases but poor curative effect and prognosis, so as to prevent misdiagnosis; the radical cure method of the disease needs to be diagnosed when the damage is large, so as to avoid causing unnecessary damage to the patient.
And S208, training the candidate biomarker screening model with the highest comprehensive evaluation score according to a preset classification algorithm to generate a biomarker classification model.
Compared with the process of training the biomarker models in the prior art, the process of training the generation of the biomarker integrated model provided by the embodiment of the invention determines the biomarker models under different algorithms respectively, screens out the biomarkers with contribution degrees under multiple algorithms simultaneously according to the biomarkers contained in the biomarker models under different algorithms, and determines the biomarkers as candidate biomarkers, and further screens out the candidate biomarker integrated model with the highest comprehensive evaluation score formed by a plurality of candidate biomarkers from the candidate biomarkers, wherein the candidate biomarker integrated model with the highest comprehensive evaluation score is the biomarker integrated model shown in fig. 1.
Fig. 3 is a flowchart of another procedure for training generation of a biomarker integration model according to an embodiment of the present invention, which is described in detail below.
In the embodiment of the present invention, the difference from the flowchart of the step of training the generated biomarker integration model shown in fig. 2 is that, before the step S202, the method further includes:
step S302, filling the missing sample data to generate complete sample data.
In the embodiment of the invention, because a large amount of data needs to be acquired during training, a sample with data missing easily occurs in the actual training process, the missing value is usually filled by using 1/2 of the minimum value in the corresponding data of other samples, so that the data is complete, and of course, other filling methods can be adopted, such as K-nearest neighbor interpolation, hot card interpolation and multiple interpolation.
In the embodiment of the invention, the actual sample data is filled, so that the experimental data is enriched, and the accuracy of the biomarker integrated model generated by training is further improved.
Step S304, carrying out logarithmic transformation on the complete sample data to generate normal distribution sample data.
In the embodiment of the invention, the sample data is subjected to log10 conversion, so that the data distribution is approximate to normal distribution, and the subsequent analysis is convenient.
In the embodiment of the present invention, other processing methods such as square root arcsine transformation and Z-score normalization may be used for the pre-processing of the sample data.
The step S202 specifically includes:
step S306, determining a plurality of potential biomarker subsets respectively corresponding to the plurality of classification algorithms according to the normal distribution sample data, the plurality of classification algorithms and the pre-obtained potential biomarker set.
According to the other method for training and generating the biomarker integrated model, provided by the embodiment of the invention, the data can be effectively cleaned by preprocessing the sample data in advance, so that the effect of the model generated by training is further improved.
Fig. 4 is a flowchart illustrating a further procedure for training generation of a biomarker integration model according to an embodiment of the present invention, which is described in detail below.
In the embodiment of the present invention, the difference from the flowchart of the step of training the generated biomarker integration model shown in fig. 2 is that after the step S206, the method further includes:
step S206, according to the test data, calculating the comprehensive evaluation score of the candidate biomarker integrated model with the highest comprehensive evaluation score under the multiple classification algorithms.
In the embodiment of the present invention, generally, in order to verify the effect of the candidate biomarker integrated model generated by training, the effect of the candidate biomarker integrated model needs to be tested, wherein a specific calculation process of calculating the comprehensive evaluation score of the candidate biomarker integrated model with the highest comprehensive evaluation score under the multiple classification algorithms is similar to the training process, and when the comprehensive evaluation score of the candidate biomarker integrated model tested by using the test data under the multiple classification algorithms is close to the score in the training process, it is considered that the trained biomarker integrated model has relatively high stability.
According to the medical analysis method based on the integrated model, provided by the embodiment of the invention, the stability of the integrated model of the trained biomarker can be further ensured by further testing the integrated model of the trained biomarker by using the test data.
As shown in fig. 5, a flowchart of steps for obtaining a set of potential biomarkers in advance is provided, which specifically includes the following steps:
step S502, carrying out difference analysis on the sample data according to a statistical verification method, and determining a potential biomarker set.
In the embodiment of the invention, the statistical verification method is a data analysis method for performing hypothesis analysis on sample data by using a data rule, wherein a large amount of data in the sample data is irrelevant useless data, the large amount of irrelevant useless data can be effectively screened and eliminated by the statistical verification method, the remaining biomarkers can be regarded as potential biomarkers possibly associated with medical analysis results, and the specific relevance size needs to be further determined by subsequent steps.
The embodiment of the invention provides a step of obtaining a potential biomarker set in advance, and the difference analysis is carried out on sample data by using a statistical check method, so that the characteristics without significant difference can be screened out in a comparison group, the training difficulty in the subsequent training process is effectively reduced, the training efficiency is improved, and the influence of irrelevant statistical variables on effective statistical variables (potential biomarkers) is reduced.
As shown in fig. 6, a flowchart of the steps for calculating the importance scores of the potential biomarkers provided by the embodiment of the present invention specifically includes the following steps:
step S602, determining a selected frequency of each potential biomarker in the plurality of potential biomarker subsets.
In the embodiment of the invention, the selection frequency of each potential biomarker can be obtained by counting the selection times of each potential biomarker in the plurality of potential biomarker sub-sets and dividing the selection times by the number of the potential biomarker sub-sets, and when the selection frequency of the potential biomarker is higher, the potential biomarker has a better classification effect under various classification algorithms.
Step S604, determining a contribution score of each potential biomarker in the plurality of subsets of potential biomarkers.
In the embodiment of the present invention, in different subsets of potential markers determined by using different feature selection algorithms, weighting coefficients are assigned to the respective potential biomarkers based on the algorithms themselves, however, in consideration of different dimensions of assigning the weighting coefficients to the different algorithms, the weighting coefficients of the potential biomarkers in the respective subsets of potential markers are ranked, and the contribution score of each potential biomarker is re-determined according to the ranking result.
Step S606, calculating importance scores of the potential biomarkers according to the selected frequencies of the potential biomarkers, the contribution scores of the potential biomarkers in the plurality of potential biomarker subsets, and the preset weights of the potential biomarker subsets.
In the embodiment of the present invention, considering that the screened features of the algorithms have different practical application effects, weights may be further preset for potential biomarker subsets corresponding to different algorithms, for example, the weight for feature selection based on correlation may be preset to be 3, the weight for feature selection using a base model with a penalty term is preset to be 2, and the weight for recursive feature elimination of the support vector machine is preset to be 1, and the importance score of each potential biomarker may be determined by combining the weights of the potential biomarker subsets corresponding to the respective algorithms, that is, the algorithms, the selection frequencies of all the potential biomarkers, and the contribution scores of the potential biomarkers in the potential biomarker subsets.
In the embodiment of the present invention, it is understood that the selection frequency of each potential biomarker, the algorithm, that is, the weight of the subset of potential biomarkers and the contribution score of each potential biomarker in the subset of potential biomarkers belong to three different indicators for calculating the importance score of the potential biomarker, in fact, the three different indicators may be determined and generated simultaneously, or may be sequentially generated in any order, and the order of the determination process of the three indicators is not specifically limited in the present invention, that is, the order of the foregoing step S602 and step S604 is not specifically limited.
As shown in fig. 7, a flowchart of the step of calculating the comprehensive evaluation score according to the embodiment of the present invention specifically includes the following steps:
step S702, the candidate biomarkers are ranked according to the importance scores of the candidate biomarkers.
In embodiments of the invention, the higher the importance score, the higher the ranking in the candidate biomarker ranking.
Step S704, according to the sample data and the numerical value N, calculating comprehensive evaluation scores of the candidate biomarker integrated model composed of the first N candidate biomarkers under the multiple classification algorithms respectively.
In the embodiment of the invention, the value of N is added with 1 after each calculation, and circulation is performed.
In the embodiment of the present invention, specifically, the first 1 candidate biomarker, the first 2 candidate biomarkers, the first 3 candidate biomarkers, and the comprehensive evaluation scores of the candidate biomarker integrated model composed of all candidate biomarkers under multiple classification algorithms are calculated, and the candidate biomarker integrated model with the highest comprehensive evaluation score is the training-generated biomarker integrated model,
Fig. 8 shows an integrated model-based medical analysis apparatus according to an embodiment of the present invention, which is described in detail below.
In an embodiment of the present invention, the integrated model-based medical analysis apparatus specifically comprises an analysis module 810.
The analysis module 810 determines and outputs a medical analysis result according to the biomarker integration model and the acquired content data of the biomarkers in the biomarker integration model.
In the embodiment of the invention, a plurality of candidate biomarkers which obviously contribute to the classification effect under different algorithms are screened from a plurality of potential biomarkers by using an ensemble learning algorithm, the screening is carried out by using the AUC (area under the Curve of ROC, which is an index of an evaluation model) in the screening process, and then the screened candidate biomarkers are trained by using a preset classification algorithm to generate a biomarker classification model with the highest comprehensive evaluation score such as the AUC, the accuracy, the specificity and the sensitivity.
The medical analysis device provided by the embodiment of the invention determines and outputs a medical analysis result according to the biomarker classification model generated by training based on the ensemble learning algorithm and the acquired detection data of the biomarkers in the biomarker classification model, since the biomarker classification model generated based on the ensemble learning algorithm training is the biomarker classification model with the highest comprehensive evaluation score generated under the ensemble training in advance under a plurality of classification algorithms, therefore, the accuracy and stability of the medical analysis result directly analyzed and determined according to the classification model can be kept at a higher level, compared with the existing biomarker classification model generated by training by other machine learning methods, the efficiency and the effect of model analysis are remarkably improved, namely, the medical analysis device with high stability and good practical application effect is provided.
FIG. 9 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device comprises a processor, a memory, a network interface, an input device and a display screen which are connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program which, when executed by the processor, causes the processor to implement the integrated model based medical analysis method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform an integrated model-based medical analysis method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the integrated model-based medical analysis apparatus provided herein may be implemented in the form of a computer program that is executable on a computer device such as that shown in fig. 9. The memory of the computer device may store various program modules constituting the integrated model-based medical analysis apparatus, such as the analysis module 810 shown in fig. 8. The respective program modules constitute computer programs that cause the processor to execute the steps in the integrated model-based medical analysis method according to the embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 9 may perform step S102 by the analysis module 810 in the integrated model-based medical analysis apparatus shown in fig. 8.
In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
determining and outputting a medical analysis result according to the biomarker integration model and the acquired content data of the biomarkers in the biomarker integration model; wherein
The comprehensive evaluation score of the biomarker integration model under various classification algorithms is highest; the comprehensive evaluation at least comprises two of accuracy evaluation, specificity evaluation and sensitivity evaluation.
In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which, when executed by a processor, causes the processor to perform the steps of: determining and outputting a medical analysis result according to the biomarker integration model and the acquired content data of the biomarkers in the biomarker integration model; wherein
The comprehensive evaluation score of the biomarker integration model under various classification algorithms is highest; the comprehensive evaluation at least comprises two of accuracy evaluation, specificity evaluation and sensitivity evaluation.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A method of medical analysis, comprising:
determining and outputting a medical analysis result according to a biomarker classification model generated based on ensemble learning algorithm training and the acquired detection data of the biomarkers in the biomarker classification model; wherein
The comprehensive evaluation score of the biomarker classification model generated based on the ensemble learning algorithm training is highest; the comprehensive evaluation at least comprises two of accuracy evaluation, specificity evaluation and sensitivity evaluation.
2. The medical analysis method according to claim 1, wherein the step of training and generating the biomarker classification model specifically comprises:
determining a plurality of potential biomarker subsets respectively corresponding to a plurality of classification algorithms according to sample data, the plurality of classification algorithms and a pre-obtained potential biomarker set; the set of potential biomarkers comprises a plurality of potential biomarkers;
determining the importance degree score of each potential biomarker according to a preset scoring rule and the plurality of potential biomarker subsets, and screening out a plurality of candidate biomarkers;
respectively calculating comprehensive evaluation scores of a plurality of candidate biomarker screening models under the various classification algorithms according to the sample data; the candidate biomarker screening model comprises a number of candidate biomarkers;
and training the candidate biomarker screening model with the highest comprehensive evaluation score according to a preset classification algorithm to generate a biomarker classification model.
3. The medical analysis method according to claim 2, wherein before the step of determining a plurality of subsets of potential biomarkers corresponding to a plurality of classification algorithms respectively from the sample data, the plurality of classification algorithms and the pre-obtained set of potential biomarkers, further comprising:
filling the missing sample data to generate complete sample data;
carrying out logarithmic conversion on the complete sample data to generate normal distribution sample data;
the step of determining, according to the sample data, the multiple classification algorithms and the pre-obtained potential biomarker sets, the multiple potential biomarker sub-sets respectively corresponding to the multiple classification algorithms specifically includes:
and determining a plurality of potential biomarker subsets respectively corresponding to the plurality of classification algorithms according to the normal distribution sample data, the plurality of classification algorithms and the pre-obtained potential biomarker sets.
4. The medical analysis method according to claim 2 or 3, wherein after the step of training the candidate biomarker screening model with the highest comprehensive evaluation score according to a preset classification algorithm to generate the ensemble learning biomarker classification model, the method further comprises:
and calculating the comprehensive evaluation score of the ensemble learning biomarker classification model according to the test data.
5. A medical analysis method according to any of claims 2 to 4, wherein the step of pre-obtaining the set of pre-obtained potential biomarkers comprises:
and carrying out difference analysis on the sample data according to a statistical verification method, and determining a potential biomarker set.
6. The medical analysis method according to claim 2, wherein the step of determining the importance score of each of the potential biomarkers according to a preset scoring rule and the plurality of subsets of potential biomarkers comprises:
determining a selected frequency of each potential biomarker in the plurality of potential biomarker subsets;
determining a contribution score for each potential biomarker in the plurality of subsets of potential biomarkers;
calculating the importance score of each potential biomarker according to the selected frequency of each potential biomarker, the contribution score of each potential biomarker in the plurality of potential biomarker subsets and the preset weight of each potential biomarker subset.
7. The medical analysis method according to claim 6, wherein the step of calculating the comprehensive evaluation scores of the candidate biomarker screening models under the classification algorithms according to the sample data comprises:
ranking the plurality of candidate biomarkers according to the importance scores of each candidate biomarker;
respectively calculating comprehensive evaluation scores of a candidate biomarker screening model consisting of the first N candidate biomarkers under the various classification algorithms according to the sample data and the numerical value N; the N is a positive integer no greater than the number of candidate biomarkers.
8. A medical analysis apparatus, comprising:
the analysis module is used for determining and outputting a medical analysis result according to a biomarker classification model generated based on ensemble learning algorithm training and the acquired detection data of the biomarkers in the biomarker classification model; wherein
The comprehensive evaluation score of the biomarker classification model generated based on the ensemble learning algorithm training is highest; the comprehensive evaluation at least comprises two of accuracy evaluation, specificity evaluation and sensitivity evaluation.
9. A computer arrangement, characterized by a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the medical analysis method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, causes the processor to carry out the steps of the medical analysis method of any one of claims 1 to 7.
CN202010022726.0A 2020-01-09 2020-01-09 Medical analysis method, device, equipment and storage medium Pending CN111326260A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010022726.0A CN111326260A (en) 2020-01-09 2020-01-09 Medical analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010022726.0A CN111326260A (en) 2020-01-09 2020-01-09 Medical analysis method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111326260A true CN111326260A (en) 2020-06-23

Family

ID=71171243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010022726.0A Pending CN111326260A (en) 2020-01-09 2020-01-09 Medical analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111326260A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022121516A1 (en) * 2020-12-08 2022-06-16 International Business Machines Corporation Biomarker selection and modeling for targeted microbiomic testing

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050223236A1 (en) * 2004-03-30 2005-10-06 Fujitsu Limited Biometric information verifying apparatus, biometric information narrowing apparatus, and computer-readable recording medium storing a biometric information narrowing program thereon
CN101901345A (en) * 2009-05-27 2010-12-01 复旦大学 Classification method of differential proteomics
CN103336914A (en) * 2013-05-31 2013-10-02 中国人民解放军国防科学技术大学 Method and device for extracting meta biomarkers
WO2015066564A1 (en) * 2013-10-31 2015-05-07 Cancer Prevention And Cure, Ltd. Methods of identification and diagnosis of lung diseases using classification systems and kits thereof
CN106650314A (en) * 2016-11-25 2017-05-10 中南大学 Method and system for predicting amino acid mutation
CN108764486A (en) * 2018-05-23 2018-11-06 哈尔滨工业大学 A kind of feature selection approach and device based on integrated study
CN108921197A (en) * 2018-06-01 2018-11-30 杭州电子科技大学 A kind of classification method based on feature selecting and Integrated Algorithm
CN109460825A (en) * 2018-10-24 2019-03-12 阿里巴巴集团控股有限公司 For constructing the Feature Selection Algorithms, device and equipment of machine learning model
CN110031624A (en) * 2019-02-28 2019-07-19 中国科学院上海高等研究院 Tumor markers detection system based on multiple neural networks classifier, method, terminal, medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050223236A1 (en) * 2004-03-30 2005-10-06 Fujitsu Limited Biometric information verifying apparatus, biometric information narrowing apparatus, and computer-readable recording medium storing a biometric information narrowing program thereon
CN101901345A (en) * 2009-05-27 2010-12-01 复旦大学 Classification method of differential proteomics
CN103336914A (en) * 2013-05-31 2013-10-02 中国人民解放军国防科学技术大学 Method and device for extracting meta biomarkers
WO2015066564A1 (en) * 2013-10-31 2015-05-07 Cancer Prevention And Cure, Ltd. Methods of identification and diagnosis of lung diseases using classification systems and kits thereof
CN106650314A (en) * 2016-11-25 2017-05-10 中南大学 Method and system for predicting amino acid mutation
CN108764486A (en) * 2018-05-23 2018-11-06 哈尔滨工业大学 A kind of feature selection approach and device based on integrated study
CN108921197A (en) * 2018-06-01 2018-11-30 杭州电子科技大学 A kind of classification method based on feature selecting and Integrated Algorithm
CN109460825A (en) * 2018-10-24 2019-03-12 阿里巴巴集团控股有限公司 For constructing the Feature Selection Algorithms, device and equipment of machine learning model
CN110031624A (en) * 2019-02-28 2019-07-19 中国科学院上海高等研究院 Tumor markers detection system based on multiple neural networks classifier, method, terminal, medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022121516A1 (en) * 2020-12-08 2022-06-16 International Business Machines Corporation Biomarker selection and modeling for targeted microbiomic testing

Similar Documents

Publication Publication Date Title
CN112419321B (en) X-ray image identification method and device, computer equipment and storage medium
CN109887562B (en) Similarity determination method, device, equipment and storage medium for electronic medical records
CN105184103A (en) Virtual medical expert based on medical record database
JP2012503812A (en) System and method for fusing clinical and image features for computer-aided diagnosis
CN111739641A (en) Gastric cancer risk prediction method and system, computer equipment and readable storage medium
CN111127467A (en) Image quantization method, computer device, and storage medium
Hwang et al. Atypical symptom cluster predicts a higher mortality in patients with first-time acute myocardial infarction
CN110472049B (en) Disease screening text classification method, computer device and readable storage medium
Abramo et al. An investigation on the skewness patterns and fractal nature of research productivity distributions at field and discipline level
TW201426620A (en) Health check path evaluation indicator building system, method thereof, device therewith, and computer program product therein
CN115954101A (en) Health degree management system and management method based on AI tongue diagnosis image processing
Wollek et al. Attention-based saliency maps improve interpretability of pneumothorax classification
CN111326260A (en) Medical analysis method, device, equipment and storage medium
Sudharson et al. Enhancing the Efficiency of Lung Disease Prediction using CatBoost and Expectation Maximization Algorithms
CN111415760B (en) Doctor recommendation method, doctor recommendation system, computer equipment and storage medium
CN109493975B (en) Chronic disease recurrence prediction method, device and computer equipment based on xgboost model
Barla et al. A method for robust variable selection with significance assessment.
TWI816078B (en) Mining method for sample grouping
Kırboğa et al. Identifying Cardiovascular Disease Risk Factors in Adults with Explainable Artificial Intelligence
Kahaki et al. Weakly supervised deep learning for predicting the response to hormonal treatment of women with atypical endometrial hyperplasia: a feasibility study
Liao et al. A machine learning‐based risk scoring system for infertility considering different age groups
Appel et al. Utility of blood pressure monitoring outside of the clinic setting
US20120265446A1 (en) Biomarkers based on sets of molecular signatures
TWM605545U (en) Risk assessment apparatus for chronic disease
Kavya et al. Heart Disease Prediction Using Logistic Regression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200623