CN112102945B - Device for predicting severe condition of COVID-19 patient - Google Patents

Device for predicting severe condition of COVID-19 patient Download PDF

Info

Publication number
CN112102945B
CN112102945B CN202011235506.2A CN202011235506A CN112102945B CN 112102945 B CN112102945 B CN 112102945B CN 202011235506 A CN202011235506 A CN 202011235506A CN 112102945 B CN112102945 B CN 112102945B
Authority
CN
China
Prior art keywords
input
feature
module
data
patient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011235506.2A
Other languages
Chinese (zh)
Other versions
CN112102945A (en
Inventor
罗嘉庆
周凌云
冯韵宇
陈子蝶
郭姝瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202011235506.2A priority Critical patent/CN112102945B/en
Publication of CN112102945A publication Critical patent/CN112102945A/en
Application granted granted Critical
Publication of CN112102945B publication Critical patent/CN112102945B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Primary Health Care (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Pure & Applied Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a device for predicting the severity of a COVID-19 patient, belonging to the intelligent processing technology of medical data. The invention comprises the following steps: the input module is used for inputting patient information; the data preprocessing module is used for preprocessing the data output by the input module, sending a processing result to the feature selection module if the data is training data, and sending the processing result to the prediction processing module if the data is to-be-predicted data; the characteristic selection module selects a certain number of characteristics from the input characteristics as input characteristic selection results; and the prediction processing module is used for inputting the characteristic information of the patient into a preset prediction model and sending the prediction result to the prediction result output module for visual output. The invention selects key characteristics from the blood detection result of the patient to ensure the accuracy of predicting the severity of the COVID-19 patient, realizes medical assistance for rapidly shunting the patient with the COVID-19, and is beneficial to optimizing medical resources and carrying out medical intervention in time.

Description

Device for predicting severe condition of COVID-19 patient
Technical Field
The invention belongs to the technical field of intelligent processing of medical data, and particularly relates to a device for predicting the severity of a COVID-19 patient.
Background
Currently, over 2000 million people worldwide are infected with the new coronavirus SARS-Cov-2, and 600 million people are receiving treatment. This poses a great threat to the health and life of people worldwide and also puts unprecedented pressure on medical systems.
Most patients with COVID-19 belong to mild/moderate cases and can recover themselves. However, about 14% of patients are in severe cases, and 5% of patients are in critical cases. Severe/critical cases often develop Acute Respiratory Distress Syndrome (ARDS) or Multiple Organ Dysfunction Syndrome (MODS) within 2 weeks after infection, which consumes a lot of medical resources and leads to a higher fatality rate (up to 49%). Early prediction of the severity of COVID-19 allows for rapid diversion of patients with COVID-19 (i.e., home isolation, hospitalization or ICU distribution, etc.), which helps to optimize the use of medical resources and to timely medical intervention.
Most patients with suspicious symptoms will first visit a fever clinic of a community hospital. They generally accepted 4 initial tests: SARS-Cov-2 RNA test, blood biochemical test and chest Computed Tomography (CT) scan. The first test is used to determine whether a patient is infected with SARS-Cov-2. The latter 3 tests were used to predict the severity of COVID-19. However, since the resources of community hospitals are limited, there are many limitations in completing all four examinations in a short time (for example, the capacity of waiting rooms, waiting time for examination results, and sterilization of examination instruments, etc.). Therefore, how to use the simplest and fastest test to make an accurate prediction is a very urgent and challenging problem.
Of all initial tests, blood tests are the most common and will typically yield results within 2 hours. The inventors of the present invention, in carrying out the present invention, have discovered that an attempt can be made to select key features from blood test results to quickly and accurately predict the severity of COVID-19 patients, thereby helping to optimize the use of medical resources and to perform medical interventions in a timely manner.
Disclosure of Invention
The invention aims to: aiming at the existing problems, the device for predicting the serious condition of the COVID-19 patient is provided, so that the medical auxiliary effect of quickly shunting the patient with the COVID-19 is realized, the use of medical resources is optimized, and medical intervention is performed in time.
The invention discloses a device for predicting the severity of a COVID-19 patient, which comprises an input module, a data preprocessing module, a feature selection module, a prediction processing module and a prediction result output module, wherein the input module is used for inputting a plurality of parameters;
the input module is used for inputting patient information, and if the current data is training data, the input patient information comprises patient personal information, blood detection information and severity; if the current data is the data to be predicted, the input patient information comprises patient personal information and blood detection information;
the data preprocessing module is used for preprocessing the data output by the input module, performing different processing on the training data and the data to be predicted, sending the processing result of the training data to the feature selection module and sending the processing result of the data to be predicted to the prediction processing module;
the characteristic selection module selects T characteristics from the input characteristics as an input characteristic selection result, wherein T is more than or equal to 1;
the prediction processing module inputs the characteristic information of the patient into a preset prediction model and sends a prediction result to the prediction result output module;
the prediction result output module is used for visually outputting the prediction result;
the data preprocessing module is used for specifically processing the training data and the data to be predicted:
if the current data is training data, executing the following preprocessing steps:
respectively taking the specified items in the personal information of the patient as input characteristic items, respectively taking each item in the blood detection information as an input characteristic item, and taking the severity as an output characteristic item; obtaining a feature table based on all input feature items and output feature items;
defining X to represent an input feature index, X to represent an input feature index set, Y to represent an output feature index, and Y to represent an output feature index set;
calculating a correlation value between any two characteristics in the characteristic table to obtain a correlation matrix R;
calculating a P value between any two characteristics in the characteristic table to obtain a P value matrix P;
preprocessing a correlation matrix R:
let R [ X, Y ] = R [ Y, X ] =0 if the elements of matrix P satisfy X ∈ X and Y ∈ Y, P [ X, Y ] = P [ Y, X ] > α;
for i, X ∈ X, if P [ X, i ] = P [ i, X ] > α, let R [ X, i ] = R [ i, X ] = 1; wherein the threshold value alpha is a preset value;
sending the feature tables of a plurality of patients, an input feature index set X, an output feature index set Y and the preprocessed correlation matrix R to a feature selection module;
if the current data is the data to be predicted, executing the following preprocessing steps:
and based on the input feature selection result sent by the feature selection module, reading the matched information from the data to be predicted to generate the feature information of the current patient, and sending the feature information of the patient to the prediction processing module.
Further, when determining the input feature selection result, the feature selection module defines the feature selection as a multi-standard decision problem of the correlation between the input features and the correlation between the input and output features, and obtains the input feature selection result based on the solution of the multi-standard decision problem.
Further, the feature selection module determines that the input feature selection result specifically is:
step 1: acquiring a marking feature set L:
step T1: initializing a marking characteristic set L as an empty set;
step T2: judging whether the input feature index set X is empty or not; if not, go to step T3; if yes, executing the step 2 based on the current marking characteristic set L;
step T3: updating the marking feature set L:
step T301: judging whether | X | is more than min { m-1, ⌈ beta X m ⌉ }, if so, sequencing elements of a union of the marking characteristic set L and the output characteristic index set Y in an ascending order to obtain a sequence
Figure 43408DEST_PATH_IMAGE001
And executing the step T302; wherein m represents the number of input characteristic terms, n represents the number of output characteristic terms, and the value range of the parameter beta is [0.6,0.8 ]]:
Otherwise, directly sorting the elements of the set L in ascending order to form a sequence
Figure 823538DEST_PATH_IMAGE002
And executing the step T302;
step T302: to input characteristic indexThe elements of the lead set X are sorted in ascending order to form a sequence
Figure 472695DEST_PATH_IMAGE003
Step T303: extracting a sub-matrix E from the correlation matrix R, wherein the elements of the sub-matrix E are as follows: e [ i, j ]]=R[ri,cj];
And the element E [ i, j]Worst condition wiAnd optimum condition biRespectively as follows:
Figure 679554DEST_PATH_IMAGE004
calculating the similarity s of each column of the matrix EjAnd the maximum similarity sjThe corresponding column identifier is denoted as j, and element c is markedj*Adding the element c into the marking characteristic set L and simultaneously deleting the element c from the input characteristic index set X j*And then returns to step T2;
the similarity sjThe specific calculation method is as follows:
Figure 947724DEST_PATH_IMAGE005
wherein the first Euclidean distance
Figure 864252DEST_PATH_IMAGE006
Second Euclidean distance
Figure 418730DEST_PATH_IMAGE007
The parameters k and q respectively represent the row number and the column number of the matrix E;
step 2: and (3) selecting the features in the marking feature set L:
starting from the first feature of the marking feature set L, and combining in a mode of adding one feature each time in sequence to obtain a plurality of combined features; and then, carrying out classification performance test on the features of each combination according to a preset classifier model, and selecting the combination with the best classification performance test as an input feature screening result.
Further, the feature selection module performs classification performance testing on the features of each combination by using naive Bayes classification.
Further, the feature selection module sets the input feature selection result as: age, white blood cell count, and lymphocyte count, or set to: age, neutrophil count, and lymphocyte count.
Further, the feature selection module performs classification performance testing on the features of each combination based on the classification accuracy, and selects the combination with the highest classification accuracy as an input feature screening result.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
the invention selects key characteristics from the blood detection result of the patient to achieve the aim of quickly and accurately predicting the severity of the patient with COVID-19; the device for predicting the serious condition of the COVID-19 patient is based on the processing of small sample data, can obtain a relatively stable result, can find out a combination with higher accuracy, and simultaneously has the visualization and interpretability of the characteristic selection process, thereby meeting the requirements of medical clinic.
Drawings
FIG. 1 is an exemplary diagram of a correlation matrix R for a COVID-19 data set, according to an embodiment;
FIG. 2 is an exemplary graph of a matrix P of P-values (a parameter used to determine the outcome of a hypothesis test, i.e., the probability of a more extreme outcome than the resulting sample observation occurring when the original hypothesis is true) of the data set, in accordance with an embodiment;
FIG. 3 is a diagram illustrating an example of a correlation matrix after preprocessing, in accordance with an embodiment;
FIG. 4 is a diagram illustrating a characteristic ordering process of a COVID-19 data set according to an embodiment;
FIG. 5 is a schematic diagram of feature ordering in an embodiment;
FIG. 6 is a graphical illustration of a predicted performance evaluation of the present invention in an exemplary embodiment;
FIG. 7 is a diagram illustrating an average feature number according to an embodiment;
FIG. 8 is a graph illustrating average performance comparison in an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
The invention aims to realize the rapid shunting of patients by early prediction of the severity of COVID-19, thereby improving the utilization of medical resources and providing timely medical intervention. The device for predicting the severity of the COVID-19 patient (called a severity prediction device for short) selects key characteristics from blood detection results so as to rapidly and accurately predict the severity of the COVID-19 patient. The present invention first defines feature selection as a multi-criteria decision (MCDM) problem that considers the correlation between input features and the correlation between input and output features, and then combines the "top-of-the-solution similarity-based-precedence-technique" (TOPSIS) and Naive Bayes (NB) classifiers to achieve the highest prediction accuracy with the least amount of functionality. Preliminary results indicate that the present invention has only 3 features (i.e., age, white blood cell count (WBC)/neutrophil count (NEUT), and lymphocyte count (LYMC)) even considering the impact of dataset uncertainty on machine learning model predictions.
In the specific embodiment, COVID-19 cases diagnosed in Wuhan red cross hospital from 1/2/2020 to 15/3/2020 according to WHO (world Health organization) guidelines were collected. As shown in table 1, the data set contains 9 features, including 8 input features (age, sex, white blood cell count (WBC), lymphocyte count (LYMC), lymphocyte ratio (LYMPH), neutrophil count (NEUT), neutrophil ratio (NEU) and neutrophil to lymphocyte ratio (NLR)) and 1 output feature (severity).
Table 1: clinical features of COVID-19 cases
Figure 50569DEST_PATH_IMAGE008
According to the national institutes of health "the guidelines for the diagnosis and treatment of infection of CoVID-19 of China edition 5, the cases are divided into 4 types:
(1) mild cases: mild clinical symptoms but no imaging manifestations of pneumonia;
(2) moderate cases: with fever, respiratory symptoms and pneumonic image manifestations;
(3) severe cases: any one of the following: respiratory distress with respiratory quotient RR > 30/min, oxygen saturation at rest < 93% or PaO2/FiO2<300mmHg (ImmHg =0.133 kPa);
(4) critical cases: there is any one of the following: respiratory failure requiring mechanical ventilation, electrical shock, or other organ failure requiring intensive care of the ICU.
In order to reduce the time overhead of the prediction process and improve the accuracy of prediction of the severity of COVID-19 (mild/moderate or severe/critically severe cases), the severity prediction device of the present invention reduces the severity types to two types in the present embodiment: the first type is: mild and/or moderate; the second type is: severe and/or critical illness; that is, the severity prediction device according to the present invention can predict whether or not a COVID-19 patient is in a severe state (including a critically ill state) quickly.
The severe illness prediction device comprises an input module, a data preprocessing module, a feature selection module, a prediction processing module and a prediction result output module; the input module is used for inputting patient information, and if the current data is training data, the input patient information comprises patient personal information (name, age, sex and the like), blood detection information and severity (namely, the patient information is classified based on the severity of the patient diseases, and a corresponding severity metric value is set for each type respectively); if the current data is the data to be predicted, the input patient information comprises patient personal information and blood detection information; the data preprocessing module is used for preprocessing the data output by the input module, performing different processing on training data and data to be predicted, and performing thinning processing on the training data mainly to perform a noise elimination process on the collected original data; if the data to be predicted is to be predicted, extracting partial information (name, information of an item matched with the input feature selection result in the blood detection information) from the data to be predicted to generate feature information of the current patient, and sending the feature information of the patient to the prediction processing module; the characteristic sorting module is used for sorting and screening the characteristics, wherein the characteristic sorting is a process of sorting the characteristics through values of certain scoring functions, and the characteristic relevance of the characteristics is usually measured; feature selection aims at selecting a small fraction of relevant features from the original features by removing irrelevant, redundant or noisy features. The prediction processing module inputs the characteristic information of the patient into a preset prediction model (which is well learned and trained) and sends a prediction result to a prediction result output module for visual output; namely, the currently input information to be predicted is subjected to the classified prediction processing of the exacerbation based on the selected characteristics and the set prediction model, and the prediction result is visually output. Meanwhile, in order to verify the prediction performance of the exacerbation prediction device of the present invention, the performance of binary classification of the exacerbation prediction device of the present invention is also measured by statistical measures (accuracy (ACC), sensitivity (TPR), False Positive Rate (FPR), and F1 score (a weighted average of model accuracy and recall, with a maximum value of 1 and a minimum value of 0, a larger value indicating a better model)).
The concrete implementation processes of the prediction processing and the prediction performance evaluation of the invention are as follows:
(1) and (4) preprocessing.
In this embodiment, the data set is randomly divided into 2 subsets: training set (50%) and test set (50%). In the four stages of this embodiment, only the test set is used for performance evaluation.
Let X = { X |1 ≦ X ≦ m } be the input feature set and Y = { Y | m +1 ≦ Y ≦ m + n } be the output feature set, assuming that there are m input features and n output features, the elements X and Y are the indices of the features. The feature set is F = X ≧ Y = { i |1 ≦ i ≦ m + n }. A (m + n) × (m + n) correlation matrix R and a (m + n) × (m + n) P-value matrix P are calculated and visualized to show the correlation between all the different feature pairs.
To simplify the data throughput, the correlation matrix R is preprocessed in two steps.
Step 1: ignoring the sign of R [ i, j ], let R [ i, j ] = | R [ i, j ] | so the range of R [ i, j ] changes from [ -1,1] to [0,1], where i, j ∈ F.
Step 2: r was filtered through P.
For X ∈ X and Y ∈ Y, if P [ X, Y ] = P [ Y, X ] > α, then R [ X, Y ] and R [ Y, X ] can be ignored, i.e., let R [ X, Y ] = R [ Y, X ] =0. For i, X ∈ X, if P [ X, i ] = P [ i, X ] > α, let R [ X, i ] = R [ i, X ] = 1. In general, the threshold α may be in the range of 0.01 or 0.05, preferably 0.05.
Based on the personal information (sex and age), blood test information, and severity (whether or not severe) of the patient given in table 1, the input feature number m =8 and the output feature number n =1 can be obtained, thereby obtaining a 9 × 9 correlation matrix R shown in fig. 1 and a 9 × 9P-value matrix P shown in fig. 2.
After the correlation matrix R is preprocessed, specific values of each element R [ i, j ] of the preprocessed correlation matrix R shown in fig. 3 can be obtained, where i, j is e.f, and the value range of R [ i, j ] is [0,1 ].
Since P [1,9] = P [9,1] =0.3865>0.05, and P [3,9] = P [9,3] =0.1055>0.05, R [1,9], R [9,1], R [3,9] and R [9,3] are negligible, i.e., take values of 0. As can also be seen from fig. 3, R [1,9] = R [9,1] = R [3,9] = R [9,3] =0, R [1, 1: 8] = R [3, 1: 8] = unit vector (1, 8), R [ 1: 8,1] = R [ 1: 8,3] = unit vector (8, 1).
(2) And (6) sorting the features.
A set of labeling features L is defined and initialized to L = ∅.
The process of ranking the input features X e X iterates and the first of each ranking is moved from X to L. The ranking criteria included 2 evaluation terms:
evaluation item 1 (EVAL 1): the input features X ∈ X and the output features Y ∈ Y, R [ X, Y ] or R [ Y, X ].
Evaluation item 2 (EVAL 2): the correlation between the input features X ∈ X and the marker features v ∈ L, R [ X, v ] or R [ v, X ]. Thereby realizing the evaluation processing of a plurality of conflict criteria in the decision.
The present invention is based on the proposed process of solving this multi-criteria decision (MCDM) problem by using the preference order Technique (TOPSIS) similar to the ideal solution, which is a compensatory aggregation method, first of all creating an evaluation matrix E containing k conditions and q alternatives to rank the input elements. According to the pareto principle, x is classified into the following 2 types:
type 1:
if | X | is > min { m-1, ⌈ β × m ⌉ }, then the input feature X to be labeled is the core feature, which should have the lowest R [ v, X ] in the evaluation term 2]And the highest R [ y, x ] from evaluation 1]. And ordering the elements of the sets L ^ Y and X in ascending order to obtain a sequence
Figure 715905DEST_PATH_IMAGE001
And
Figure 373475DEST_PATH_IMAGE003
. Wherein the value range of the parameter beta is [0.6,0.8 ]]Preferably, the value is 0.8, i.e. the first 20% of the input features are core features.
Let k = | L | + n, and q = | X |, extract a k × q sub-matrix E from the preprocessed correlation matrix R, so that E [ i, j |)]=R[ri,cj]。
And the element E [ i, j]Worst condition wiAnd optimum condition biRespectively as follows:
Figure 895592DEST_PATH_IMAGE004
referring to fig. 4, in the diagram, it is represented that when | X | =8 > min {8-1, ⌈ 0.8 × 8 ⌉ } =7, L { [ Y = ∅ { [ 9} = {9 }. Is provided with (r)i)1 i=1=(9),(cj)8 j=1= (1, …, 8). Since = | L | + n =1 and q = | X | =8, E is for RA 1 × 8 sub-matrix.
Type 2:
if | X | ≦ min { m-1, ⌈ 0.8.8 xm ⌉ }, the X to be labeled is an assist feature (the remaining 80%), only the lowest R [ v, X ] evaluation of 2 is needed.
And ordering the elements of the L and X sets in ascending order to obtain a sequence
Figure 483568DEST_PATH_IMAGE009
And
Figure 17843DEST_PATH_IMAGE003
let k = | L | and q = | X |, E is E [ i, j |)]=R[ri,cj]K × q matrix of (1).
As can be seen from the graph given in fig. 4, when | X | =5 < 7, L = {2,6,4}, and X = {1,3,5,7,8} (r)i)3 i=1=(2,6,4),(cj)5 j=1= (1,3,5,7, 8). Since currently k = | L | =3 and q = | X | =5, E is a 3 × 5 sub-matrix of R.
The L2 distance (euclidean distance) between the target surrogate j and the worst condition is calculated according to equation (1):
Figure 109295DEST_PATH_IMAGE006
(1)
the L2 distance between the j condition and the optimum condition is then calculated according to equation (2):
Figure 739997DEST_PATH_IMAGE007
(2)
and then calculating the similarity with the worst condition according to a formula (3):
Figure 411674DEST_PATH_IMAGE010
(3)
s only when the conditions for substituting j are optimalj= 1; s only when the worst condition of j is substitutedjAnd =0. Order toj*=arg maxj{sjIs then X = X \ cj*},L=L∪{cj*}。
Example 4: as shown in fig. 4, when | X | =8 > 7, wi=1 and biAnd =0. D is calculated from the formula (1) and the formula (2)w2=0.5251,db2= 0.4749. From equation (3), s can be obtained2= 0.5251. When | X | =5 < 7, wi=1,biAnd =0. D is calculated by formula (1) and formula (2)w8=0.9685,db8= 0.8615. From equation (3), s is obtained2=0.5293。
Namely, the invention marks a plurality of characteristics of patients based on MCDM, and obtains the specific realization process of the marked characteristic set as follows:
step S1: acquiring patient characteristics as input characteristics, acquiring prediction types as output characteristics, and acquiring a characteristic set based on all the input characteristics and the output characteristics;
obtaining a correlation matrix R for any two characteristics in the characteristic set based on a correlation value between the characteristics, wherein the dimensionality of the correlation matrix R is (m + n) x (m + n), m represents the number of input characteristics, and n represents the number of output characteristics;
for any two features in the feature set, obtaining a matrix P with dimensions of (m + n) × (m + n) based on a P value between the features;
setting an input feature index set X = { X |1 is not less than X and not more than m }, and setting an output feature index set Y = { Y | m +1 is not less than Y and not more than m + n };
initializing a marking characteristic set L as an empty set;
step S2: preprocessing a correlation matrix R:
the elements of the correlation matrix R are set to: r [ i, j ] = | R [ i, j ] |, where i, j represent the rows and columns, respectively, of the correlation matrix R;
and (3) filtering the correlation matrix R based on the matrix P: for X ∈ X and Y ∈ Y, if P [ X, Y ] = P [ Y, X ] >0.05, let R [ X, Y ] = R [ Y, X ] = 0; for u ∈ X and X ∈ X, if P [ X, u ] = P [ u, X ] >0.05, let R [ X, u ] = R [ u, X ] = 1;
step S3: judging whether the set X is empty; if yes, go to step S5; otherwise, executing step S4;
step S4: updating the marking feature set L:
step S401: judging whether | X | is more than min { m-1, ⌈ beta X m ⌉ }, if so, sorting the elements of the set L | > Y and the set X in an ascending order to obtain a sequence
Figure 684392DEST_PATH_IMAGE001
And performing step 402;
that is, when the number of elements in the set X is greater than the value of min { m-1, ⌈ β × m ⌉ }, the elements of L { [ Y ] } are ordered in ascending order to form a sequence
Figure 568035DEST_PATH_IMAGE001
Otherwise, directly sorting the elements of the set L in ascending order to form a sequence
Figure 900796DEST_PATH_IMAGE002
And performing step 402;
step S402: sorting the elements of the set X in ascending order to form a sequence
Figure 466294DEST_PATH_IMAGE003
Step S403: extracting a sub-matrix E from the correlation matrix R, wherein the elements of the sub-matrix E are as follows: e [ i, j ]]=R[ri,cj];
Calculating the similarity s of each column of the matrix EjAnd the maximum similarity sjThe corresponding column identifier is denoted j*An element cj*Adding the element c into the marking characteristic set L and simultaneously deleting the element c from the input characteristic index set Xj*Then, the process returns to step S3;
step S5: and obtaining and outputting a marking feature set L.
Referring to fig. 4, the labeling order of the current input elements is (2, 6,4,7,8,6,1, 3). If only evaluation 1 is considered, i.e. X ∈ X is ordered according to statistically significant R [ X, y ], another sequence (2, 5,4,7,8, 6) will result, as shown in fig. 5. As can be seen from fig. 3, although R [5,9] =0.3526> R [6,9] =0.2179, R [5,2] =0.2471> R [6,2] =0.06803 and R [5, 4] =0.7023> R [6,4] = 0.2827. This indicates that 2,5,4 may include redundant features and may not independently contribute to the prediction.
(3) And (4) selecting characteristics.
The goal of feature subset selection is to find the best input feature subset. The number of labeled features is gradually increased and the model is trained using a naive bayes classifier in turn. To find the best subset, the accuracy of the training model is tested sequentially on the training set. Fig. 5 shows that when 4 features {2,5,4,7} are selected, the accuracy of evaluation item 1 reaches a peak of 0.765. And when fewer features 2,6,4 are used, the accuracy of the evaluation term 1 plus the evaluation term 2 can reach a higher 0.816.
(4) And (4) prediction processing and output.
The prediction processing module and the prediction result output module are realized based on the invention. The prediction processing module of the invention is preset with a trained prediction model (such as a classifier model adopted in feature selection), and only the feature information of the patient is required to be input into the classifier model, so that the current prediction result of the serious condition of the patient is output and obtained based on the classification result; the prediction model in the prediction processing module is not specifically limited, any conventional classifier model can be adopted, and the adopted classifier model is subjected to learning training to obtain the prediction model meeting the training requirement. The prediction result output module can output the corresponding prediction result in a mode of graphics, characters, light or the like.
(5) And (6) performance evaluation.
In the present embodiment, based on the set test set, the Accuracy (ACC), sensitivity (TPR), False Positive Rate (FPR), and F1 score (F1 score) are used as evaluation measures of functional predictability. Fig. 6 shows the prediction performance for prediction using different conditions. As shown in fig. 6, {2,6,4} has the lowest number of functions, but scores the highest among the multiple performance indicators. Meanwhile, based on fig. 6, it can be seen that the accuracies of {2,5,4,7,8,6}, {2,5,4,7} and {2,6,4} are 0.7959, 0.8469 and 0.8673, respectively; and {2,5,4,7,8,6}, {2,5,4,7} and {2,6,4} have F1 scores of 0.7561, 0.7761 and 0.806, respectively.
In this embodiment, 306 collected cases of COVID-19 are divided into two groups: 141 moderate cases and 165 severe/critical cases. The blood test results of the two groups are shown in Table 1.
To test the severity prediction unit of the present invention for prediction stability and to observe the effect of dataset uncertainty on feature selection, the dataset was divided into 100 runs (50% training set and 50% testing set) and repeated. FIG. 7 shows the average number of features selected by 3 different criteria, EVAL1, EVAL2 (subset) and EVAL1+ EVAL2 (subset) being 6.29 (95% CI (Confidence Interval): 6.13-6.45), 3.11 (95% CI: 2.79-3.43) and 2.98 (95% CI: 2.81-3.15), respectively. As can be seen from fig. 8, the standard EVAL1+ EVAL2 (subset) used by the severe exacerbation prediction device of the present invention improves most performance indicators. Indexes (ACC, TPR, FPR and F1 scores) of EVAL1+ EVAL2 (subset) were 0.803 (95% CI: 0.794-0.812), 0.685 (95% CI: 0.673-0.697), 0.117 (95% CI: 0.104-0.131) and 0.724 (95% CI: 0.71-0.739), respectively, while EVAL1 was 0.75 (95% CI: 0.741-0.76), 0.599 (95% CI: 0.583-0.616), 0.093 (95% CI: 0.083-0.103) and 0.698 (95% CI: 0.688-0.708), respectively. Referring to FIG. 8, although the feature selection is affected by the dataset uncertainty, the feature selection is dominated by the selectivity of up to 31% for the 2 subsets Age, NEUT, LYMC and Age, WBC, LYMC. These two subsets can achieve high accuracy with a small number of features.
Furthermore, based on current treatment experience, proper intervention in the first and second weeks of disease progression is important to prevent disease progression and reduce mortality. Previous studies have shown that the severity of COVID-19 is closely related to the age, underlying disease and general immune status of the patient. The input of the critical condition prediction device only needs the age of the patient and the blood test result, and selects the corresponding characteristics (WBC/NEUT, LYMC) from the blood test result to perform prediction processing based on the preset characteristic selection mode, so as to output the patient type (mild, moderate, severe and critical) of COVID-19 of the current patient, and the prediction accuracy can reach more than 80%. During the COVID-19 pandemic, it is more clinically desirable and is easier to popularize and use in areas of different medical levels. That is, the present invention of the critical illness prediction apparatus selects effective characteristics from blood test results, and preliminary experiment results show that the prediction accuracy (95% CI: 0.794-0.812) of 0.803 can be achieved by only selecting 3 key characteristics (i.e., age, white blood cell count (WBC)/neutrophil count (NEUT) and lymphocyte count (LYMC)), and the high accuracy of the prediction (average 80.3%) is very favorable for the rapid diagnosis of covi-19 patients. Using only the most common blood tests, the medical facility can better determine home isolation, hospitalization, ICU distribution, or covd-19 patients.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (8)

1. The device for predicting the severity of the COVID-19 patient is characterized by comprising an input module, a data preprocessing module, a feature selection module, a prediction processing module and a prediction result output module;
the input module is used for inputting patient information, and if the current data is training data, the input patient information comprises patient personal information, blood detection information and severity; if the current data is the data to be predicted, the input patient information comprises patient personal information and blood detection information;
the data preprocessing module is used for preprocessing the data output by the input module, performing different processing on the training data and the data to be predicted, sending the processing result of the training data to the feature selection module and sending the processing result of the data to be predicted to the prediction processing module;
the characteristic selection module selects T characteristics from the input characteristics as an input characteristic selection result, wherein T is more than or equal to 1;
the prediction processing module inputs the characteristic information of the patient into a preset prediction model and sends a prediction result to the prediction result output module;
the prediction result output module is used for visually outputting the prediction result;
the data preprocessing module is used for specifically processing the training data and the data to be predicted:
if the current data is training data, executing the following preprocessing steps:
respectively taking the specified items in the personal information of the patient as input characteristic items, respectively taking each item in the blood detection information as an input characteristic item, and taking the severity as an output characteristic item; obtaining a feature table based on all input feature items and output feature items;
defining X to represent an input feature index, X to represent an input feature index set, Y to represent an output feature index, and Y to represent an output feature index set;
calculating a correlation value between any two characteristics in the characteristic table to obtain a correlation matrix R;
calculating a P value between any two characteristics in the characteristic table to obtain a P value matrix P;
preprocessing a correlation matrix R:
setting the value of an element R [ x, y ] of the correlation matrix R to 0 if the element of the matrix P satisfies P [ x, y ] = P [ y, x ] > α;
for i ∈ X and X ∈ X, if P [ X, i ] = P [ i, X ] > α, let R [ X, i ] = R [ i, X ] = 1; wherein the threshold value alpha is a preset value;
sending the feature tables of a plurality of patients, an input feature index set X, an output feature index set Y and the preprocessed correlation matrix R to a feature selection module;
if the current data is the data to be predicted, executing the following preprocessing steps:
and based on the input feature selection result sent by the feature selection module, reading the matched information from the data to be predicted to generate the feature information of the current patient, and sending the feature information of the patient to the prediction processing module.
2. The apparatus of claim 1, wherein the feature selection module, when determining the input feature selection result, defines the feature selection as a multi-criteria decision problem of the correlation between the input features and the correlation between the input and output features, and obtains the input feature selection result based on a solution of the multi-criteria decision problem.
3. The apparatus of claim 1, wherein the feature selection module determines the input feature selection result to be:
step 1: acquiring a marking feature set L:
step T1: initializing a marking characteristic set L as an empty set;
step T2: judging whether the input feature index set X is empty or not; if not, go to step T3; if yes, executing the step 2 based on the current marking characteristic set L;
step T3: updating the marking feature set L:
step T301: judging whether | X | is more than min { m-1, ⌈ beta X m ⌉ }, if so, sequencing elements of a union of the marking characteristic set L and the output characteristic index set Y in an ascending order to obtain a sequence
Figure 31384DEST_PATH_IMAGE001
And executing the step T302; wherein m represents the number of input characteristic terms, n represents the number of output characteristic terms, and the value range of the parameter beta is [0.6,0.8 ]];
Otherwise, directly sorting the elements of the set L in ascending order to form a sequence
Figure 73158DEST_PATH_IMAGE002
And executing the step T302;
step T302: sorting the elements of the input feature index set X in ascending order to form a sequence
Figure 384054DEST_PATH_IMAGE003
Step T303: from phaseExtracting a sub-matrix E from the relation matrix R, wherein the elements of the sub-matrix E are as follows: e [ i, j ]]=R[ri,cj];
And the element E [ i, j]Worst condition wiAnd optimum condition biRespectively as follows:
Figure 172405DEST_PATH_IMAGE004
calculating the similarity s of each column of the matrix EjAnd the maximum similarity sjThe corresponding column identifier is denoted j*An element cj*Adding the element c into the marking characteristic set L and simultaneously deleting the element c from the input characteristic index set Xj*And then returns to step T2;
the similarity sjThe specific calculation method is as follows:
Figure 908149DEST_PATH_IMAGE005
wherein the first Euclidean distance
Figure 183142DEST_PATH_IMAGE006
Second Euclidean distance
Figure 97178DEST_PATH_IMAGE007
The parameters k and q respectively represent the row number and the column number of the matrix E;
step 2: and (3) selecting the features in the marking feature set L:
starting from the first feature of the marking feature set L, and combining in a mode of adding one feature each time in sequence to obtain a plurality of combined features; and then, carrying out classification performance test on the features of each combination according to a preset classifier model, and selecting the combination with the best classification performance test as an input feature screening result.
4. The apparatus of claim 1, wherein the feature selection module sets the input feature selection result to: age, white blood cell count, and lymphocyte count, or set to: age, neutrophil count, and lymphocyte count.
5. The apparatus of claim 3, wherein the feature selection module performs a classification performance test on each combination of features using a naive bayes classification.
6. The apparatus of claim 3, wherein the feature selection module performs a classification performance test on the features of each combination based on the classification accuracy, and selects the combination with the highest classification accuracy as the input feature selection result.
7. The apparatus of claim 1, wherein the threshold α is set to a value of 0.01 or 0.05.
8. The apparatus of claim 3, wherein the setting parameter β is 0.8.
CN202011235506.2A 2020-11-09 2020-11-09 Device for predicting severe condition of COVID-19 patient Active CN112102945B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011235506.2A CN112102945B (en) 2020-11-09 2020-11-09 Device for predicting severe condition of COVID-19 patient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011235506.2A CN112102945B (en) 2020-11-09 2020-11-09 Device for predicting severe condition of COVID-19 patient

Publications (2)

Publication Number Publication Date
CN112102945A CN112102945A (en) 2020-12-18
CN112102945B true CN112102945B (en) 2021-02-05

Family

ID=73785242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011235506.2A Active CN112102945B (en) 2020-11-09 2020-11-09 Device for predicting severe condition of COVID-19 patient

Country Status (1)

Country Link
CN (1) CN112102945B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10991185B1 (en) 2020-07-20 2021-04-27 Abbott Laboratories Digital pass verification systems and methods
CN112951413B (en) * 2021-03-22 2023-07-21 江苏大学 Asthma diagnosis system based on decision tree and improved SMOTE algorithm
CN113138250B (en) * 2021-04-23 2021-12-17 西湖大学 Non-diagnostic method for typing covid-19 grade by using characteristic urine protein and application
CN112967810A (en) * 2021-05-07 2021-06-15 四川大学华西医院 New coronavirus pneumonia severe prediction system and method
CN113555118B (en) * 2021-07-26 2023-03-31 内蒙古自治区人民医院 Method and device for predicting disease degree, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2020100564A4 (en) * 2020-04-14 2020-05-21 Phan, Hung Thanh Mr CORONAVIRUS IMPACT ON THE WORLD ECONOMY PROBLEMS SOLVING: I invent the equation for solving the forecast of number of COVID-19 cases in the future so to help a country can re open the business as early as possible in the minimizes of COVID-19
CN111261302A (en) * 2020-02-26 2020-06-09 汤一平 Epidemic infectious disease virus field visualization method and system based on space-time trajectory data
CN111314360A (en) * 2020-02-25 2020-06-19 贵州精准健康数据有限公司 In-hospital cloud video system
CN111462101A (en) * 2020-04-07 2020-07-28 广州柏视医疗科技有限公司 Staging equipment based on novel coronavirus pneumonia CT detection and using method thereof
CN111462100A (en) * 2020-04-07 2020-07-28 广州柏视医疗科技有限公司 Detection equipment based on novel coronavirus pneumonia CT detection and use method thereof
CN111653356A (en) * 2020-04-20 2020-09-11 浙江大学 New coronary pneumonia screening method and new coronary pneumonia screening system based on deep learning

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3435262A1 (en) * 2010-03-15 2019-01-30 Singapore Health Services Pte. Ltd. A system for the detection of impending acute cardiopulmonary medical events
CN105787439B (en) * 2016-02-04 2019-04-05 广州新节奏智能科技股份有限公司 A kind of depth image human synovial localization method based on convolutional neural networks
CN108829815B (en) * 2018-06-12 2022-06-07 四川希氏异构医疗科技有限公司 Medical image screening method
CN110955809B (en) * 2019-11-27 2023-03-31 南京大学 High-dimensional data visualization method supporting topology structure maintenance
CN110926655A (en) * 2020-02-17 2020-03-27 深圳市刷新智能电子有限公司 Epidemic situation monitoring method and system based on wearable body temperature sensor
CN111383728A (en) * 2020-02-24 2020-07-07 华中科技大学同济医学院附属同济医院 Medical symptom information processing device and isolation management system for isolation management of new coronary pneumonia
KR20200032050A (en) * 2020-03-05 2020-03-25 김승찬 CoVID-19 suitable triple knockout DNAi remedy
CN111128397A (en) * 2020-03-13 2020-05-08 赵志强 Temperature-sensing intelligent wearable monitoring device and monitoring method thereof
CN111081316A (en) * 2020-03-25 2020-04-28 元码基因科技(北京)股份有限公司 Method and device for screening new coronary pneumonia candidate drugs
CN111334868B (en) * 2020-03-26 2023-05-23 福州福瑞医学检验实验室有限公司 Construction method of novel coronavirus whole genome high-throughput sequencing library and kit for library construction
CN111161887B (en) * 2020-03-30 2020-11-24 广州地理研究所 Population migration big data-based epidemic area return population scale prediction method
AU2020100545A4 (en) * 2020-04-10 2020-05-28 Wholesale Group International Pty. Ltd. TOV 770 - An innovative ethyl alcohol, chlorite, hydrogen peroxide, tea tree oil extract (Melaleuca alternifolia) based anti- SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) viral surface sanitizer

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111314360A (en) * 2020-02-25 2020-06-19 贵州精准健康数据有限公司 In-hospital cloud video system
CN111261302A (en) * 2020-02-26 2020-06-09 汤一平 Epidemic infectious disease virus field visualization method and system based on space-time trajectory data
CN111462101A (en) * 2020-04-07 2020-07-28 广州柏视医疗科技有限公司 Staging equipment based on novel coronavirus pneumonia CT detection and using method thereof
CN111462100A (en) * 2020-04-07 2020-07-28 广州柏视医疗科技有限公司 Detection equipment based on novel coronavirus pneumonia CT detection and use method thereof
AU2020100564A4 (en) * 2020-04-14 2020-05-21 Phan, Hung Thanh Mr CORONAVIRUS IMPACT ON THE WORLD ECONOMY PROBLEMS SOLVING: I invent the equation for solving the forecast of number of COVID-19 cases in the future so to help a country can re open the business as early as possible in the minimizes of COVID-19
CN111653356A (en) * 2020-04-20 2020-09-11 浙江大学 New coronary pneumonia screening method and new coronary pneumonia screening system based on deep learning

Also Published As

Publication number Publication date
CN112102945A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN112102945B (en) Device for predicting severe condition of COVID-19 patient
Rahaman et al. Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches
CN111681219B (en) New coronavirus infection CT image classification method, system and equipment based on deep learning
Malik et al. CDC_Net: Multi-classification convolutional neural network model for detection of COVID-19, pneumothorax, pneumonia, lung Cancer, and tuberculosis using chest X-rays
CN107845424B (en) Method and system for diagnostic information processing analysis
RABBAH et al. A new classification model based on stacknet and deep learning for fast detection of COVID 19 through X rays images
Rahman et al. Development and validation of an early scoring system for prediction of disease severity in COVID-19 using complete blood count parameters
Malik et al. BDCNet: Multi-classification convolutional neural network model for classification of COVID-19, pneumonia, and lung cancer from chest radiographs
CN110838363A (en) Control method and medical system
Kollias et al. Ai-enabled analysis of 3-d ct scans for diagnosis of covid-19 & its severity
Goldstein et al. Covid-19 classification of x-ray images using deep neural networks
Li et al. PNet: An efficient network for pneumonia detection
Dawood A new method based CNN combined with genetic algorithm and support vector machine for COVID-19 detection by analyzing X-ray images
Monowar et al. Lung opacity classification with convolutional neural networks using chest x-rays
Ahmad et al. Lightweight ResGRU: a deep learning-based prediction of SARS-CoV-2 (COVID-19) and its severity classification using multimodal chest radiography images
Burgos-Artizzu Computer-aided covid-19 patient screening using chest images (X-Ray and CT scans)
Barbosa Jr et al. Machine learning automatically detects COVID-19 using chest CTs in a large multicenter cohort
Yu et al. Artificial intelligence systems for diagnosis and clinical classification of COVID-19
Mayya et al. A novel medical support deep learning fusion model for the diagnosis of COVID-19
Dawod et al. Hybrid approach for COVID-19 detection from chest radiography
Türk Covid-19 Diagnosis Using a Deep Learning Ensemble Model with Chest X-Ray Images.
Fonseca et al. Screening of viral pneumonia and covid-19 in chest x-ray using classical machine learning
Hammadah et al. A hybrid approach of Deep Learning Algorithms for Identification of COVID-19 disease using Chest X-Ray Images
Arreola Minjarez et al. Detection of COVID-19 Lung Lesions in Computed Tomography Images Using Deep Learning
Beniameen et al. 32 predictors of Mortality among Head Trauma Patients reaching ICU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant