WO2023241012A1

WO2023241012A1 - Method for establishing deep learning-based model for predicting functions after post-cerebral stroke early rehabilitation

Info

Publication number: WO2023241012A1
Application number: PCT/CN2022/143730
Authority: WO
Inventors: 陆晓; 郑瑜; 顾昭华; 龚晨; 李健; 彭丽君
Original assignee: 南京医科大学
Priority date: 2022-06-16
Filing date: 2022-12-30
Publication date: 2023-12-21
Also published as: CN115019919A; CN115019919B

Abstract

A method for establishing a deep learning-based model for predicting functions after post-cerebral stroke early rehabilitation. By constructing a hybrid deep learning model consisting of a convolutional neural network (CNN) and a long short-term memory (LSTM) artificial neural network, and combining clinical data and early rehabilitation related data, the early and accurate prediction of function prognosis after post-cerebral ischemic stroke early rehabilitation is performed; long-term function prediction is performed in combination with a time-dependent modified Rankin scale (mRS), and at the same time, an individualized early rehabilitation strategy is formulated under guidance; function prediction can be performed in the early stage of cerebral stroke by means of the constructed machine learning-based prediction model for function prognosis after post-cerebral ischemic stroke early rehabilitation, i.e., a CNN-LSTM model, so that accurate guidance is provided for the formulation of a subsequent rehabilitation training plan, and the functions of a stroke patient are better recovered; moreover, medical resources can be conserved, and unnecessary consumption of labor and material resources is reduced.

Description

Method for establishing functional prediction model after early stroke rehabilitation based on deep learning

Technical field

The invention relates to a method for establishing a functional prognosis prediction model after early recovery from stroke, and specifically relates to a method for establishing a functional prediction model after early recovery from stroke based on deep learning.

Background technique

Stroke has the characteristics of high incidence, high mortality and high recurrence rate. According to the Global Burden of Disease Report 2010, stroke has become the second leading cause of death in the world and is also the disease with the highest disability rate among single diseases. Over the past two decades, tremendous advances have been made in the treatment of patients with acute ischemic stroke, resulting in significant reductions in mortality. However, as mortality rates decrease, the disability burden among stroke survivors increases.

Treatment after acute ischemic stroke includes intravenous thrombolysis, device thrombectomy, etc. The difficulty lies in how to assess the patient's risk and how to obtain benefits from treatment, thereby helping to make early treatment decisions. Early rehabilitation is currently proposed as a means to promote functional recovery and reduce mortality and disability rates in stroke patients. However, there are currently no studies that predict the level of functional prognosis after acute ischemic stroke, and there are no studies that predict the level of function and disability after early recovery. However, for patients and families, early and accurate prediction of functional prognosis is of great reference for family decision-making. The prognosis of ischemic stroke is highly heterogeneous and difficult to predict. In recent years, there has been a gradual increase in research into deep learning, which may provide methods for solving these challenging problems. Deep neural network analysis methods are adept at handling complex inputs and have been used to predict long-term prognosis.

The present invention constructs a hybrid deep learning model composed of a convolutional neural network and a long-short-term memory artificial neural network, and combines clinical data and early rehabilitation-related data to make early and accurate predictions of the functional prognosis after early rehabilitation of ischemic stroke. Combined with the time-dependent modified Rankin Scale (mRS), long-term functional prediction is performed, while guiding the development of individualized early rehabilitation strategies.

Contents of the invention

The purpose of the present invention is to propose a method for establishing a functional prediction model after early rehabilitation of stroke based on deep learning in order to solve the shortcomings existing in the existing technology.

In order to achieve the above purpose, the present invention adopts the following technical solution: a method for establishing a functional prediction model after early recovery from stroke based on deep learning. The model establishment method includes the following steps:

S1: Establish a data disease database;

Prepare medical record data, collect patient electronic medical records from the hospital electronic medical record platform, and collect electronic medical records of ischemic stroke patients undergoing early rehabilitation; use the medical record data of the first case diagnosed with ischemic stroke as qualified electronic medical record data. ;

S2: Extract patient medical characteristic data;

Extract ischemic stroke medical features from the qualified electronic medical record data obtained in S1, and extract medical features and medical feature values; the ischemic stroke features include demographic information, laboratory and clinical examination-related information, and medications. Information related to invasive treatment and rehabilitation intervention; used as material for prediction;

The medical characteristic values are specific values of each medical characteristic among demographics, laboratory and clinical examinations, drugs and invasive treatment and rehabilitation intervention characteristics;

The demographic information includes: gender, age, occupation, marital status, education, height, weight, BMI, systolic blood pressure, diastolic blood pressure, heart rate, whether it is the first cerebrovascular accident, TOAST classification, OCSP classification, past history, Duration of hypertension, duration of diabetes, smoking status, smoking age, number of cigarettes smoked per day, smoking index, drinking history, regular physical activities, family history;

The laboratory and clinical examination-related information includes: glycosylated hemoglobin, triglycerides, total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, lipoprotein A, homocysteine, partial thromboplastin time, coagulation Enzyme time - international normalized ratio, electrocardiogram, structural imaging examination results, common carotid artery stenosis, carotid bulb stenosis, internal carotid artery stenosis, subclavian artery stenosis, left internal carotid artery intracranial stenosis, left anterior cerebral artery stenosis , left middle cerebral artery stenosis, left posterior cerebral artery stenosis, left vertebral artery stenosis, right internal carotid artery intracranial stenosis, right anterior cerebral artery stenosis, right middle cerebral artery stenosis, right posterior cerebral artery stenosis, right vertebral artery stenosis, Vertebrobasilar artery stenosis, swallowing function assessment, Kubota drinking test;

The information related to the drugs and invasive treatments includes: intravenous thrombolysis, endovascular treatment, antiplatelet treatment within 48 hours, anticoagulant treatment within 48 hours, antihypertensive drugs, lipid-lowering drugs, and hypoglycemic drugs;

The information related to rehabilitation intervention includes: the duration from onset to the first rehabilitation intervention, the duration from onset to the first mobilization, the benefit of early mobilization in the first rehabilitation intervention, the duration of early mobilization in the first rehabilitation intervention, 14 days Total duration of early mobilization, average duration of 14 days of early mobilization, duration of physical therapy, duration of occupational therapy, duration of speech therapy, first 14 days of continuous physical therapy, first 14 days of continuous physical therapy, first 14 days of continuous physical therapy Continuous Speech Therapy;

S3: Target result feature data extraction;

Extract post-stroke scores at different time steps. The extracted time steps mainly include the day of admission, 15 days after stroke, 30 days after stroke, 90 days after stroke, and 180 days after stroke; extract the post-stroke time score used to predict the target ; The dichotomous results of time after stroke are: favorable outcome is time after stroke score 0-2, unfavorable outcome is time after stroke score 3-6, which may be moderate or severe disability, or death;

Big data on the clinical manifestations of ischemic stroke can be obtained through steps S1 to S3.

S4: Feature data standardization and data cleaning;

Standardize the feature data on the big data on the clinical manifestations of ischemic stroke obtained in S3, and adopt a missing data strategy to exclude patients with more than 50% missing feature variables. The remaining missing feature data will be filled in under the mode of existing data for the same feature. , the missing values of continuous variables are filled with the mean, and the missing values of the categorical variables are filled with the mode; all data are standardized so that their mean and unit variance are zero;

S5: Establish machine learning model 1—XGBoost;

Input the demographic information, laboratory and clinical examination-related information, drug and invasive treatment-related information, and rehabilitation intervention-related information extracted in step S2 into the XGBoost model for mRS90 binary prediction;

The XGBoost includes an XGBoost decision tree and the relationship between the XGBoost decision trees; the XGBoost decision tree includes multiple nodes; the nodes are medical features and thresholds; the relationship between the XGBoost decision trees is a gradient descent optimization algorithm, The latter decision tree is obtained from the previous decision tree according to the gradient descent optimization algorithm;

Feature screening in the XGBoost model: Use XGBoost to automatically find the most relevant features for mRS90 binary classification of target results; use the initial features on the development set to train the estimator, and perform parameter adjustment through grid search technology or three-fold cross-validation for hyperparameter optimization, the trained model generates ranked key features, quantifying its relative importance by assigning a weight to each variable; said "weight" represents the use of that feature to split the data in all trees The total number of times to measure the feature importance in XGBoost;

Feature analysis in the XGBoost model: Calculate standard data samples, and the statistical methods for screening relevant features are T test, Mann-Whitney U test, Kruskal-Wallis one-factor analysis of variance; among which T test, Mann-Whitney U test, Kruskal-Wallis one-factor analysis of variance is a commonly used method in statistics; the present invention uses the above statistical method and related software to calculate and obtain the probability value P. We set the P value less than 0.05, and it can be considered that the selected features are consistent with There is an extremely significant correlation between the ischemic mRS90 two-category targets, and it is reasonable to select these features to build a model; secondly, hierarchical clustering analysis was performed on the selected feature variables and all rehabilitation intervention-related information; evaluation of the use of hierarchical clustering Standard is 'enclidean', method selection

Ward’s method is implemented using the open source tool library seaborn;

Use the selected demographic and clinical characteristic information, all rehabilitation intervention-related characteristic information and mRS as input information for the first time to conduct a modeling experiment;

The modeling test refers to using four machine learning algorithms: The grid search method systematically performs automated hyperparameter tuning. During the grid search process, F1score is used as the model evaluation criterion, and 5-fold cross-validation is used to select the optimal model;

S6: Establish machine learning model 2—CNN-LSTM;

The convolutional neural network - CNN is used as the backbone network and the long short-term memory network model with forgetting gate - LSTM is combined to conduct time series modeling focusing on the patient's recovery at each time step and the development of mRS recovery.

The information used by the model includes demographic feature information and clinical feature information screened out by Intervention-related feature information is non-sequential information, and mRS scores are temporal information; the mRS scores include mRS-0, mRS-15, mRS-30, mRS-90, and mRS-180 scores;

Using the above information as input information, the network structure of the cascaded convolutional neural network and the recurrent neural network is constructed. In order to obtain the patient's non-sequential status information at each time step, first use stacked multiple layers of fully connected layers. The convolutional neural network performs feature aggregation and extraction on the non-sequential state information, and finally uses the sigmoid activation function as the score of the non-sequential state information;

Then apply CNN to stack multiple complete connection layer aggregation to extract feature discontinuous state information, and then use the above function to generate non-sequential state information;

Combine the generated scores with timing information and corresponding time step information and integrate them into the LSTM network;

Use LSTM model training to learn the mRS rehabilitation development changes of each patient;

Finally, the attention mechanism is used to perform weighted fusion of features across time steps, so that the mRS prediction at each time step is closer to the mRS at all time steps before the current time step;

S7: Establish machine learning model 3 - stimulus observation - key point selection;

Use the trained CNN time series model to conduct testing and evaluation under different mRS loss conditions to better explore the impact of mRS scores at each time step of the follow-up process on the patient's recovery;

The CNN time series model refers to the simulation modeling of the rehabilitation progress by focusing on the patient's recovery situation at each time step through learning and development, and changing the input of the model to obtain the rehabilitation progress under different circumstances;

Through the changes in the input of the above model, the impact of mRS scores at different time steps is compared; in this model, mRS-180 is used to represent the patient's final recovery status, and then mRS-15, mRS-30, and mRS-90 are used to represent the patient's final recovery status. Explore and analyze the impact of missing mRS scores in the time step;

S8: Compare results and select model;

After comparing the results of S5, S6 and S7, it was found that the CNN-LSTM model had the best specificity and sensitivity, and was judged to be a predictive model for functional prognosis after early recovery from ischemic stroke.

Compared with the prior art, the beneficial effects of the present invention are:

(1) The present invention uses a machine learning CNN-LSTM integrated algorithm to establish a prediction model for functional prognosis after early recovery from ischemic stroke, based on demographic information, laboratory and clinical examination-related information, drug and invasive treatment-related information, rehabilitation Intervention-related information accurately predicts mRS-90 outcomes in patients with ischemic stroke. The CNN-LSTM model performed well in predicting the functional prognosis of early recovered ischemic stroke patients and showed better prediction performance than four traditional algorithms. The AUCs of the CNN-LSTM model in the test set are 0.829 (mRS-15), 0.706 (mRS-30), 0.809 (mRS-90) and 0.730 (mRS-180) respectively. In addition, the information of mRS-15 and mRS-30 is a key feature for the CNN-LSTM model to improve the prediction performance of mRS-180.

(2) The machine learning-based prediction model of functional prognosis after early rehabilitation of ischemic stroke constructed by this invention - CNN-LSTM, can perform functional prediction in the early stage of stroke and provide accurate guidance for the formulation of subsequent rehabilitation training programs. , to better restore the functions of stroke patients, while saving medical resources and reducing unnecessary consumption of manpower and material resources.

Description of the drawings

Figure 1 is a flow chart of the method of model construction in the present invention.

Detailed ways

In order to further understand the purpose, structure, characteristics, and functions of the present invention, detailed descriptions are given below with reference to the embodiments.

1. Please refer to Figure 1. The present invention provides a method for establishing a functional prediction model after early recovery from stroke based on deep learning, which includes the following steps:

S1: Establish a data disease database;

Prepare medical record data, collect patient electronic medical records from the hospital electronic medical record platform, and collect electronic medical records of ischemic stroke patients undergoing early rehabilitation.

S2: Extract patient medical characteristic data;

Extract ischemic stroke medical features from the qualified electronic medical record data obtained in S1, and extract medical features and medical feature values; the ischemic stroke features include demographic information, laboratory and clinical examination-related information, and medications. and information related to invasive treatments, information related to rehabilitation interventions;

The past medical history also includes ischemic stroke, hemorrhagic stroke, subarachnoid hemorrhage, unclassified stroke, hypertension, diabetes, dyslipidemia, atrial fibrillation, coronary heart disease, myocardial infarction, congenital heart disease, and valvular heart disease. disease, other types of heart disease, and peripheral arterial disease; the family history includes stroke, coronary heart disease, hypertension, diabetes, dyslipidemia, and intracranial aneurysm.

The electrocardiogram detection includes atrial fibrillation, atrial flutter, left ventricular hypertrophy, Q wave, acute myocardial infarction, myocardial ischemia and others; the structural imaging examination results include hemorrhagic transformation after cerebral infarction, new cerebral infarction, old cerebral infarction and so on. Cerebral infarction and others.

The information related to the drugs and invasive treatments includes: intravenous thrombolysis, intravascular treatment, antiplatelet treatment within 48 hours, anticoagulant treatment within 48 hours, antihypertensive drugs, lipid-lowering drugs, and hypoglycemic drugs;

The intravascular treatment includes stent thrombectomy, direct thrombus aspiration, balloon dilatation, intravascular stent-assisted angioplasty, intra-arterial thrombolysis and mechanical thrombolysis; the antiplatelet treatment within 48 hours includes the use of aspirin, Clopidogrel, ozagrel, dipyridamole, ticlopidine, cilostazol and others; the anticoagulant treatment within 48 hours includes warfarin, rivaroxaban, dabigatran, apidogrel Saban, edoxaban, low molecular weight heparin, unfractionated heparin and others; the antihypertensive drugs include angiotensin-converting enzyme inhibitors, angiotensin receptor blockers, diuretics, beta-blockers, Calcium ion channel blockers and others; the lipid-lowering drugs include statins, niacin and its derivatives, fibrates, cholesterol absorption inhibitors and others; the hypoglycemic drugs also include insulin, sulfonylureas, biguanides classes, glycosidase inhibitors, insulin sensitizers, insulin secretion promoters and others.

The information related to rehabilitation intervention includes: the duration from onset to the first rehabilitation intervention, the duration from onset to the first mobilization, the benefit of early mobilization in the first rehabilitation intervention, the duration of early mobilization in the first rehabilitation intervention, 14 days Total duration of early mobilization, average duration of 14 days of early mobilization, duration of physical therapy, duration of occupational therapy, duration of speech therapy, first 14 days of continuous physical therapy, first 14 days of continuous physical therapy, first 14 days of continuous physical therapy Continuous Speech Therapy.

S3: Target result feature data extraction;

Take the post-stroke time score used to predict the target, and extract the mRS score at different time steps. The extracted time steps mainly include mRS-0 (baseline is the day of admission), mRS-15 (15 days after stroke), mRS-30 ( 30 days after stroke), mRS-90 (90 days after stroke), mRS-180 (180 days after stroke);

The dichotomous outcomes of time after stroke include: favorable outcome, time after stroke score of 0-2, indicating no or minimal disability; unfavorable outcome, time after stroke score of 3-6, indicating moderate or severe disability, or death. Through steps S1 to S3, big data on the clinical manifestations of ischemic stroke can be obtained.

S4: Feature data standardization and data cleaning;

Standardize the feature data on the big data on the clinical manifestations of ischemic stroke obtained in S4, and adopt a missing data strategy to exclude patients with more than 50% missing feature variables. The remaining missing feature data will be filled in under the mode of existing data for the same feature. , the missing values of continuous variables are filled with the mean, and the missing values of the categorical variables are filled with the mode; all data are standardized so that their mean and unit variance are zero.

S5: Establish machine learning model 1—XGBoost;

Input demographic information, laboratory and clinical examination-related information, drug and invasive treatment-related information, and rehabilitation intervention-related information into the XGBoost model for mRS90 binary prediction;

Feature analysis in the XGBoost model: Calculate standard data samples, and the statistical methods for screening relevant features are T test, Mann-Whitney U test, Kruskal-Wallis one-factor analysis of variance; among which T test, Mann-Whitney U test, Kruskal-Wallis one-factor analysis of variance is a commonly used method in statistics; the present invention uses the above statistical method and related software to calculate and obtain the probability value P. We set the P value less than 0.05, and it can be considered that the selected features are consistent with There is an extremely significant correlation between the ischemic mRS90 two-category targets, and it is reasonable to select these features to build a model; secondly, hierarchical clustering analysis was performed on the selected feature variables and all rehabilitation intervention-related information; evaluation of the use of hierarchical clustering The standard is 'enclidean', the method is Ward's method, and the specific implementation uses the open source tool library seaborn;

The characteristic information filtered in the above steps includes the following:

Demographic and clinical information: smoking age, antidiabetic drugs (biguanides), number of cigarettes smoked per day, past medical history (diabetes), antiplatelet treatment within 48 hours, anticoagulant treatment within 48 hours (other), drinking history, family History (stroke), carotid artery vascular examination (carotid bulbar artery stenosis), family history (hypertension), occupation, duration of diabetes (years), family history (coronary heart disease), systolic blood pressure, whether it is the first cerebrovascular accident, imaging (new cerebral infarction), education, swallowing function assessment, anticoagulation treatment within 48 hours, resultant imaging examination results (old cerebral infarction), lipoprotein (a), resultant imaging examination results (other), past history ( Hypertension), duration of hypertension (years), antihypertensive drugs (angiotensin-converting enzyme inhibitors), heart rate, gender, lipid-lowering drugs, age, triglycerides, OSCP classification, partial thromboplastin time, total cholesterol , weight, prothrombin time-international normalized ratio, homocysteine, high-density lipoprotein cholesterol, glycosylated hemoglobin, Kubota water test, diastolic blood pressure, right internal carotid artery intracranial stenosis, internal carotid artery stenosis, height, TOAST classification, BMI, low-density lipoprotein cholesterol;

Rehabilitation intervention related information: length of time from onset to first rehabilitation treatment (hours), time from onset to first out of bed (hours), whether out of bed was completed during the first rehabilitation treatment, time to maintain out of bed state during first out of bed rehabilitation treatment (minutes) , Total time out of bed in 14 days, Average time out of bed in 14 days (minutes), Physical therapy duration (days), Occupational therapy duration (days), Speech therapy duration (days), 14 days of continuous physical therapy, 14 days of continuous Physical therapy, continuous speech therapy for 14 days;

Time step information (days): The value is {0, 15, 30, 90, 180}.

S6: Establish machine learning model 2—CNN-LSTM;

Using the convolutional neural network - CNN as the backbone network and the long short-term memory network model with forget gate - LSTM, we focus on the patient's recovery at each time step and conduct time series modeling on the mRS recovery development.

The loss function formula used above is as follows:

L _total =θL _mse +(1-θ)L _fn

where L _total represents the overall loss, Lmse represents the weighted mean square error loss function, and L _fn represents the weighted focal loss loss function (focal loss can effectively alleviate the complex category imbalance problem between multiple time steps and perform difficult sample mining ), mse represents the mean square error loss function (which can make the prediction performance at different time steps more balanced), W _mask represents the weighting coefficient, θ represents the weighting factor, the value range is [0,1], and the value of this model is 0.25; P represents the prediction result, p' represents the label, N represents the number of samples, α is the weighting factor, the value range is [0,1], this model takes the value 1; γ represents the focusing parameter, the value range is generally [0, 5], the value of this model is 2.

Use the already trained time-series model to conduct testing and evaluation under different mRS loss conditions to better explore the impact of mRS scores on the patient's recovery at each time step of the follow-up process;

The time series model refers to the simulation modeling of the rehabilitation progress by focusing on the patient's recovery situation at each time step through learning and development, and changing the input of the model to obtain the rehabilitation progress under different circumstances;

Through the changes in the input of the above model, the impact of mRS scores at different time steps is compared; in this model, mRS-180 is used to represent the patient's final recovery status, and then mRS-15, mRS-30, and mRS-90 are used to represent the patient's final recovery status. The impact of missing mRS scores was explored and analyzed in time steps.

S8: Compare results and select model;

The present invention has been described by the above-mentioned relevant embodiments, but the above-mentioned embodiments are only examples of implementing the present invention. It must be noted that the disclosed embodiments do not limit the scope of the present invention. On the contrary, any changes and modifications made without departing from the spirit and scope of the present invention shall fall within the scope of patent protection of the present invention.

Claims

A deep learning-based method for establishing a functional prediction model after early recovery from stroke, characterized in that: the model establishment method includes the following steps:

S1: Establish a data disease database;

Prepare medical record data, collect patient electronic medical records from the hospital electronic medical record platform, and collect electronic medical records of ischemic stroke patients undergoing early rehabilitation; use the medical record data of the first case diagnosed with ischemic stroke as qualified electronic medical record data. ;

S2: Extract patient medical characteristic data;

Extract ischemic stroke medical features from the qualified electronic medical record data obtained in S1, and extract medical features and medical feature values; the ischemic stroke features include demographic information, laboratory and clinical examination-related information, and medications. Information related to invasive treatment and rehabilitation intervention; used as material for prediction;

S3: Target result feature data extraction;

Extract post-stroke scores at different time steps. The extracted time steps mainly include the day of admission, 15 days after stroke, 30 days after stroke, 90 days after stroke, and 180 days after stroke; extract the post-stroke time score used to predict the target ,

The two-category results of time after stroke include: a favorable outcome is a time after stroke score of 0-2, an unfavorable outcome is a time after stroke score of 3-6, which may be moderate or severe disability, or death; the missing results can be obtained through steps S1 to S3. Big data on clinical manifestations of hemorrhagic stroke;

S4: Feature data standardization and data cleaning;

Standardize the feature data on the big data on the clinical manifestations of ischemic stroke obtained in S3, and adopt a missing data strategy to exclude patients with more than 50% missing feature variables. The remaining missing feature data will be filled in under the mode of existing data for the same feature. , the missing values of continuous variables are filled with the mean, and the missing values of the categorical variables are filled with the mode; all data are standardized so that their mean and unit variance are zero;

S5: Establish machine learning model 1—XGBoost;

Input the demographic information, laboratory and clinical examination-related information, drug and invasive treatment-related information, and rehabilitation intervention-related information extracted in step S2 into the XGBoost model for mRS90 binary prediction;

The XGBoost model includes an XGBoost decision tree and the relationship between the XGBoost decision trees; the XGBoost decision tree includes multiple nodes; the nodes are medical features and thresholds; the relationship between the XGBoost decision trees is a gradient descent optimization algorithm , the latter decision tree is obtained from the previous decision tree according to the gradient descent optimization algorithm; feature screening and feature analysis are performed on the XGBoost model; feature variables are screened out and modeled using XGBoost, and finally an XGBoost is established Predictive models;

S6: Establish machine learning model 2—CNN-LSTM;

The convolutional neural network CNN is used as the backbone network and combined with the long short-term memory network model with forgetting gate - LSTM, focusing on the patient's recovery at each time step, as well as the mRS recovery development situation for time-series modeling.

The information used by the CNN-LSTM model includes demographic feature information and clinical feature information filtered by XGBoost, all rehabilitation intervention-related feature information, mRS scores and corresponding time step information; among them, demographic feature information and clinical feature information , All the characteristic information related to rehabilitation intervention belongs to non-sequential information, and the mRS score belongs to temporal information; the mRS score includes the scores of mRS-0, mRS-15, mRS-30, mRS-90, and mRS-180;

Using the above demographic characteristic information and clinical characteristic information, all rehabilitation intervention-related characteristic information, mRS scores and corresponding time step information as input information, the network structure of the cascade convolutional neural network and the recurrent neural network is constructed. In order to allow each The patient's non-sequential state information can be obtained at every time step. First, a convolutional neural network stacked with multiple fully connected layers is used to aggregate and extract features of the non-sequential state information, and finally the sigmoid activation function is used as the score of the non-sequential state information; Then apply CNN to stack multiple complete connection layer aggregation to extract feature discontinuous state information. Secondly, use the sigmoid activation function mentioned above to generate non-sequential state information; combine the generated scores with temporal information and corresponding time step information. , integrated into the LSTM network; LSTM model training is used to learn the mRS rehabilitation development changes of each patient; finally, the attention mechanism is used to perform weighted fusion of features across time steps, so that the mRS prediction at each time step is closer to all the mRS predictions before the current time step. mRS at time step;

Step S6 is to use the feature variables screened out in step S5 to establish a CNN-LSTM prediction model by combining the convolutional neural network - CNN as the backbone network with the long short-term memory network model with forget gate - LSTM. ;

S7: Establish machine learning model 3 - stimulus observation - key point selection;

Use the CNN-LSTM model that has been trained in step S6 to conduct testing and evaluation under different mRS loss conditions to better explore the impact of mRS scores on the patient's recovery at each time step of the follow-up process;

The CNN-LSTM model refers to the simulation modeling of the rehabilitation progress by focusing on the patient's recovery situation at each time step through learning and development, and changing the input of the model to obtain the rehabilitation progress under different circumstances;

Through changes in the input of the CNN-LSTM model, the impact of mRS scores at different time steps is compared; in this model, mRS-180 is used to represent the patient's final recovery status, and then mRS-15, mRS-30, and mRS-90 are used to represent the patient's final recovery status. Explore and analyze the impact of missing mRS scores in three time steps;

Step 7 is to repeat the modeling process of step 6 several times, but reduce one variable on the basis of step 6 each time (reduce the mRS score at one time step each time), compare the results of different modeling, and obtain a prediction The best performing model;

S8: Compare results and select model;

After comparing the results of S5, S6 and S, it was found that the CNN-LSTM model had the best specificity and sensitivity, and was judged to be a predictive model for functional prognosis after early recovery from ischemic stroke.
A method for establishing a functional prediction model after early recovery from stroke based on deep learning according to claim 1, characterized in that: in the step S2, the medical characteristic values are demographics, laboratory and clinical examinations, drugs and the specific values of each medical characteristic in the characteristics of invasive treatment and rehabilitation intervention; the demographic information includes gender, age, occupation, marital status, education, height, weight, BMI, systolic blood pressure, diastolic blood pressure, heart rate, whether it is the first time Cerebrovascular accident, TOAST classification, OCSP classification, past history, hypertension duration, diabetes duration, smoking status, smoking age, number of cigarettes smoked per day, smoking index, drinking history, regular physical activities and family history; the experiment Laboratory and clinical examination-related information includes glycosylated hemoglobin, triglycerides, total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, lipoprotein A, homocysteine, partial thromboplastin time, prothrombin time - International normalized ratio, electrocardiogram, structural imaging examination results, common carotid artery stenosis, carotid bulb stenosis, internal carotid artery stenosis, subclavian artery stenosis, left internal carotid artery intracranial stenosis, left anterior cerebral artery stenosis, left brain Middle artery stenosis, left posterior cerebral artery stenosis, left vertebral artery stenosis, right internal carotid artery intracranial stenosis, right anterior cerebral artery stenosis, right middle cerebral artery stenosis, right posterior cerebral artery stenosis, right vertebral artery stenosis, vertebrobasilar artery stenosis , swallowing function assessment, Kubota drinking test; the information related to the drugs and invasive treatments includes: intravenous thrombolysis, endovascular treatment, antiplatelet treatment within 48 hours, anticoagulant treatment within 48 hours, antihypertensive drugs, lipid-lowering drugs , hypoglycemic drugs; the rehabilitation intervention related information includes the duration from onset to the first rehabilitation intervention, the duration from onset to the first mobilization, the benefits of early mobilization in the first rehabilitation intervention, early mobilization in the first rehabilitation intervention Duration, total duration of 14 days of early mobilization, average duration of 14 days of early mobilization, duration of physical therapy, duration of occupational therapy, duration of speech therapy, first 14 days of continuous physical therapy, first 14 days of continuous physical therapy, first Continuous speech therapy was performed for 14 days.
A method for establishing a functional prediction model after early recovery from stroke based on deep learning as claimed in claim 1, characterized in that: the XGBoost model feature screening in step S5 refers to using XGBoost to automatically find the most relevant features. mRS90 binary classification of target results; use the initial features on the development set to train the estimator, and perform three-fold cross-validation of parameter adjustment or hyperparameter optimization through grid search technology. The trained model generates key features for ranking , quantifying its relative importance by assigning a weight to each variable; the XGBoost model feature analysis refers to the calculation of standard data samples, and the statistical methods for screening relevant features are T test, Mann-Whitney U test, Kruskal- Wallis single-factor variance analysis; secondly, hierarchical cluster analysis was performed on the selected characteristic variables and all rehabilitation intervention-related information; the evaluation standard used for the hierarchical clustering was 'enclidean', the method was Ward's method, and the open source tool library was used for the specific implementation seaborn; uses the selected demographic feature information and clinical feature information, all rehabilitation intervention-related feature information and mRS as input information for the first time to conduct modeling experiments.
A method for establishing a functional prediction model after early recovery from stroke based on deep learning as claimed in claim 3, characterized in that: the modeling test refers to using XGBoost, SVM, random forest (RF), Logistic Regression (LR ) Four machine learning algorithms are modeled in the Develop Set. During the modeling process, each machine learning method uses the grid search method to systematically perform automated hyperparameter tuning. The F1 score is used as the grid search process. The model evaluation standard uses 5-fold cross-validation to select the optimal model.