CN117727458A

CN117727458A - BEFAST stroke screening system and method based on online learning

Info

Publication number: CN117727458A
Application number: CN202410174998.0A
Authority: CN
Inventors: 许杰; 王拥军; 冯致远; 薛婧; 缪中荣; 孙瑄; 万俊豪; 王博; 孙文
Original assignee: Beijing Idmed Decision Medical Technology Co ltd; Beidou Yunfang Beijing Health Technology Co ltd; Beijing Tiantan Hospital
Current assignee: Beijing Idmed Decision Medical Technology Co ltd; Beidou Yunfang Beijing Health Technology Co ltd; Beijing Tiantan Hospital
Priority date: 2024-02-07
Filing date: 2024-02-07
Publication date: 2024-03-19
Anticipated expiration: 2044-02-07

Abstract

A BEFAST stroke screening system and method based on online learning.A historical database, a shrinkage-limited data set module, a weight distribution module and a model construction module are configured in the system, wherein the shrinkage-limited data set module screens each patient with a similar index distance to a designated patient in the historical database to form a historical data subset, at the same time, the weight distribution module classifies decision trees based on the historical data subset and calculates weights of all BEFAST data, the model construction module constructs a screening model aiming at a specific patient based on the historical data subset and the weights of all BEFAST data, and in the subsequent iteration, the weights calculated by the previous weight distribution module participate in the construction of the historical data subset at the next time to influence the calculation of the weights at the next time and the construction of the screening model, so that the screening model is continuously optimized in an iterative manner, and the effect of BEFAST multi-mode data acquisition and stroke screening is more accurate with the passage of time.

Description

BEFAST stroke screening system and method based on online learning

Technical Field

The invention relates to a BEFAST stroke screening system and method based on online learning.

Background

Stroke screening is the identification of whether a patient is at risk for stroke or has symptoms of early stroke by different medical techniques and methods. When stroke risk factor screening is performed, common technical methods include investigation of lifestyle (smoking, drinking), monitoring of physiological parameters (e.g., measurement of blood pressure, blood sugar, cholesterol levels), head and neck vascular ultrasound examination, heart structure and rhythm related examination, and the like. Regarding the recognition of early symptoms of stroke, a series of scales, including CPSS, ROSIER, LAPSS, are developed at home and abroad. The above techniques and tools are widely used to assist doctors in estimating the risk of a patient's stroke and in rapidly detecting symptoms of stroke. Neuroimaging (MRI, CT scan) is often used to aid diagnosis in cases where stroke is highly suspected.

However, existing stroke screening methods have some drawbacks. First, these methods are often decentralized, requiring cooperation between multiple tests and medical professionals, resulting in cumbersome and time consuming screening procedures; second, these methods often require a large amount of subjective judgment, which can lead to inconsistencies and errors in the analysis; in addition, these methods often perform one-time evaluation based on current data of the doctor, and cannot continuously evaluate the risk of the individual by combining the past data. Accordingly, there is a need for more efficient, automated and accurate stroke screening methods to improve early detection and detection of stroke.

Disclosure of Invention

The invention provides a BEFAST stroke screening system and method based on online learning, which effectively solve the problems existing in the prior art.

Specifically, the invention provides a BEFAST stroke screening system based on online learning, which can screen a specific patient every unit time from the 0 th moment, and comprises a historical database, a limited data set module, a weight distribution module and a model construction module, wherein the historical database stores data of BEFAST data of N patients, the BEFAST data of the specific patient from the 0 th moment to the j th moment are respectively balanced indexes B0 to Bj, eye indexes E0 to Ej, face indexes F0 to Fj, arm indexes A0 to Aj and language indexes S0 to Sj, the BEFAST data of each patient from the 0 th moment to the j th moment in the historical database are respectively balanced indexes B0 to Bj, eye indexes E0 to Ej, face indexes F0 to Fj, arm indexes A0 to Aj and language indexes S0 to Sj, and the limited data set module calculates the distance between each patient and the specific patient at the 0 th moment：

Setting a distance threshold MaxL, wherein the BEFAST data of any patient in the history database only meets the following conditionsAnd then, the weight distribution module builds a0 th level decision tree by taking the 0 th historical data subset M0 as a training data set to execute weight vector calculation, wherein light difficulty, medium difficulty and heavy difficulty corresponding to any index in BEFAST data are taken as root node classification, stroke risk is taken as leaf node classification, five 0 th level decision trees are respectively formed, thus, the gain coefficients gB0, gE0, gF0, gA0 and gS0 of the five BEFAST data are respectively calculated as weights of the five BEFAST data at the 0 th moment, the weights are combined to form a weight vector g0, and then, a model construction module constructs a screening model aiming at the specific patient through supervised learning, and the function expression is as follows: y=f (M0, g0, θ), where Y represents the model output, i.e., predicted risk of stroke for a particular patient, θ represents the model building parameters, the 0 th historical data subset M0 and the weight vector g0 are taken as inputs to the model, whereby the model building parameters θ are determined by calculating θ=argmin (Y0, f (M0, g0, θ)), where Y0 represents the actual risk of stroke at time 0 and f (M0, g0, θ) is the predicted risk of stroke, and at time 1 the shrinkage limiting data set module introduces the weight vector g0 at time 0 to calculate the exponential distance of each patient from the particular patient at time 1 in the historical database>：

，

The reduced data set module requires that only the BEFAST data of any patient in the history database is satisfiedMaxL or less can be introduced into the 1 st historical data subset M1, and the weight distribution module establishes the 1 st historical data subset M1 as a training data setThe level 1 decision tree performs weight vector computation to obtain a weight vector g1 at time 1, and then the model building module inputs g1 and M1 into the function to iteratively determine θ=argmin (y 1, f (M1, g1, θ)), where y1 represents the actual risk of stroke at time 1 and f (M1, g1, θ) is the predicted risk of stroke.

Further, as time progresses, the integer j starts to be valued from 2, and the integer j is valued from 0 th moment, 1 st moment to j th moment, a shrinkage limiting data set module is called, and the exponential distance between each patient and a specific patient in the history database at the j th moment is calculatedThe calculation formula is as follows:

wherein gB _j-1 、gE _j-1 、gF _j-1 、gA _j-1 、gS _j-1 The weights of five indexes, namely, a balance index B, an eye index E, a face index F, an arm index A and a language index S, are calculated by the weight distribution module at the j-1 th moment.

Still further, the reduced dataset module requires that only the BEFAST data of any patient in the historical database be satisfiedAnd the model construction module inputs gj and Mj into the function to iteratively determine θ=argmin (yj, f (Mj, gj, θ)), wherein yj represents the actual risk probability of the stroke at the j-th moment, and f (Mj, gj, θ) represents the predicted risk probability of the stroke.

Optionally, the integer j is finally valued to 200, or is finally valued to 500, or is finally valued to 1000.

Further, when any one of the level 1 decision tree to the level j decision tree is calculated, the any one of the level 1 decision tree to the level j decision tree is counted as a p-th decision tree, mild difficulty, moderate difficulty and severe difficulty corresponding to any one of the balance index B, the eye index E, the face index F, the arm index a and the language index S are used as root node classification, stroke risk is used as leaf node classification, five level 0 decision trees are respectively formed, and thus gain coefficients gBp, gEp, gFp, gAp, gSp of the balance index B, the eye index E, the face index F, the arm index a and the language index S are calculated, and the empirical entropy calculation formula of the p-th decision tree is as follows:

wherein k=3, meaning that all patients in the p-th historical data subset Mp can be classified into three basic categories of low risk of stroke, medium risk of stroke, and high risk of stroke, D is the total number of samples of the training data set, i.e., the number of patients in the p-th historical data subset Mp, C _K For the corresponding sample number under each basic major class, each index of the balance index B, the eye index E, the face index F, the arm index A and the language index S is collectively called as an index Z, and then mild difficulty, moderate difficulty and severe difficulty corresponding to the index Z are used as root nodes for classification, so that empirical condition entropy is introduced, and a calculation formula is as follows:

where H (d|z) represents the empirical conditional entropy under the index Z as the root node classification, di represents the number of patients in each of the 3 cases of mild difficulty, moderate difficulty, severe difficulty in which the training data set is classified as the index Z of the root node, dik represents the number of patients in each of the three cases of low risk in stroke, medium risk in stroke, high risk in stroke under the root node classification, whereby n=3, and the gain factor gZp of the decision tree under the index Z under the p-th level decision tree is found: gZp =h (D) -H (d|z), that is, gain coefficients gBp, gEp, gFp, gAp, gSp of the balance index B, the eye index E, the face index F, the arm index a, and the language index S at the p-th time are obtained as weights of the five best data at the p-th time, and are combined to form a weight vector gp.

Alternatively, the unit time is 1 hour, or 12 hours, or 1 day, or 1 month.

Optionally, N >100.

The invention also provides a BEFAST stroke screening method based on online learning, which is executed by the system.

In summary, the invention provides a system and a method for screening a BEFAST stroke based on online learning, the whole system is configured with a historical database, a limited data set module, a weight distribution module and a model construction module, wherein the limited data set module is used for screening each patient with a similar index distance to a specified patient in the historical database to form a historical data subset, at the same moment, the weight distribution module carries out decision tree classification based on the historical data subset to calculate the weight of each BEFAST data, then the model construction module is used for constructing a screening model for the specific patient based on the historical data subset and the weight of each BEFAST data, and in the subsequent iteration, the weight calculated by the previous weight distribution module participates in the construction of the historical data subset at the next moment, and further influences the weight calculation and the construction of the screening model at the next moment, so that the screening model is continuously optimized in the continuous iteration, and the effect of BEFAST multi-mode data acquisition and stroke screening of the invention is more accurate along with the time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following discussion will discuss the embodiments or the drawings required in the description of the prior art, and it is obvious that the technical solutions described in connection with the drawings are only some embodiments of the present invention, and that other embodiments and drawings thereof can be obtained according to the embodiments shown in the drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates an exemplary schematic diagram of decision tree classification (exemplified by a balance index) by a weight distribution module in a multi-modal stroke screening system according to the present invention;

fig. 2 shows an iterative evolutionary diagram of a fast multi-modality data acquisition and stroke screening system based on online learning according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made in detail with reference to the accompanying drawings, wherein it is apparent that the embodiments described are only some, but not all embodiments of the present invention. All other embodiments, which can be made by a person of ordinary skill in the art without the need for inventive faculty, are within the scope of the invention, based on the embodiments described in the present invention.

This patent aims at providing an innovative on-line learning based BEFAST stroke screening system and method to overcome the limitations of the prior art and achieve more efficient, automated, accurate and real-time stroke screening and monitoring. The innovative system aims at integrating different types of medical data, including image data, sound data, clinical text data and the like, and particularly, the relevant data of the embodiment balance (B), eyes (E), faces (F), arms (A) and languages (S) related to stroke screening, and can identify stroke risk factors and early stroke symptoms of a patient in real time and early warning (T) through online learning and intelligent algorithms. By the method, a more intelligent and efficient early screening tool for stroke patients can be provided for doctors and patients to better prevent and manage the risk of stroke.

It should be noted that the five data of balance (B), eyes (E), face (F), arms (a), and language (S) are often of different importance to each other for different people, and the present invention aims to build a system model for real-time stroke screening, and thus, before building the model, the present invention considers that the five data are given respective weights. In addition, it should be noted that, of the six elements of the above-mentioned BEFAST, the time (T) is taken into consideration in the present invention, mainly considering the trend of the data of the above-mentioned five aspects over time.

The data of the above five aspects are roughly classified medically as: "B" -Balance refers to a loss of Balance, balance or coordination ability, and sudden difficulty in walking; "E" -Eyes refers to Eyes, sudden vision changes, and difficulty in looking; "F" -Face refers to Face, face asymmetry, facial distortion; "A" -Arms refers to the arm, which suddenly has no sense of force or numbness, usually on one side of the body; "S" -Specch refers to a language in which the Speech is ambiguous and cannot be understood by others.

The stroke screening system of the present invention first needs to build a historical database containing the trend of N patients' changes in balance index B, eye index E, face index F, arm index a, and language index S over time (T).

It should be noted that the balance index B, the eye index E, the face index F, the arm index a, and the language index S can be quantified. For example, the eye index (E), as described above, characterizes the difficulty of the vision, and the eye index can be classified into a scale of 1-10 according to the difficulty level of the vision, wherein a scale of 1-3 indicates the difficulty of the vision, a scale of 4-7 indicates the difficulty of the vision, and a scale of 8-10 indicates the difficulty of the vision.

Similarly, other indices, such as balance index B, face index F, arm index a, and language index S, may be equally divided into multiple stages according to the degree of difficulty, and thus, light difficulty, moderate difficulty, and heavy difficulty may be divided among the multiple stages.

For example, the BEFAST data for a particular patient contained in the database is as follows:

the history database is set to have N patients' BEFAST data stored (in practice, N is likely a very large integer, often greater than 100). The screening system now screens every unit time for a particular patient. The BEFAST data of the specific patient at the initial 0 th moment T0 are respectively a balance index B0, an eye index E0, a face index F0, an arm index A0 and a language index S0.

It should be noted that the "unit time" in the "unit time" may be 1 hour, 12 hours, 1 day, 1 month, etc., depending on the actual situation, for example, the "unit time" in the above table is 1 day.

At this time, the screening system calls a limited data set module therein, and at time T0, screens the 0 th historical data subset M0 from the historical database according to the exponential distance.

As described above, the history database contains the best data of N patients, where the best data of each patient at time 0 is the balance index b0, the eye index e0, the face index f0, the arm index a0, and the language index s0, respectively.

It should be noted that time 0 of a patient in the history database refers to the initial time of its own treatment period, and is not in real time agreement with time 0T 0 of the particular patient to be examined. For example, for the particular patient to be examined, zhang San, who begins with stroke screening at 1 st 2024, time 0 for Zhang Sanis 1 st 2024. One of the patients in the history database, lifour, was historically screened for stroke starting on day 4 of 3 months 2023, and time 0 for Lifour was day 4 of 3 months 2023. And so on, in this case, time 1 for Zhang three is 2024, 1 month and 2 days, and time 1 for Lifour is 2023, 3 months and 5 days, and so on, time 2, 3, and so on.

Exponential distance of each patient from the particular patient at time 0The calculation formula is as follows:

setting a distance threshold MaxL and making an exponential distanceComparing if any of the history databasesAn exponential distance ∈of a patient>MaxL, the BEFAST data of any of the patients is introduced into the 0 th historical data subset M0, and vice versa, if>MaxL, then the patient data for any patient will not enter the 0 th historical data subset M0.

Thus, the 0 th historical data subset M0 will contain BEFAST data for M0 patients, where M0.ltoreq.N.

At this point, the operation of the limited dataset module ceases. The screening system then invokes the weight distribution module therein for calculating the weights of five indices, balance index B, eye index E, face index F, arm index a, language index S, at time 0.

In the weight distribution module, a0 th level decision tree is established, the decision tree takes a0 th historical data subset M0 as a training data set, mild difficulty, moderate difficulty and severe difficulty corresponding to any one of a balance index B, an eye index E, a face index F, an arm index A and a language index S (collectively called BEFAST data) are used as root node classification, stroke risks are used as leaf node classification, five 0 th level decision trees are respectively formed, and accordingly gain coefficients gB0, gE0, gF0, gA0 and gS0 of the balance index B, the eye index E, the face index F, the arm index A and the language index S are calculated. For example, as shown in fig. 1.

Fig. 1 shows an exemplary schematic diagram of decision tree classification (exemplified by a balance index) by a weight distribution module in a multi-modal stroke screening system according to the present invention.

The calculation of the gain factor described above will be described in detail below.

If the decision tree classification is not considered, the class 0 decision tree empirical entropy calculation formula is:

irrespective of the decision tree classification, k=3 means that all patients in the 0 th historical data subset M0 can be classified into three basic categories of low risk of stroke, and high risk of stroke.

D in this formula is the total number of samples of the training data set, i.e. the number M0 of patients in the 0 th historical data subset M0.C (C) _K For the corresponding number of samples under each basic category, e.g., m0=10 of patients in the 0 th historical data subset M0, wherein 4 patients with low risk of stroke, 3 patients with high risk of stroke, then C ₁ =4，C ₂ =3，C ₃ =3。

For convenience of description, each index of the balance index B, the eye index E, the face index F, the arm index a and the language index S is collectively referred to as an index Z, and then mild difficulty, moderate difficulty and severe difficulty corresponding to the index Z are used as root nodes for classification, so that empirical conditional entropy is introduced, and a calculation formula is as follows:

where H (D|Z) represents the entropy of the empirical condition under the classification of index Z as the root node, di represents the number of patients in each of the 3 cases of mild difficulty, moderate difficulty, severe difficulty in which the training dataset was classified as index Z of the root node, and Dik represents the number of patients in each of the three cases of low risk of stroke, medium risk of stroke, and high risk of stroke under the classification of the root node. Thus, in the above formula, n=3, k=3.

Then, the gain factor gZ of the decision tree at the index Z can be calculated:

gZ=H（D）-H（D|Z）

as described above, the index Z is a generalized index of each of the balance index B, the eye index E, the face index F, the arm index a, and the language index S.

Therefore, by using the above calculation procedure, the gain coefficients gB0, gE0, gF0, gA0, gS0, and gS0 of the balance index B, the eye index E, the face index F, the arm index a, and the language index S at the 0 th time can be obtained.

The gain factors gB0, gE0, gF0, gA0, gS0 represent the correlation between the balance index B, the eye index E, the face index F, the arm index a, and the language index S and the risk of stroke of the patient, and thus the five indices, i.e., the balance index B, the eye index E, the face index F, the arm index a, and the language index S, are weighted at the time 0.

Thus, the operation of the weight allocation module also pauses a paragraph. The screening system then invokes its own model building module for building a screening model for the particular patient at time 0.

The model construction module constructs a screening model for the particular patient through supervised learning. This screening model may be constructed, for example, by convolving a neural network (CNN).

The constructed screening model is expressed as a function:

Y=f（M0，g0，θ）

where Y expresses the output of the model, i.e. the predicted risk of stroke for a particular patient, M0 as a test input of the model represents data in the 0 th historical data subset as described above, g0 also represents a weight vector formed by gain coefficients gB0, gE0, gF0, gA0, gS0 at time 0 as described above as a parallel test input of the model, θ representing the model building parameters.

With test data M0 and g0 as inputs, the model build parameter θ is determined by the following formula:

θ=argmin（y0，f（M0，g0，θ））

where y0 represents the actual risk of stroke at time 0, and f (M0, g0, θ) is the predicted risk of stroke.

Under this model, stroke risk at time 0 for a particular patient can be screened. Thus, the stroke risk screening for a particular patient at time 0 ends.

Next, the screening system will screen for risk of stroke for the particular patient at time 1.

A limited data set module is invoked,calculating the index distance of each patient from the specific patient in the historical database at time 1The calculation formula is as follows:

the BEFAST data of each patient at the 1 st time t1 are respectively a balance index B1, an eye index E1, a face index F1, an arm index A1 and a language index S1, and the BEFAST data of the specific patient at the 1 st time t1 are respectively a balance index B1, an eye index E1, a face index F1, an arm index A1 and a language index S1.

By comparing the index distance formulas at the 1 st moment and the 0 th moment, it can be seen that the two formulas have great difference, and most importantly, the index distance formula at the 1 st moment introduces the weight at the 0 th moment, so that the index with larger weight in the five indexes plays a larger role in calculating the index distance, and the index distance formula is more accurate through iteration.

Further, the reduced data set module builds the 1 st historical data subset M1 on the basis of the exponential distance.

If the index distance of any patient in the history databaseMaxL, the BEFAST data of said any patient is introduced into the 1 st historical data subset M1, whereas if +.>>MaxL, then the patient data for any patient will not enter the 1 st historical data subset M1. Thus, the 1 st historical data subset M1 will contain BEFAST data for M1 patients, where M1.ltoreq.N.

Next, in the weight distribution module, a level 1 decision tree is established, the decision tree uses the 1 st historical data subset M1 as a training data set, uses mild difficulty, moderate difficulty and severe difficulty corresponding to any one of the balance index B, the eye index E, the face index F, the arm index a and the language index S as a root node classification, uses stroke risk as a leaf node classification, and forms five level 1 decision trees respectively, thereby calculating gain coefficients gB1, gE1, gF1, gA1 and gS1 of the balance index B, the eye index E, the face index F, the arm index a and the language index S respectively, and can be regarded as weights of the five indexes of the balance index B, the eye index E, the face index F, the arm index a and the language index S at the time 1 to be combined into a weight vector g1 at the time 1.

And then calling a screening model in the model construction module, and expressing the screening model as a function: y=f (M1, g1, θ).

Further optimizing the model parameters θ=argmin (y 1, f (M1, g1, θ)), where y1 represents the actual risk of stroke at time 1 and f (M1, g1, θ) is the predicted risk of stroke.

Through the further iteration at the 1 st moment, the screening model is more accurate, and the 1 st historical data subset M1, the weight vector g1 and the optimized model constructed at the 1 st moment are based on the weight vector g0 at the 0 th moment, so that the further optimization of the model is realized in the further iteration process at the 1 st moment.

And so on, along with the evolution of time, the integer j starts to take a value from 2, and takes a value from the 0 th moment and the 1 st moment to the j th moment. In practice, it is possible to consider that j always takes a value of up to 200, 500 or even 1000.

Then at the j-th moment, a limited data set module is called to calculate the index distance between each patient and the specific patient in the historical databaseThe calculation formula is as follows:

wherein gB _j-1 、gE _j-1 、gF _j-1 、gA _j-1 、gS _j-1 For j-1 moment by weight distribution moduleThe calculated five indexes of balance index B, eye index E, face index F, arm index A and language index S are weighted. Bj. Ej, fj, aj, sj the corresponding balance index, eye index, face index, arm index, and language index of each patient at the j-th moment, bj, ej, fj, aj, sj the corresponding balance index, eye index, face index, arm index, and language index of the specific patient at the j-th moment.

In other words, at the j-th time, the index distance between the specific patient at the j-th time and each patient in the j-th historical data subset Mj is calculated by introducing the weight of each index calculated by the weight distribution module at the j-1-th time into the index distance calculation formula。

Further, the reduced dataset module is at an exponential distanceOn the basis of which a j-th historical data subset Mj is constructed.

If the index distance of any patient in the history databaseMaxL, the BEFAST data of said any patient is introduced into the j-th historical data subset Mj, whereas if +.>>MaxL, the patient data for any patient will not enter the j-th historical data subset Mj. Thus, the j-th historical data subset Mj will contain BEFAST data for Mj patients, where mj.ltoreq.N.

Next, in the weight distribution module, a j-th level decision tree is established, the decision tree uses a j-th historical data subset Mj as a training data set, uses mild difficulty, moderate difficulty and severe difficulty corresponding to any index of the balance index B, the eye index E, the face index F, the arm index a and the language index S as a root node classification, uses stroke risk as a leaf node classification, and forms five j-th level decision trees respectively, thereby calculating gain coefficients gBj, gEj, gFj, gAj, gSj of the balance index B, the eye index E, the face index F, the arm index a and the language index S respectively, and can be regarded as weights of the five indexes of the balance index B, the eye index E, the face index F, the arm index a and the language index S at j time to be combined into a weight vector gj at j time.

And then calling a screening model in the model construction module, and expressing the screening model as a function: y=f (Mj, gj, θ).

Further optimizing the model parameters θ=argmin (yj, f (Mj, gj, θ)), where yj represents the actual risk of stroke at the j-th moment, and f (Mj, gj, θ) is the predicted risk of stroke.

Therefore, through j iterations from the 1 st time to the j th time, the screening model is more accurate, the model is more practical as the iteration is carried out later, the j-th historical data subset Mj constructed at the j-th time, the weight vector gj and the optimized model are all based on the weight vector at the j-1 time, and the model is gradually optimized along with gradual progress of time. The relevant iterative evolution can be seen with reference to fig. 2.

Fig. 2 shows an iterative evolutionary diagram of a fast stroke screening system based on online learning according to the present invention.

The invention provides a BEFAST stroke screening system and a BEFAST stroke screening method based on online learning. In summary, the invention provides a system and a method for screening a BEFAST stroke based on online learning, the whole system is configured with a historical database, a limited data set module, a weight distribution module and a model construction module, wherein the limited data set module is used for screening each patient with a similar index distance to a specified patient in the historical database to form a historical data subset, at the same moment, the weight distribution module carries out decision tree classification based on the historical data subset to calculate the weight of each BEFAST data, then the model construction module is used for constructing a screening model for the specific patient based on the historical data subset and the weight of each BEFAST data, and in the subsequent iteration, the weight calculated by the previous weight distribution module participates in the construction of the historical data subset at the next moment, and further influences the weight calculation and the construction of the screening model at the next moment, so that the screening model is continuously optimized in the continuous iteration, and the effect of BEFAST multi-mode data acquisition and stroke screening of the invention is more accurate along with the time.

The foregoing description of the exemplary embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, and variations which fall within the spirit and scope of the invention are intended to be included in the scope of the invention.

Claims

1. A BEFAST stroke screening system based on online learning is characterized in that the system can screen stroke of specific patients every unit time from the 0 th moment, the system comprises a history database, a limited data set module, a weight distribution module and a model construction module, wherein the history database stores BEFAST data of each patient from the 0 th moment to the j th moment in N patients, the specific patients record BEFAST data from the 0 th moment to the j th moment,

the shrinkage data set module calculates the index distance between each patient and the specific patient according to BEFAST data of each patient and the specific patient at the 0 th moment in the historical databaseSetting a distance threshold MaxL, which is satisfied only for patients in the history database>MaxL, is introduced into the 0 th historical data subset M0,

the weight distribution module establishes a0 th level decision tree by taking a0 th historical data subset M0 as a training data set, calculates respective gain coefficients of BEFAST data and combines the gain coefficients into a weight vector g0, the model construction module constructs a screening model aiming at the specific patient by taking the 0 th historical data subset M0 and the weight vector g0 as inputs and taking the predicted risk probability of stroke as outputs, further determines a model construction parameter theta,

reduced dataset module importationWeight vector g0 and calculating the index distance of each patient from the specific patient at time 1 in the history database based on BEFAST data at time 1The shrinkage-limiting data set module requires that only patients in the history database meet +.>And the model building module is used for inputting g1 and M1 into the screening model so as to iteratively optimize the model building parameters theta.

2. The system of claim 1, wherein the BEFAST data of the specific patient at the 0 th to the j th time are balance indexes B0 to Bj, eye indexes E0 to Ej, face indexes F0 to Fj, arm indexes A0 to Aj, and language indexes S0 to Sj, respectively, the BEFAST data of each patient in the history database at the 0 th to the j th time are balance indexes B0 to Bj, eye indexes E0 to Ej, face indexes F0 to Fj, arm indexes A0 to Aj, and language indexes S0 to Sj, respectively,

at time 0, the condensed data set module calculates the exponential distance of each patient from the specific patientThe calculation formula of (2) is as follows:

。

3. the system of claim 2 wherein the weight distribution module establishes a0 th level decision tree execution weight vector with the 0 th historical data subset M0 as the training data set to calculate respective gain coefficients gB0, gE0, gF0, gA0, gS0 for the five best dataAs the weight at the 0 th moment, the weight vector g0 is combined and calculated, and at the 1 st moment, the shrinkage limiting data set module introduces the weight vector g0 at the 0 th moment to calculate the index distance between each patient and the specific patient in the history database at the 1 st moment，

。

4. A system according to claim 3, wherein in the process of calculating the weight by the weight distribution module at the time 0, the weight distribution module uses mild difficulty, moderate difficulty and severe difficulty corresponding to any index in the fast data as a root node classification, uses stroke risk as a leaf node classification, forms five 0 th-level decision trees respectively, calculates gain coefficients gB0, gE0, gF0, gA0 and gS0 of the five fast data respectively as weights of the five fast data respectively at the time 0, and combines the five fast data into a weight vector g0.

5. The system of claim 4, wherein in the model building module building the screening model for the particular patient at time 0, the model building module builds the screening model for the particular patient by supervised learning, the function expressed as: y=f (M0, g0, θ), where Y represents the model output, i.e., the predicted risk of stroke for a particular patient, θ represents the model building parameters, the 0 th historical data subset M0 and the weight vector g0 are the inputs to the model, whereby the model building parameters θ are determined by calculating θ=argmin (Y0, f (M0, g0, θ)), where Y0 represents the actual risk of stroke at time 0 and f (M0, g0, θ) is the predicted risk of stroke.

6. The system of claim 5, wherein during the model building module building the screening model for the particular patient at time 1, the model building module then inputs g1 and M1 into the function, iteratively determining θ = argmin (y 1, f (M1, g1, θ)), where y1 represents an actual risk of stroke at time 1 and f (M1, g1, θ) is a predicted risk of stroke probability.

7. The system of claim 6, wherein over time, the integer j is valued from 2, from time 0, time 1, to time j, and the reduced data set module is invoked to calculate the exponential distance of each patient in the history database from the particular patient at time jThe calculation formula is as follows:，

8. The system of claim 7, wherein the reduced data set module requires only satisfaction of the best data for any patient in the history databaseAnd the model construction module inputs gj and Mj into the function to iteratively determine θ=argmin (yj, f (Mj, gj, θ)), wherein yj represents the actual risk probability of the stroke at the j-th moment, and f (Mj, gj, θ) represents the predicted risk probability of the stroke.

9. The system of claim 8, wherein when calculating any one of the level 1 decision tree to the level j decision tree, the any one of the level 1 decision tree is counted as a level p decision tree, the mild difficulty, the moderate difficulty and the severe difficulty corresponding to any one of the balance index B, the eye index E, the face index F, the arm index a and the language index S are classified as root nodes, the stroke risk is classified as leaf nodes, five level 0 decision trees are respectively formed, thereby calculating gain coefficients gBp, gEp, gFp, gAp, gSp of each of the balance index B, the eye index E, the face index F, the arm index a and the language index S,

the empirical entropy calculation formula of the p-th level decision tree is as follows:

，

wherein k=3, meaning that all patients in the p-th historical data subset Mp can be classified into three basic categories of low risk of stroke, medium risk of stroke, and high risk of stroke, D is the total number of samples of the training data set, i.e., the number of patients in the p-th historical data subset Mp, C _K For the corresponding number of samples under each basic subclass,

each index of the balance index B, the eye index E, the face index F, the arm index a and the language index S is collectively called an index Z, and then mild difficulty, moderate difficulty and severe difficulty corresponding to the index Z are used as root nodes for classification, so that empirical conditional entropy is introduced, and a calculation formula is as follows:

，

where H (d|z) represents the empirical conditional entropy under the index Z as the root node classification, di represents the number of patients in each of the 3 cases of mild difficulty, moderate difficulty, severe difficulty in which the training data set is classified as the index Z of the root node, dik represents the number of patients in each of the three cases of low risk of stroke, medium risk of stroke, high risk of stroke under the root node classification, whereby n=3,

from this, the gain factor gZp of the decision tree at index Z under the p-th level decision tree is found: gZp =h (D) -H (d|z), that is, gain coefficients gBp, gEp, gFp, gAp, gSp of the balance index B, the eye index E, the face index F, the arm index a, and the language index S at the p-th time are obtained as weights of the five best data at the p-th time, and are combined to form a weight vector gp.

10. A method of fast stroke screening based on online learning, characterized in that the method is performed by a system according to any of claims 1-9.