CN116741385B - Infectious disease cross-border propagation modeling prediction method - Google Patents
Infectious disease cross-border propagation modeling prediction method Download PDFInfo
- Publication number
- CN116741385B CN116741385B CN202310383533.1A CN202310383533A CN116741385B CN 116741385 B CN116741385 B CN 116741385B CN 202310383533 A CN202310383533 A CN 202310383533A CN 116741385 B CN116741385 B CN 116741385B
- Authority
- CN
- China
- Prior art keywords
- country
- infection
- curve
- sir
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 208000015181 infectious disease Diseases 0.000 title claims abstract description 109
- 238000000034 method Methods 0.000 title claims abstract description 83
- 208000035473 Communicable disease Diseases 0.000 title claims abstract description 57
- 238000004458 analytical method Methods 0.000 claims abstract description 15
- 230000005540 biological transmission Effects 0.000 claims abstract description 13
- 230000008569 process Effects 0.000 claims description 49
- 230000000694 effects Effects 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 7
- 238000012417 linear regression Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000011084 recovery Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 238000005259 measurement Methods 0.000 claims description 2
- 238000011160 research Methods 0.000 abstract description 8
- 230000002265 prevention Effects 0.000 abstract description 7
- 238000011156 evaluation Methods 0.000 abstract description 4
- 238000012502 risk assessment Methods 0.000 abstract description 3
- 238000009825 accumulation Methods 0.000 abstract description 2
- 238000004451 qualitative analysis Methods 0.000 abstract description 2
- 230000009471 action Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000010006 flight Effects 0.000 description 3
- 230000002458 infectious effect Effects 0.000 description 3
- 238000002255 vaccination Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 241000700605 Viruses Species 0.000 description 2
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 2
- 230000006806 disease prevention Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000000611 regression analysis Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000013179 statistical model Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 239000012678 infectious agent Substances 0.000 description 1
- 230000005541 medical transmission Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/80—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
Landscapes
- Public Health (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Engineering & Computer Science (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a cross-border propagation modeling prediction method for infectious diseases, which utilizes a regression model, adopts a quantitative index to research and judge the input risk of the infectious diseases and provides a more accurate basis for the effectiveness evaluation of port prevention and control measures. The current situation that the current customs infectious disease risk analysis and research and judgment still stay in the current situation description and qualitative analysis in most cases is improved, the capabilities of scientific analysis, accurate prediction, dynamic adjustment and objective research and judgment are improved, experience is accumulated for research and accumulation of dynamic models of other infectious disease cross-border transmission, a foundation is established, and the capability of customs port for preventing and controlling infectious disease cross-border transmission is improved.
Description
Technical Field
The present invention relates to the field of infection prediction. And more particularly, to a modeling prediction method for cross-border spread of infectious diseases.
Background
At present, customs still stays in a descriptive induction and summarization stage on the application of infectious disease data, methods such as expert consultation and the like are mainly adopted in the prevention and control of infectious disease cross-border transmission risks, and more visual and accurate model methods are needed for predicting and evaluating results. Customs accumulates a large amount of data of positive cases detected by infectious disease inputtable laboratories, but analysis of the data is still in a descriptive summary analysis stage, and an accurate data model is not established.
In order to timely, accurately and reliably predict the development change of infectious diseases, the mathematical and statistical models are comprehensively utilized to carry out more accurate modeling analysis on the infectious diseases, and research results based on the mathematical and statistical models are obtained. At present, no model is used for researching the dynamics of infectious disease transmission, the input risk of infectious disease cross-border transmission is closely related to the incidence of the output country, the times of flights of the output country and the input country, the frequencies of people, the types of flights, the density of airport personnel, the environmental conditions of different ports, prevention measures and the like, and is also related to factors such as crowd distribution, regional distribution, vaccination conditions, variant distribution and the like of the infectious disease in the output country, some of the above influencing factors can be used for defining the logic relationship of the infectious disease by using a mathematical formula, some of the influencing factors need to be continuously deduced in the model, so that a corresponding flexible, dynamic and real-time simulation system needs to be constructed for accurate evaluation and prevention and control of the input risk of the infectious disease, and theoretical support and visual display are provided for decision making and making of customs at each stage in the prevention and control of the infectious disease.
Disclosure of Invention
The invention aims to provide a modeling prediction method for cross-border transmission of infectious diseases, which provides theoretical support and visual display for decision making and formulation of customs in each stage of infectious disease prevention and control.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the first aspect of the invention provides a modeling prediction method for cross-border transmission of infectious diseases, which comprises the following steps:
step1, acquiring raw data comprising the number of infected persons in each country worldwide, the number of inbound aviation tools, the origin of the inbound persons and the number of inbound persons, detecting the number of positive cases and the number of published input cases in a laboratory, and sequentially screening and preprocessing the raw data to obtain an infection curve I (t) of each country;
step2, dividing curve segments which can fit the SIR propagation process in the infection curves I (t) of all countries obtained by the pretreatment to obtain a curve I which can fit the SIR s (t), wherein s is the segment number divided by the infection curve I (t);
step3, for each curve I of each country s (t) fitting the SIR propagation process to obtain a fitted SIR curve and fitting parameters of each country;
step4, according to the fitted SIR curve and the fitting parameters, the curve I is compared with the fitting parameters s Dividing the infection process represented by (t) according to distribution to obtain divided curve segments, and collecting total to obtain the infection of the country as the first characteristic variableA sequence and an output country infectious disease stage as a second characteristic variable;
step 5, summarizing the number of people entering the environment in the original data by using the divided curve segments to obtain the total number of people entering the environment as a third characteristic variable;
step 6, summarizing the number of input cases in the original data by using the divided curve segments to obtain the number of overseas input cases serving as a target variable;
and 7, adding the first characteristic variable, the second characteristic variable, the third characteristic variable and the target variable into a regression model, constructing a cross-border infectious disease propagation linear regression model, and predicting results.
Preferably, the preprocessing of the data comprises stitching the acquired data, and generating an infection curve I (t) of each country during the infectious disease infection period, namely a time sequence I (t) of the number of existing infectious persons per day, based on the raw data.
Preferably, said screening of data comprises,
s1: rejecting countries with non-zero sequence length less than or equal to 300;
s2: rejecting min (I (t)) ∈10000000 and max (I (t)) ∈1000, i.e. rejecting sequence with minimum value or maximum value of minimum value of sequence;
s3: deleting all header data of O in each country sequence;
s4: and deleting tail data of all O in each country sequence.
Preferably, for the curve I s The infection process characterized in (t) is divided according to distribution, and further comprises the steps of starting from the peak value of the number of infected people, dividing the infected people from the left side and the right side at equal distances according to a complete infection period, and obtaining divided curve segments.
Preferably, the number of S susceptible people, the beta infection rate, the gamma recovery rate and the initial value of the number of I infected people are used for constructing an SIR model by combining parameters, N groups are randomly extracted from a determined parameter initial value vector to serve as initial values, N is a constant, an objective function is a residual error between the sequence of the number of infected people and the actual sequence of the number of infected people of the fitted SIR model, the parameter combination is input as the initial value, the L-BFGS-B is used as an optimization method, and the S, beta and gamma are optimized to output optimal parameters.
Preferably, the method further comprises using R 2 Evaluating the fitting effect of the fitted SIR curve
Wherein the method comprises the steps ofRepresenting each true value, predicted value and average value of the sequence.
Preferably, the construction of a cross-border linear regression model of infectious disease
y=α+λx inject +ωx input +δx stage +∈
Where y is the number of positives entered over a period of time, x inject 、x input 、x stage The method is characterized in that the number of the infection of the country is output at the same time, the input number of the country and the infection period of the infectious disease of the country are respectively output, wherein epsilon is a random error term, alpha is a constant, lambda is a coefficient of the number of the infection of the country at the same time, omega is a coefficient of the input number of the country, and delta is a coefficient of the infection period of the infectious disease of the country.
Preferably, in the process of predicting the result, it is also necessary to record the SIR process curve which cannot be accurately described at present in the application process, add the newly-appearing curve with incorrect prediction to the configuration table in time, and at the same time, re-fit or mark the SIR curve which can be predicted and has incorrect prediction for the next analysis and measurement.
A second aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method provided in the first aspect of the invention when executing the program.
A third aspect of the invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method provided by the first aspect of the invention.
The model disclosed by the invention can flexibly, dynamically and real-timely study and output the influence of infectious disease incidence, variant epidemic situation, vaccination situation, the frequency of flights to and from the input country, the flight model, airport personnel density, environmental conditions of different ports and prevention and control measures on the input risk of infectious disease;
searching rules of infectious disease input risk under the condition that factors such as population distribution, regional distribution and the like of infectious disease of the output country are unknown;
theoretical basis and data support are provided for risk assessment, input early warning and decision making of national border port infectious diseases, prevention and control measures are comprehensively evaluated, and the national border port infectious diseases are represented by using an intuitive diagram mode.
The beneficial effects of the invention are as follows:
the invention improves the current situation that the current customs infectious disease risk analysis and research and judgment still stay in the current situation description and qualitative analysis in most cases, improves the capabilities of scientific analysis, accurate prediction, dynamic adjustment and objective research and judgment, builds the basis for research and accumulation of experience of power models of other infectious disease cross-border transmission, and improves the capability of customs port to prevent and control the infectious disease cross-border transmission.
Drawings
The following describes the embodiments of the present invention in further detail with reference to the drawings.
FIG. 1 shows an overall flowchart for modeling infectious disease cross-border propagation.
Fig. 2 shows a schematic diagram of peak intervals.
Fig. 3 shows a SIR model schematic.
Fig. 4 shows a schematic diagram of SIR model fitting process.
Fig. 5 shows a schematic of the fitting result.
Figure 6 shows a plot of SIR model goodness-of-fit density.
Fig. 7 shows a schematic diagram of a prediction process.
Detailed Description
In order to more clearly illustrate the present invention, the present invention will be further described with reference to the preferred embodiments and the accompanying fig. 1 to 7. Like parts in the drawings are denoted by the same reference numerals. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and that this invention is not limited to the details given herein.
In order to describe the input risk more directly, the overall scenario is described as follows:
predicting a target: the number of overseas input cases within a certain time period T;
prediction variable: the infection of a country within a certain time period T illustrates the variable injectFeatureList;
a variable stageList is described for the infection stage of a country within a certain time period T;
the explanatory variable totalFeatureList of the overseas input situation of a certain country within a certain time period T.
Prediction model: linear regression model
The construction of specific prediction variables comprises the following steps:
step1, acquiring raw data comprising the number of infected persons in each country worldwide, the number of inbound aviation tools, the origin of the inbound persons and the number of inbound persons, detecting the number of positive cases and the number of published input cases in a laboratory, and sequentially screening and preprocessing the raw data to obtain an infection curve I (t) of each country;
step2, dividing curve segments which can fit the SIR propagation process in the infection curves I (t) of all countries obtained by the pretreatment to obtain a curve I which can fit the SIR s (t), wherein s is the segment number divided by the infection curve I (t);
step3, for each curve I of each country s (t) fitting the SIR propagation process to obtain a fitted SIR curve and fitting parameters of each country;
step4, according to the fitted SIR curve and the fitting parameters, the curve I is compared with the fitting parameters s The infection process characterized in (t) is divided according to distribution to obtain divided curve segments, and aggregate is carried out as the firstAn output country infection sequence (injectFeatureList) of one characteristic variable and an output country infectious disease infection stage (stageList) as a second characteristic variable;
step 5, summarizing the number of people entering the original data by using the divided curve segments to obtain an overseas input headcount (total FeatureList) serving as a third characteristic variable;
step 6, summarizing the number of input cases in the original data by using the divided curve segments to obtain the number of overseas input cases (caseNum) serving as a target variable;
and 7, adding the first characteristic variable, the second characteristic variable, the third characteristic variable and the target variable into a regression model, constructing a cross-border infectious disease propagation linear regression model, and predicting results.
1.1 data preprocessing and stitching
The raw data provides a 2020,2021,2022 three year sequence of existing infections, respectively, and a daily time series of existing infections I (t) for each country's infectious disease period needs to be generated based on the raw data. Firstly, the data in 2020 and 2021 are internally connected, the main key is the country (including the corresponding attribute item), the history infection number wide table of each country for two years is obtained, wherein the numerical matrix is marked as H ij The row index represents time, the column index represents country, and element h in the matrix ij Representing the number of people with historical infection. Then using the same notation, a single country historic infested population sequence H. j Is recorded as H (t)
1.2 data screening and missing value handling
The data screening and missing value processing comprises the following steps of;
step1: rejecting countries with length ({ I (t) noteq0 }) less than or equal to 300, i.e. countries with non-zero sequences less than or equal to 300, because these country sequences are too short for efficient analysis;
step2: rejecting min (I (t)). Gtoreq.10000000, max (I (t)). Ltoreq.1000. Namely, removing sequences with excessively large minimum value or excessively small maximum value of the sequences, and basically screening national sequences and corresponding newly-increased sequences of the number of infectious agents which are suitable for further analysis and valuable for analysis;
step3: deleting all header data of O in each country sequence;
step4: and deleting tail data of all O in each country sequence.
The first step removes data with non-zero sequences that are too short, as the sequences are too short to fit the SIR model. The second step removes models with minimum values that are too small or maximum values that are too large, because such data are difficult to fit and have a very small duty cycle that is less meaningful for subsequent analysis. Thirdly, the data with the head and the tail being O are removed in the fourth step, because the data which is equal to O in the sequence can not fit the SIR model, the data which appears in the head and the tail can be directly deleted, and the subsequent analysis can not be influenced.
Through the steps, an analyzable sequence of the number of infected persons is obtained.
The epidemic process is mostly complex, and a plurality of infection processes or errors in the data collection process may occur in the same time period, and the complex situations of abnormal fluctuation of the number of infected persons and the like are represented in the data (a t2-t3 interval shown in fig. 2). By screening and selecting existing infection sequences in each country, the final selection can be simulated using the better fitted curve portion of the SIR propagation process.
As shown in fig. 2, (t) 1 ,t 2 ) And (t) 3 ,t 4 ) A curve is fitted for the selected SIR propagation process.
2.1SIR model profiling
The SIR transmission model can be used to describe the change process of uninfected population, infected population and recovered population at different moments after the occurrence of infectious disease, and is specifically explained as follows:
dividing the total population (N) in the propagation process into three parts: namely S (susceptible), I (infected) and R (recovering). When the transmission is performed, the infected person (I) is in contact with the susceptible person (S), and the susceptible person is infected with a certain probability. The infected person recovers after a period of time, and is moved out of the population of infected persons to become a restorer (R). The restorer for some reason will not become an infected person for a short period of time. Let t denote time, β denote infection rate, γ denote recovery rate, and the model can be expressed by the following differential equation:
by optimizing calculation I 0 Model parameters such as S, beta, gamma and the like can determine the transmission condition of the whole epidemic disease in a period of time.
2.2SIR epidemic model action description
In fact, the SIR model described above is not able to fully describe the course of infection of an infectious disease, mainly because: the model is not enough for the transmission process of infectious diseases, only two parameters gamma and beta are set to describe the recovery and infection process, and the parameter construction of factors is simpler. In fact, most governments in countries and regions take corresponding measures to limit the spread of infectious diseases after they occur. The utility of the relevant intervention is therefore difficult to embody in the existing model, and the corresponding deviations are unavoidable.
From the above analysis, although the SIR model does not accurately describe the infection process in an actual state, the SIR base model can effectively describe the infection process for a certain period of time from the viewpoint of the collected observation data. Therefore, in the scheme, the purpose of using the SIR model is to effectively identify the infection process and remove noise points, and finally form a characteristic with direct correlation significance to predict the input case.
2.3 model fitting and parameter solving
In this project, the construction of SIR models requires four parameters, S (number of susceptible people), β (infection rate), γ (recovery rate), and I (initial number of infected people), respectively. In order to be able to efficiently search for the optimization vector and improve the effect of SIR model fitting, a traversal search of the best fit effect is performed using different parameter combinations as initial inputs. And randomly extracting N groups of initial values from the determined parameter initial value vectors to serve as initial values, and executing the following optimization fitting process.
The process of optimizing the parameters uses an optim function in r language, the objective function being the sum of the residuals between the sequence of the number of infected persons and the actual sequence of infected persons of the fitted SIR model (calculated using the dist function in r language). And (3) inputting the parameter combination as an initial value, using the L-BFGS-B as an optimization method, optimizing S, beta and gamma, and outputting the optimal parameters. For each piece of data, there are N initial value optimizing processes, and a group of optimizing parameters with minimum residual is taken and output. The optimization process can be described by the following formula
Wherein I is SIR (S, beta, gamma, I) is the infection sequence of the SIR model determined by the parameters, I real Is the true infection sequence, the optimized objective function is the residual of the two (i.e. the difference L 2 Norms). Taking n=25, finding the optimal result with the smallest residual error in the 25 groups of parameters, and taking the optimal result as the final output parameter.
The optim function in the r language provides a variety of parameter optimization methods, where L-BFGS-B is selected as the optimization method.
When the fitting effect is poor, the relevant calculation precision needs to be analyzed, and the constant term is expanded. The fitting is performed this time, the parameter b (beta) needs to be multiplied by 10000 constant items, and other parameters need to be expanded less. The coordinates of the optimal S, β, γ, I and peak values for each interval are optimized, an example is shown in fig. 5, where the solid line represents the fitted curve and the points represent the truncated original interval, and the parameters of the fitted curve output are shown in table 1, for example.
Table 1 parameter output table
As shown in table 1, in the data finally output after fitting, in addition to the original section data (country, left and right section, peak value), there are infection parameters bVecOpt (infection rate β), gVeclopt (recovery rate γ), initial opt (initial number of infected persons I), sVecOpt (susceptible person S) of the SIR model.
2.4 evaluation of the Effect of SIR model
And obtaining the optimal parameters of the SIR model by using the parameter solving process. Evaluating the fit effect of a curve using R 2 ,R 2 The formula of (2) is as follows:
wherein the method comprises the steps ofRepresenting each true value, predicted value and average value of the sequence. R is R 2 The closer to 1, the better the fitting effect representing the curve, the setting of R 2 At > 0.4, the curve fitting effect is considered good. In SIR model evaluation, R is calculated for each fitted infectious disease fit data 2 Drawing a density map; as shown in FIG. 6, most of the curve R 2 Are all above 0.75, but only a very small part of R 2 < 0.4. In general, all R 2 The average value of (c) is about 0.86, proving that the SIR model can well describe the change of infection status in different countries and regions.
3.1 sample dataset introduction
After preliminary data processing and SIR fitting, summarizing to obtain a data set finally input into a regression model. The variable interpretation tables and samples in the dataset are shown in tables 2 and 3, for example;
table 2 regression analysis sample data interpretation table
Table 3 regression analysis sample example
After the SIR model is output, the original data sequence is intercepted and processed, and besides parameters of the SIR model, there are infection stage stageList, peak characteristic peaksListVec of corresponding time period, and infection population injectFeatureList (calculation mode is summation of infection population for a period of time) of the same time period, total population total feature is input, and total number of inbound cases caseFeatureList.
3.2 training procedure of model
The training process adopts N-fold for cross verification, ensures the stability and rationality of the model on a sample set, and comprises the following specific implementation processes: randomly sampling an original data set and a training set: test set = 7: and 3, fitting and testing a linear regression model, and selecting a more proper model result.
3.3 fitting model analysis specification
In this project, the model structure obtained by training is as follows:
y=α+λx inject +ωx input +δx stage +∈
where y is the number of positives entered over a period of time, x inject 、x input 、x stage The method comprises the steps of outputting the number of infection of the country in the same period, outputting the number of input of the country and outputting the period of infection of infectious diseases of the country, wherein epsilon is a random error term, alpha is a constant, lambda is a coefficient of the number of infection of the country in the same period, omega is a coefficient of the number of input of the country, delta is a coefficient of the period of infection of infectious diseases of the country.
4.1 prediction Process
The prediction process is schematically shown in fig. 7, and two prediction processes are given here from the practical application point of view, due to the properties of the SIR model itself (which may determine the overall parameter process from the partial parameters):
prediction process 1: constructing variables similar to the linear model, and directly predicting the linear model, wherein the specific process is as follows:
step1: the prediction formula is determined as follows
case t =pre(Inject t ,input t ,stage t )
Step2: the following variables were constructed:
Inject t : summarizing the number of infected people after being better fitted by SIR in the same time period (or the same type of time period);
input t : summarizing the number of people input in the same time period;
stage t : the timing of the pre-determined infection (which may be determined according to an originally provided table or randomly selected); if the infected waveform is incomplete, the SIR model can be fitted first, and the regression model is substituted for prediction according to possible different stages of input (the part is considered to be input, if the observed data are enough, definition and explanation can be carried out), and in a word, the upper limit and the lower limit of input prediction can be carried out for a plurality of times in terms of predicting the risk of input;
step3: prediction of the number of infected persons case using a linear model part t I.e. input risk.
Prediction process 2: the prediction is performed by combining a characteristic table (initial value table of fitting SIR), and the specific prediction process is as follows:
step1: determining input as initial infection fragment object [t,T] ,[t,T]Is an observation interval;
step2: performing simulation calculation by using relevant epidemic parameters in a characteristic table (table 2) to obtain a curve I (t);
step3: intercepting a part I (T, T) similar to the initial input infection fragment, and calculating the part I and the part I from the input [t,T] And outputting a curve with smaller distance;
step4: using the obtained epidemic curves, feature summarization is performed, a prediction process 1 is performed, and a relevant prediction value is output.
4.2 predictive monitoring Range
In fact, in the process of prediction, besides the conventional items to be monitored, it is also necessary to record SIR process curves which cannot be accurately described by the current feature table (table 2) in the application process, add newly-appearing and mispredicted curves into the configuration table in time, and simultaneously re-fit or mark SIR curves which can be predicted and are not accurately predicted so as to perform the next analysis and calculation.
The prediction result application includes:
(1) And judging the immediate epidemic situation of the output country and the probability of inputting the infectious case under the traffic situation of the input country according to the input positive case number result.
(2) Judging, grading and dynamically adjusting the port and vehicle infectious disease prevention and control measures according to the number of the inputted positive cases.
(3) When virus variation and vaccination occur, the infectious disease input risk is re-estimated by adjusting model parameters under the condition of foreign morbidity or infectious virus variation.
(4) The probability of infecting other passengers during the travel is estimated by comprehensively considering the factors related to the density of the vehicles, such as the airplane boarding rate, the airplane seating distance and the like.
The present embodiment also provides a nonvolatile computer storage medium, which may be the nonvolatile computer storage medium included in the apparatus described in the above embodiment, or may be a nonvolatile computer storage medium that exists alone and is not incorporated in a terminal.
In the description of the present invention, it should be noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
It should be understood that the foregoing examples of the present invention are provided merely for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention, and that various other changes and modifications may be made therein by one skilled in the art without departing from the spirit and scope of the present invention as defined by the appended claims.
Claims (9)
1. A method for modeling and predicting cross-border spread of infectious diseases, comprising:
step1, acquiring raw data comprising the number of infected persons in each country worldwide, the number of inbound aviation tools, the origin of the inbound persons and the number of inbound persons, detecting the number of positive cases and the number of published input cases in a laboratory, and sequentially screening and preprocessing the raw data to obtain an infection curve I (t) of each country;
step2, dividing curve segments which can fit the SIR propagation process in the infection curves I (t) of all countries obtained by the pretreatment to obtain a curve I which can fit the SIR s (t), wherein s is the segment number divided by the infection curve I (t);
step3, for each curve I of each country s (t) fitting the SIR propagation process to obtain a fitted SIR curve and fitting parameters of each country;
step4, according to the fitted SIR curve and the fitting parameters, the curve I is compared with the fitting parameters s The infection process characterized in (t) is divided according to distribution to obtain divided curve segments, and summarizing calculation is carried outOutputting an output country infection sequence as a first characteristic variable and an output country infectious disease infection stage as a second characteristic variable;
step 5, summarizing the number of people entering the environment in the original data by using the divided curve segments to obtain the total number of people entering the environment as a third characteristic variable;
step 6, summarizing the number of input cases in the original data by using the divided curve segments to obtain the number of overseas input cases serving as a target variable;
step 7, adding the first characteristic variable, the second characteristic variable, the third characteristic variable and the target variable into a regression model, constructing a cross-border transmission linear regression model of infectious diseases, and predicting results;
constructing a cross-border spread linear regression model of infectious diseases
y=α+λx inject +ωx input +δx stage +∈
Where u is the number of positives entered over a period of time, x inject 、x input 、x stage The method comprises the steps of outputting the number of infection of the country, the number of input of the country and the period of infection of infectious diseases of the country in the same period of time, wherein epsilon is a random error term, alpha is a constant, lambda is a coefficient of the number of infection of the country in the same period of time, omega is a coefficient of the number of input of the country in the country, and delta is a coefficient of the period of infection of infectious diseases of the country in the same period of time.
2. The method according to claim 1, wherein the preprocessing of the data comprises stitching the acquired data to generate an infection curve I (t), i.e. a time series I (t) of the number of existing infections per day, for each country during the infection of the infectious disease on the basis of the raw data.
3. The method of claim 1, wherein the screening the data comprises,
s1: rejecting countries with non-zero sequence length less than or equal to 300;
s2: rejecting min (I (t)) ∈10000000 and max (I (t)) ∈1000, i.e. rejecting sequence with minimum value or maximum value of minimum value of sequence;
s3: deleting all header data of 0 in each country sequence;
s4: and deleting tail data with all 0 in each country sequence.
4. The method according to claim 1, characterized in that for the curve I s The infection process characterized in (t) is divided according to distribution, and further comprises the steps of starting from the peak value of the number of infected people, dividing the infected people from the left side and the right side at equal distances according to a complete infection period, and obtaining divided curve segments.
5. The method of claim 1, wherein the SIR model is constructed using the number of S susceptible people, the β infection rate, the γ recovery rate, and the initial value of the I number of infected people, the parameter combination is constructed by randomly extracting N groups as initial values in the determined parameter initial value vector, and N is a constant, the objective function is a residual between the sequence of infected people and the actual sequence of infected people of the fitted SIR model, the parameter combination is input as initial values, and the S, β, and γ are optimized using the L-BFGS-B as an optimization method, and the optimal parameters are output.
6. The method of claim 5, further comprising using R 2 Evaluating the fitting effect of the fitted SIR curve
Wherein Y is i ,Representing each true value, predicted value and average value of the sequence.
7. The method of claim 1, wherein during the result prediction, it is further required to record SIR process curves which cannot be accurately described at present during the application process, and add new and mispredicted curves to the configuration table in time, and at the same time, re-fit or mark SIR curves which can be predicted and are not accurately predicted for further analysis and measurement.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-7 when the program is executed by the processor.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310383533.1A CN116741385B (en) | 2023-04-11 | 2023-04-11 | Infectious disease cross-border propagation modeling prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310383533.1A CN116741385B (en) | 2023-04-11 | 2023-04-11 | Infectious disease cross-border propagation modeling prediction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116741385A CN116741385A (en) | 2023-09-12 |
CN116741385B true CN116741385B (en) | 2023-11-14 |
Family
ID=87912200
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310383533.1A Active CN116741385B (en) | 2023-04-11 | 2023-04-11 | Infectious disease cross-border propagation modeling prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116741385B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108364694A (en) * | 2018-03-09 | 2018-08-03 | 中华人民共和国陕西出入境检验检疫局 | Airport Disease Warning Mechanism based on multi-data source big data and prevention and control system constituting method |
KR101960504B1 (en) * | 2017-12-18 | 2019-07-15 | 세종대학교산학협력단 | Apparatus and method for epidemic spread prediction modeling |
CN115424737A (en) * | 2022-09-15 | 2022-12-02 | 医渡云(北京)技术有限公司 | Infectious disease spreading number prediction method and device, storage medium and electronic equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102349270B1 (en) * | 2021-03-05 | 2022-01-10 | 한국과학기술원 | Method and apparatus for predicting confirmed patients of infectious disease based on deep neural networks |
-
2023
- 2023-04-11 CN CN202310383533.1A patent/CN116741385B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101960504B1 (en) * | 2017-12-18 | 2019-07-15 | 세종대학교산학협력단 | Apparatus and method for epidemic spread prediction modeling |
CN108364694A (en) * | 2018-03-09 | 2018-08-03 | 中华人民共和国陕西出入境检验检疫局 | Airport Disease Warning Mechanism based on multi-data source big data and prevention and control system constituting method |
CN115424737A (en) * | 2022-09-15 | 2022-12-02 | 医渡云(北京)技术有限公司 | Infectious disease spreading number prediction method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN116741385A (en) | 2023-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113297272B (en) | Bridge monitoring data association rule mining and health early warning method and system | |
CN113298373B (en) | Financial risk assessment method, device, storage medium and equipment | |
CN116341901B (en) | Integrated evaluation method for landslide surface domain-monomer hazard early warning | |
Taufiq | Classification method of multi-class on C4. 5 algorithm for fish diseases | |
CN110889440A (en) | Rockburst grade prediction method and system based on principal component analysis and BP neural network | |
WO2019200739A1 (en) | Data fraud identification method, apparatus, computer device, and storage medium | |
CN113344471A (en) | Method for representing weather environment adaptability of aircraft system | |
CN115063056B (en) | Construction behavior safety risk dynamic analysis method based on graph topology analysis improvement | |
CN115481792A (en) | Tunnel geological forecasting method and system based on rough set and cloud model | |
CN104239722A (en) | Forecasting method based on recognition of correlational relationship between factors | |
Ye et al. | A deep learning-based method for automatic abnormal data detection: Case study for bridge structural health monitoring | |
CN116756825A (en) | Group structural performance prediction system for middle-small span bridge | |
CN115221793A (en) | Tunnel surrounding rock deformation prediction method and device | |
CN114548494A (en) | Visual cost data prediction intelligent analysis system | |
CN111968003B (en) | Crop disease prediction method based on crop ontology concept response | |
CN116741385B (en) | Infectious disease cross-border propagation modeling prediction method | |
Raihan et al. | Classification of covid-19 patients using deep learning architecture of inceptionv3 and resnet50 | |
US20240230458A9 (en) | Ai method and apparatus for detection of real-time damage using ae (acoustic emissions) | |
CN106709522B (en) | High-voltage cable construction defect classification method based on improved fuzzy trigonometric number | |
CN112151185A (en) | Child respiratory disease and environment data correlation analysis method and system | |
Zhou et al. | Study on Optimization of Data-Driven Anomaly Detection | |
CN114764682A (en) | Rice safety risk assessment method based on multi-machine learning algorithm fusion | |
CN113807587A (en) | Integral early warning method and system based on multi-ladder-core deep neural network model | |
CN112579849A (en) | Structure safety influence factor correlation analysis method based on full data | |
CN110807399A (en) | Single-category support vector machine-based collapse and slide hidden danger point detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |