CN115631869B - Method for constructing infectious disease prediction model - Google Patents

Method for constructing infectious disease prediction model Download PDF

Info

Publication number
CN115631869B
CN115631869B CN202211496712.8A CN202211496712A CN115631869B CN 115631869 B CN115631869 B CN 115631869B CN 202211496712 A CN202211496712 A CN 202211496712A CN 115631869 B CN115631869 B CN 115631869B
Authority
CN
China
Prior art keywords
time
wiener
model
training
infectious disease
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211496712.8A
Other languages
Chinese (zh)
Other versions
CN115631869A (en
Inventor
叶建宏
王依宁
史文彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202211496712.8A priority Critical patent/CN115631869B/en
Publication of CN115631869A publication Critical patent/CN115631869A/en
Application granted granted Critical
Publication of CN115631869B publication Critical patent/CN115631869B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a construction method of an infectious disease prediction model, and belongs to the technical field of epidemic situation prediction. Collecting search quantity of relevant keywords of epidemic situation on the Internet based on a large data platform of a search engine, collecting multiple types of climate variable data from a meteorological platform for infectious disease prediction, performing model verification by adopting a time sequence cross verification algorithm (TSCV), and outputting variables such as newly increased diagnosis numbers in future time periods; the prediction performance of the model can be evaluated by adopting indexes such as root mean square error, spearman rank correlation coefficient and the like. The invention provides a set of infectious disease epidemic situation prediction system based on the wiener cascade model for the first time based on the unique advantages of strong interpretability, low calculation cost, nonlinear decoding and the like, integrates large search query data and climate data including temperature and humidity, realizes accurate detection of epidemic situation of infectious disease, and is expected to provide a feasible new approach for dynamically configuring epidemic prevention resources.

Description

Method for constructing infectious disease prediction model
Technical Field
The invention relates to a construction method of an infectious disease prediction model, which is suitable for decoding and predicting epidemic dynamic changes in real time according to infectious disease related search query and multiple climate variables, and belongs to the technical field of epidemic prediction.
Background
In view of the serious influence of the new crown epidemic situation on health and socioeconomic performance, the epidemic disease incidence rate in the area range can be predicted timely and accurately, and the local sanitary resource management and effective epidemic prevention can be greatly facilitated. Several predictive models have been proposed for analysis of infectious disease dynamics, with susceptibility, infection and recovery (SIR) models being one of the most widely used epidemic models in the literature, which mainly take into account the number of susceptible, infected and recovered individuals, as well as the transmission and recovery rates, in mathematical modeling. Another common infectious disease model, the susceptibility, exposure, infection and recovery (SEIR) model, is a new variable that is an extended version of the SIR model that further improves the accuracy of the prediction of transmission of infectious disease by increasing the proportion of asymptomatic individuals who have been infected during modeling. However, most of the above models are deterministic, i.e. the modeling process is fixed, and the dependent variables and parameters are too small. Furthermore, the above models lack attention to behavior pattern features of people, such as vaccination, buying surgical masks or willingness to go to clubs, which may increase or decrease the spread of epidemics, with a crucial impact on the prognosis of infections.
Therefore, in order to explore new methods of monitoring and predicting disease outbreaks, internet search data reflecting patterns of human behavior is expected to become an important source of enhancing the performance of predictive models. The search query data obtained from Google Trends (Google Trends) big data platforms are widely used in the fields of business, economy, communication, disease prediction and the like. For example, ginsberg et al in 2009 detected dynamic changes in U.S. influenza-like disease (ILI) using query data of 45 selected search terms. In another study, araz et al reported that the additional use of google trend data significantly improved the performance of the multiple linear regression model in predicting ILI visit information. To date, there are considerable literature showing that the use of search query big data (e.g., google trends, etc.) has significant value in dynamic analysis of epidemics such as ebola, influenza-like diseases, measles and zika.
Currently, scholars propose various models to predict epidemic dynamics based on the search query data of the big data platform, such as autoregressive integrated moving average (ARIMA) model, and various machine learning algorithms. However, a major problem with the ARIMA model is that the stability of the model is poor, and small changes in observed data or model parameters can lead to unstable models, degraded decoding and prediction performance for epidemics. Machine learning-based methods, including linear regression, deep neural networks, long and Short Term Memory (LSTM) networks, etc., suffer from the disadvantage of requiring large amounts of data, which is often difficult to meet in practical predictions.
Disclosure of Invention
The technical solution of the invention is as follows: the method adopts a Wiener Cascade (Wiener Cascade) model, and is based on search query data and climate data of a big data platform.
The technical scheme of the invention is as follows:
a method of constructing a predictive model of an infectious disease, the method comprising the steps of:
step one, acquiring signals in an experimental time periods(t) Signal, signals(t) Including the number of new diagnosis cases per day, the daily temperature, the daily humidity, and the search amount of keywords related to infectious diseases. The number of newly added diagnosis cases is used as a target variable, and the daily temperature, the daily humidity and the keyword search amount are used as characteristic variables;
step two, the signal acquired in the first step is converted based on Continuous Wavelet Transform (CWT)s(t) Performing time-frequency analysis to obtain a signal time-frequency spectrogram;
in the second step, the method for performing time-frequency analysis comprises the following steps:
step S1, selecting a wavelet function satisfying "allowable conditions", expressed as:
Figure 462887DEST_PATH_IMAGE001
wherein, ψ is%f k t) The wavelet function is represented by a wavelet function,f k for the frequency of oscillation,tin order to be able to take time,σis a resolution parameter of the time domain;
step S2, the signals are subjected to wavelet function based on the step S1s(t) Performing continuous wavelet transformation to obtain signals(t) Wavelet coefficients of (a)W s (f k , t) The expression is:
Figure 542839DEST_PATH_IMAGE002
wherein, represents the complex conjugate,τis a time integral variable.
Step three, selecting characteristic variables related to infectious diseases based on the signals in the experimental time period acquired in the step one and the time-frequency spectrogram acquired in the step twox(n) Establishing a Wiener-Cascade (Wiener-Cascade) model which consists of a dynamic linear unitP(i.e., wiener filter) and a static nonlinear elementMetaQComposition;
wiener filterPIs output at the same time as a nonlinear unitQThe input of (a) is expressed as:
Figure 630880DEST_PATH_IMAGE003
wherein, the liquid crystal display device comprises a liquid crystal display device,u(n) Is thatnThe predicted incidence of the disease at the moment in time,x i (n - j) Represent the firstiThe characteristic variables are atn - jThe value of the moment of time is taken,Nrepresenting the total number of feature variables,Mthe time of the experiment is indicated as the time of the experiment,A ij is the firstiIs characterized in thatjWiener filter coefficient of moment is formed by wiener filter coefficientA ij The coefficient matrix of the composition is defined asAThe set of all feature variables is defined as a feature setF(n) = { x i (n - j), i = 1, 2, …, N, j = 1, 2, …, M }. Construction of nonlinear unitsQFor first order outputu(n) Adjusting to generate final output
y(n) = g(u(n))
Wherein, the liquid crystal display device comprises a liquid crystal display device,y(n) As a result of the fact that the target variable,nas a function of the discrete time variable,g(u(n) A third order polynomial, the weights of the third order polynomial function being determined by least squares optimization.
The method for determining the Wiener Cascade (Wiener-Cascade) model comprises the following steps:
the first step: estimating coefficient matrix based on Wiener-Hopf equationARealizing feature setF(n) With the target variabley(n) The best dynamic linear fit between, expressed as:
A = (X T X) -1 X T Y
wherein, the liquid crystal display device comprises a liquid crystal display device,Xrepresenting a matrix of all the characteristic variables,Yrepresenting a target variable matrix;
A = (X T X+λI) -1 X T Y
wherein, the liquid crystal display device comprises a liquid crystal display device,λis a regularization constant which is set to be constant,Irepresenting the identity matrix;
and a second step of: based on the coefficient matrix estimated in the first stepAAnd feature setF(n) Obtainingu(n);
And a third step of: according to the result obtained in the second stepu(n) And a target variabley(n) Fittingu(n) Andy(n) A third-order polynomial between the two to obtain a target variabley(n) The weights of the third order polynomial function are determined by least squares optimization.
And step four, training and verifying the wiener cascade model established in the step three by using a sliding window time sequence cross verification algorithm. The cross-validation process is mainly based on a sliding window time sequence cross-validation algorithm (TSCV), wherein a part of data subsets in front of predicted points are used as training sets, data points after a training window are used as test sets, the prediction precision of the data points to be tested can be obtained for each operation, then, the training and the test window can move a plurality of steps towards the tail of the time sequence when the next operation is performed, and the estimation process is repeated;
the TSCV algorithm method comprises the following steps:
step S1: selecting window lengthLIt represents the number of samples contained in the training set;
step S2: window length selected according to the previous stepLTaking the front of the beginning of the original time sequenceLThe individual samples are used as a training set, and a single observed value after a training window is selected as a test set;
step S3: training the wiener cascade model by using the training set in the step S2;
step S4: predicting the data of the test set by using the model trained in the step S3, and evaluating the prediction precision;
step S5: moving the training window to the right, and repeating the steps S3-S4 until the whole time sequence is covered;
step S6: the average value of all prediction accuracies in step S4 is calculated.
Advantageous effects
(1) Compared with the traditional linear regression model and machine learning algorithm, the wiener cascade model not only additionally considers the time lag effect of various characteristic variables and fully integrates the history information of the characteristic variables, but also has the advantages of strong interpretability, low calculation cost, support of nonlinear relation modeling and the like, and is beneficial to improving the dynamic decoding and prediction performance of the infectious diseases;
(2) In the method, in view of the fact that the previous researches indicate that climate variables such as temperature and humidity have significant influence on the incidence of infectious diseases such as new coronary epidemic, the method also integrates multiple climate variables (such as temperature and humidity) into a wiener cascade prediction model, and expects that the variables can better realize dynamic prediction of the incidence of infectious diseases. To our knowledge, this is also a novel technique for integrating crowd search query big data with climate variables at the same time and using wiener cascade model to decode and predict infectious diseases.
(3) Aiming at the requirements of predicting and decoding serious infectious diseases such as new coronary epidemic situation, the invention provides a set of infectious disease epidemic situation prediction system based on a wiener cascade model for the first time, integrates large data of Internet search query and climate data including temperature and humidity, realizes accurate detection of epidemic situation of infectious disease, and is expected to provide a feasible new way for dynamically configuring epidemic prevention resources. The new model provided by the invention is suitable for accurate decoding and prediction of epidemic situation of infectious disease, is efficient and reliable, and is easy to software.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 (A) is a timing representation of key variables and their time-frequency spectra, including newly added diagnostic number, average temperature, average humidity, and search volume for the search query keyword "New crown symptoms";
fig. 2 (B) shows the predicted trend (grey) and actual trend (black) of the new diagnosis of new crown addition weekly.
Detailed Description
The invention builds a prediction model by using a wiener cascade method based on search query data and climate variables related to infectious diseases, thereby realizing the prediction and monitoring of epidemic situations of the infectious diseases.
The technical flow chart of the invention is shown in fig. 1, and the detailed process is as follows:
step 1) acquiring signals in an experimental time periods(t) Signal, signals(t) Including the number of new diagnosis cases per day, the daily temperature, the daily humidity, and the search amount of keywords related to infectious diseases. The number of newly added diagnosis cases is used as a target variable, and the daily temperature, the daily humidity and the keyword search amount are used as characteristic variables;
step 2) in order to verify the effectiveness of climate variables and keyword search volumes related to infectious diseases for disease epidemic prediction, the time series in step 1) is processed based on a time-frequency representation method (TFR).
The original TFR method is based on a fourier transform, such as Short Time Fourier Transform (STFT), which acquires frequency components by fourier transforming in fixed time intervals and is therefore generally not suitable for processing transient signals of short duration. To address the limitations of the fourier transform-based TFR method described above, we use a wavelet transform-based TFR algorithm to generate a clearer time-frequency spectrogram. Advantages of wavelet transform (CWT) over traditional fourier transform include the following: (1) The wavelet transformation has self-adaptive time-frequency resolution, namely an adjustable time-frequency window, the width of the window changes along with the frequency, and the width of the window automatically narrows when the frequency increases so as to improve the resolution. (2) In wavelet transform algorithms, the user is free to choose a mother wavelet function to better match the analyzed signal, common mother wavelet functions include Morlet wavelet, bump wavelet, harr wavelet, etc. In view of the above advantages, the continuous wavelet transform can be used to analyze transient behavior in various application scenarios, extracting oscillating components with time-varying frequencies and amplitudes. The following are the computational process and mathematical details of the wavelet transform.
Step S1: a wavelet function is selected that satisfies the allowable condition. Fourier analysis breaks down the signal into sine waves of a specific frequency, while wavelet analysis breaks down the signal into a series of mother wavelet functions with different scales. The mother wavelet function is different from sine wave, and is a fast-decaying wave oscillation, for example Morlet wavelet, and its time domain expression is
Figure 163493DEST_PATH_IMAGE004
(1)
Wherein, ψ is%f k t) The wavelet function is represented by a wavelet function,f k for the frequency of oscillation,tin order to be able to take time,σis a resolution parameter of the time domain;
step S2: the signal is based on the wavelet function of step S1s(t) Performing continuous wavelet transformation to obtain signals(t) Wavelet coefficients of (a)W s (f k , t) The expression is:
Figure 627972DEST_PATH_IMAGE005
(2)
wherein, represents the complex conjugate,τis a time integral variable.
The time-frequency spectrum diagram generated by the above steps is shown in fig. 2 (a).
Step 3) selecting a search query related to infectious diseases and climatic variable data based on the time-frequency spectrogram in step 2) and establishing a Wiener-cascades (Wiener-cascades) model. A schematic diagram of the wiener cascade model is shown in FIG. 1, which is composed of a dynamic linear unitP(i.e. wiener filter) and a static nonlinear elementQComposition is prepared. Specifically:
wiener filterPAt the same time as a nonlinear unitQCan be expressed as an input of
Figure 296851DEST_PATH_IMAGE006
(3)
Wherein the method comprises the steps ofu(n) Is thatnThe predicted incidence of the disease at the moment in time,x i (n - j) Representation ofFirst, theiThe characteristic variables are atn - jThe value of the moment of time is taken,Nrepresenting the total number of feature variables,Mthe time of the experiment is indicated as the time of the experiment,A ij is the firstiIs characterized in thatjWiener filter coefficient of moment is formed by wiener filter coefficientA ij The coefficient matrix of the composition is defined asAThe set of all feature variables is defined as a feature setF(n) = { x i (n - j), i = 1, 2, …, N, j = 1, 2, …, M }。
After the linear units, constructing nonlinear unitsQFor first order outputu(n) Adjusting to generate final output
y(n) = g(u(n)) (4)
Wherein, the liquid crystal display device comprises a liquid crystal display device,y(n) As a result of the fact that the target variable,nas a function of the discrete time variable,g(u(n) A third order polynomial, the weights of the third order polynomial function being determined by least squares optimization.
The calculation process of the wiener cascade model is as follows:
the first step: estimating coefficient matrix based on Wiener-Hopf equationARealizing feature setF(n) With the target variabley(n) The best dynamic linear fit between, expressed as:
A = (X T X) -1 X T Y (5)
wherein, the liquid crystal display device comprises a liquid crystal display device,Xrepresenting a matrix of all the characteristic variables,Yrepresenting a target variable matrix;
A = (X T X+λI) -1 X T Y (6)
wherein, the liquid crystal display device comprises a liquid crystal display device,λis a regularization constant which is set to be constant,Irepresenting the identity matrix;
and a second step of: based on the coefficient matrix estimated in the first stepAAnd feature setF(n) Obtainingu(n);
And a third step of: according to the result obtained in the second stepu(n) And a target variabley(n) Fittingu(n) Andy(n) A third-order polynomial between the two to obtain a target variabley(n) The weights of the third order polynomial function are determined by least squares optimization.
Step 4) training and verifying the performance of the model constructed in step 3) using a cross-validation method. The cross-validation method is of a wide variety, with k-fold cross-validation being one of the most widely used. The k-fold cross-validation method is simple and easy to implement and is often used for evaluating and selecting various machine learning models. However, if the conventional k-fold cross-validation is applied to the time series prediction problem, the time series structure of the time series data will be seriously damaged, and the situation that the past data is predicted by using the future data may occur, so that the estimation of the prediction performance is too optimistic.
Thus, the cross-validation process of the present invention is based primarily on a sliding window time series cross-validation algorithm (TSCV) rather than the traditional k-fold cross-validation. Unlike the standard cross-validation method, time series cross-validation uses a subset of the data before the predicted points as the training set and data points after the training window as the test set. For each run, the prediction accuracy of the measured data point is obtained. Then, at the next run, the training and testing window will be moved several steps towards the end of the time series and the estimation process will be repeated. The workflow of the TSCV algorithm is explained in detail as follows:
step S1: selecting an appropriate window lengthLIt represents the number of samples contained in the training set;
step S2: window length selected according to step S1LTaking the front of the beginning of the original time sequenceLThe individual samples are used as a training set and a single observation after a training window is selected as a test set. The diagram in fig. 1 illustrates in detail the segmentation of the training set and the test set, wherein dark gray observations represent the training set and black observations represent the test set;
step S3: training the wiener cascade model by using the training set in the step S2;
step S4: predicting the data of the test set by using the model trained in the step S3, and evaluating the prediction accuracy (for example, the root mean square error RMSE between the actual data point and the predicted data point of the test set can be selected for evaluation);
step S5: and (3) moving the training window to the right, and repeating the steps S3-S4 until the whole time sequence is covered, as shown in fig. 1.
Step S6: and calculating the average value of all the prediction precision in the step S4 to evaluate the overall performance of the prediction model.
Specifically, taking fig. 1 as an example, using 5 consecutive data points as a training set at a time, predicting a subsequent single data point based on the trained model, each running window is slid 1 step toward the end of the time series.
And 5) evaluating the model prediction performance by adopting the root mean square error and the spearman rank correlation coefficient. Root Mean Square Error (RMSE) is obtained by measuring the quantitative difference between the predicted and actual variables, and can be written as
Figure 555794DEST_PATH_IMAGE007
(7)
Wherein, the liquid crystal display device comprises a liquid crystal display device,ntandθrepresenting the number of samples, the actual value and the predicted value, respectively.
In addition, the Szelman correlation coefficient of the constructed model is calculatedrhoAnd its significancepValues. Spearman rank correlation coefficientrhoThe strength and direction of the monotonic relationship between the two variables are measured, and the distribution of the data input is not required, and can be expressed as
Figure 575702DEST_PATH_IMAGE008
(8)
Wherein the method comprises the steps ofd i Representing the rank difference of two time sequencesnRepresenting the length of each time series. The significance level for all statistical tests was set at 5%.
The present invention has completed validation analysis on weekly new crown augmentation diagnostic data of the washington columbia region. Based on search query data and climate variables related to the new coronary epidemic situation, a prediction model is constructed by using a wiener cascade method, and the complex dynamics of the new number of additional diagnosis of the new coronary epidemic situation every week in the Washington Columbia zone is accurately predicted. The result shows that the predicted trend is obviously related to the actual trend, and the fact that the proposed method can predict and decode complex infectious disease dynamics in the actual data set is verified. In conclusion, the wiener cascade model constructed based on the user search query data and the climate data can be used as a new tool for predicting and decoding infectious diseases, is helpful for helping health policy makers to allocate health resources and plan preventive solutions before potential outbreaks of diseases, and has certain potential value and application prospect in the field of infectious disease epidemic prediction. FIG. 2 (A) is a timing representation of key variables and their time-frequency spectra, including newly added diagnostic number, average temperature, average humidity, and search volume for the search query keyword "New crown symptoms"; fig. 2 (B) shows the predicted trend (grey) and actual trend (black) of new diagnosis cases added every week.

Claims (4)

1. A construction method of an infectious disease prediction model is characterized by comprising the following steps:
step one, acquiring signals in an experimental time period;
step two, carrying out time-frequency analysis on the signals obtained in the step one based on continuous wavelet transformation to obtain a signal time-frequency spectrogram;
step three, a wiener cascade model is established according to the signals in the experimental time period acquired in the step one and the signal time-frequency spectrogram acquired in the step two, wherein the wiener cascade model consists of a dynamic linear unit, namely a wiener filter P and a static nonlinear unit Q;
in the first step, s (t) in the experimental time period is obtained, wherein the s (t) comprises the number of newly-increased diagnosis cases per day, the daily temperature, the daily humidity and the keyword search quantity related to infectious diseases;
the number of newly added diagnosis cases is used as a target variable, and the daily temperature, the daily humidity and the keyword search quantity related to infectious diseases are used as characteristic variables;
in the second step, the method for performing time-frequency analysis comprises the following steps:
step S1, selecting a wavelet function satisfying the allowable condition, expressed as:
Figure FDA0004103628190000011
wherein ψ (f) k T) represents a wavelet function, f k The oscillation frequency, t is time, and sigma is a resolution parameter of a time domain;
step S2, performing continuous wavelet transformation on S (t) based on the wavelet function in step S1 to obtain a wavelet coefficient W of S (t) s (f k T), the expression is:
Figure FDA0004103628190000021
wherein τ is the time integral variable;
in the third step, the output u (n) of the wiener filter P in the wiener cascade model is taken as the input of the nonlinear unit Q, and is expressed as:
Figure FDA0004103628190000022
wherein u (n) is the predicted incidence at time n, n is a discrete time variable, x i (N-j) represents the value of the ith characteristic variable at the time of N-j, N represents the total number of characteristic variables, M represents the experimental time, A ij For the wiener filter coefficient of the ith feature at the moment j, the wiener filter coefficient A ij The coefficient matrix of the composition is defined as a, and the set of all feature variables is defined as feature set F (n) = { x i (n-j),i=1,2,...,N;j=1,2,...,M};
The nonlinear unit Q adjusts the output u (n) to generate a final output
y(n)=g(u(n))
Wherein y (n) is a target variable, and g (u (n)) is a third-order polynomial;
the method for determining the wiener cascade model comprises the following steps:
the first step: estimating coefficient matrix A based on Wiener-Hopf equation, realizing the best dynamic linear fitting between feature sets F '(n) and y' (n), expressed as:
A=(X T X+λI) -1 X T Y
wherein λ is a regularization constant, and I represents an identity matrix;
and a second step of: obtaining u '(n) from the coefficient matrix a and the feature set F' (n) estimated in the first step,
and a third step of: fitting a third order polynomial between u '(n) and y' (n) according to u '(n) and y' (n) obtained in the second step, wherein the weight of the third order polynomial function is determined through least square optimization.
2. The method for constructing an infectious disease prediction model according to claim 1, wherein:
training and verifying the wiener cascade model established in the step three by using a sliding window time sequence cross verification algorithm.
3. The method for constructing an infectious disease prediction model according to claim 2, wherein:
the cross verification method comprises the following steps: based on a sliding window time sequence cross-validation algorithm TSCV, a part of data subsets in front of predicted points are used as training sets, data points behind a training window are used as test sets, the prediction precision of the data points to be tested can be obtained for each operation, and the training and test windows can be moved to the tail of the time sequence for a plurality of steps in the next operation.
4. A method of constructing a predictive model of infectious disease as set forth in claim 3, wherein:
the TSCV algorithm method comprises the following steps:
step S1: selecting a window length L representing the number of samples contained in the training set;
step S2: according to the window length L selected in the last step, taking the first L samples at the beginning of the original time sequence as a training set, and selecting a single observed value after the training window as a test set;
step S3: training the wiener cascade model by using the training set in the step S2;
step S4: predicting the data of the test set by using the model trained in the step S3, and evaluating the prediction precision;
step S5: moving the training window to the right, and repeating the steps S3-S4 until the whole time sequence is covered;
step S6: the average value of all prediction accuracies in step S4 is calculated.
CN202211496712.8A 2022-11-28 2022-11-28 Method for constructing infectious disease prediction model Active CN115631869B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211496712.8A CN115631869B (en) 2022-11-28 2022-11-28 Method for constructing infectious disease prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211496712.8A CN115631869B (en) 2022-11-28 2022-11-28 Method for constructing infectious disease prediction model

Publications (2)

Publication Number Publication Date
CN115631869A CN115631869A (en) 2023-01-20
CN115631869B true CN115631869B (en) 2023-05-05

Family

ID=84910079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211496712.8A Active CN115631869B (en) 2022-11-28 2022-11-28 Method for constructing infectious disease prediction model

Country Status (1)

Country Link
CN (1) CN115631869B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116168847B (en) * 2023-04-26 2023-08-11 南京邮电大学 Infectious disease prediction method based on optimized next generation reserve pool calculation
CN116525135B (en) * 2023-04-27 2024-03-19 兰州大学 Method for predicting epidemic situation development situation by space-time model based on meteorological factors

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327682A (en) * 2020-02-28 2021-08-31 天津职业技术师范大学(中国职业培训指导教师进修中心) Infectious disease epidemic situation prediction and monitoring system and method based on keyword search time sequence and application thereof
CN113793693A (en) * 2021-09-18 2021-12-14 北京大学第三医院(北京大学第三临床医学院) Infectious disease prevalence trend prediction method and device
CN115240869A (en) * 2022-07-18 2022-10-25 石会文 Intelligent infectious disease monitoring and early warning system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10332637B2 (en) * 2013-02-15 2019-06-25 Battelle Memorial Institute Use of web-based symptom checker data to predict incidence of a disease or disorder
CN109656918A (en) * 2019-01-04 2019-04-19 平安科技(深圳)有限公司 Prediction technique, device, equipment and the readable storage medium storing program for executing of epidemic disease disease index
CN110085327A (en) * 2019-04-01 2019-08-02 东莞理工学院 Multichannel LSTM neural network Influenza epidemic situation prediction technique based on attention mechanism
US20210065914A1 (en) * 2019-09-04 2021-03-04 SIVOTEC BioInformatics LLC Dynamic, real-time, genomics decision support, research, and simulation
CN111415752B (en) * 2020-03-01 2023-05-12 集美大学 Hand-foot-and-mouth disease prediction method integrating meteorological factors and search indexes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327682A (en) * 2020-02-28 2021-08-31 天津职业技术师范大学(中国职业培训指导教师进修中心) Infectious disease epidemic situation prediction and monitoring system and method based on keyword search time sequence and application thereof
CN113793693A (en) * 2021-09-18 2021-12-14 北京大学第三医院(北京大学第三临床医学院) Infectious disease prevalence trend prediction method and device
CN115240869A (en) * 2022-07-18 2022-10-25 石会文 Intelligent infectious disease monitoring and early warning system

Also Published As

Publication number Publication date
CN115631869A (en) 2023-01-20

Similar Documents

Publication Publication Date Title
CN115631869B (en) Method for constructing infectious disease prediction model
Khan et al. Wavelet based hybrid ANN-ARIMA models for meteorological drought forecasting
Kulkarni et al. Common-input models for multiple neural spike-train data
Seth Granger causality
Wang et al. Development and evaluation of a deep learning approach for modeling seasonality and trends in hand-foot-mouth disease incidence in mainland China
Wrobel et al. Registration for exponential family functional data
CN108399434B (en) Analysis and prediction method of high-dimensional time series data based on feature extraction
CN112016097B (en) Method for predicting network security vulnerability time to be utilized
Schliep et al. Multilevel latent Gaussian process model for mixed discrete and continuous multivariate response data
Brizzi et al. Extending Bayesian back-calculation to estimate age and time specific HIV incidence
Bertolacci et al. Adaptspec-x: Covariate-dependent spectral modeling of multiple nonstationary time series
Yi et al. Structural health monitoring data cleaning based on Bayesian robust tensor learning
Elbert et al. Development and evaluation of a data‐adaptive alerting algorithm for univariate temporal biosurveillance data
Giebel et al. Simulation and prediction of wind speeds: A neural network for Weibull
CN112149355A (en) Soft measurement method based on semi-supervised dynamic feedback stack noise reduction self-encoder model
Banyal et al. Technology landscape for epidemiological prediction and diagnosis of covid-19
Santos et al. Surfacing estimation uncertainty in the decay parameters of Hawkes processes with exponential kernels
Nyarige The bootstrap for the functional autoregressive model FAR (1)
Celotto et al. Estimating the temporal evolution of synaptic weights from dynamic functional connectivity
Huang et al. An Accurate Prediction Algorithm of RUL for Bearings: Time‐Frequency Analysis Based on MRCNN
Liboschik Modeling count time series following generalized linear models
Mlakar et al. SMIXS: Novel efficient algorithm for non-parametric mixture regression-based clustering
Cattaneo Effi cient Semiparametric Estimation of Multi&valued Treatment Effects
Krumin et al. Correlation-distortion based identification of Linear-Nonlinear-Poisson models
Huang et al. Complex network robustness prediction using attention-augmented CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant