CN109243619B - Generation method and device of prediction model and computer readable storage medium - Google Patents

Generation method and device of prediction model and computer readable storage medium Download PDF

Info

Publication number
CN109243619B
CN109243619B CN201810768332.2A CN201810768332A CN109243619B CN 109243619 B CN109243619 B CN 109243619B CN 201810768332 A CN201810768332 A CN 201810768332A CN 109243619 B CN109243619 B CN 109243619B
Authority
CN
China
Prior art keywords
sequence
influenza
model
prediction
period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810768332.2A
Other languages
Chinese (zh)
Other versions
CN109243619A (en
Inventor
李弦
徐亮
阮晓雯
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810768332.2A priority Critical patent/CN109243619B/en
Priority to PCT/CN2018/107488 priority patent/WO2020010710A1/en
Publication of CN109243619A publication Critical patent/CN109243619A/en
Application granted granted Critical
Publication of CN109243619B publication Critical patent/CN109243619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a generation method of a prediction model, which comprises the following steps: determining a target area, a target time unit to be predicted and a preset period; acquiring a data sequence of the percentage of the influenza-like cases of a target area in a plurality of continuous time units before a target time unit; delaying the influenza sample case percentage data sequence by 0 to k orders before and after a preset period according to the preset period and a preset order k to obtain 2k +1 data sequences; calculating autocorrelation coefficients of the 2k +1 data sequences and the influenza sample case percentage data sequences, and determining a prediction period according to the data sequences of which the first autocorrelation coefficient is greater than a preset threshold value; and calculating model parameters, and establishing an autoregressive integral moving average model as a prediction model according to the model parameters and the prediction period. The invention also provides a device for generating the prediction model and a computer readable storage medium. The method improves the prediction accuracy of the autoregressive integral moving average model.

Description

Generation method and device of prediction model and computer readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a prediction model, and a computer-readable storage medium.
Background
For the prediction of influenza, the current more common approach used in epidemiology is to predict the percentage of flu-like cases using an autoregressive integral moving average model. The autoregressive integral moving average model is used for influenza prediction, and generally a fixed and unchangeable period is set for a prediction region according to the change rule of historical influenza sample case percentage data of the region for modeling, such as one year or half year. However, the fixed period may ignore the effects of non-periodically occurring factors, such as the length of the annual cycle and the difference in the change of the solar terms, resulting in a large deviation of the prediction results. For example, different years have different lengths, and some years have 53 weeks, such as 2013, that is, the period length of the years changes. Therefore, if a fixed period modeling is used, the prediction result of the model is greatly deviated.
Disclosure of Invention
The invention provides a generation method and a generation device of a prediction model and a computer readable storage medium, and mainly aims to improve the prediction accuracy of an autoregressive integral moving average model.
In order to achieve the above object, the present invention further provides a method for generating a prediction model, including:
determining a target area and a target time unit to be predicted, and acquiring a preset period;
obtaining a data sequence of percent of influenza-like cases for a plurality of consecutive time units of the target region prior to the target time unit;
according to the preset period and the preset order k, delaying the influenza sample case percentage data sequence by 0 to k orders respectively before and after the preset period to obtain 2k +1 data sequences;
respectively calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza sample case percentage data sequences, and determining a prediction period according to a hysteresis sequence and the data sequence of which the first autocorrelation coefficient is greater than a preset threshold;
and calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as the prediction model according to the model parameters and the prediction period.
Optionally, after the step of obtaining a data sequence of percentage of influenza-like cases in a plurality of consecutive time units before the target time unit of the target region, the method further comprises the steps of:
detecting whether the influenza-like case percentage data sequence is a plateau sequence;
if yes, executing the step of acquiring a preset period determined according to the periodicity presented by the percentage data of the influenza-like cases;
if not, converting the influenza sample case percentage data sequence into a stable sequence according to differential operation.
Optionally, the step of detecting whether the influenza-like case percentage data sequence is a plateau sequence comprises:
and performing unit root inspection on the influenza sample case percentage data to detect whether the influenza sample case percentage data sequence is a stable sequence, wherein if the unit root in the sequence is detected, the sequence is judged to be a non-stable sequence, and if not, the sequence is judged to be a stable sequence.
Optionally, the step of obtaining the preset period includes:
determining the preset period according to the periodicity presented by the percentage influenza-like case data.
Optionally, the step of calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and the step of establishing an autoregressive integral moving average model as the prediction model according to the model parameters and the prediction period comprises:
calculating the autocorrelation coefficient and the partial autocorrelation coefficient of the steady influenza sample case percentage data sequence with the determined prediction period, and drawing an autocorrelation graph and a partial autocorrelation graph;
and judging whether the calculated partial autocorrelation coefficient and autocorrelation coefficient are trailing or truncated according to the autocorrelation graph and the partial autocorrelation graph, and selecting an autoregressive integral moving average model according to a judgment result to fit the stable influenza sample case percentage data sequence to obtain the prediction model.
In order to achieve the above object, the present invention further provides a prediction model generation apparatus, including a memory and a processor, wherein the memory stores a model generation program operable on the processor, and the model generation program, when executed by the processor, implements the steps of:
determining a target area and a target time unit to be predicted, and acquiring a preset period;
obtaining a data sequence of percent of influenza-like cases for a plurality of consecutive time units of the target region prior to the target time unit;
according to the preset period and the preset order k, delaying the influenza sample case percentage data sequence by 0 to k orders respectively before and after the preset period to obtain 2k +1 data sequences;
respectively calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza sample case percentage data sequences, and determining a prediction period according to a lag sequence and a data sequence of which the first autocorrelation coefficient is greater than a preset threshold;
calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as the prediction model according to the model parameters and the prediction period.
Optionally, the model generation program is further executable by the processor to, after the step of obtaining a sequence of percent data of flow-like cases in a plurality of consecutive time units prior to the target time unit of the target region, further implement the steps of:
detecting whether the influenza-like case percentage data sequence is a plateau sequence;
if yes, executing the step of acquiring a preset period determined according to the periodicity presented by the percentage data of the influenza-like cases;
if not, converting the influenza sample case percentage data sequence into a stable sequence according to differential operation.
Optionally, the step of detecting whether the influenza-like case percentage data sequence is a plateau sequence comprises:
and performing unit root detection on the influenza sample case percentage data to detect whether the influenza sample case percentage data sequence is a stable sequence, wherein if a unit root is detected in the sequence, the sequence is determined to be a non-stable sequence, and if not, the sequence is determined to be a stable sequence.
Optionally, the step of obtaining the preset period includes:
determining the preset period according to the periodicity presented by the percentage influenza-like case data.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a model generation program, which is executable by one or more processors to implement the steps of the generation method of the prediction model as described above.
The generation method, the generation device and the computer readable storage medium of the prediction model provided by the invention are used for determining a target area and a target time unit to be predicted and acquiring a preset period; acquiring a data sequence of the percentage of the influenza-like cases of a target area in a plurality of continuous time units before a target time unit; respectively delaying the influenza sample case percentage data sequence by 0-k order before and after a preset period according to the preset period and a preset order k to obtain 2k +1 data sequences; respectively calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza sample case percentage data sequences, and determining a prediction period according to a lag sequence and the data sequence of which the first autocorrelation coefficient is greater than a preset threshold; and calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as a prediction model according to the model parameters and the prediction period. When the data of the target time unit is predicted, the autocorrelation coefficients among the sequences are calculated after the data sequences before the target time unit lag behind a plurality of orders, a prediction period suitable for the current target time unit is further determined according to the autocorrelation coefficients, modeling is performed by using the prediction period, and the prediction accuracy of the autoregressive integral sliding average model is improved.
Drawings
Fig. 1 is a schematic flow chart of a method for generating a prediction model according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating an internal structure of a prediction model generation apparatus according to an embodiment of the present invention;
fig. 3 is a block diagram illustrating a model generation program in the prediction model generation apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a generation method of a prediction model. Fig. 1 is a schematic flow chart of a method for generating a prediction model according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the method for generating the prediction model includes:
and S10, determining a target area and a target time unit to be predicted, and acquiring a preset period.
Step S20, acquiring a data sequence of percentage of influenza-like cases of the target region in a plurality of consecutive time units before the target time unit.
The target area in this embodiment is a certain area where the prediction of the percentage of influenza-like cases is to be performed, for example, a certain city or a certain province. In addition, the scheme of the present embodiment is described in units of weeks, and assuming that the percentage of influenza-like cases in the present week is to be predicted, an ARIMA (Autoregressive Integrated Moving Average) model is constructed from the historical data series of the percentage of influenza-like cases in each week in 5 years before the present week, that is, the percentage of influenza-like cases in 260 weeks before the present week is to be predicted, so that the ARIMA model needs to be constructed using the data of the percentage of influenza-like cases in the previous week. Over time, the target time unit may move forward and its historical data may change, and furthermore, the periodicity that these historical data actually exhibit may change due to the difference in the number of weeks and solar terms per year. Without the influence of these factors, it can be seen from the observation of the data series of percent influenza-like cases over a long period of time that the data is annually periodic, i.e., with a period of 52 weeks, but that the period is less than or greater than 52 weeks when the above factors are present. Therefore, in the method of the present embodiment, before the dynamic adjustment of the prediction period, the preset period may be determined to be 52 weeks according to the periodicity presented by the data of the percentage of cases of the flu sample, and then the prediction period is adjusted with 52 weeks as a reference. Assuming that the percentage of influenza-like cases of the first week of 2018, 6 months is currently to be predicted, a data sequence of the percentage of influenza-like cases of each week from 2013 to 2017 needs to be obtained. In other embodiments, the number of time units in the historical data is set by the user based on actual predicted demand.
Further, in order to improve the prediction accuracy of the established ARIMA model, after the data sequence is obtained, whether the data sequence is a stable sequence is detected, if so, the subsequent steps are continuously executed, and if not, the data sequence is converted into a stable influenza sample case percentage data sequence according to differential operation. Specifically, the step of detecting whether the influenza-like case percentage data sequence is a plateau sequence comprises: and performing unit root inspection on the influenza sample case percentage data to detect whether the influenza sample case percentage data sequence is a stable sequence, wherein if the unit root in the sequence is detected, the sequence is judged to be a non-stable sequence, and if not, the sequence is judged to be a stable sequence.
And S30, respectively delaying the data sequence of the percentage of the influenza sample cases by 0 to k orders before and after the preset period according to the preset period and a preset order k to obtain 2k +1 data sequences.
Step S40, calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza-like case percentage data sequences respectively, and determining a prediction period according to a hysteresis sequence and the data sequence with the first autocorrelation coefficient larger than a preset threshold.
In this embodiment, the preset order K is a positive integer, preferably, K is 2 to 6, and the preset period T is set below 0 Taking the case where k =52 and k =2 as an example, the series of data of the percentage of cases of the flu sample is acquired at a preset period T 0 A data sequence with a lag of K order around, canTo obtain 2k +1 data sequences. Suppose that the history data from 2013 to 2017 constitute the original data sequence of [ W1, W2, W3, \8230 ], \8230; W260]According to a predetermined period T 0 =52 extracted original data sequence L0= [ W1, W2, W3, \8230; W206]Respectively lagging the sequence by 0 to 2 orders before and after a preset period to obtain the following 5 sequences:
l1 is [ W50, W51, W52, \ 8230; \ 8230; W256];
l2 is [ W52, W53, W54, \ 8230; \ 8230; W257];
l3 is [ W53, W54, W55, \8230 ], \8230; W258];
l4 is [ W54, W55, W56, \8230 ], \8230, W259];
l5 is [ W55, W56, W57, \ 8230 ], \ 8230and W260].
That is, the data sequence L1 is obtained after 50 weeks from the original data sequence, the data sequence L2 is obtained after 51 weeks from the original data sequence, the data sequence L3 is obtained after 52 weeks from the original data sequence, the data sequence L4 is obtained after 53 weeks from the original data sequence, and the data sequence L5 is obtained after 54 weeks from the original data sequence. Autocorrelation coefficients between the sequences L1, L2, L3, L4, L5 and the sequence L0 are calculated, respectively.
When the period is not affected by factors such as year or solar terms, that is, the predicted period is 52 weeks, the data sequences from week 1 to week 206 and from week 53 to week 258 are the same within a small error range, and the autocorrelation coefficients of L0 and L3 are the largest. However, if the period of the data sequence varies due to the influence of other factors, the correlation between L0 and L3 is weak, and the correlation between L0 and other data sequences is strong, so that the autocorrelation coefficients obtained by the above calculation are determined in the order from L1 to L5 as a sequence in which the first autocorrelation coefficient is larger than the preset threshold, and the number of cycles after the delay of the sequence is used as the prediction period. For example, if the autocorrelation coefficients between L0 and L4 are calculated as a sequence in which the first autocorrelation coefficient of L1 to L5 is greater than the preset threshold, the current prediction period is determined to be 53 weeks.
And S50, calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as the prediction model according to the model parameters and the prediction period.
For the stable influenza ILI sequence with the determined prediction period, the autocorrelation coefficient and the partial autocorrelation coefficient of the stable influenza ILI sequence are respectively obtained to determine the AR parameters and MA parameters in the model, namely the values of p, q and q, and then an ARIMA model is established according to the obtained parameters and the prediction period. Specifically, a stationary data sequence obtained through differential operation conversion is obtained, and the order of the differential operation during the stationary sequence conversion is used as the value of the parameter d of the ARIMA model. Respectively obtaining the autocorrelation coefficient and the partial autocorrelation coefficient of the stationary data sequence with the determined prediction period, drawing an autocorrelation graph and a partial autocorrelation graph, judging whether the partial autocorrelation coefficient and the autocorrelation coefficient are trailing or truncated according to the autocorrelation graph and the partial autocorrelation graph, and selecting a corresponding ARIMA model to fit the stationary data sequence; and performing parameter estimation on the fitted ARIMA model to obtain the optimal hierarchy p and order q, and then performing validity check on the model to determine the final prediction model. Specifically, under an exponential smoothing model, whether the residual error of the ARIMA model is normal distribution with an average value of 0 and a variance of a constant is observed, and whether continuous residual errors are related is observed, if so, the model is judged to pass the verification.
In the process of using the obtained ARIMA model for real-time prediction, the influenza case percentage data is predicted one week in advance by a rolling prediction mode. There may be variations in the period of the historical influenza-like case percentage data series for different prediction points over time. Therefore, each time prediction is carried out one week in advance, a new prediction period needs to be determined according to the percentage data of the influenza sample cases of continuous weeks before the week, and the model needs to be dynamically updated, so that the prediction accuracy of the autoregressive integral moving average model is improved.
In the generation method of the prediction model provided by this embodiment, a target area and a target time unit to be predicted are determined, and a preset period is obtained; acquiring a data sequence of percentage of influenza-like cases of a target area in a plurality of time units before a target time unit; respectively delaying the influenza sample case percentage data sequence by 0-k order before and after a preset period according to the preset period and a preset order k to obtain 2k +1 data sequences; respectively calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza sample case percentage data sequences, and determining a prediction period according to a hysteresis sequence and the data sequence of which the first autocorrelation coefficient is greater than a preset threshold; and calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as a prediction model according to the model parameters and the prediction period. When the data of the target time unit is predicted, the autocorrelation coefficients among the sequences are calculated after the data sequences before the target time unit lag behind a plurality of orders, a prediction period suitable for the current target time unit is further determined according to the autocorrelation coefficients, modeling is performed by using the prediction period, and the prediction accuracy of the autoregressive integral sliding average model is improved.
The invention also provides a device for generating the prediction model. Fig. 2 is a schematic diagram illustrating an internal structure of a prediction model generation apparatus according to an embodiment of the present invention.
In the present embodiment, the prediction model generation device 1 may be a PC (Personal Computer), or may be a terminal device such as a smartphone, a tablet Computer, or a mobile Computer. The predictive model generating device 1 comprises at least a memory 11, a processor 12, a network interface 13, and a communication bus.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the generation apparatus 1 of the prediction model, for example a hard disk of the generation apparatus 1 of the prediction model. The memory 11 may also be an external storage device of the predictive model generation apparatus 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the predictive model generation apparatus 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the generation apparatus 1 of the prediction model. The memory 11 may be used not only to store application software installed in the prediction model generation apparatus 1 and various types of data, such as a code of the model generation program 01, but also to temporarily store data that has been output or is to be output.
Processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, executes program code or processes data stored in memory 11, such as executing model generator 01.
The network interface 13 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used for establishing a communication connection between the apparatus 1 and other electronic devices.
The communication bus is used to enable connection communication between these components.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying the information processed in the predictive model generating device 1 and for displaying a visual user interface.
Fig. 2 shows only the predictive model generator 1 with the components 11-13 and the model generator 01, and it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the predictive model generator 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, a model generation program 01 is stored in the memory 11; the processor 12, when executing the model generation program 01 stored in the memory 11, implements the following steps:
and determining a target area and a target time unit to be predicted, and acquiring a preset period.
Acquiring a data sequence of percent of influenza-like cases for a plurality of consecutive time units of the target region prior to the target time unit.
And respectively lagging the influenza sample case percentage data sequences by 0 to k orders before and after the preset period according to the preset period and the preset order k to obtain 2k +1 data sequences.
And respectively calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza sample case percentage data sequences, and determining a prediction period according to a lag sequence and the data sequence of which the first autocorrelation coefficient is greater than a preset threshold value.
Calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as the prediction model according to the model parameters and the prediction period.
The target area in this embodiment is a certain area where the prediction of the percentage of influenza-like cases is to be performed, for example, a certain city or a certain province. In addition, the scheme of the present embodiment is described in units of weeks, and assuming that the percentage of influenza-like cases in the present week is to be predicted, an ARIMA (Autoregressive Integrated Moving Average) model is constructed from the historical data series of the percentage of influenza-like cases in each week in 5 years before the present week, that is, the percentage of influenza-like cases in 260 weeks before the present week is to be predicted, so that the ARIMA model needs to be constructed using the data of the percentage of influenza-like cases in the previous week. Over time, the target time unit may move forward and its historical data may change, and furthermore, the periodicity that these historical data actually exhibit may change due to the difference in the number of weeks and solar terms per year. Without the influence of these factors, it can be seen from the observation of the data series of percent influenza-like cases over a long period of time that the data is annually periodic, i.e., with a period of 52 weeks, but that the period is less than or greater than 52 weeks when the above factors are present. Therefore, in the method of the present embodiment, before the dynamic adjustment of the prediction period, the preset period may be determined to be 52 weeks according to the periodicity presented by the data of the percentage of cases of the flu sample, and then the prediction period is adjusted with 52 weeks as a reference. Assuming that the percentage of influenza-like cases for the first week of 6 months in 2018 is currently to be predicted, a data sequence of the percentage of influenza-like cases for each week from 2013 to 2017 needs to be obtained. In other embodiments, the number of time units in the historical data is set by the user based on actual predicted demand.
Further, in order to improve the prediction accuracy of the established ARIMA model, after the data sequence is obtained, whether the data sequence is a stable sequence is detected, if so, the subsequent steps are continuously executed, and if not, the data sequence is converted into a stable influenza sample case percentage data sequence according to differential operation. Specifically, the step of detecting whether the influenza-like case percentage data sequence is a plateau sequence comprises: and performing unit root detection on the influenza sample case percentage data to detect whether the influenza sample case percentage data sequence is a stable sequence, wherein if a unit root is detected in the sequence, the sequence is determined to be a non-stable sequence, and if not, the sequence is determined to be a stable sequence.
In this embodiment, the preset order K is a positive integer, preferably, K is 2 to 6, and the preset period T is set below 0 Taking the case where k =52 and k =2 as an example, the series of data of the percentage of cases of the flu sample is acquired at a preset period T 0 The data sequences with the lag of K order nearby can obtain 2k +1 data sequences. Assume that the historical data during the period from 2013 to 2017 constitutes the original data sequence of [ W1, W2, W3, \8230; W260]According to a predetermined period T 0 =52 extracted original data sequence L0= [ W1, W2, W3, \8230; W206]Respectively delaying the sequence by 0 to 2 orders before and after a preset period to obtain the following 5 sequences:
l1 is [ W50, W51, W52, \ 8230; \ 8230; W256];
l2 is [ W52, W53, W54, \ 8230; \ 8230; W257];
l3 is [ W53, W54, W55, \8230 ], \8230; W258];
l4 is [ W54, W55, W56, \ 8230; \ 8230; W259];
l5 is [ W55, W56, W57, \ 8230 ], \ 8230and W260].
That is, the data sequence L1 is obtained after 50 weeks from the original data sequence, the data sequence L2 is obtained after 51 weeks from the original data sequence, the data sequence L3 is obtained after 52 weeks from the original data sequence, the data sequence L4 is obtained after 53 weeks from the original data sequence, and the data sequence L5 is obtained after 54 weeks from the original data sequence. Autocorrelation coefficients between the sequences L1, L2, L3, L4, L5 and the sequence L0 are calculated, respectively.
When the period is not affected by factors such as year or solar terms, that is, the prediction period is 52 weeks, the data sequences from week 1 to week 206 and from week 53 to week 258 are the same within a small error range, and the autocorrelation coefficients of L0 and L3 are the largest. However, if the period of the data sequence varies due to the influence of other factors, the correlation between L0 and L3 is weak, and the correlation between L0 and other data sequences is strong, so that the autocorrelation coefficients obtained by the above calculation are determined in the order from L1 to L5 as a sequence in which the first autocorrelation coefficient is larger than the preset threshold, and the number of cycles after the delay of the sequence is used as the prediction period. For example, if the autocorrelation coefficients between L0 and L4 are calculated as a sequence in which the first autocorrelation coefficient of L1 to L5 is greater than the preset threshold, the current prediction period is determined to be 53 weeks.
And respectively solving the autocorrelation coefficient and the partial autocorrelation coefficient of the stable influenza ILI sequence with the determined prediction period to determine the AR parameters and MA parameters in the model, namely the values of p, q and q, and further establishing an ARIMA model according to the obtained parameters and the prediction period. Specifically, a stationary data sequence obtained through differential operation conversion is obtained, and the order of the differential operation during the stationary sequence conversion is used as the value of the parameter d of the ARIMA model. Respectively obtaining the autocorrelation coefficient and the partial autocorrelation coefficient of the stationary data sequence with the determined prediction period, drawing an autocorrelation graph and a partial autocorrelation graph, judging whether the partial autocorrelation coefficient and the autocorrelation coefficient are trailing or truncated according to the autocorrelation graph and the partial autocorrelation graph, and selecting a corresponding ARIMA model to fit the stationary data sequence; and performing parameter estimation on the fitted ARIMA model to obtain the optimal hierarchy p and order q, and then performing validity check on the model to determine the final prediction model. Specifically, under an exponential smoothing model, whether the residual error of the ARIMA model is normal distribution with an average value of 0 and a variance of a constant is observed, whether continuous residual errors are related is observed, and if yes, the model is judged to pass the verification.
In the process of using the obtained ARIMA model for real-time prediction, the influenza case percentage data is predicted one week in advance by a rolling prediction mode. Over time, there may be variations in the period of the historical percent influenza case data series for different prediction points. Therefore, each time prediction is carried out one week in advance, a new prediction period needs to be determined according to the percentage data of the influenza-like cases of a plurality of continuous weeks before the week, and the model is dynamically updated, so that the prediction accuracy of the autoregressive integral moving average model is improved.
The generation device of the prediction model provided by the embodiment determines a target area and a target time unit to be predicted, and acquires a preset period; acquiring a data sequence of percentage of influenza-like cases of a target area in a plurality of time units before a target time unit; respectively delaying the influenza sample case percentage data sequence by 0-k order before and after a preset period according to the preset period and a preset order k to obtain 2k +1 data sequences; respectively calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza sample case percentage data sequences, and determining a prediction period according to a lag sequence and the data sequence of which the first autocorrelation coefficient is greater than a preset threshold; and calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as a prediction model according to the model parameters and the prediction period. When the data of the target time unit is predicted, the autocorrelation coefficients among the sequences are calculated after the data sequences before the target time unit lag behind a plurality of orders, a prediction period suitable for the current target time unit is further determined according to the autocorrelation coefficients, modeling is performed by using the prediction period, and the prediction accuracy of the autoregressive integral sliding average model is improved.
Alternatively, in other embodiments, the model generating program may be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention, where the module referred to in the present invention refers to a series of computer program instruction segments capable of performing specific functions for describing the execution process of the model generating program in the prediction model generating device.
For example, referring to fig. 3, a schematic diagram of program modules of a model generation program in an embodiment of the prediction model generation apparatus of the present invention is shown, in this embodiment, the model generation program may be divided into a data determination module 10, a sequence acquisition module 20, a data calculation module 30, and a model generation module 40, which exemplarily:
the data determination module 10 is configured to: determining a target area and a target time unit to be predicted, and acquiring a preset period;
and obtaining a data sequence of percent of influenza-like cases for a plurality of consecutive time units of the target region prior to the target time unit;
the sequence acquisition module 20 is configured to: according to the preset period and the preset order k, delaying the influenza sample case percentage data sequence by 0 to k orders respectively before and after the preset period to obtain 2k +1 data sequences;
the data calculation module 30 is configured to: respectively calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza sample case percentage data sequences, and determining a prediction period according to a hysteresis sequence and the data sequence of which the first autocorrelation coefficient is greater than a preset threshold;
the model generation module 40 is configured to: and calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as the prediction model according to the model parameters and the prediction period.
The functions or operation steps implemented by the program modules such as the data determining module 10, the sequence acquiring module 20, the data calculating module 30, and the model generating module 40 when executed are substantially the same as those of the above embodiments, and are not described herein again.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, on which a model generation program is stored, where the model generation program is executable by one or more processors to implement the following operations:
determining a target area and a target time unit to be predicted, and acquiring a preset period;
obtaining a data sequence of percent of influenza-like cases for a plurality of consecutive time units of the target region prior to the target time unit;
respectively delaying the influenza sample case percentage data sequences by 0 to k orders before and after the preset period according to the preset period and a preset order k to obtain 2k +1 data sequences;
respectively calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza sample case percentage data sequences, and determining a prediction period according to a lag sequence and a data sequence of which the first autocorrelation coefficient is greater than a preset threshold;
calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as the prediction model according to the model parameters and the prediction period. The embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the apparatus and the method for generating the prediction model, and will not be described herein again.
It should be noted that, the above numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of another identical element in a process, apparatus, article, or method comprising the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A method of generating a predictive model, the method comprising:
determining a target area and a target time unit to be predicted, and acquiring a preset period, wherein the preset period is a preset cycle;
obtaining a data sequence of percent of influenza-like cases for a plurality of consecutive time units of the target region prior to the target time unit;
respectively delaying the influenza sample case percentage data sequences by 0 to k orders before and after the preset period according to the preset period and a preset order k to obtain 2k +1 data sequences;
calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza sample case percentage data sequences respectively, determining a data sequence of which the first autocorrelation coefficient is greater than a preset threshold according to a lag sequence, and taking the lag cycle of the data sequence as a prediction period;
calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as the prediction model according to the model parameters and the prediction period;
wherein the step of calculating model parameters from the sequence of percent of cases of influenza that define a prediction period and building an autoregressive integral moving average model as the prediction model from the model parameters and the prediction period comprises: calculating the autocorrelation coefficient and the partial autocorrelation coefficient of the steady influenza sample case percentage data sequence with the determined prediction period, and drawing an autocorrelation graph and a partial autocorrelation graph; and judging whether the calculated partial autocorrelation coefficient and autocorrelation coefficient are trailing or truncated according to the autocorrelation graph and the partial autocorrelation graph, and selecting an autoregressive integral moving average model according to a judgment result to fit the stable influenza sample case percentage data sequence to obtain the prediction model.
2. The method of generating a predictive model according to claim 1, wherein after the step of obtaining a data sequence of percent of influenza-like cases in a plurality of consecutive time units before the target time unit of the target region, the method further comprises the steps of:
detecting whether the influenza-like case percentage data sequence is a plateau sequence;
if yes, the step of delaying the flow sample case percentage data sequence by 0 to k orders respectively before and after the preset period according to the preset period and a preset order k to obtain 2k +1 data sequences is carried out;
if not, converting the influenza sample case percentage data sequence into a stable sequence according to differential operation.
3. The method of generating a predictive model of claim 2, wherein the step of determining whether the data sequence of percent influenza-like cases is a stationary sequence comprises:
and performing unit root inspection on the influenza sample case percentage data to detect whether the influenza sample case percentage data sequence is a stable sequence, wherein if the unit root in the sequence is detected, the sequence is judged to be a non-stable sequence, and if not, the sequence is judged to be a stable sequence.
4. A method for generating a prediction model according to any one of claims 1 to 3, wherein the step of obtaining a preset period comprises:
determining the preset period according to the periodicity presented by the percentage data of influenza-like cases.
5. An apparatus for generating a predictive model, the apparatus comprising a memory and a processor, the memory having stored thereon a model generator executable on the processor, the model generator when executed by the processor implementing the steps of:
determining a target area and a target time unit to be predicted, and acquiring a preset period, wherein the preset period is a preset number of weeks;
obtaining a data sequence of percent of influenza-like cases for a plurality of consecutive time units of the target region prior to the target time unit;
respectively delaying the influenza sample case percentage data sequences by 0 to k orders before and after the preset period according to the preset period and a preset order k to obtain 2k +1 data sequences;
respectively calculating autocorrelation coefficients between the 2k +1 data sequences and the influenza sample case percentage data sequences, determining a data sequence with a first autocorrelation coefficient larger than a preset threshold value according to a lag sequence, and taking the lag cycle number of the data sequence as a prediction period;
calculating model parameters according to the influenza sample case percentage sequence with the determined prediction period, and establishing an autoregressive integral moving average model as the prediction model according to the model parameters and the prediction period;
wherein the step of calculating model parameters based on the sequence of percent influenza cases for which a prediction period is determined, and establishing an autoregressive integral moving average model as the prediction model based on the model parameters and the prediction period comprises: calculating the autocorrelation coefficient and the partial autocorrelation coefficient of the steady influenza sample case percentage data sequence with the determined prediction period, and drawing an autocorrelation graph and a partial autocorrelation graph; and judging whether the calculated partial autocorrelation coefficients and autocorrelation coefficients are trailing or truncated according to the autocorrelation graphs and the partial autocorrelation graphs, and selecting an autoregressive integral moving average model according to a judgment result to fit the data sequence of the percent of the stable influenza sample cases so as to obtain the prediction model.
6. The generation apparatus of a predictive model as recited in claim 5, wherein the model generation program is further executable by the processor to, after the step of obtaining a series of percentage data of influenza-like cases in a plurality of consecutive time units prior to the target time unit of the target region, further implement the steps of:
detecting whether the influenza-like case percentage data sequence is a plateau sequence;
if yes, the method proceeds to the step of delaying the flow sample case percentage data sequence by 0-k orders respectively before and after the preset period according to the preset period and a preset order k to obtain 2k +1 data sequences;
if not, converting the influenza sample case percentage data sequence into a stable sequence according to differential operation.
7. The apparatus for generating a predictive model according to claim 6, wherein the step of detecting whether the data sequence of percent influenza-like cases is a stationary sequence comprises:
and performing unit root inspection on the influenza sample case percentage data to detect whether the influenza sample case percentage data sequence is a stable sequence, wherein if the unit root in the sequence is detected, the sequence is judged to be a non-stable sequence, and if not, the sequence is judged to be a stable sequence.
8. The generation apparatus of a prediction model according to any one of claims 5 to 7, wherein the step of obtaining a preset period includes:
determining the preset period according to the periodicity presented by the percentage influenza-like case data.
9. A computer-readable storage medium having stored thereon a model generation program executable by one or more processors to implement the steps of a method of generating a predictive model of any of claims 1 to 4.
CN201810768332.2A 2018-07-13 2018-07-13 Generation method and device of prediction model and computer readable storage medium Active CN109243619B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810768332.2A CN109243619B (en) 2018-07-13 2018-07-13 Generation method and device of prediction model and computer readable storage medium
PCT/CN2018/107488 WO2020010710A1 (en) 2018-07-13 2018-09-26 Method and apparatus for generating prediction model, and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810768332.2A CN109243619B (en) 2018-07-13 2018-07-13 Generation method and device of prediction model and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109243619A CN109243619A (en) 2019-01-18
CN109243619B true CN109243619B (en) 2023-03-31

Family

ID=65072559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810768332.2A Active CN109243619B (en) 2018-07-13 2018-07-13 Generation method and device of prediction model and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN109243619B (en)
WO (1) WO2020010710A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109916361B (en) * 2019-03-04 2020-12-29 中国计量科学研究院 Roundness measurement signal processing method without angle and position information
CN112259239B (en) * 2020-10-21 2023-07-11 平安科技(深圳)有限公司 Parameter processing method and device, electronic equipment and storage medium
CN113035368A (en) * 2021-04-13 2021-06-25 桂林电子科技大学 Disease propagation prediction method based on differential migration diagram neural network
CN113380423A (en) * 2021-05-24 2021-09-10 首都医科大学 Epidemic situation scale prediction method, device, electronic equipment and storage medium
CN113537631B (en) * 2021-08-04 2023-11-10 北方健康医疗大数据科技有限公司 Medicine demand prediction method, device, electronic equipment and storage medium
CN113706269A (en) * 2021-09-13 2021-11-26 华润电力技术研究院有限公司 Frequency modulation service quotation method and frequency modulation service quotation device
CN117147807B (en) * 2023-11-01 2024-01-26 中海(天津)能源科技有限公司 Oil quality monitoring system and method for petroleum exploration
CN117457096B (en) * 2023-12-26 2024-03-22 山东省海洋资源与环境研究院(山东省海洋环境监测中心、山东省水产品质量检验中心) Dynamic monitoring and adjusting system for simulating carbon dioxide dissolution in ocean acidification device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633254A (en) * 2017-07-25 2018-01-26 平安科技(深圳)有限公司 Establish device, method and the computer-readable recording medium of forecast model
CN107688872A (en) * 2017-08-20 2018-02-13 平安科技(深圳)有限公司 Forecast model establishes device, method and computer-readable recording medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268967B (en) * 2017-01-04 2021-01-26 北京京东尚科信息技术有限公司 Method and system for predicting telephone traffic
CN107145714B (en) * 2017-04-07 2020-05-22 浙江大学城市学院 Multi-factor-based public bicycle usage amount prediction method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633254A (en) * 2017-07-25 2018-01-26 平安科技(深圳)有限公司 Establish device, method and the computer-readable recording medium of forecast model
CN107688872A (en) * 2017-08-20 2018-02-13 平安科技(深圳)有限公司 Forecast model establishes device, method and computer-readable recording medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
姜世强 等.基于ARIMA模型的流感样病例预警预测分析.华南预防医学.2013,(第05期),全文. *
李广智 等.自回归求和移动平均模型在流感发病预测中的应用.中国热带医学.2016,(第12期),全文. *

Also Published As

Publication number Publication date
CN109243619A (en) 2019-01-18
WO2020010710A1 (en) 2020-01-16

Similar Documents

Publication Publication Date Title
CN109243619B (en) Generation method and device of prediction model and computer readable storage medium
US10318595B2 (en) Analytics based on pipes programming model
CN109117141B (en) Method, device, electronic equipment and computer readable storage medium for simplifying programming
CN104035747A (en) Method and device for parallel computing
CN113268403B (en) Time series analysis and prediction method, device, equipment and storage medium
CN116126346B (en) Code compiling method and device of AI model, computer equipment and storage medium
CN114881989A (en) Small sample based target object defect detection method and device, and electronic equipment
CN107741910B (en) Application program installation performance testing method and device, computing equipment and storage medium
CN110516335B (en) Radiation risk assessment method and device, electronic equipment and storage medium
CN112700064B (en) Accompanying post-processing method and device for air quality forecast numerical output
CN116382658A (en) Compiling method and device of AI model, computer equipment and storage medium
Okamura et al. Optimal trigger time of software rejuvenation under probabilistic opportunities
CN115481594B (en) Scoreboard implementation method, scoreboard, electronic equipment and storage medium
EP3376165B1 (en) Power monitor
JP2012230538A (en) Software evaluation device, software evaluation method and system evaluation device
CN112131179B (en) Task state detection method, device, computer equipment and storage medium
CN110298690B (en) Object class purpose period judging method, device, server and readable storage medium
CN112148753B (en) Apparatus and method for performing information processing on data stream
CN112087482B (en) Method for managing multiple devices by using cloud system
WO2014208005A1 (en) Non-functional evaluation assistance device, system, method, and program
CN112115418B (en) Method, device and equipment for acquiring bias estimation information
CN111241159B (en) Method and device for determining task execution time
CN108255518A (en) Processor and cyclic program branch prediction method
JP6677068B2 (en) Execution time estimation device, execution time estimation method, and program
CN116523051A (en) Model mixed-precision reasoning method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant