WO2020010710A1 - 预测模型的生成方法、装置及计算机可读存储介质 - Google Patents

预测模型的生成方法、装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2020010710A1
WO2020010710A1 PCT/CN2018/107488 CN2018107488W WO2020010710A1 WO 2020010710 A1 WO2020010710 A1 WO 2020010710A1 CN 2018107488 W CN2018107488 W CN 2018107488W WO 2020010710 A1 WO2020010710 A1 WO 2020010710A1
Authority
WO
WIPO (PCT)
Prior art keywords
influenza
model
sequence
prediction
period
Prior art date
Application number
PCT/CN2018/107488
Other languages
English (en)
French (fr)
Inventor
李弦
徐亮
阮晓雯
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020010710A1 publication Critical patent/WO2020010710A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, a device, and a computer-readable storage medium for generating a prediction model.
  • the autoregressive integral moving average model for influenza forecasting generally sets a fixed period for the region to model based on the change rule of historical influenza-like case percentage data in the forecast area, such as one year or six months.
  • the fixed cycle may ignore the effects of some non-periodic factors, such as the impact of different week lengths and solar terms each year, resulting in large deviations in prediction results. For example, different years have different lengths, and some years have 53 weeks, such as 2013, that is, the period of the year will change. Therefore, if the fixed period modeling is used, the prediction result of the model will be greatly deviated.
  • the present application provides a method, a device, and a computer-readable storage medium for generating a prediction model, the main purpose of which is to improve the prediction accuracy of the autoregressive integral moving average model.
  • the present application further provides a method for generating a prediction model, which method includes:
  • a model parameter is calculated according to the influenza-like case percentage sequence for which a prediction period is determined, and an autoregressive integrated moving average model is established as the prediction model according to the model parameter and the prediction period.
  • the present application further provides a device for generating a prediction model.
  • the device includes a memory and a processor.
  • the memory stores a model generating program that can be run on the processor.
  • the model generates When the program is executed by the processor, the following steps are implemented:
  • a model parameter is calculated according to the influenza-like case percentage sequence for which a prediction period is determined, and an autoregressive integrated moving average model is established as the prediction model according to the model parameter and the prediction period.
  • the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores a model generation program, and the model generation program can be executed by one or more processors to implement The following steps:
  • a model parameter is calculated according to the influenza-like case percentage sequence in which the prediction period is determined, and an autoregressive integral moving average model is established as the prediction model according to the model parameter and the prediction period.
  • the method, device and computer-readable storage medium provided by the present application determine a target area and a target time unit to be predicted, and obtain a preset period; and obtain the target area's consecutive multiple time units before the target time unit.
  • Flu-like case percentage data sequence according to a preset period and a preset order k, the flu-like case percentage data sequence lags 0 to k stages before and after the preset period, respectively, to obtain 2k + 1 data sequences; calculate 2k + 1 respectively
  • the autocorrelation coefficient between each data sequence and the influenza-like case percentage data sequence, and in accordance with the lag order, the prediction period is determined according to the first data sequence whose autocorrelation coefficient is greater than a preset threshold; according to the percentage of influenza-like cases that has determined the prediction period
  • the model parameters are calculated in series, and an autoregressive integrated moving average model is established as a prediction model according to the model parameters and the prediction period.
  • the present application calculates the autocorrelation coefficient between the sequences by lagging the data sequence before the target time unit by multiple orders, and then determines an applicable to the current target based on the autocorrelation coefficient.
  • FIG. 1 is a schematic flowchart of a method for generating a prediction model according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of an internal structure of a prediction model generating device according to an embodiment of the present application
  • FIG. 3 is a schematic block diagram of a model generation program in a prediction model generation device provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for generating a prediction model according to an embodiment of the present application. The method may be performed by a device, which may be implemented by software and / or hardware.
  • a method for generating a prediction model includes:
  • Step S10 Determine a target area and a target time unit to be predicted, and obtain a preset period.
  • Step S20 Obtain a data sequence of the percentage of influenza-like cases in the target area in multiple consecutive time units before the target time unit.
  • the target area in this embodiment is a certain area where the percentage of influenza-like cases is to be predicted, for example, a certain city or a certain province.
  • the scheme of this embodiment is described with weeks as a time unit. Assuming that to predict the percentage of influenza-like cases this week, an ARIMA (Autoregressive Integrated Moving Average) is constructed based on the historical influenza-like case percentage data series for each week within the 5 years before this week (Autoregressive integral moving average) model, that is, to predict the percentage of influenza-like cases at week 261, you need to use its previous 260-week percentage of influenza-like cases to build an ARIMA model. Over time, the target time unit will move forward, and the historical data of the target time unit will change.
  • ARIMA Automatic Integrated Moving Average
  • the actual periodicity of these historical data may appear. Will change.
  • the preset period may be determined to be 52 weeks based on the periodicity presented by the percentage of influenza-like case data, and then the prediction period is adjusted using the 52 week period as a reference. .
  • the number of time units in the historical data is set by the user according to the actual predicted demand.
  • the step of detecting whether the influenza-like case percentage data sequence is a stationary sequence includes: performing a unit root test on the influenza-like case percentage data sequence to detect whether the influenza-like case percentage data sequence is a stationary sequence, wherein, If a unit root is detected in the sequence, the sequence is determined to be non-stationary, otherwise the sequence is determined to be stationary.
  • step S30 according to the preset period and the preset order k, the influenza-like case percentage data sequence is lagged by 0 to k stages before and after the preset period, respectively, to obtain 2k + 1 data sequences.
  • Step S40 Calculate the autocorrelation coefficients between the 2k + 1 data sequences and the influenza-like case percentage data sequences, and determine the prediction according to the lag order according to the first data sequence whose autocorrelation coefficient is greater than a preset threshold. cycle.
  • the preset order K is a positive integer, and K is preferably set to a value of 2-6.
  • a data sequence with a lag of order K near the period T 0 can obtain 2k + 1 data sequences.
  • the original data sequence composed of historical data from 2013 to 2017 is [W1, W2, W3, ..., W260]
  • L1 is [W50, W51, W52, ... W256]
  • L2 is [W52, W53, W54, ... W257]
  • L3 is [W53, W54, W55, ... W258]
  • L4 is [W54, W55, W56, ... W259]
  • L5 is [W55, W56, W57, ... W260].
  • the data sequence L1 is obtained by lagging 50 weeks on the basis of the original data sequence
  • the data sequence L2 is obtained by lagging 51 weeks on the basis of the original data sequence
  • the data sequence L3 is obtained by lagging 52 weeks on the basis of the original data sequence
  • the data sequence L4 is obtained by lagging 53 weeks on the basis of
  • the data sequence L5 is obtained by lagging 54 weeks on the basis of the original data sequence.
  • the cycle is not affected by factors such as year or solar terms, that is, when the forecast cycle is 52 weeks, the data series from week 1 to 206 and week 53 to 258 are the same within a smaller error range.
  • the autocorrelation coefficient of L0 and L3 will be the largest.
  • the data sequence is affected by other factors that cause periodic changes, the correlation between L0 and L3 will be weaker, and the correlation between L0 and other data sequences will become stronger. Therefore, the autocorrelation coefficient obtained through the above calculations is based on In the order of L1 to L5, the first sequence whose autocorrelation coefficient is greater than a preset threshold is determined, and the number of lag weeks of the sequence is used as the prediction period. For example, if the autocorrelation coefficient between L0 and L4 is calculated to be the first sequence of L1 to L5 whose autocorrelation coefficient is greater than a preset threshold, the current prediction period is determined to be 53 weeks.
  • step S50 a model parameter is calculated according to the influenza-like case percentage sequence in which the prediction period is determined, and an autoregressive integral moving average model is established as the prediction model according to the model parameter and the prediction period.
  • the auto-correlation coefficient and partial auto-correlation coefficient are obtained to determine the AR and MA parameters in the model, that is, the values of p, q, and q, and then according to the above obtained Parameters and prediction period to build ARIMA model.
  • a stationary data sequence obtained by difference operation conversion is obtained, and the order of the difference operation when performing the stationary sequence conversion is taken as the value of the parameter d of the ARIMA model.
  • the residual of the ARIMA model is a normal distribution with an average value of 0 and a constant variance, and at the same time observe whether the continuous residuals are related, and if so, determine whether the model passes the check.
  • the rolling-like prediction method is used to predict the influenza-like case percentage data one week in advance. Over time, the period of the historical influenza-like case percentage data series for different prediction points may change. Therefore, every time you make a prediction one week in advance, you need to determine a new prediction cycle based on the percentage of flu-like cases for consecutive weeks before this week, and dynamically update the model to improve the prediction accuracy of the autoregressive integral moving average model .
  • the method for generating a prediction model determines a target area and a target time unit to be predicted, and obtains a preset period; and acquires a data sequence of the percentage of influenza-like cases in the target area in multiple consecutive time units before the target time unit;
  • the influenza-like case percentage data sequence is lagged by 0 to k stages before and after the preset period, respectively, to obtain 2k + 1 data sequences; 2k + 1 data sequences and influenza-like cases are calculated respectively
  • the autocorrelation coefficient between the percentage data sequences, and in accordance with the lag order, the prediction period is determined according to the first data sequence with the autocorrelation coefficient greater than a preset threshold; the model parameters are calculated based on the influenza-like case percentage sequence that has determined the prediction period, and according to the model
  • the parameters and prediction period are used to build an autoregressive integral moving average model as a prediction model.
  • the present application calculates the autocorrelation coefficient between the sequences by lagging the data sequence before the target time unit by multiple orders, and then determines an applicable to the current target based on the autocorrelation coefficient.
  • the present application also provides a device for generating a prediction model.
  • a device for generating a prediction model Referring to FIG. 2, a schematic diagram of an internal structure of a prediction model generating device according to an embodiment of the present application is shown.
  • the device 1 for generating a prediction model may be a PC (Personal Computer) or a terminal device such as a smart phone, a tablet computer, or a portable computer.
  • the prediction model generating device 1 includes at least a memory 11, a processor 12, a network interface 13, and a communication bus.
  • the memory 11 includes at least one type of readable storage medium.
  • the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like.
  • the memory 11 may be an internal storage unit of the prediction model generating device 1 in some embodiments, such as a hard disk of the prediction model generating device 1.
  • the memory 11 may also be an external storage device of the prediction model generating device 1 in other embodiments, such as a plug-in hard disk, a smart memory card (SMC), and a secure digital device provided on the prediction model generating device 1. (Secure Digital, SD) card, Flash card, etc.
  • the memory 11 may include both an internal storage unit of the prediction model generating device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various types of data installed in the prediction model generating device 1, such as the code of the model generation program 01, but also to temporarily store data that has been or will be output.
  • the processor 12 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chip in some embodiments, and is configured to run program codes or processes stored in the memory 11 Data, for example, the model generation program 01 is executed.
  • CPU central processing unit
  • controller a controller
  • microcontroller a microprocessor
  • microprocessor or other data processing chip in some embodiments, and is configured to run program codes or processes stored in the memory 11 Data, for example, the model generation program 01 is executed.
  • the network interface 13 may optionally include a standard wired interface, a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the device 1 and other electronic devices.
  • a standard wired interface such as a WI-FI interface
  • the communication bus is used to implement connection communication between these components.
  • the device 1 may further include a user interface.
  • the user interface may include a display, an input unit such as a keyboard, and the optional user interface may further include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-type liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light emitting diode) touch device, or the like.
  • the display may also be appropriately referred to as a display screen or a display unit for displaying information processed in the prediction model generating device 1 and a user interface for displaying visualization.
  • FIG. 2 only shows a prediction model generating device 1 having components 11-13 and a model generating program 01. Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute the prediction device generating device 1.
  • the definition may include fewer or more components than shown, or some components may be combined, or different component arrangements.
  • a model generating program 01 is stored in the memory 11; when the processor 12 executes the model generating program 01 stored in the memory 11, the following steps are implemented:
  • a data sequence of the percentage of influenza-like cases in the target region in a plurality of consecutive time units before the target time unit is acquired.
  • the influenza-like case percentage data sequence is lagged by 0 to k orders before and after the preset period, respectively, to obtain 2k + 1 data sequences.
  • a model parameter is calculated according to the influenza-like case percentage sequence for which a prediction period is determined, and an autoregressive integrated moving average model is established as the prediction model according to the model parameter and the prediction period.
  • the target area in this embodiment is a certain area where the percentage of influenza-like cases is to be predicted, for example, a certain city or a certain province.
  • the scheme of this embodiment is described with weeks as a time unit. Assuming that to predict the percentage of influenza-like cases this week, an ARIMA (Autoregressive Integrated Moving Average) is constructed based on the historical influenza-like case percentage data series for each week within the 5 years before this week. (Autoregressive integral moving average) model, that is, to predict the percentage of influenza-like cases at week 261, you need to use its previous 260-week percentage of influenza-like cases to build an ARIMA model. Over time, the target time unit will move forward, and the historical data of the target time unit will change.
  • ARIMA Automatic Integrated Moving Average
  • the actual periodicity of these historical data may appear. Will change.
  • the preset period may be determined to be 52 weeks based on the periodicity presented by the percentage of influenza-like case data, and then the prediction period is adjusted using the 52 week period as a reference. .
  • the number of time units in the historical data is set by the user according to the actual predicted demand.
  • the step of detecting whether the influenza-like case percentage data sequence is a stationary sequence includes: performing a unit root test on the influenza-like case percentage data sequence to detect whether the influenza-like case percentage data sequence is a stationary sequence, wherein, If a unit root is detected in the sequence, the sequence is determined to be non-stationary, otherwise the sequence is determined to be stationary.
  • the preset order K is a positive integer, and K is preferably set to a value of 2-6.
  • a data sequence with a lag of order K near the period T 0 can obtain 2k + 1 data sequences.
  • the original data sequence composed of historical data from 2013 to 2017 is [W1, W2, W3, ..., W260]
  • L1 is [W50, W51, W52, ... W256]
  • L2 is [W52, W53, W54, ... W257]
  • L3 is [W53, W54, W55, ... W258]
  • L4 is [W54, W55, W56, ... W259]
  • L5 is [W55, W56, W57, ... W260].
  • the data sequence L1 is obtained by lagging 50 weeks on the basis of the original data sequence
  • the data sequence L2 is obtained by lagging 51 weeks on the basis of the original data sequence
  • the data sequence L3 is obtained by lagging 52 weeks on the basis of the original data sequence
  • the data sequence L4 is obtained by lagging 53 weeks on the basis of
  • the data sequence L5 is obtained by lagging 54 weeks on the basis of the original data sequence.
  • the cycle is not affected by factors such as year or solar terms, that is, when the forecast cycle is 52 weeks, the data series from week 1 to 206 and week 53 to 258 are the same within a smaller error range.
  • the autocorrelation coefficient of L0 and L3 will be the largest.
  • the data sequence is affected by other factors that cause periodic changes, the correlation between L0 and L3 will be weaker, and the correlation between L0 and other data sequences will become stronger. Therefore, the autocorrelation coefficient obtained through the above calculations is based on In the order of L1 to L5, the first sequence whose autocorrelation coefficient is greater than a preset threshold is determined, and the number of lag weeks of the sequence is used as the prediction period. For example, if the autocorrelation coefficient between L0 and L4 is calculated to be the first sequence of L1 to L5 whose autocorrelation coefficient is greater than a preset threshold, the current prediction period is determined to be 53 weeks.
  • the auto-correlation coefficient and partial auto-correlation coefficient are obtained to determine the AR and MA parameters in the model, that is, the values of p, q, and q, and then according to the above obtained Parameters and prediction period to build ARIMA model.
  • a stationary data sequence obtained by difference operation conversion is obtained, and the order of the difference operation when performing the stationary sequence conversion is taken as the value of the parameter d of the ARIMA model.
  • the residual of the ARIMA model is a normal distribution with an average value of 0 and a constant variance, and at the same time observe whether the continuous residuals are related, and if so, determine whether the model passes the check.
  • the rolling-like prediction method is used to predict the influenza-like case percentage data one week in advance. Over time, the period of the historical influenza-like case percentage data series for different prediction points may change. Therefore, each time a prediction is made one week in advance, a new prediction cycle needs to be determined based on the percentage of flu-like cases for consecutive weeks before this week, and the model is dynamically updated to improve the prediction accuracy of the autoregressive integral moving average model. .
  • the apparatus for generating a prediction model determines a target area and a target time unit to be predicted, and obtains a preset period; and acquires a data sequence of the percentage of influenza-like cases in the target area in a plurality of consecutive time units before the target time unit; According to the preset period and the preset order k, the influenza-like case percentage data sequence is lagged by 0 to k stages before and after the preset period, respectively, to obtain 2k + 1 data sequences; 2k + 1 data sequences and influenza-like cases are calculated respectively
  • the autocorrelation coefficient between the percentage data sequences, and in accordance with the lag order, the prediction period is determined according to the first data sequence with the autocorrelation coefficient greater than a preset threshold; the model parameters are calculated based on the influenza-like case percentage sequence that has determined the prediction period, and The parameters and prediction period are used to build an autoregressive integral moving average model as a prediction model.
  • the present application calculates the autocorrelation coefficient between the sequences by lagging the data sequence before the target time unit by multiple orders, and then determines an applicable to the current target based on the autocorrelation coefficient.
  • the model generation program may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and are implemented by one or more processors (in this embodiment, The processor 12) executes to complete the present application.
  • the modules referred to in the present application refer to a series of computer program instruction segments capable of performing specific functions, and are used to describe the execution process of the model generation program in the generation device of the prediction model.
  • FIG. 3 it is a schematic diagram of a program module of a model generation program in an embodiment of a device for generating a prediction model of the present application.
  • the model generation program may be divided into a data determination module 10 and a sequence acquisition module 20.
  • the data calculation module 30 and the model generation module 40 exemplarily:
  • the data determining module 10 is configured to determine a target area and a target time unit to be predicted, and obtain a preset period;
  • the sequence acquisition module 20 is configured to: according to the preset period and the preset order k, lag the influenza-like case percentage data sequence before and after the preset period by 0 to k stages, respectively, to obtain 2k + 1 data sequences ;
  • the data calculation module 30 is configured to calculate the autocorrelation coefficients between the 2k + 1 data sequences and the influenza-like case percentage data sequences, respectively, and according to the lag order, according to the first autocorrelation coefficient that is greater than a preset threshold
  • the data sequence determines the forecast period
  • the model generating module 40 is configured to calculate a model parameter according to the influenza-like case percentage sequence in which the prediction period is determined, and establish an autoregressive integral moving average model as the prediction model according to the model parameter and the prediction period.
  • an embodiment of the present application further provides a computer-readable storage medium.
  • the computer-readable storage medium stores a model generation program, and the model generation program can be executed by one or more processors to implement the following operations:
  • a model parameter is calculated according to the influenza-like case percentage sequence for which a prediction period is determined, and an autoregressive integrated moving average model is established as the prediction model according to the model parameter and the prediction period.
  • the specific implementation manner of the computer-readable storage medium of the present application is basically the same as each embodiment of the foregoing apparatus and method for generating a prediction model, and is not described in detail here.

Landscapes

  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请公开了一种预测模型的生成方法,该方法包括:确定目标区域和待预测的目标时间单元和预设周期;获取目标区域在目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;根据预设周期和预设阶数k,将流感样病例百分比数据序列在预设周期前后滞后0至k阶,获取2k+1个数据序列;计算2k+1个数据序列与流感样病例百分比数据序列的自相关系数,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;计算模型参数,根据模型参数和预测周期建立自回归积分滑动平均模型作为预测模型。本申请还提出一种预测模型的生成装置以及一种计算机可读存储介质。本申请提高了自回归积分滑动平均模型的预测精准度。

Description

预测模型的生成方法、装置及计算机可读存储介质
本申请基于巴黎公约申明享有2018年07月13日递交的申请号为201810768332.2、名称为“预测模型的生成方法、装置及计算机可读存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种预测模型的生成方法、装置及计算机可读存储介质。
背景技术
对于流行性感冒的预测,目前流行病学用的较普遍的方法是采用自回归积分滑动平均模型预测流感样病例百分比。自回归积分滑动平均模型进行流感预测,一般是根据预测地区的历史流感样病例百分比数据的变化规律,为该地区设定一个固定不变的周期进行建模,比如一年或者半年。然而,固定的周期可能会忽略一些非周期性出现的因素的影响,例如每年的周数长短和节气变化的不同产生的影响,导致预测结果出现较大偏差。例如,不同年份长短不一样,有的年份有53周,如2013年,即年份的周期长短会发生变化。因此,如果使用固定的周期建模会导致模型的预测结果出现较大的偏差。
发明内容
本申请提供一种预测模型的生成方法、装置及计算机可读存储介质,其主要目的在于提高自回归积分滑动平均模型的预测精准度。
为实现上述目的,本申请还提供一种预测模型的生成方法,该方法包括:
确定目标区域和待预测的目标时间单元,并获取预设周期;
获取所述目标区域在所述目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;
根据所述预设周期和预设阶数k,将所述流感样病例百分比数据序列在所述预设周期前后分别滞后0至k阶,获取2k+1个数据序列;
分别计算所述2k+1个数据序列与所述流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;
根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型。
此外,为实现上述目的,本申请还提供一种预测模型的生成装置,该装置包括存储器和处理器,所述存储器中存储有可在所述处理器上运行的模型生成程序,所述模型生成程序被所述处理器执行时实现如下步骤:
确定目标区域和待预测的目标时间单元,并获取预设周期;
获取所述目标区域在所述目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;
根据所述预设周期和预设阶数k,将所述流感样病例百分比数据序列在所述预设周期前后分别滞后0至k阶,获取2k+1个数据序列;
分别计算所述2k+1个数据序列与所述流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;
根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有模型生成程序,所述模型生成程序可被一个或者多个处理器执行,以实现如下步骤:
确定目标区域和待预测的目标时间单元,并获取预设周期;
获取所述目标区域在所述目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;
根据所述预设周期和预设阶数k,将所述流感样病例百分比数据序列在所述预设周期前后分别滞后0至k阶,获取2k+1个数据序列;
分别计算所述2k+1个数据序列与所述流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;
根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述 模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型。
本申请提出的预测模型的生成方法、装置及计算机可读存储介质,确定目标区域和待预测的目标时间单元,并获取预设周期;获取目标区域在目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;根据预设周期和预设阶数k,将流感样病例百分比数据序列在预设周期前后分别滞后0至k阶,获取2k+1个数据序列;分别计算2k+1个数据序列与流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;根据确定了预测周期的流感样病例百分比序列计算模型参数,根据模型参数和预测周期建立自回归积分滑动平均模型作为预测模型。本申请在对目标时间单元的数据进行预测时,通过对该目标时间单元之前的数据序列滞后多个阶数后,计算序列之间的自相关系数,进而根据自相关系数确定一个适用于当前目标时间单元的预测周期,并使用该预测周期进行建模,提高了自回归积分滑动平均模型的预测精准度。
附图说明
图1为本申请一实施例提供的预测模型的生成方法的流程示意图;
图2为本申请一实施例提供的预测模型的生成装置的内部结构示意图;
图3为本申请一实施例提供的预测模型的生成装置中模型生成程序的模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供一种预测模型的生成方法。参照图1所示,为本申请一实施例提供的预测模型的生成方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,预测模型的生成方法包括:
步骤S10,确定目标区域和待预测的目标时间单元,并获取预设周期。
步骤S20,获取所述目标区域在所述目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列。
本实施例中的目标区域为要进行流感样病例百分比预测的某一地区,例如,某一城市或者某一省份等。此外,以周作为时间单元对本实施例的方案进行说明,假设要预测本周的流感样病例百分比,则以本周之前5年内每一周的历史流感样病例百分比数据序列构建ARIMA(Autoregressive Integrated Moving Average,自回归积分滑动平均)模型,即要对第261周的流感样病例百分比进行预测,则需要使用其之前的260周的流感样病例百分比数据构建ARIMA模型。随着时间的推移,目标时间单元会向前推移,而该目标时间单元的历史数据会发生变化,此外,由于每年周数长短和节气变化可能存在不同,导致这些历史数据实际呈现出的周期性会发生变化。在没有这些因素影响的情况下,通过对长时间内的流感样病例百分比数据序列的观测,可以看出数据是呈年周期性的,也就是周期为52周,但是在有上述因素影响的时候,周期会小于或者大于52周。因此,本实施例的方法中,在对预测周期进行动态调整之前,可以根据流感样病例百分比数据所呈现的周期性确定预设周期为52周,接下来以52周作为基准对预测周期进行调整。假设当前要预测2018年6月的第一周的流感样病例百分比,则需要获取2013年至2017年每一周的流感样病例百分比数据序列。在其他实施例中,历史数据中时间单元的数量由用户根据实际预测需求设置。
进一步地,为了提高建立的ARIMA模型的预测精准度,在获取到上述数据序列后,检测该数据序列是否为平稳序列,如果是,则继续执行后续步骤,如果否,则根据差分运算将该数据序列转换为平稳流感样病例百分比数据序列。具体地,检测所述流感样病例百分比数据序列是否为平稳序列的步骤包括:对所述流感样病例百分比数据进行单位根检验,以检测所述流感样病例百分比数据序列是否为平稳序列,其中,若检测到序列中有单位根,则判定序列为非平稳序列,否则,判定序列为平稳序列。
步骤S30,根据所述预设周期和预设阶数k,将所述流感样病例百分比数据序列在所述预设周期前后分别滞后0至k阶,获取2k+1个数据序列。
步骤S40,分别计算所述2k+1个数据序列与所述流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设 阈值的数据序列确定预测周期。
本实施例中,预设阶数K为正整数,K优选地取值为2-6,以下以预设周期T 0=52,K=2为例,获取流感样病例百分比数据序列在预设周期T 0附近滞后K阶的数据序列,可以获取到2k+1个数据序列。假设2013年至2017年期间的历史数据构成的原始数据序列为[W1、W2、W3、……W260],按照预设周期T 0=52提取到的原始数据序列L0=[W1、W2、W3、……W206],将该序列在预设周期前后分别滞后0至2阶,获取到如下5个序列:
L1为[W50、W51、W52、……W256];
L2为[W52、W53、W54、……W257];
L3为[W53、W54、W55、……W258];
L4为[W54、W55、W56、……W259];
L5为[W55、W56、W57、……W260]。
即在原始数据序列的基础上滞后50周得到数据序列L1,在原始数据序列的基础上滞后51周得到数据序列L2,在原始数据序列的基础上滞后52周得到数据序列L3,在原始数据序列的基础上滞后53周得到数据序列L4,在原始数据序列的基础上滞后54周得到数据序列L5。分别计算序列L1、L2、L3、L4、L5与序列L0之间的自相关系数。
在周期没有受到年份或者节气等因素的影响时,即预测周期为52周时,则第1周至第206周与第53周至258周的数据序列在较小的误差范围内是一样的,此时,L0与L3的自相关系数会最大。但是,如果数据序列在受到其他因素影响导致周期变化时,L0与L3的相关性就会比较弱,L0与其他数据序列的相关性变强,因此,通过上述计算得到的自相关系数,按照从L1至L5的顺序,确定第一个自相关系数大于预设阈值的序列,将该序列的滞后周数作为预测周期。例如,讲过计算得到L0与L4之间的自相关系数为L1至L5中第一个自相关系数大于预设阈值的序列,则确定当前的预测周期为53周。
步骤S50,根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型。
对确定了预测周期的平稳的流感ILI序列,分别求得其自相关系数和偏自 相关系数,以确定模型中的AR参数和MA参数,即p、q和q的值,进而根据得到的上述参数和预测周期建立ARIMA模型。具体地,获取经过差分运算转换得到的平稳数据序列,将进行平稳序列转换时的差分运算的阶数作为ARIMA模型的参数d的值。对确定了预测周期的平稳数据序列分别求得其自相关系数和偏自相关系数,并绘制自相关图和偏自相关图,根据自相关图和偏自相关图判断偏自相关系数和自相关系数是拖尾还是截尾,并由此选择相应的ARIMA模型对平稳化处理后的数据序列进行拟合;对拟合的ARIMA模型进行参数估计,得到最佳的阶层p和阶数q,然后对模型进行有效性检验,以确定最终的预测模型。具体地,在指数平滑模型下,观察ARIMA模型的残差是否是平均值为0且方差为常数的正态分布,同时观察连续残差是否相关,若是,则判定模型通过校验。
在使用得到的ARIMA模型进行实时预测的过程中,通过滚动预测的方式对流感样病例百分比数据进行提前一周预测。随着时间的变化,针对不同的预测点的历史流感样病例百分比数据序列的周期可能会存在变化。因此,每次在提前一周进行预测时,需要根据本周之前的连续多周的流感样病例百分比数据确定新的预测周期,对模型进行动态更新,以提高自回归积分滑动平均模型的预测精准度。
本实施例提出的预测模型的生成方法,确定目标区域和待预测的目标时间单元,并获取预设周期;获取目标区域在目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;根据预设周期和预设阶数k,将流感样病例百分比数据序列在预设周期前后分别滞后0至k阶,获取2k+1个数据序列;分别计算2k+1个数据序列与流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;根据确定了预测周期的流感样病例百分比序列计算模型参数,根据模型参数和预测周期建立自回归积分滑动平均模型作为预测模型。本申请在对目标时间单元的数据进行预测时,通过对该目标时间单元之前的数据序列滞后多个阶数后,计算序列之间的自相关系数,进而根据自相关系数确定一个适用于当前目标时间单元的预测周期,并使用该预测周期进行建模,提高了自回归积分滑动平均模型的预测精准度。
本申请还提供一种预测模型的生成装置。参照图2所示,为本申请一实 施例提供的预测模型的生成装置的内部结构示意图。
在本实施例中,预测模型的生成装置1可以是PC(Personal Computer,个人电脑),也可以是智能手机、平板电脑、便携计算机等终端设备。该预测模型的生成装置1至少包括存储器11、处理器12,网络接口13,以及通信总线。
其中,存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是预测模型的生成装置1的内部存储单元,例如该预测模型的生成装置1的硬盘。存储器11在另一些实施例中也可以是预测模型的生成装置1的外部存储设备,例如预测模型的生成装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括预测模型的生成装置1的内部存储单元也包括外部存储设备。存储器11不仅可以用于存储安装于预测模型的生成装置1的应用软件及各类数据,例如模型生成程序01的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行模型生成程序01等。
网络接口13可选的可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该装置1与其他电子设备之间建立通信连接。
通信总线用于实现这些组件之间的连接通信。
可选地,该装置1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在预测模型的生成装置1中处理的信息以及用于显示可视化的用户界面。
图2仅示出了具有组件11-13以及模型生成程序01的预测模型的生成装 置1,本领域技术人员可以理解的是,图1示出的结构并不构成对预测模型的生成装置1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
在图2所示的装置1实施例中,存储器11中存储有模型生成程序01;处理器12执行存储器11中存储的模型生成程序01时实现如下步骤:
确定目标区域和待预测的目标时间单元,并获取预设周期。
获取所述目标区域在所述目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列。
根据所述预设周期和预设阶数k,将所述流感样病例百分比数据序列在所述预设周期前后分别滞后0至k阶,获取2k+1个数据序列。
分别计算所述2k+1个数据序列与所述流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期。
根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型。
本实施例中的目标区域为要进行流感样病例百分比预测的某一地区,例如,某一城市或者某一省份等。此外,以周作为时间单元对本实施例的方案进行说明,假设要预测本周的流感样病例百分比,则以本周之前5年内每一周的历史流感样病例百分比数据序列构建ARIMA(Autoregressive Integrated Moving Average,自回归积分滑动平均)模型,即要对第261周的流感样病例百分比进行预测,则需要使用其之前的260周的流感样病例百分比数据构建ARIMA模型。随着时间的推移,目标时间单元会向前推移,而该目标时间单元的历史数据会发生变化,此外,由于每年周数长短和节气变化可能存在不同,导致这些历史数据实际呈现出的周期性会发生变化。在没有这些因素影响的情况下,通过对长时间内的流感样病例百分比数据序列的观测,可以看出数据是呈年周期性的,也就是周期为52周,但是在有上述因素影响的时候,周期会小于或者大于52周。因此,本实施例的方法中,在对预测周期进行动态调整之前,可以根据流感样病例百分比数据所呈现的周期性确定预设周期为52周,接下来以52周作为基准对预测周期进行调整。假设当前要预测2018年6月的第一周的流感样病例百分比,则需要获取2013年至2017年每一周 的流感样病例百分比数据序列。在其他实施例中,历史数据中时间单元的数量由用户根据实际预测需求设置。
进一步地,为了提高建立的ARIMA模型的预测精准度,在获取到上述数据序列后,检测该数据序列是否为平稳序列,如果是,则继续执行后续步骤,如果否,则根据差分运算将该数据序列转换为平稳流感样病例百分比数据序列。具体地,检测所述流感样病例百分比数据序列是否为平稳序列的步骤包括:对所述流感样病例百分比数据进行单位根检验,以检测所述流感样病例百分比数据序列是否为平稳序列,其中,若检测到序列中有单位根,则判定序列为非平稳序列,否则,判定序列为平稳序列。
本实施例中,预设阶数K为正整数,K优选地取值为2-6,以下以预设周期T 0=52,K=2为例,获取流感样病例百分比数据序列在预设周期T 0附近滞后K阶的数据序列,可以获取到2k+1个数据序列。假设2013年至2017年期间的历史数据构成的原始数据序列为[W1、W2、W3、……W260],按照预设周期T 0=52提取到的原始数据序列L0=[W1、W2、W3、……W206],将该序列在预设周期前后分别滞后0至2阶,获取到如下5个序列:
L1为[W50、W51、W52、……W256];
L2为[W52、W53、W54、……W257];
L3为[W53、W54、W55、……W258];
L4为[W54、W55、W56、……W259];
L5为[W55、W56、W57、……W260]。
即在原始数据序列的基础上滞后50周得到数据序列L1,在原始数据序列的基础上滞后51周得到数据序列L2,在原始数据序列的基础上滞后52周得到数据序列L3,在原始数据序列的基础上滞后53周得到数据序列L4,在原始数据序列的基础上滞后54周得到数据序列L5。分别计算序列L1、L2、L3、L4、L5与序列L0之间的自相关系数。
在周期没有受到年份或者节气等因素的影响时,即预测周期为52周时,则第1周至第206周与第53周至258周的数据序列在较小的误差范围内是一样的,此时,L0与L3的自相关系数会最大。但是,如果数据序列在受到其他因素影响导致周期变化时,L0与L3的相关性就会比较弱,L0与其他数据序列的相关性变强,因此,通过上述计算得到的自相关系数,按照从L1至 L5的顺序,确定第一个自相关系数大于预设阈值的序列,将该序列的滞后周数作为预测周期。例如,讲过计算得到L0与L4之间的自相关系数为L1至L5中第一个自相关系数大于预设阈值的序列,则确定当前的预测周期为53周。
对确定了预测周期的平稳的流感ILI序列,分别求得其自相关系数和偏自相关系数,以确定模型中的AR参数和MA参数,即p、q和q的值,进而根据得到的上述参数和预测周期建立ARIMA模型。具体地,获取经过差分运算转换得到的平稳数据序列,将进行平稳序列转换时的差分运算的阶数作为ARIMA模型的参数d的值。对确定了预测周期的平稳数据序列分别求得其自相关系数和偏自相关系数,并绘制自相关图和偏自相关图,根据自相关图和偏自相关图判断偏自相关系数和自相关系数是拖尾还是截尾,并由此选择相应的ARIMA模型对平稳化处理后的数据序列进行拟合;对拟合的ARIMA模型进行参数估计,得到最佳的阶层p和阶数q,然后对模型进行有效性检验,以确定最终的预测模型。具体地,在指数平滑模型下,观察ARIMA模型的残差是否是平均值为0且方差为常数的正态分布,同时观察连续残差是否相关,若是,则判定模型通过校验。
在使用得到的ARIMA模型进行实时预测的过程中,通过滚动预测的方式对流感样病例百分比数据进行提前一周预测。随着时间的变化,针对不同的预测点的历史流感样病例百分比数据序列的周期可能会存在变化。因此,每次在提前一周进行预测时,需要根据本周之前的连续多周的流感样病例百分比数据确定新的预测周期,对模型进行动态更新,以提高自回归积分滑动平均模型的预测精准度。
本实施例提出的预测模型的生成装置,确定目标区域和待预测的目标时间单元,并获取预设周期;获取目标区域在目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;根据预设周期和预设阶数k,将流感样病例百分比数据序列在预设周期前后分别滞后0至k阶,获取2k+1个数据序列;分别计算2k+1个数据序列与流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;根据确定了预测周期的流感样病例百分比序列计算模型参数,根据模型参数和预测周期建立自回归积分滑动平均模型作为预测模型。本申请 在对目标时间单元的数据进行预测时,通过对该目标时间单元之前的数据序列滞后多个阶数后,计算序列之间的自相关系数,进而根据自相关系数确定一个适用于当前目标时间单元的预测周期,并使用该预测周期进行建模,提高了自回归积分滑动平均模型的预测精准度。
可选地,在其他的实施例中,模型生成程序还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由一个或多个处理器(本实施例为处理器12)所执行以完成本申请,本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段,用于描述模型生成程序在预测模型的生成装置中的执行过程。
例如,参照图3所示,为本申请预测模型的生成装置一实施例中的模型生成程序的程序模块示意图,该实施例中,模型生成程序可以被分割为数据确定模块10、序列获取模块20、数据计算模块30和模型生成模块40,示例性地:
数据确定模块10用于:确定目标区域和待预测的目标时间单元,并获取预设周期;
以及,获取所述目标区域在所述目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;
序列获取模块20用于:根据所述预设周期和预设阶数k,将所述流感样病例百分比数据序列在所述预设周期前后分别滞后0至k阶,获取2k+1个数据序列;
数据计算模块30用于:分别计算所述2k+1个数据序列与所述流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;
模型生成模块40用于:根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型。
上述数据确定模块10、序列获取模块20、数据计算模块30和模型生成模块40等程序模块被执行时所实现的功能或操作步骤与上述实施例大体相同,在此不再赘述。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读 存储介质上存储有模型生成程序,所述模型生成程序可被一个或多个处理器执行,以实现如下操作:
确定目标区域和待预测的目标时间单元,并获取预设周期;
获取所述目标区域在所述目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;
根据所述预设周期和预设阶数k,将所述流感样病例百分比数据序列在所述预设周期前后分别滞后0至k阶,获取2k+1个数据序列;
分别计算所述2k+1个数据序列与所述流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;
根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型。本申请计算机可读存储介质具体实施方式与上述预测模型的生成装置和方法各实施例基本相同,在此不作累述。
需要说明的是,上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。并且本文中的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间 接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种预测模型的生成方法,其特征在于,所述方法包括:
    确定目标区域和待预测的目标时间单元,并获取预设周期;
    获取所述目标区域在所述目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;
    根据所述预设周期和预设阶数k,将所述流感样病例百分比数据序列在所述预设周期前后分别滞后0至k阶,获取2k+1个数据序列;
    分别计算所述2k+1个数据序列与所述流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;
    根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型。
  2. 如权利要求1所述的预测模型的生成方法,其特征在于,所述获取所述目标区域的所述目标时间单元之前的连续多个时间单位内的流感样病例百分比数据序列的步骤之后,所述方法还包括步骤:
    检测所述流感样病例百分比数据序列是否为平稳序列;
    若是,则执行所述获取根据所述流感样病例百分比数据所呈现的周期性确定的预设周期的步骤;
    若否,则根据差分运算将所述流感样病例百分比数据序列转换为平稳序列。
  3. 如权利要求2所述的预测模型的生成方法,其特征在于,所述检测所述流感样病例百分比数据序列是否为平稳序列的步骤包括:
    对所述流感样病例百分比数据进行单位根检验,以检测所述流感样病例百分比数据序列是否为平稳序列,其中,若检测到序列中有单位根,则判定序列为非平稳序列,否则,判定序列为平稳序列。
  4. 如权利要求2所述的预测模型的生成方法,其特征在于,所述获取预设周期的步骤包括:
    根据所述流感样病例百分比数据所呈现的周期性确定所述预设周期。
  5. 如权利要求2所述的预测模型的生成方法,其特征在于,所述根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和 所述预测周期建立自回归积分滑动平均模型作为所述预测模型的步骤包括:
    计算确定了预测周期的平稳流感样病例百分比数据序列的自相关系数和偏自相关系数,并绘制自相关图和偏自相关图;
    根据所述自相关图和所述偏自相关图,判断计算的偏自相关系数和自相关系数是拖尾还是截尾,并根据判断结果选择自回归积分滑动平均模型对平稳流感样病例百分比数据序列进行拟合,以获取所述预测模型。
  6. 如权利要求3所述的预测模型的生成方法,其特征在于,所述根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型的步骤包括:
    计算确定了预测周期的平稳流感样病例百分比数据序列的自相关系数和偏自相关系数,并绘制自相关图和偏自相关图;
    根据所述自相关图和所述偏自相关图,判断计算的偏自相关系数和自相关系数是拖尾还是截尾,并根据判断结果选择自回归积分滑动平均模型对平稳流感样病例百分比数据序列进行拟合,以获取所述预测模型。
  7. 如权利要求4所述的预测模型的生成方法,其特征在于,所述根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型的步骤包括:
    计算确定了预测周期的平稳流感样病例百分比数据序列的自相关系数和偏自相关系数,并绘制自相关图和偏自相关图;
    根据所述自相关图和所述偏自相关图,判断计算的偏自相关系数和自相关系数是拖尾还是截尾,并根据判断结果选择自回归积分滑动平均模型对平稳流感样病例百分比数据序列进行拟合,以获取所述预测模型。
  8. 一种预测模型的生成装置,其特征在于,所述装置包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的模型生成程序,所述模型生成程序被所述处理器执行时实现如下步骤:
    确定目标区域和待预测的目标时间单元,并获取预设周期;
    获取所述目标区域在所述目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;
    根据所述预设周期和预设阶数k,将所述流感样病例百分比数据序列在所述预设周期前后分别滞后0至k阶,获取2k+1个数据序列;
    分别计算所述2k+1个数据序列与所述流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;
    根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型。
  9. 如权利要求8所述的预测模型的生成装置,其特征在于,所述模型生成程序还可被所述处理器执行,以在所述获取所述目标区域的所述目标时间单元之前的连续多个时间单位内的流感样病例百分比数据序列的步骤之后,还实现如下步骤:
    检测所述流感样病例百分比数据序列是否为平稳序列;
    若是,则执行所述获取根据所述流感样病例百分比数据所呈现的周期性确定的预设周期的步骤;
    若否,则根据差分运算将所述流感样病例百分比数据序列转换为平稳序列。
  10. 如权利要求9所述的预测模型的生成装置,其特征在于,所述检测所述流感样病例百分比数据序列是否为平稳序列的步骤包括:
    对所述流感样病例百分比数据进行单位根检验,以检测所述流感样病例百分比数据序列是否为平稳序列,其中,若检测到序列中有单位根,则判定序列为非平稳序列,否则,判定序列为平稳序列。
  11. 如权利要求9所述的预测模型的生成装置,其特征在于,所述获取预设周期的步骤包括:
    根据所述流感样病例百分比数据所呈现的周期性确定所述预设周期。
  12. 如权利要求9所述的预测模型的生成装置,其特征在于,所述根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型的步骤包括:
    计算确定了预测周期的平稳流感样病例百分比数据序列的自相关系数和偏自相关系数,并绘制自相关图和偏自相关图;
    根据所述自相关图和所述偏自相关图,判断计算的偏自相关系数和自相关系数是拖尾还是截尾,并根据判断结果选择自回归积分滑动平均模型对平稳流感样病例百分比数据序列进行拟合,以获取所述预测模型。
  13. 如权利要求10所述的预测模型的生成装置,其特征在于,所述根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型的步骤包括:
    计算确定了预测周期的平稳流感样病例百分比数据序列的自相关系数和偏自相关系数,并绘制自相关图和偏自相关图;
    根据所述自相关图和所述偏自相关图,判断计算的偏自相关系数和自相关系数是拖尾还是截尾,并根据判断结果选择自回归积分滑动平均模型对平稳流感样病例百分比数据序列进行拟合,以获取所述预测模型。
  14. 如权利要求11所述的预测模型的生成装置,其特征在于,所述根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型的步骤包括:
    计算确定了预测周期的平稳流感样病例百分比数据序列的自相关系数和偏自相关系数,并绘制自相关图和偏自相关图;
    根据所述自相关图和所述偏自相关图,判断计算的偏自相关系数和自相关系数是拖尾还是截尾,并根据判断结果选择自回归积分滑动平均模型对平稳流感样病例百分比数据序列进行拟合,以获取所述预测模型。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有模型生成程序,所述模型生成程序可被一个或者多个处理器执行,以实现如下步骤:
    确定目标区域和待预测的目标时间单元,并获取预设周期;
    获取所述目标区域在所述目标时间单元之前的连续多个时间单元的流感样病例百分比数据序列;
    根据所述预设周期和预设阶数k,将所述流感样病例百分比数据序列在所述预设周期前后分别滞后0至k阶,获取2k+1个数据序列;
    分别计算所述2k+1个数据序列与所述流感样病例百分比数据序列之间的自相关系数,并按照滞后顺序,根据第一个自相关系数大于预设阈值的数据序列确定预测周期;
    根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型。
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述模型 生成程序还可被所述处理器执行,以在所述获取所述目标区域的所述目标时间单元之前的连续多个时间单位内的流感样病例百分比数据序列的步骤之后,还实现如下步骤:
    检测所述流感样病例百分比数据序列是否为平稳序列;
    若是,则执行所述获取根据所述流感样病例百分比数据所呈现的周期性确定的预设周期的步骤;
    若否,则根据差分运算将所述流感样病例百分比数据序列转换为平稳序列。
  17. 如权利要求16所述的计算机可读存储介质,其特征在于,所述检测所述流感样病例百分比数据序列是否为平稳序列的步骤包括:
    对所述流感样病例百分比数据进行单位根检验,以检测所述流感样病例百分比数据序列是否为平稳序列,其中,若检测到序列中有单位根,则判定序列为非平稳序列,否则,判定序列为平稳序列。
  18. 如权利要求16所述的计算机可读存储介质,其特征在于,所述获取预设周期的步骤包括:
    根据所述流感样病例百分比数据所呈现的周期性确定所述预设周期。
  19. 如权利要求16所述的计算机可读存储介质,其特征在于,所述根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型的步骤包括:
    计算确定了预测周期的平稳流感样病例百分比数据序列的自相关系数和偏自相关系数,并绘制自相关图和偏自相关图;
    根据所述自相关图和所述偏自相关图,判断计算的偏自相关系数和自相关系数是拖尾还是截尾,并根据判断结果选择自回归积分滑动平均模型对平稳流感样病例百分比数据序列进行拟合,以获取所述预测模型。
  20. 如权利要求17所述的计算机可读存储介质,其特征在于,所述根据确定了预测周期的流感样病例百分比序列计算模型参数,根据所述模型参数和所述预测周期建立自回归积分滑动平均模型作为所述预测模型的步骤包括:
    计算确定了预测周期的平稳流感样病例百分比数据序列的自相关系数和偏自相关系数,并绘制自相关图和偏自相关图;
    根据所述自相关图和所述偏自相关图,判断计算的偏自相关系数和自相 关系数是拖尾还是截尾,并根据判断结果选择自回归积分滑动平均模型对平稳流感样病例百分比数据序列进行拟合,以获取所述预测模型。
PCT/CN2018/107488 2018-07-13 2018-09-26 预测模型的生成方法、装置及计算机可读存储介质 WO2020010710A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810768332.2A CN109243619B (zh) 2018-07-13 2018-07-13 预测模型的生成方法、装置及计算机可读存储介质
CN201810768332.2 2018-07-13

Publications (1)

Publication Number Publication Date
WO2020010710A1 true WO2020010710A1 (zh) 2020-01-16

Family

ID=65072559

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/107488 WO2020010710A1 (zh) 2018-07-13 2018-09-26 预测模型的生成方法、装置及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN109243619B (zh)
WO (1) WO2020010710A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112259239A (zh) * 2020-10-21 2021-01-22 平安科技(深圳)有限公司 一种参数处理方法、装置、电子设备及存储介质
CN113380423A (zh) * 2021-05-24 2021-09-10 首都医科大学 疫情规模预测方法、装置、电子设备及存储介质
CN113706269A (zh) * 2021-09-13 2021-11-26 华润电力技术研究院有限公司 一种调频服务报价方法和调频服务报价装置
CN117147807A (zh) * 2023-11-01 2023-12-01 中海(天津)能源科技有限公司 一种用于石油勘探的油质监测系统及方法
CN117457096A (zh) * 2023-12-26 2024-01-26 山东省海洋资源与环境研究院(山东省海洋环境监测中心、山东省水产品质量检验中心) 模拟海洋酸化装置中二氧化碳溶解量动态监测与调整系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109916361B (zh) * 2019-03-04 2020-12-29 中国计量科学研究院 一种无需角度位置信息的圆度测量信号处理方法
CN113035368A (zh) * 2021-04-13 2021-06-25 桂林电子科技大学 一种基于差分迁移图神经网络的疾病传播预测方法
CN113537631B (zh) * 2021-08-04 2023-11-10 北方健康医疗大数据科技有限公司 药品需求量的预测方法、装置、电子设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145714A (zh) * 2017-04-07 2017-09-08 浙江大学城市学院 基于多因素的公共自行车使用量预测方法
CN107633254A (zh) * 2017-07-25 2018-01-26 平安科技(深圳)有限公司 建立预测模型的装置、方法及计算机可读存储介质
CN108268967A (zh) * 2017-01-04 2018-07-10 北京京东尚科信息技术有限公司 一种话务量预测的方法和系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688872A (zh) * 2017-08-20 2018-02-13 平安科技(深圳)有限公司 预测模型建立装置、方法及计算机可读存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268967A (zh) * 2017-01-04 2018-07-10 北京京东尚科信息技术有限公司 一种话务量预测的方法和系统
CN107145714A (zh) * 2017-04-07 2017-09-08 浙江大学城市学院 基于多因素的公共自行车使用量预测方法
CN107633254A (zh) * 2017-07-25 2018-01-26 平安科技(深圳)有限公司 建立预测模型的装置、方法及计算机可读存储介质

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112259239A (zh) * 2020-10-21 2021-01-22 平安科技(深圳)有限公司 一种参数处理方法、装置、电子设备及存储介质
CN112259239B (zh) * 2020-10-21 2023-07-11 平安科技(深圳)有限公司 一种参数处理方法、装置、电子设备及存储介质
CN113380423A (zh) * 2021-05-24 2021-09-10 首都医科大学 疫情规模预测方法、装置、电子设备及存储介质
CN113706269A (zh) * 2021-09-13 2021-11-26 华润电力技术研究院有限公司 一种调频服务报价方法和调频服务报价装置
CN117147807A (zh) * 2023-11-01 2023-12-01 中海(天津)能源科技有限公司 一种用于石油勘探的油质监测系统及方法
CN117147807B (zh) * 2023-11-01 2024-01-26 中海(天津)能源科技有限公司 一种用于石油勘探的油质监测系统及方法
CN117457096A (zh) * 2023-12-26 2024-01-26 山东省海洋资源与环境研究院(山东省海洋环境监测中心、山东省水产品质量检验中心) 模拟海洋酸化装置中二氧化碳溶解量动态监测与调整系统
CN117457096B (zh) * 2023-12-26 2024-03-22 山东省海洋资源与环境研究院(山东省海洋环境监测中心、山东省水产品质量检验中心) 模拟海洋酸化装置中二氧化碳溶解量动态监测与调整系统

Also Published As

Publication number Publication date
CN109243619B (zh) 2023-03-31
CN109243619A (zh) 2019-01-18

Similar Documents

Publication Publication Date Title
WO2020010710A1 (zh) 预测模型的生成方法、装置及计算机可读存储介质
WO2019227716A1 (zh) 流感预测模型的生成方法、装置及计算机可读存储介质
US20130103381A1 (en) Systems and methods for enhancing machine translation post edit review processes
CN111488211A (zh) 基于深度学习框架的任务处理方法、装置、设备及介质
CN116166405B (zh) 异构场景下的神经网络任务调度策略确定方法及装置
US20230128318A1 (en) Automated Parameterized Modeling And Scoring Intelligence System
CN114997263B (zh) 基于机器学习的结训率分析方法、装置、设备及存储介质
JP2014026649A (ja) 出力変数の全導関数を使用した協調シミュレーションプロシージャ
WO2021031349A1 (zh) 一种辐射风险评估方法、装置及电子设备和存储介质
WO2021151304A1 (zh) 时序数据滞后性处理方法、装置、电子设备及存储介质
CN115495702B (zh) 一种模型训练能耗计算方法、装置、系统及可读存储介质
US8972072B2 (en) Optimizing power consumption in planned projects
CN113342940A (zh) 文本匹配分析方法、装置、电子设备及存储介质
US20160210170A1 (en) Computing CPU Time Usage of Activities Serviced by CPU
CN112148753B (zh) 用于对数据流执行信息处理的设备和方法
US11290048B2 (en) Method and apparatus for adaptive control of motor, and storage medium
CN111989662A (zh) 自主混合分析建模平台
WO2015063954A1 (ja) プログラム図作成装置、プログラム図作成方法、及びプログラム図作成プログラム
WO2014208005A1 (ja) 非機能評価支援装置、システム、方法およびプログラム
US11899552B2 (en) Estimation device, estimation method and estimation program
JP6677068B2 (ja) 実行時間推定装置、実行時間推定方法、及びプログラム
US9727990B2 (en) Graph display device, method and computer-readable medium
CN116662573A (zh) 一种应用于电力能源管控的知识图谱预测方法及装置
CN117009523A (zh) 基于循环预标注的文本标注方法及装置
CN117648542A (zh) 一种数据填补方法、装置、介质和电子设备

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17/05/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18925723

Country of ref document: EP

Kind code of ref document: A1