WO2021151304A1 - Method and apparatus for hysteretic processing of time series data, electronic device, and storage medium - Google Patents

Method and apparatus for hysteretic processing of time series data, electronic device, and storage medium Download PDF

Info

Publication number
WO2021151304A1
WO2021151304A1 PCT/CN2020/119091 CN2020119091W WO2021151304A1 WO 2021151304 A1 WO2021151304 A1 WO 2021151304A1 CN 2020119091 W CN2020119091 W CN 2020119091W WO 2021151304 A1 WO2021151304 A1 WO 2021151304A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
time series
time
differential
curve
Prior art date
Application number
PCT/CN2020/119091
Other languages
French (fr)
Chinese (zh)
Inventor
阮晓雯
邓攀
徐亮
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021151304A1 publication Critical patent/WO2021151304A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/17Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Definitions

  • This application relates to the field of data processing technology, and in particular to a method, device, electronic equipment, and computer-readable storage medium for processing time series data with hysteresis.
  • Lag refers to the lag between one phenomenon and another closely related phenomenon. This situation is usually caused by the influence of itself or other variables.
  • the lag is used to measure the degree of lag
  • the degree of lag Evaluation refers to the use of a time series analysis model to solve the problem of lag. For example, there will be a lag in the method of using information related to the influenza epidemic to predict the intensity of the influenza next week. Due to the high variability and uncertainty of influenza viruses, influenza prevention and control faces many challenges. The lagging assessment of influenza epidemics has become a key strategy for influenza prevention and control.
  • the inventor realizes that when the acquired information samples are getting larger and larger, the traditional time series analysis model predicts the next period of information based on the historical information extracted from the information samples, there will be a long-term lag phenomenon, which will cause the calculation process The complexity of the assessment and the decline in the accuracy of the assessment.
  • a method for processing time series data hysteresis includes:
  • the standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  • the present application also provides a method and apparatus for processing time series data lag, the apparatus includes:
  • a differential data generation module used to obtain time series sample data, and perform time series differential processing on the time series sample data to obtain a differential data set;
  • the true curve generation module is used to perform curve fitting on the difference data set to obtain a true curve
  • the prediction curve generation module is used to analyze the time series sample data through multiple pre-built time series analysis models to obtain multiple time series analysis data, and perform time series differential processing on the multiple time series analysis data to obtain multiple differential analysis data Set, curve fitting multiple differential analysis data sets to obtain multiple prediction curves;
  • a loss value calculation module configured to use a loss function to calculate a plurality of loss values between the predicted curve and the true curve
  • a hysteresis calculation module configured to calculate the hysteresis factor of each prediction curve based on the loss value, and determine the hysteresis of each prediction curve according to the hysteresis factor;
  • the standard time series analysis model screening module selects standard time series analysis models from the plurality of time series analysis models;
  • the data analysis module is configured to use the standard time series analysis model to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  • This application also provides an electronic device, which includes:
  • Memory storing at least one instruction
  • the processor executes the instructions stored in the memory to implement the following steps:
  • the standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  • This application also provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:
  • the standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  • FIG. 1 is a schematic flowchart of a method for processing time-series data hysteresis according to an embodiment of the application
  • FIG. 2 is a schematic flowchart of one of the steps in the method for processing time series data hysteresis according to an embodiment of the application;
  • FIG. 3 is a schematic diagram of modules of a time series data hysteresis processing device provided by an embodiment of the application;
  • FIG. 4 is a schematic diagram of the internal structure of an electronic device for implementing a method for processing time series data hysteresis according to an embodiment of the application;
  • the execution subject of the time-series data hysteresis processing method provided by the embodiment of the present application includes but is not limited to at least one of the electronic devices that can be configured to execute the method provided by the embodiment of the present application, such as a server and a terminal.
  • the time-series data hysteresis processing method can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform.
  • the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, etc.
  • FIG. 1 it is a schematic flowchart of a method for processing time series data hysteresis according to an embodiment of this application.
  • the method for processing time series data hysteresis includes:
  • the time-series sample data is also called time-series sample data, and includes any data with time-series characteristics, such as data of epidemic intensity changes over time, data of object temperature changes over time, and so on.
  • a java statement with a data calling function may be used to obtain the time series sample data from a database for storing the time series sample data.
  • the method further includes:
  • time series sample data has stationarity, do not perform time series difference processing on the time series sample data
  • the time series difference processing of the time series sample data is performed.
  • the embodiment of the present application can determine whether the time series sample data has stationarity by detecting whether there is a unit root in the time series sample data.
  • the unit root refers to the root modulo 1 in the time series sample data, for example: 1, -1, i, and -i are all 4th unit roots.
  • time series sample data When there is no unit root in the time series sample data, it means that the time series sample data has stationarity, and when there is a unit root in the time series sample data, it means that the time series sample data does not have stationarity.
  • the embodiment of the present application adopts a first-order difference to perform a time-series difference on the time-series sample data, wherein the first-order difference formula is as follows:
  • ⁇ yx is the first-order difference of the object temperature change data with time
  • y x+1 is the data at the time x+1
  • y x is the data at the time x.
  • performing time-series difference processing on the time-series sample data can filter out unstable factors in the time-series sample data.
  • the detailed implementation process of S2 includes:
  • the true curve is a stable curve, which can represent the curve of the difference data with time.
  • the time sequence feature refers to the time sequence of the differential data; the coordinate encoding of the differential data in the differential data set according to the time sequence feature includes the chronological sequence of the differential data.
  • the difference data is expressed in a preset coordinate system.
  • the time sequence analysis model is:
  • St represents the difference data of the difference data concentrated at time t
  • St-1 represents the difference data of the difference data concentrated at time t-1
  • X t represents the smooth value at time t
  • X t-1 represents The smoothed value at time t-1
  • Y t represents the trend value at time t
  • Y t-1 represents the trend value at time t-1
  • S t+1 represents that the differential analysis data is concentrated in the differential analysis data at time t+1
  • ⁇ and ⁇ are preset different smoothing coefficients.
  • the embodiment of the present application obtains multiple time series analysis models by customizing different smoothing coefficients ⁇ and ⁇ .
  • the embodiment of the present application uses the multiple time series analysis models to analyze the time series sample data to obtain multiple time series analysis data. For example, for the data that the object temperature changes with time as the time series sample data: [(1,2), (2,4), (3,5), (4,8)], where the first digit is the time The identifier is used to indicate the time of the time series sample data, the second digit is the data identifier, which is used to indicate the numerical value of the object temperature that changes with time; The changed data is analyzed, and multiple time series analysis data are obtained: [(5,9), (6,11), (7,14), (8,16)].
  • time series analysis data may contain unstable factors
  • the embodiment of the present application further performs time series differential processing on the time series analysis data to filter out the unstable factors in the time series analysis data.
  • the embodiment of the present application performs the time series differential processing as described in S1 on the multiple time series analysis data to obtain multiple differential analysis data sets.
  • curve fitting is performed on multiple differential analysis data sets to obtain multiple prediction curves.
  • the method of performing curve fitting on the multiple differential analysis data sets is the same as the method of performing curve fitting on the differential data set in S2, and will not be repeated here.
  • the embodiment of the present application uses the following formula to calculate the initial loss value between the multiple predicted curves and the true curve:
  • It represents the predicted value of the prediction curve
  • represents the number of the time series analysis models
  • Y represents the true value of the true curve under different translation degrees
  • represents the error factor
  • calculating the root mean square error of the initial loss value according to the initial loss value to obtain the loss value includes:
  • the lag factor ⁇ m is calculated using the following algorithm:
  • arg min refers to the set of all independent variables that make the function obtain its minimum value
  • It is an analytical value
  • is a loss value
  • i is the starting value of the prediction curve
  • n is the end value of the prediction curve.
  • the embodiment of the present application determines the hysteresis of each prediction curve according to the hysteresis factor, including:
  • ⁇ m is the lag factor of the prediction curve; It is the smallest root mean square error after ⁇ m translation.
  • the lag factors of multiple prediction curves are substituted into the above algorithm to obtain the lag of multiple prediction curves.
  • the lag indicates the degree of lag in comparison between the prediction curve and the real curve.
  • a standard time series analysis model is selected from the plurality of time series analysis models.
  • the selection of a standard time series analysis model from the multiple time series analysis models according to the hysteresis of each of the prediction curves includes:
  • the multiple time series analysis models are screened according to the hysteresis, and a standard time series analysis model is obtained.
  • models with hysteresis less than a preset threshold among multiple time series analysis models are screened out to obtain a standard time series analysis model.
  • the standard time series analysis model is a model selected after comparing the lag and error degree between different time series analysis models, which can solve the long-term lag problem in time series forecasting.
  • the lag is used to measure the degree of lag If the lag of the multiple time series analysis models is smaller, the data processed by the standard time series analysis model will be relatively accurate, and there will be fewer lag problems.
  • influenza viruses are highly variable and uncertain. Correctly predicting the peak of influenza epidemics is critical to preventing and controlling influenza.
  • the selected standard time series analysis model can solve the problems of lagging forecast results and large forecast errors when forecasting influenza peaks, and improve the accuracy of influenza forecasting.
  • the time sequence to be processed is analyzed according to the standard time sequence analysis model obtained through screening to obtain the development trend of the preset event.
  • the time series data of the preset event may be stored in the blockchain.
  • Influenza is a serious global public health problem.
  • the flu information obtained will not have lag or large errors. problem.
  • FIG. 3 it is a schematic diagram of modules of the time series data hysteresis processing device of the present application.
  • the time series data hysteresis processing apparatus 100 described in this application can be installed in an electronic device.
  • the time series data hysteresis processing device 100 may include a differential data generation module 101, a true curve generation module 102, a prediction curve generation module 103, a loss value calculation module 104, a lag calculation module 105, and a standard time series analysis model
  • the screening module 106 and the data analysis module 107 can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the differential data generating module 101 is configured to obtain time series sample data, and perform time series differential processing on the time series sample data to obtain a differential data set;
  • the true curve generation module 102 is configured to perform curve fitting on the difference data set to obtain a true curve
  • the prediction curve generation module 103 is configured to analyze the time series sample data through multiple pre-built time series analysis models to obtain multiple time series analysis data, and perform time series difference processing on the multiple time series analysis data to obtain multiple time series analysis data. Differential analytical data set, curve fitting multiple differential analytical data sets to obtain multiple prediction curves;
  • the loss value calculation module 104 is configured to use a loss function to calculate a plurality of loss values between the predicted curve and the true curve;
  • the hysteresis calculation module 105 is configured to calculate the hysteresis factor of each prediction curve based on the loss value, and determine the hysteresis of each prediction curve according to the hysteresis factor;
  • the standard time series analysis model screening module 106 selects a standard time series analysis model from the plurality of time series analysis models according to the hysteresis of each of the prediction curves;
  • the data analysis module 107 is configured to use the standard time series analysis model to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  • each module of the time series data hysteresis processing device 100 is as follows:
  • Step 1 The differential data generating module 101 obtains time series sample data, and performs time series differential processing on the time series sample data to obtain a differential data set.
  • the time series sample data includes any data with time series characteristics, such as data of epidemic intensity changes over time, data of object temperature changes over time, and so on.
  • a java statement with a data calling function may be used to obtain the time series sample data from a database for storing the time series sample data.
  • the difference data generation module 101 further executes:
  • time series sample data has stationarity, do not perform time series difference processing on the time series sample data
  • the time series difference processing of the time series sample data is performed.
  • the differential data generating module 101 of the embodiment of the present application can determine whether the time series sample data has stationarity by detecting whether there is a unit root in the time series sample data.
  • the unit root refers to the root modulo 1 in the time series sample data, for example: 1, -1, i, and -i are all 4th unit roots.
  • time series sample data When there is no unit root in the time series sample data, it means that the time series sample data has stationarity, and when there is a unit root in the time series sample data, it means that the time series sample data does not have stationarity.
  • the differential data generation module 101 of the embodiment of the present application uses a first-order difference to perform a time-series difference on the time-series sample data, wherein the first-order difference formula is as follows:
  • ⁇ yx is the first-order difference of the object temperature change data with time
  • y x+1 is the data at the time x+1
  • y x is the data at the time x.
  • performing time-series difference processing on the time-series sample data can filter out unstable factors in the time-series sample data.
  • Step 2 The true curve generation module 102 performs curve fitting on the difference data set to obtain a true curve.
  • the real curve generating module 102 is used to:
  • the difference data in the coordinate system is connected according to the time sequence feature to obtain the true curve.
  • the true curve is a stable curve, which can represent the curve of the difference data with time.
  • the time sequence feature refers to the time sequence of the differential data; the coordinate encoding of the differential data in the differential data set according to the time sequence feature includes the chronological sequence of the differential data.
  • the difference data is expressed in a preset coordinate system.
  • Step 3 The prediction curve generation module 103 analyzes the time series sample data through multiple pre-built time series analysis models to obtain multiple time series analysis data, and performs time series difference processing on the multiple time series analysis data to obtain multiple time series analysis data. Differential analytical data set, curve fitting is performed on multiple differential analytical data sets to obtain multiple prediction curves.
  • the time sequence analysis model is:
  • St represents the difference data of the difference data concentrated at time t
  • St-1 represents the difference data of the difference data concentrated at time t-1
  • X t represents the smooth value at time t
  • X t-1 represents The smoothed value at time t-1
  • Y t represents the trend value at time t
  • Y t-1 represents the trend value at time t-1
  • S t+1 represents that the differential analysis data is concentrated in the differential analysis data at time t+1
  • ⁇ and ⁇ are preset different smoothing coefficients.
  • the prediction curve generation module 103 described in the embodiment of the present application obtains multiple time series analysis models by customizing different smoothing coefficients ⁇ and ⁇ .
  • the prediction curve generation module 103 of the embodiment of the present application uses the multiple time series analysis models to analyze the time series sample data to obtain multiple time series analysis data. For example, for the data that the object temperature changes with time as the time series sample data: [(1,2), (2,4), (3,5), (4,8)], where the first digit is the time ID, used to indicate the time of the time series sample data, the second digit is a data ID, used to indicate the numerical value of the object temperature that changes with time; using the time series analysis model to compare the object temperature with time The changed data is analyzed, and multiple time series analysis data are obtained: [(5,9), (6,11), (7,14), (8,16)].
  • the prediction curve generation module 103 described in this embodiment of the present application further performs time series difference processing on the time series analysis data to convert the time series analysis data Unsteady factors in filter out.
  • the embodiment of the present application performs the time-series differential processing performed by the differential data generating module 101 on the multiple time-series analysis data to obtain multiple differential analysis data sets.
  • curve fitting is performed on multiple differential analysis data sets to obtain multiple prediction curves.
  • the method of performing curve fitting on multiple differential analysis data sets is the same as the method of performing curve fitting on differential data sets executed by the real curve generating module 102 described above, and will not be repeated here.
  • Step 4 The loss value calculation module 104 uses a loss function to calculate loss values between multiple predicted curves and the true curve.
  • the loss value calculation module 104 in the embodiment of the present application uses the following formula to calculate the initial loss value between the multiple predicted curves and the true curve:
  • It represents the predicted value of the prediction curve
  • represents the number of the time series analysis models
  • Y represents the true value of the true curve under different translation degrees
  • represents the error factor
  • the loss value calculation module 104 calculates the root mean square error of the initial loss value according to the initial loss value to obtain the loss value, including:
  • the hysteresis calculation module 105 calculates the hysteresis factors of multiple prediction curves based on the loss value, and determines the hysteresis of the multiple prediction curves according to the hysteresis factors.
  • the hysteresis calculation module 105 uses the following algorithm to calculate the hysteresis factor ⁇ m :
  • arg min is the set of all arguments that make the function obtain its minimum value
  • It is an analytical value
  • is a loss value
  • i is the starting value of the prediction curve
  • n is the end value of the prediction curve.
  • the hysteresis calculation module 105 of the embodiment of the present application determines the hysteresis of each prediction curve according to the hysteresis factor, including:
  • ⁇ m is the lag factor of the prediction curve; It is the smallest root mean square error after ⁇ m translation.
  • the hysteresis calculation module 105 described in the embodiment of the present application substitutes the hysteresis factors of multiple prediction curves into the above algorithm to obtain the hysteresis of the multiple prediction curves.
  • the hysteresis represents the lag of the prediction curve compared with the true curve degree.
  • the standard time series analysis model screening module 106 selects a standard time series analysis model from the plurality of time series analysis models according to the hysteresis of each of the prediction curves.
  • the standard time series analysis model screening module 106 selects standard time series analysis models from the multiple time series analysis models through the following operations:
  • the multiple time series analysis models are screened according to the hysteresis, and a standard time series analysis model is obtained.
  • models with hysteresis less than a preset threshold among multiple time series analysis models are screened out to obtain a standard time series analysis model.
  • the standard time series analysis model is a model selected after comparing the lag and error degree between different time series analysis models, which can solve the long-term lag problem in time series forecasting.
  • the lag is used to measure the degree of lag If the lag of the multiple time series analysis models is smaller, the data processed by the standard time series analysis model will be relatively accurate, and there will be fewer lag problems.
  • influenza viruses are highly variable and uncertain. Correctly predicting the peak of influenza epidemics is critical to preventing and controlling influenza.
  • the selected standard time series analysis model can solve the problem of lagging forecast results and large forecast errors when forecasting influenza peaks, and improve the accuracy of influenza forecasting.
  • Step 7 The data analysis module 107 uses the standard time series analysis model to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  • the data analysis module 107 analyzes the time sequence to be processed according to the standard time sequence analysis model obtained by screening, and obtains the development trend of the preset event.
  • the time series data of the preset event may be stored in the blockchain.
  • Influenza is a serious global public health problem.
  • the flu information obtained will not have lag or large errors. problem.
  • FIG. 4 it is a schematic structural diagram of an electronic device implementing the method for processing time series data hysteresis according to the present application.
  • the electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as a time series data hysteresis processing program 12.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (such as SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of the time series data hysteresis processing program 12, etc., but also to temporarily store data that has been output or will be output.
  • the processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc.
  • the processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (for example, executing Time series data hysteresis processing program, etc.), and call the data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or combinations of certain components, or different component arrangements.
  • the electronic device 1 may also include a power source (such as a battery) for supplying power to various components.
  • the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators.
  • the electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface.
  • the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may also include a user interface.
  • the user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • the time series data hysteresis processing program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
  • the standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  • the integrated module/unit of the electronic device 1 can be stored in a computer-readable storage medium. It can be non-volatile or volatile.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store a block chain node Use the created data, etc.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

Abstract

A method and an apparatus for hysteretic processing of time series data, an electronic apparatus, and a storage medium relating to data processing technology. The method comprises: performing time series differencing processing and curve fitting on time sequence sample data to obtain a real curve; using a pre-constructed plurality of time series analysis models to perform time series differencing processing and curve fitting on the time series sample data to generate a plurality of predicted curves; on the basis of loss values between the predicted curves and the real curve, calculating degrees of hysteresis of the predicted curves, on the basis of the degrees of hysteresis, filtering a standard time series analysis model out from the plurality of time series analysis models, and using the standard time series analysis model to perform data analysis on time series data of a pre-set event to obtain a trend of development of the pre-set event. The present method is able to improve the accuracy of prediction in time series analysis processes.

Description

时序数据滞后性处理方法、装置、电子设备及存储介质Time series data hysteresis processing method, device, electronic equipment and storage medium
本申请要求于2020年7月9日提交中国专利局、申请号为202010656657.9,发明名称为“时序数据滞后性处理方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on July 9, 2020, the application number is 202010656657.9, and the invention title is "Time-sequential data lag processing method, device, electronic equipment and storage medium", and its entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及数据处理技术领域,尤其涉及一种时序数据滞后性处理方法、装置、电子设备及计算机可读存储介质。This application relates to the field of data processing technology, and in particular to a method, device, electronic equipment, and computer-readable storage medium for processing time series data with hysteresis.
背景技术Background technique
滞后是指一个现象与另一个密切相关的现象之间所出现的落后延迟情况,这种情况通常是受自身或者其它变量的影响而产生的,滞后度是用来衡量滞后程度的大小,滞后度评估则是指利用一个时序分析模型去解决滞后的问题,例如对利用流感疫情相关信息预测下周流感强度的方法会存在滞后。由于流感病毒的高度可变性和不可确定性,流感防控面临诸多挑战,流感疫情的滞后度评估成为预防和控制流感的关键策略。Lag refers to the lag between one phenomenon and another closely related phenomenon. This situation is usually caused by the influence of itself or other variables. The lag is used to measure the degree of lag, the degree of lag Evaluation refers to the use of a time series analysis model to solve the problem of lag. For example, there will be a lag in the method of using information related to the influenza epidemic to predict the intensity of the influenza next week. Due to the high variability and uncertainty of influenza viruses, influenza prevention and control faces many challenges. The lagging assessment of influenza epidemics has become a key strategy for influenza prevention and control.
发明人意识到当获取的信息样本越来越大时,传统的时序分析模型在根据所述信息样本中提取到的历史信息预测下一周期信息时,会存在长期滞后现象,如此会造成计算过程的复杂以及评估准确率的下降。The inventor realizes that when the acquired information samples are getting larger and larger, the traditional time series analysis model predicts the next period of information based on the historical information extracted from the information samples, there will be a long-term lag phenomenon, which will cause the calculation process The complexity of the assessment and the decline in the accuracy of the assessment.
发明内容Summary of the invention
本申请提供的一种时序数据滞后性处理方法,包括:A method for processing time series data hysteresis provided by this application includes:
获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集;Acquiring time series sample data, and performing time series differential processing on the time series sample data to obtain a differential data set;
对所述差分数据集进行曲线拟合,得到真实曲线;Curve fitting the difference data set to obtain a true curve;
通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线;Through pre-built multiple time series analysis models, analyze the time series sample data to obtain multiple time series analysis data, perform time series differential processing on the multiple time series analysis data, obtain multiple differential analysis data sets, and analyze multiple differential analysis data. Perform curve fitting on the data set to obtain multiple prediction curves;
利用损失函数计算多个所述预测曲线和所述真实曲线之间的损失值;Using a loss function to calculate a plurality of loss values between the predicted curve and the real curve;
基于所述损失值计算每个所述预测曲线的滞后因子,并根据所述滞后因子确定每个所述预测曲线的滞后度;Calculating the lag factor of each prediction curve based on the loss value, and determining the lag of each prediction curve according to the lag factor;
根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型;Filtering out a standard time series analysis model from the plurality of time series analysis models according to the lag of each of the prediction curves;
利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。The standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
本申请还提供一种时序数据滞后性处理方法装置,所述装置包括:The present application also provides a method and apparatus for processing time series data lag, the apparatus includes:
差分数据生成模块,用于获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集;A differential data generation module, used to obtain time series sample data, and perform time series differential processing on the time series sample data to obtain a differential data set;
真实曲线生成模块,用于对所述差分数据集进行曲线拟合,得到真实曲线;The true curve generation module is used to perform curve fitting on the difference data set to obtain a true curve;
预测曲线生成模块,用于通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理, 得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线;The prediction curve generation module is used to analyze the time series sample data through multiple pre-built time series analysis models to obtain multiple time series analysis data, and perform time series differential processing on the multiple time series analysis data to obtain multiple differential analysis data Set, curve fitting multiple differential analysis data sets to obtain multiple prediction curves;
损失值计算模块,用于利用损失函数计算多个所述预测曲线和所述真实曲线之间的损失值;A loss value calculation module, configured to use a loss function to calculate a plurality of loss values between the predicted curve and the true curve;
滞后度计算模块,用于基于所述损失值计算每个所述预测曲线的滞后因子,并根据所述滞后因子确定每个所述预测曲线的滞后度;A hysteresis calculation module, configured to calculate the hysteresis factor of each prediction curve based on the loss value, and determine the hysteresis of each prediction curve according to the hysteresis factor;
标准时序分析模型筛选模块,根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型;The standard time series analysis model screening module, according to the hysteresis of each of the prediction curves, selects standard time series analysis models from the plurality of time series analysis models;
数据分析模块,用于利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。The data analysis module is configured to use the standard time series analysis model to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
本申请还提供一种电子设备,所述电子设备包括:This application also provides an electronic device, which includes:
存储器,存储至少一个指令;及Memory, storing at least one instruction; and
处理器,执行所述存储器中存储的指令以实现如下步骤:The processor executes the instructions stored in the memory to implement the following steps:
获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集;Acquiring time series sample data, and performing time series differential processing on the time series sample data to obtain a differential data set;
对所述差分数据集进行曲线拟合,得到真实曲线;Curve fitting the difference data set to obtain a true curve;
通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线;Through pre-built multiple time series analysis models, analyze the time series sample data to obtain multiple time series analysis data, perform time series differential processing on the multiple time series analysis data, obtain multiple differential analysis data sets, and analyze multiple differential analysis data. Perform curve fitting on the data set to obtain multiple prediction curves;
利用损失函数计算多个所述预测曲线和所述真实曲线之间的损失值;Using a loss function to calculate a plurality of loss values between the predicted curve and the real curve;
基于所述损失值计算每个所述预测曲线的滞后因子,并根据所述滞后因子确定每个所述预测曲线的滞后度;Calculating the lag factor of each prediction curve based on the loss value, and determining the lag of each prediction curve according to the lag factor;
根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型;Filtering out a standard time series analysis model from the plurality of time series analysis models according to the lag of each of the prediction curves;
利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。The standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
本申请还提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:This application also provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:
获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集;Acquiring time series sample data, and performing time series differential processing on the time series sample data to obtain a differential data set;
对所述差分数据集进行曲线拟合,得到真实曲线;Curve fitting the difference data set to obtain a true curve;
通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线;Through pre-built multiple time series analysis models, analyze the time series sample data to obtain multiple time series analysis data, perform time series differential processing on the multiple time series analysis data, obtain multiple differential analysis data sets, and analyze multiple differential analysis data. Perform curve fitting on the data set to obtain multiple prediction curves;
利用损失函数计算多个所述预测曲线和所述真实曲线之间的损失值;Using a loss function to calculate a plurality of loss values between the predicted curve and the real curve;
基于所述损失值计算每个所述预测曲线的滞后因子,并根据所述滞后因子确定每个所述预测曲线的滞后度;Calculating the lag factor of each prediction curve based on the loss value, and determining the lag of each prediction curve according to the lag factor;
根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型;Filtering out a standard time series analysis model from the plurality of time series analysis models according to the lag of each of the prediction curves;
利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。The standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
附图说明Description of the drawings
图1为本申请一实施例提供的时序数据滞后性处理方法的流程示意图;FIG. 1 is a schematic flowchart of a method for processing time-series data hysteresis according to an embodiment of the application;
图2为本申请一实施例提供的时序数据滞后性处理方法中其中一个步骤的流程示意图;2 is a schematic flowchart of one of the steps in the method for processing time series data hysteresis according to an embodiment of the application;
图3为本申请一实施例提供的时序数据滞后性处理装置的模块示意图;3 is a schematic diagram of modules of a time series data hysteresis processing device provided by an embodiment of the application;
图4为本申请一实施例提供的实现时序数据滞后性处理方法的电子设备的内部结构示意图;4 is a schematic diagram of the internal structure of an electronic device for implementing a method for processing time series data hysteresis according to an embodiment of the application;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请实施例提供的时序数据滞后性处理方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之,所述时序数据滞后性处理方法可以由安装在终端设备或服务端设备的软件或硬件来执行,所述软件可以是区块链平台。所述服务端包括但不限于:单台服务器、服务器集群、云端服务器或云端服务器集群等。The execution subject of the time-series data hysteresis processing method provided by the embodiment of the present application includes but is not limited to at least one of the electronic devices that can be configured to execute the method provided by the embodiment of the present application, such as a server and a terminal. In other words, the time-series data hysteresis processing method can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, etc.
参照图1所示,为本申请实施例提供的时序数据滞后性处理方法的流程示意图。Referring to FIG. 1, it is a schematic flowchart of a method for processing time series data hysteresis according to an embodiment of this application.
在本实施例中,时序数据滞后性处理方法包括:In this embodiment, the method for processing time series data hysteresis includes:
S1、获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集。S1. Acquire time series sample data, and perform time series differential processing on the time series sample data to obtain a differential data set.
本申请实施例中,所述时序样本数据也称时间序列样本数据,包括任何具有时序特征的数据,例如流行病强度随时间变化的数据、物体温度随时间变化的数据等。本申请实施例可利用具有数据调用功能的java语句从用于存储时序样本数据的数据库中获取所述时序样本数据。In the embodiments of the present application, the time-series sample data is also called time-series sample data, and includes any data with time-series characteristics, such as data of epidemic intensity changes over time, data of object temperature changes over time, and so on. In the embodiment of the present application, a java statement with a data calling function may be used to obtain the time series sample data from a database for storing the time series sample data.
优选地,所述对所述时序样本数据进行时序差分处理之前,所述方法还包括:Preferably, before the time-series differential processing is performed on the time-series sample data, the method further includes:
判断所述时序样本数据是否具有平稳性;Judging whether the time series sample data has stationarity;
在所述时序样本数据具有平稳性时,不执行对时序样本数据的时序差分处理;When the time series sample data has stationarity, do not perform time series difference processing on the time series sample data;
在所述时序样本数据不具有平稳性时,执行对时序样本数据的时序差分处理。When the time series sample data does not have stationarity, the time series difference processing of the time series sample data is performed.
优选地,本申请实施例可以通过检测所述时序样本数据中是否存在单位根来判断所述时序样本数据是否具有平稳性。其中,单位根是指所述时序样本数据中模为1的根,例如:1、-1、i、-i都是4次单位根。Preferably, the embodiment of the present application can determine whether the time series sample data has stationarity by detecting whether there is a unit root in the time series sample data. Wherein, the unit root refers to the root modulo 1 in the time series sample data, for example: 1, -1, i, and -i are all 4th unit roots.
当所述时序样本数据中不存在单位根时,说明所述时序样本数据具有平稳性,以及当所述时序样本数据中存在单位根时,说明所述时序样本数据不具有平稳性。When there is no unit root in the time series sample data, it means that the time series sample data has stationarity, and when there is a unit root in the time series sample data, it means that the time series sample data does not have stationarity.
较佳地,本申请实施例采用一阶差分对所述时序样本数据进行时序差分,其中,所述一阶差分公式如下所示:Preferably, the embodiment of the present application adopts a first-order difference to perform a time-series difference on the time-series sample data, wherein the first-order difference formula is as follows:
Δ yx=y x+1-y x Δ yx =y x+1 -y x
其中,Δ yx是所述物体温度随时间变化数据的一阶差分,y x+1是在x+1时刻的数据,y x是在x时刻的数据。 Wherein, Δyx is the first-order difference of the object temperature change data with time, y x+1 is the data at the time x+1, and y x is the data at the time x.
在本申请实施例中,对所述时序样本数据进行时序差分处理,可以将时序样本数据中的不平稳因素滤除。In the embodiment of the present application, performing time-series difference processing on the time-series sample data can filter out unstable factors in the time-series sample data.
S2、对差分数据集进行曲线拟合,得到真实曲线。S2. Perform curve fitting on the differential data set to obtain a true curve.
具体地,参阅图2所示,所述S2的详细实施流程包括:Specifically, referring to FIG. 2, the detailed implementation process of S2 includes:
S20、提取所述差分数据集中差分数据的时序特征;S20. Extract the time sequence characteristics of the differential data in the differential data set;
S21、利用所述时序特征对所述差分数据集中的差分数据进行坐标编码;S21: Use the time sequence feature to perform coordinate encoding on the differential data in the differential data set;
S22、利用所述坐标编码将所述差分数据集中的差分数据映射至预先构建的坐标系中;S22. Use the coordinate encoding to map the difference data in the difference data set to a pre-built coordinate system;
S23、根据所述时序特征将坐标系中的差分数据进行连接,得到所述真实曲线。所述真实曲线是一条平稳的,可以表示差分数据随时间变化的曲线。S23. Connect the differential data in the coordinate system according to the time series feature to obtain the true curve. The true curve is a stable curve, which can represent the curve of the difference data with time.
本申请实施例中,所述时序特征是指所述差分数据的时间先后顺序;根据所述时序特征对所述差分数据集中的差分数据进行坐标编码包括按照所述差分数据的时间先后顺序将所述差分数据表示在预设的坐标系中。In the embodiment of the present application, the time sequence feature refers to the time sequence of the differential data; the coordinate encoding of the differential data in the differential data set according to the time sequence feature includes the chronological sequence of the differential data. The difference data is expressed in a preset coordinate system.
例如,建立以t时刻,t+1时刻为横轴,以L t,L t+1为纵轴建立一个坐标系,将所述差分数据集中的差分数据进行坐标编码,映射在所述构建的坐标系中,进而根据所述时序特征将坐标系中的差分数据进行连接,得到所述真实曲线。 For example, establish a coordinate system with time t and t+1 as the horizontal axis, and L t and L t+1 as the vertical axis to establish a coordinate system; In the coordinate system, the difference data in the coordinate system is further connected according to the time sequence feature to obtain the true curve.
S3、通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线。S3. Analyze the time series sample data through multiple pre-built time series analysis models to obtain multiple time series analysis data, and perform time series differential processing on the multiple time series analysis data to obtain multiple differential analysis data sets. Curve fitting is performed on the differential analysis data set to obtain multiple prediction curves.
本申请实施例中,所述时序分析模型为:In the embodiment of the present application, the time sequence analysis model is:
S t+1=X t+Y t S t+1 =X t +Y t
X t=αS t+(1-α)(S t-1-Y t-1) X t =αS t +(1-α)(S t-1 -Y t-1 )
Y t=β(X t-X t-1)+(1-β)Y t-1 Y t =β(X t -X t-1 )+(1-β)Y t-1
其中,S t表示所述差分数据集中在t时刻的差分数据,S t-1表示所述差分数据集中在t-1时刻的差分数据,X t表示t时刻的平滑值,X t-1表示t-1时刻的平滑值,Y t表示t时刻的趋势值,Y t-1表示t-1时刻的趋势值,S t+1表示所述差分解析数据集中在t+1时刻的差分解析数据,α和β为预设的不同的平滑系数。 Among them, St represents the difference data of the difference data concentrated at time t, St-1 represents the difference data of the difference data concentrated at time t-1, X t represents the smooth value at time t, and X t-1 represents The smoothed value at time t-1, Y t represents the trend value at time t, Y t-1 represents the trend value at time t-1, and S t+1 represents that the differential analysis data is concentrated in the differential analysis data at time t+1 , Α and β are preset different smoothing coefficients.
本申请实施例通过自定义不同的平滑系数α和β来得到多个时序分析模型。The embodiment of the present application obtains multiple time series analysis models by customizing different smoothing coefficients α and β.
具体地,本申请实施例利用所述多个时序分析模型对所述时序样本数据进行解析,得到多个时序解析数据。例如,对于以物体温度随时间变化的数据为时序样本数据:[(1,2),(2,4),(3,5),(4,8)],其中,第一位数字为时间标识,用于表示所述时序样本数据的时间,第二位数字为数据标识,用于表示所述随时间变化的物体温度的数值大小;利用所述时序分析模型对所述以物体温度随时间变化的数据进行解析,得到多个时序解析数据:[(5,9),(6,11),(7,14),(8,16)]。Specifically, the embodiment of the present application uses the multiple time series analysis models to analyze the time series sample data to obtain multiple time series analysis data. For example, for the data that the object temperature changes with time as the time series sample data: [(1,2), (2,4), (3,5), (4,8)], where the first digit is the time The identifier is used to indicate the time of the time series sample data, the second digit is the data identifier, which is used to indicate the numerical value of the object temperature that changes with time; The changed data is analyzed, and multiple time series analysis data are obtained: [(5,9), (6,11), (7,14), (8,16)].
进一步地,由于得到所述时序解析数据中可能包含不平稳的因素,因此,本申请实施例进一步对所述时序解析数据进行时序差分处理,以将所述时序解析数据中的不平稳因素滤除。Further, since the time series analysis data may contain unstable factors, the embodiment of the present application further performs time series differential processing on the time series analysis data to filter out the unstable factors in the time series analysis data. .
详细地,本申请实施例对所述多个时序解析数据进行如S1所述的时序差分处理,得到多个差分解析数据集。In detail, the embodiment of the present application performs the time series differential processing as described in S1 on the multiple time series analysis data to obtain multiple differential analysis data sets.
本申请实施例对多个差分解析数据集进行曲线拟合,获得多个预测曲线。In the embodiment of the present application, curve fitting is performed on multiple differential analysis data sets to obtain multiple prediction curves.
较佳地,所述对多个差分解析数据集进行曲线拟合的方法与上述S2中对差分数据集进行曲线拟合的方法相同,这里不再赘述。Preferably, the method of performing curve fitting on the multiple differential analysis data sets is the same as the method of performing curve fitting on the differential data set in S2, and will not be repeated here.
S4、利用损失函数计算多个预测曲线和所述真实曲线之间的损失值。S4. Calculate the loss value between the multiple predicted curves and the real curve by using a loss function.
较佳地,本申请实施例利用下述公式计算所述多个预测曲线和所述真实曲线之间的初始损失值:Preferably, the embodiment of the present application uses the following formula to calculate the initial loss value between the multiple predicted curves and the true curve:
Figure PCTCN2020119091-appb-000001
Figure PCTCN2020119091-appb-000001
其中,
Figure PCTCN2020119091-appb-000002
表示初始损失值,
Figure PCTCN2020119091-appb-000003
表示预测曲线的预测值,σ表示所述时序分析模型的个数,Y表示不同平移程度下真实曲线的真实值,α表示误差因子。
in,
Figure PCTCN2020119091-appb-000002
Represents the initial loss value,
Figure PCTCN2020119091-appb-000003
It represents the predicted value of the prediction curve, σ represents the number of the time series analysis models, Y represents the true value of the true curve under different translation degrees, and α represents the error factor.
进一步地,根据所述初始损失值,计算初始损失值均方根误差,得到损失值,包括:Further, calculating the root mean square error of the initial loss value according to the initial loss value to obtain the loss value includes:
Figure PCTCN2020119091-appb-000004
Figure PCTCN2020119091-appb-000004
其中,
Figure PCTCN2020119091-appb-000005
为初始损失值的均方根误差,即损失值,
Figure PCTCN2020119091-appb-000006
表示初始损失值,m表示所述时序分析模型的个数。
in,
Figure PCTCN2020119091-appb-000005
Is the root mean square error of the initial loss value, that is, the loss value,
Figure PCTCN2020119091-appb-000006
Represents the initial loss value, and m represents the number of time series analysis models.
S5、基于所述损失值计算多个预测曲线的滞后因子,并根据所述滞后因子确定多个预测曲线的滞后度。S5. Calculate the lag factors of multiple prediction curves based on the loss value, and determine the lag degrees of the multiple prediction curves according to the lag factors.
本申请实施例中,利用如下算法计算所述滞后因子τ mIn the embodiment of the present application, the lag factor τ m is calculated using the following algorithm:
Figure PCTCN2020119091-appb-000007
Figure PCTCN2020119091-appb-000007
其中,arg min是指使函数取得其最小值的所有自变量集合,
Figure PCTCN2020119091-appb-000008
为解析值,τ为损失值,i为所述预测曲线的起始值,n为所述预测曲线的终点值。
Among them, arg min refers to the set of all independent variables that make the function obtain its minimum value,
Figure PCTCN2020119091-appb-000008
It is an analytical value, τ is a loss value, i is the starting value of the prediction curve, and n is the end value of the prediction curve.
进一步地,本申请实施例根据所述滞后因子确定每个所述预测曲线的滞后度,包括:Further, the embodiment of the present application determines the hysteresis of each prediction curve according to the hysteresis factor, including:
利用如下算法计算所述滞后度score delayUse the following algorithm to calculate the lag score delay :
Figure PCTCN2020119091-appb-000009
Figure PCTCN2020119091-appb-000009
其中,τ m为所述预测曲线的滞后因子;
Figure PCTCN2020119091-appb-000010
为τ m平移后最小的均方根误差。
Wherein, τ m is the lag factor of the prediction curve;
Figure PCTCN2020119091-appb-000010
It is the smallest root mean square error after τ m translation.
本申请实施例将多个预测曲线的滞后因子代入上述算法,得到多个预测曲线的滞后度,所述滞后度表示所述预测曲线与所述真实曲线相比较滞后的程度。In the embodiment of the present application, the lag factors of multiple prediction curves are substituted into the above algorithm to obtain the lag of multiple prediction curves. The lag indicates the degree of lag in comparison between the prediction curve and the real curve.
S6、根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型。S6. According to the hysteresis of each of the prediction curves, a standard time series analysis model is selected from the plurality of time series analysis models.
在本申请实施例中,所述根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型,包括:In the embodiment of the present application, the selection of a standard time series analysis model from the multiple time series analysis models according to the hysteresis of each of the prediction curves includes:
根据所述滞后度对多个时序分析模型进行筛选,得到标准时序分析模型。The multiple time series analysis models are screened according to the hysteresis, and a standard time series analysis model is obtained.
本申请实施例筛选出多个时序分析模型中滞后度小于预设阈值的模型,得到标准时序分析模型。In the embodiment of the present application, models with hysteresis less than a preset threshold among multiple time series analysis models are screened out to obtain a standard time series analysis model.
所述标准时序分析模型是比较不同的所述时序分析模型之间滞后度和误差程度后所筛选出来的模型,可以解决时序预测中长期存在的滞后性问题,滞后度是用来衡量滞后程度的大小,所述多个时序分析模型的滞后度越小,经过所述标准时序分析模型处理后的数据会相对准确,存在的滞后问题会比较少。The standard time series analysis model is a model selected after comparing the lag and error degree between different time series analysis models, which can solve the long-term lag problem in time series forecasting. The lag is used to measure the degree of lag If the lag of the multiple time series analysis models is smaller, the data processed by the standard time series analysis model will be relatively accurate, and there will be fewer lag problems.
生活中,流感病毒存在高度可变性和不可确定性,正确预测流感疫情的高峰点对预防和控制流感是很关键的。而筛选出来的标准时序分析模型可以解决预测流感高峰点时出现的预测结果滞后、预测误差较大的问题,提高流感预测的准确率。In life, influenza viruses are highly variable and uncertain. Correctly predicting the peak of influenza epidemics is critical to preventing and controlling influenza. The selected standard time series analysis model can solve the problems of lagging forecast results and large forecast errors when forecasting influenza peaks, and improve the accuracy of influenza forecasting.
S7、利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。S7. Use the standard time series analysis model to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
本申请实施例中,根据筛选得到的所述标准时序分析模型,对待处理的所述时序进行分析,得到所述预设事件的发展趋势。In the embodiment of the present application, the time sequence to be processed is analyzed according to the standard time sequence analysis model obtained through screening to obtain the development trend of the preset event.
优选地,为了保证数据安全,所述预设事件的时序数据可以存储在区块链中。Preferably, in order to ensure data security, the time series data of the preset event may be stored in the blockchain.
例如:流感是全球性的严重公共卫生问题,利用所述时序分析模型对历史的流感疫情相关信息进行分析,可以得到下周流感强度信息,同时得到的流感信息不会存在滞后和误差较大的问题。For example: Influenza is a serious global public health problem. Using the time series analysis model to analyze historical flu epidemic information, you can get flu intensity information next week. At the same time, the flu information obtained will not have lag or large errors. problem.
如图3所示,是本申请时序数据滞后性处理装置的模块示意图。As shown in FIG. 3, it is a schematic diagram of modules of the time series data hysteresis processing device of the present application.
本申请所述时序数据滞后性处理装置100可以安装于电子设备中。根据实现的功能,所述时序数据滞后性处理装置100可以包括差分数据生成模块101、真实曲线生成模块102、预测曲线生成模块103、损失值计算模块104、滞后度计算模块105、标准时序分析模型筛选模块106和数据分析模块107。本申请所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。The time series data hysteresis processing apparatus 100 described in this application can be installed in an electronic device. According to the implemented functions, the time series data hysteresis processing device 100 may include a differential data generation module 101, a true curve generation module 102, a prediction curve generation module 103, a loss value calculation module 104, a lag calculation module 105, and a standard time series analysis model The screening module 106 and the data analysis module 107. The module described in this application can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
在本实施例中,关于各模块/单元的功能如下:In this embodiment, the functions of each module/unit are as follows:
所述差分数据生成模块101,用于获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集;The differential data generating module 101 is configured to obtain time series sample data, and perform time series differential processing on the time series sample data to obtain a differential data set;
所述真实曲线生成模块102,用于对所述差分数据集进行曲线拟合,得到真实曲线;The true curve generation module 102 is configured to perform curve fitting on the difference data set to obtain a true curve;
所述预测曲线生成模块103,用于通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线;The prediction curve generation module 103 is configured to analyze the time series sample data through multiple pre-built time series analysis models to obtain multiple time series analysis data, and perform time series difference processing on the multiple time series analysis data to obtain multiple time series analysis data. Differential analytical data set, curve fitting multiple differential analytical data sets to obtain multiple prediction curves;
所述损失值计算模块104,用于利用损失函数计算多个所述预测曲线和所述真实曲线之间的损失值;The loss value calculation module 104 is configured to use a loss function to calculate a plurality of loss values between the predicted curve and the true curve;
所述滞后度计算模块105,用于基于所述损失值计算每个所述预测曲线的滞后因子,并根据所述滞后因子确定每个所述预测曲线的滞后度;The hysteresis calculation module 105 is configured to calculate the hysteresis factor of each prediction curve based on the loss value, and determine the hysteresis of each prediction curve according to the hysteresis factor;
所述标准时序分析模型筛选模块106,根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型;The standard time series analysis model screening module 106 selects a standard time series analysis model from the plurality of time series analysis models according to the hysteresis of each of the prediction curves;
所述数据分析模块107,用于利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。The data analysis module 107 is configured to use the standard time series analysis model to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
详细地,所述时序数据滞后性处理装置100各模块的具体实施方式如下:In detail, the specific implementation of each module of the time series data hysteresis processing device 100 is as follows:
步骤一、所述差分数据生成模块101获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集。Step 1: The differential data generating module 101 obtains time series sample data, and performs time series differential processing on the time series sample data to obtain a differential data set.
本申请实施例中,所述时序样本数据包括任何具有时序特征的数据,例如流行病强度随时间变化的数据、物体温度随时间变化的数据等。本申请实施例可利用具有数据调用功能的java语句从用于存储时序样本数据的数据库中获取所述时序样本数据。In the embodiment of the present application, the time series sample data includes any data with time series characteristics, such as data of epidemic intensity changes over time, data of object temperature changes over time, and so on. In the embodiment of the present application, a java statement with a data calling function may be used to obtain the time series sample data from a database for storing the time series sample data.
优选地,所述对所述时序样本数据进行时序差分处理之前,所述差分数据生成模块101还执行:Preferably, before the time-series difference processing is performed on the time-series sample data, the difference data generation module 101 further executes:
判断所述时序样本数据是否具有平稳性;Judging whether the time series sample data has stationarity;
在所述时序样本数据具有平稳性时,不执行对时序样本数据的时序差分处理;When the time series sample data has stationarity, do not perform time series difference processing on the time series sample data;
在所述时序样本数据不具有平稳性时,执行对时序样本数据的时序差分处理。When the time series sample data does not have stationarity, the time series difference processing of the time series sample data is performed.
优选地,本申请实施例所述差分数据生成模块101可以通过检测所述时序样本数据中是否存在单位根来判断所述时序样本数据是否具有平稳性。其中,单位根是指所述时序样本数据中模为1的根,例如:1、-1、i、-i都是4次单位根。Preferably, the differential data generating module 101 of the embodiment of the present application can determine whether the time series sample data has stationarity by detecting whether there is a unit root in the time series sample data. Wherein, the unit root refers to the root modulo 1 in the time series sample data, for example: 1, -1, i, and -i are all 4th unit roots.
当所述时序样本数据中不存在单位根时,说明所述时序样本数据具有平稳性,以及当所述时序样本数据中存在单位根时,说明所述时序样本数据不具有平稳性。When there is no unit root in the time series sample data, it means that the time series sample data has stationarity, and when there is a unit root in the time series sample data, it means that the time series sample data does not have stationarity.
较佳地,本申请实施例所述差分数据生成模块101采用一阶差分对所述时序样本数据进行时序差分,其中,所述一阶差分公式如下所示:Preferably, the differential data generation module 101 of the embodiment of the present application uses a first-order difference to perform a time-series difference on the time-series sample data, wherein the first-order difference formula is as follows:
Δ yx=y x+1-y x Δ yx =y x+1 -y x
其中,Δ yx是所述物体温度随时间变化数据的一阶差分,y x+1是在x+1时刻的数据,y x是在x时刻的数据。 Wherein, Δyx is the first-order difference of the object temperature change data with time, y x+1 is the data at the time x+1, and y x is the data at the time x.
在本申请实施例中,对所述时序样本数据进行时序差分处理,可以将时序样本数据中的不平稳因素滤除。In the embodiment of the present application, performing time-series difference processing on the time-series sample data can filter out unstable factors in the time-series sample data.
步骤二、所述真实曲线生成模块102对所述差分数据集进行曲线拟合,得到真实曲线。Step 2: The true curve generation module 102 performs curve fitting on the difference data set to obtain a true curve.
具体地,所述真实曲线生成模块102详细用于:Specifically, the real curve generating module 102 is used to:
提取所述差分数据集中差分数据的时序特征;Extracting the timing characteristics of the differential data in the differential data set;
利用所述时序特征对所述差分数据集中的差分数据进行坐标编码;Use the time sequence feature to perform coordinate encoding on the differential data in the differential data set;
利用所述坐标编码将所述差分数据集中的差分数据映射至预先构建的坐标系中;Mapping the differential data in the differential data set to a pre-built coordinate system by using the coordinate encoding;
根据所述时序特征将坐标系中的差分数据进行连接,得到所述真实曲线。所述真实曲线是一条平稳的,可以表示差分数据随时间变化的曲线。The difference data in the coordinate system is connected according to the time sequence feature to obtain the true curve. The true curve is a stable curve, which can represent the curve of the difference data with time.
本申请实施例中,所述时序特征是指所述差分数据的时间先后顺序;根据所述时序特征对所述差分数据集中的差分数据进行坐标编码包括按照所述差分数据的时间先后顺序将所述差分数据表示在预设的坐标系中。In the embodiment of the present application, the time sequence feature refers to the time sequence of the differential data; the coordinate encoding of the differential data in the differential data set according to the time sequence feature includes the chronological sequence of the differential data. The difference data is expressed in a preset coordinate system.
例如,建立以t时刻,t+1时刻为横轴,以L t,L t+1为纵轴建立一个坐标系,将所述差分数据集中的差分数据进行坐标编码,映射在所述构建的坐标系中,进而根据所述时序特征将坐标系中的差分数据进行连接,得到所述真实曲线。 For example, establish a coordinate system with time t and t+1 as the horizontal axis, and L t and L t+1 as the vertical axis to establish a coordinate system; In the coordinate system, the difference data in the coordinate system is further connected according to the time sequence feature to obtain the true curve.
步骤三、所述预测曲线生成模块103通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线。Step 3. The prediction curve generation module 103 analyzes the time series sample data through multiple pre-built time series analysis models to obtain multiple time series analysis data, and performs time series difference processing on the multiple time series analysis data to obtain multiple time series analysis data. Differential analytical data set, curve fitting is performed on multiple differential analytical data sets to obtain multiple prediction curves.
本申请实施例中,所述时序分析模型为:In the embodiment of the present application, the time sequence analysis model is:
S t+1=X t+Y t S t+1 =X t +Y t
X t=αS t+(1-α)(S t-1-Y t-1) X t =αS t +(1-α)(S t-1 -Y t-1 )
Y t=β(X t-X t-1)+(1-β)Y t-1 Y t =β(X t -X t-1 )+(1-β)Y t-1
其中,S t表示所述差分数据集中在t时刻的差分数据,S t-1表示所述差分数据集中在t-1时刻的差分数据,X t表示t时刻的平滑值,X t-1表示t-1时刻的平滑值,Y t表示t时刻的趋势值,Y t-1表示t-1时刻的趋势值,S t+1表示所述差分解析数据集中在t+1时刻的差分解析数据,α和β为预设的不同的平滑系数。 Among them, St represents the difference data of the difference data concentrated at time t, St-1 represents the difference data of the difference data concentrated at time t-1, X t represents the smooth value at time t, and X t-1 represents The smoothed value at time t-1, Y t represents the trend value at time t, Y t-1 represents the trend value at time t-1, and S t+1 represents that the differential analysis data is concentrated in the differential analysis data at time t+1 , Α and β are preset different smoothing coefficients.
本申请实施例所述预测曲线生成模块103通过自定义不同的平滑系数α和β来得到多个时序分析模型。The prediction curve generation module 103 described in the embodiment of the present application obtains multiple time series analysis models by customizing different smoothing coefficients α and β.
具体地,本申请实施例所述预测曲线生成模块103利用所述多个时序分析模型对所述时序样本数据进行解析,得到多个时序解析数据。例如,对于以物体温度随时间变化的数据为时序样本数据:[(1,2),(2,4),(3,5),(4,8)],其中,第一位数字为时间标识,用于表示所述时序样本数据的时间,第二位数字为数据标识,用于表示所述随时间变化的物体温度的数值大小;利用所述时序分析模型对所述以物体温度随时间变化的数据进行解析,得到多个时序解析数据:[(5,9),(6,11),(7,14),(8,16)]。Specifically, the prediction curve generation module 103 of the embodiment of the present application uses the multiple time series analysis models to analyze the time series sample data to obtain multiple time series analysis data. For example, for the data that the object temperature changes with time as the time series sample data: [(1,2), (2,4), (3,5), (4,8)], where the first digit is the time ID, used to indicate the time of the time series sample data, the second digit is a data ID, used to indicate the numerical value of the object temperature that changes with time; using the time series analysis model to compare the object temperature with time The changed data is analyzed, and multiple time series analysis data are obtained: [(5,9), (6,11), (7,14), (8,16)].
进一步地,由于得到所述时序解析数据中可能包含不平稳的因素,因此,本申请实施例所述预测曲线生成模块103进一步对所述时序解析数据进行时序差分处理,以将所述时序解析数据中的不平稳因素滤除。Further, since the obtained time series analysis data may contain unstable factors, the prediction curve generation module 103 described in this embodiment of the present application further performs time series difference processing on the time series analysis data to convert the time series analysis data Unsteady factors in filter out.
详细地,本申请实施例对所述多个时序解析数据进行如差分数据生成模块101执行的所述的时序差分处理,得到多个差分解析数据集。In detail, the embodiment of the present application performs the time-series differential processing performed by the differential data generating module 101 on the multiple time-series analysis data to obtain multiple differential analysis data sets.
本申请实施例对多个差分解析数据集进行曲线拟合,获得多个预测曲线。In the embodiment of the present application, curve fitting is performed on multiple differential analysis data sets to obtain multiple prediction curves.
较佳地,所述对多个差分解析数据集进行曲线拟合的方法与上述真实曲线生成模块102执行的对差分数据集进行曲线拟合的方法相同,这里不再赘述。Preferably, the method of performing curve fitting on multiple differential analysis data sets is the same as the method of performing curve fitting on differential data sets executed by the real curve generating module 102 described above, and will not be repeated here.
步骤四、所述损失值计算模块104利用损失函数计算多个预测曲线和所述真实曲线之间的损失值。Step 4: The loss value calculation module 104 uses a loss function to calculate loss values between multiple predicted curves and the true curve.
较佳地,本申请实施例所述损失值计算模块104利用下述公式计算所述多个预测曲线和所述真实曲线之间的初始损失值:Preferably, the loss value calculation module 104 in the embodiment of the present application uses the following formula to calculate the initial loss value between the multiple predicted curves and the true curve:
Figure PCTCN2020119091-appb-000011
Figure PCTCN2020119091-appb-000011
其中,
Figure PCTCN2020119091-appb-000012
表示初始损失值,
Figure PCTCN2020119091-appb-000013
表示预测曲线的预测值,σ表示所述时序分析模型的个数,Y表示不同平移程度下真实曲线的真实值,α表示误差因子。
in,
Figure PCTCN2020119091-appb-000012
Represents the initial loss value,
Figure PCTCN2020119091-appb-000013
It represents the predicted value of the prediction curve, σ represents the number of the time series analysis models, Y represents the true value of the true curve under different translation degrees, and α represents the error factor.
进一步地,所述损失值计算模块104根据所述初始损失值,计算初始损失值均方根误差,得到损失值,包括:Further, the loss value calculation module 104 calculates the root mean square error of the initial loss value according to the initial loss value to obtain the loss value, including:
Figure PCTCN2020119091-appb-000014
Figure PCTCN2020119091-appb-000014
其中,
Figure PCTCN2020119091-appb-000015
为初始损失值的均方根误差,即损失值,
Figure PCTCN2020119091-appb-000016
表示初始损失值,m表示所述时序分析模型的个数。
in,
Figure PCTCN2020119091-appb-000015
Is the root mean square error of the initial loss value, that is, the loss value,
Figure PCTCN2020119091-appb-000016
Represents the initial loss value, and m represents the number of time series analysis models.
步骤五、所述滞后度计算模块105基于所述损失值计算多个预测曲线的滞后因子,并根据所述滞后因子确定多个预测曲线的滞后度。Step 5. The hysteresis calculation module 105 calculates the hysteresis factors of multiple prediction curves based on the loss value, and determines the hysteresis of the multiple prediction curves according to the hysteresis factors.
本申请实施例中,所述滞后度计算模块105利用如下算法计算所述滞后因子τ mIn the embodiment of the present application, the hysteresis calculation module 105 uses the following algorithm to calculate the hysteresis factor τ m :
Figure PCTCN2020119091-appb-000017
Figure PCTCN2020119091-appb-000017
其中,arg min是使函数取得其最小值的所有自变量集合,
Figure PCTCN2020119091-appb-000018
为解析值,τ为损失值,i为所述预测曲线的起始值,n为所述预测曲线的终点值。
Among them, arg min is the set of all arguments that make the function obtain its minimum value,
Figure PCTCN2020119091-appb-000018
It is an analytical value, τ is a loss value, i is the starting value of the prediction curve, and n is the end value of the prediction curve.
进一步地,本申请实施例所述滞后度计算模块105根据所述滞后因子确定每个所述预测曲线的滞后度,包括:Further, the hysteresis calculation module 105 of the embodiment of the present application determines the hysteresis of each prediction curve according to the hysteresis factor, including:
利用如下算法计算所述滞后度score delayUse the following algorithm to calculate the lag score delay :
Figure PCTCN2020119091-appb-000019
Figure PCTCN2020119091-appb-000019
其中,τ m为所述预测曲线的滞后因子;
Figure PCTCN2020119091-appb-000020
为τ m平移后最小的均方根误差。
Wherein, τ m is the lag factor of the prediction curve;
Figure PCTCN2020119091-appb-000020
It is the smallest root mean square error after τ m translation.
本申请实施例所述滞后度计算模块105将多个预测曲线的滞后因子代入上述算法,得到多个预测曲线的滞后度,所述滞后度表示所述预测曲线与所述真实曲线相比较滞后的程度。The hysteresis calculation module 105 described in the embodiment of the present application substitutes the hysteresis factors of multiple prediction curves into the above algorithm to obtain the hysteresis of the multiple prediction curves. The hysteresis represents the lag of the prediction curve compared with the true curve degree.
步骤六、所述标准时序分析模型筛选模块106根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型。 Step 6. The standard time series analysis model screening module 106 selects a standard time series analysis model from the plurality of time series analysis models according to the hysteresis of each of the prediction curves.
在本申请实施例中,所述标准时序分析模型筛选模块106通过下述操作从所述多个时序分析模型中筛选出标准时序分析模型:In the embodiment of the present application, the standard time series analysis model screening module 106 selects standard time series analysis models from the multiple time series analysis models through the following operations:
根据所述滞后度对多个时序分析模型进行筛选,得到标准时序分析模型。The multiple time series analysis models are screened according to the hysteresis, and a standard time series analysis model is obtained.
本申请实施例筛选出多个时序分析模型中滞后度小于预设阈值的模型,得到标准时序分析模型。In the embodiment of the present application, models with hysteresis less than a preset threshold among multiple time series analysis models are screened out to obtain a standard time series analysis model.
所述标准时序分析模型是比较不同的所述时序分析模型之间滞后度和误差程度后所筛选出来的模型,可以解决时序预测中长期存在的滞后性问题,滞后度是用来衡量滞后程度的大小,所述多个时序分析模型的滞后度越小,经过所述标准时序分析模型处理后的数据会相对准确,存在的滞后问题会比较少。The standard time series analysis model is a model selected after comparing the lag and error degree between different time series analysis models, which can solve the long-term lag problem in time series forecasting. The lag is used to measure the degree of lag If the lag of the multiple time series analysis models is smaller, the data processed by the standard time series analysis model will be relatively accurate, and there will be fewer lag problems.
生活中,流感病毒存在高度可变性和不可确定性,正确预测流感疫情的高峰点对预防和控制流感是很关键的。而筛选出来的标准时序分析模型可以解决预测 流感高峰点时出现的预测结果滞后、预测误差较大的问题,提高流感预测的准确率。In life, influenza viruses are highly variable and uncertain. Correctly predicting the peak of influenza epidemics is critical to preventing and controlling influenza. The selected standard time series analysis model can solve the problem of lagging forecast results and large forecast errors when forecasting influenza peaks, and improve the accuracy of influenza forecasting.
步骤七、所述数据分析模块107利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。Step 7. The data analysis module 107 uses the standard time series analysis model to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
本申请实施例中,所述数据分析模块107根据筛选得到的所述标准时序分析模型,对待处理的所述时序进行分析,得到所述预设事件的发展趋势。In the embodiment of the present application, the data analysis module 107 analyzes the time sequence to be processed according to the standard time sequence analysis model obtained by screening, and obtains the development trend of the preset event.
优选地,为了保证数据安全,所述预设事件的时序数据可以存储在区块链中。Preferably, in order to ensure data security, the time series data of the preset event may be stored in the blockchain.
例如:流感是全球性的严重公共卫生问题,利用所述时序分析模型对历史的流感疫情相关信息进行分析,可以得到下周流感强度信息,同时得到的流感信息不会存在滞后和误差较大的问题。For example: Influenza is a serious global public health problem. Using the time series analysis model to analyze historical flu epidemic information, you can get flu intensity information next week. At the same time, the flu information obtained will not have lag or large errors. problem.
如图4所示,是本申请实现时序数据滞后性处理方法的电子设备的结构示意图。As shown in FIG. 4, it is a schematic structural diagram of an electronic device implementing the method for processing time series data hysteresis according to the present application.
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序,如时序数据滞后性处理程序12。The electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as a time series data hysteresis processing program 12.
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如时序数据滞后性处理程序12的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (such as SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of the time series data hysteresis processing program 12, etc., but also to temporarily store data that has been output or will be output.
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如执行时序数据滞后性处理程序等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。The processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc. The processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (for example, executing Time series data hysteresis processing program, etc.), and call the data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。The bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.
图4仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图3示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or combinations of certain components, or different component arrangements.
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电 路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。For example, although not shown, the electronic device 1 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power The device implements functions such as charge management, discharge management, and power consumption management. The power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators. The electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。Further, the electronic device 1 may also include a network interface. Optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may also include a user interface. The user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)). Optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc. Among them, the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only for illustrative purposes, and are not limited by this structure in the scope of the patent application.
所述电子设备1中的所述存储器11存储的时序数据滞后性处理程序12是多个指令的组合,在所述处理器10中运行时,可以实现:The time series data hysteresis processing program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集;Acquiring time series sample data, and performing time series differential processing on the time series sample data to obtain a differential data set;
对所述差分数据集进行曲线拟合,得到真实曲线;Curve fitting the difference data set to obtain a true curve;
通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线;Through pre-built multiple time series analysis models, analyze the time series sample data to obtain multiple time series analysis data, perform time series differential processing on the multiple time series analysis data, obtain multiple differential analysis data sets, and analyze multiple differential analysis data. Perform curve fitting on the data set to obtain multiple prediction curves;
利用损失函数计算多个所述预测曲线和所述真实曲线之间的损失值;Using a loss function to calculate a plurality of loss values between the predicted curve and the real curve;
基于所述损失值计算每个所述预测曲线的滞后因子,并根据所述滞后因子确定每个所述预测曲线的滞后度;Calculating the lag factor of each prediction curve based on the loss value, and determining the lag of each prediction curve according to the lag factor;
根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型;Filtering out a standard time series analysis model from the plurality of time series analysis models according to the lag of each of the prediction curves;
利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。The standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中,所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。Further, if the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. It can be non-volatile or volatile. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
进一步地,所述计算机可用存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store a block chain node Use the created data, etc.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed equipment, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the present application.
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图表记视为限制所涉及的权利要求。Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any accompanying diagrams in the claims should not be regarded as limiting the claims involved.
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。In addition, it is obvious that the word "including" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices stated in the system claims can also be implemented by one unit or device through software or hardware. The second class words are used to indicate names, and do not indicate any specific order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solution of the present application.

Claims (20)

  1. 一种时序数据滞后性处理方法,其中,所述方法包括:A method for processing time series data hysteresis, wherein the method includes:
    获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集;Acquiring time series sample data, and performing time series differential processing on the time series sample data to obtain a differential data set;
    对所述差分数据集进行曲线拟合,得到真实曲线;Curve fitting the difference data set to obtain a true curve;
    通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线;Through pre-built multiple time series analysis models, analyze the time series sample data to obtain multiple time series analysis data, perform time series differential processing on the multiple time series analysis data, obtain multiple differential analysis data sets, and analyze multiple differential analysis data. Perform curve fitting on the data set to obtain multiple prediction curves;
    利用损失函数计算多个所述预测曲线和所述真实曲线之间的损失值;Using a loss function to calculate a plurality of loss values between the predicted curve and the real curve;
    基于所述损失值计算每个所述预测曲线的滞后因子,并根据所述滞后因子确定每个所述预测曲线的滞后度;Calculating the lag factor of each prediction curve based on the loss value, and determining the lag of each prediction curve according to the lag factor;
    根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型;Filtering out a standard time series analysis model from the plurality of time series analysis models according to the lag of each of the prediction curves;
    利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。The standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  2. 如权利要求1所述的时序数据滞后性处理方法,其中,所述对所述时序样本数据进行时序差分处理之前,所述方法还包括:5. The method for processing time series data hysteresis according to claim 1, wherein before the time series differential processing is performed on the time series sample data, the method further comprises:
    判断所述时序样本数据是否具有平稳性;Judging whether the time series sample data has stationarity;
    在所述时序样本数据具有平稳性时,不执行对时序样本数据的时序差分处理;When the time series sample data has stationarity, do not perform time series difference processing on the time series sample data;
    在所述时序样本数据不具有平稳性时,执行对时序样本数据的时序差分处理。When the time series sample data does not have stationarity, the time series difference processing of the time series sample data is performed.
  3. 如权利要求1或2所述的时序数据滞后性处理方法,其中,所述对所述时序样本数据进行时序差分处理,包括:3. The method for processing time series data hysteresis according to claim 1 or 2, wherein the performing time series difference processing on the time series sample data comprises:
    采用下述公式对所述时序样本数据进行时序差分处理:Use the following formula to perform time-series differential processing on the time-series sample data:
    Δ yx=y x+1-y x Δ yx =y x+1 -y x
    其中,Δ yx是所述物体温度随时间变化数据的一阶差分,y x+1是在x+1时刻的数据,y x是在x时刻的数据。 Wherein, Δyx is the first-order difference of the object temperature change data with time, y x+1 is the data at the time x+1, and y x is the data at the time x.
  4. 如权利要求1所述的时序数据滞后性处理方法,其中,所述对所述差分数据集进行曲线拟合,得到真实曲线,包括:The method for processing time series data lagging according to claim 1, wherein said performing curve fitting on said differential data set to obtain a true curve comprises:
    提取所述差分数据集中差分数据的时序特征;Extracting the timing characteristics of the differential data in the differential data set;
    利用所述时序特征对所述差分数据集中的差分数据进行坐标编码;Use the time sequence feature to perform coordinate encoding on the differential data in the differential data set;
    利用所述坐标编码将所述差分数据集中的差分数据映射至预先构建的坐标系中;Mapping the differential data in the differential data set to a pre-built coordinate system by using the coordinate encoding;
    根据所述时序特征将坐标系中的差分数据进行连接,得到所述真实曲线。The difference data in the coordinate system is connected according to the time sequence feature to obtain the true curve.
  5. 如权利要求1所述的时序数据滞后性处理方法,其中,所述时序分析模型为:3. The method for processing time series data lagging according to claim 1, wherein the time series analysis model is:
    S t+1=X t+Y t S t+1 =X t +Y t
    X t=αS t+(1-α)(S t-1-Y t-1) X t =αS t +(1-α)(S t-1 -Y t-1 )
    Y t=β(X t-X t-1)+(1-β)Y t-1 Y t =β(X t -X t-1 )+(1-β)Y t-1
    其中,S t表示所述差分数据集中在t时刻的差分数据,S t-1表示所述差分数据集中在t-1时刻的差分数据,X t表示t时刻的平滑值,X t-1表示t-1时刻的平滑值,Y t表示t时刻的趋势值,Y t-1表示t-1时刻的趋势值,S t+1表示所述差分解析数据集中在t+1时刻的差分解析数据,α和β为预设的不同的平滑系数。 Among them, St represents the difference data of the difference data concentrated at time t, St-1 represents the difference data of the difference data concentrated at time t-1, X t represents the smooth value at time t, and X t-1 represents The smoothed value at time t-1, Y t represents the trend value at time t, Y t-1 represents the trend value at time t-1, and S t+1 represents that the differential analysis data is concentrated in the differential analysis data at time t+1 , Α and β are preset different smoothing coefficients.
  6. 如权利要求1所述的时序数据滞后性处理方法,其中,所述基于所述损失值计算所述预测曲线的滞后因子,包括:The method for processing time series data lagging according to claim 1, wherein the calculating the lag factor of the prediction curve based on the loss value comprises:
    利用如下算法计算所述滞后因子τ m The lag factor τ m is calculated using the following algorithm:
    Figure PCTCN2020119091-appb-100001
    Figure PCTCN2020119091-appb-100001
    其中,arg min是使函数取得其最小值的所有自变量集合,
    Figure PCTCN2020119091-appb-100002
    为解析值,τ为损失值,i为所述预测曲线的起始值,n为所述预测曲线的终点值。
    Among them, arg min is the set of all arguments that make the function obtain its minimum value,
    Figure PCTCN2020119091-appb-100002
    It is an analytical value, τ is a loss value, i is the starting value of the prediction curve, and n is the end value of the prediction curve.
  7. 如权利要求6所述的时序数据滞后性处理方法,其中,所述根据所述滞后因子确定每个所述预测曲线的滞后度,包括:8. The method for processing time series data lagging according to claim 6, wherein said determining the lagging degree of each said prediction curve according to said lagging factor comprises:
    利用如下算法计算所述滞后度score delayUse the following algorithm to calculate the lag score delay :
    Figure PCTCN2020119091-appb-100003
    Figure PCTCN2020119091-appb-100003
    其中,τ m为所述预测曲线的滞后因子;
    Figure PCTCN2020119091-appb-100004
    为τ m平移后最小的均方根误差。
    Wherein, τ m is the lag factor of the prediction curve;
    Figure PCTCN2020119091-appb-100004
    It is the smallest root mean square error after τ m translation.
  8. 一种时序数据滞后性处理方法装置,其中,所述装置包括:A method and device for processing time series data hysteresis, wherein the device includes:
    差分数据生成模块,用于获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集;A differential data generation module, used to obtain time series sample data, and perform time series differential processing on the time series sample data to obtain a differential data set;
    真实曲线生成模块,用于对所述差分数据集进行曲线拟合,得到真实曲线;The true curve generation module is used to perform curve fitting on the difference data set to obtain a true curve;
    预测曲线生成模块,用于通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线;The prediction curve generation module is used to analyze the time series sample data through multiple pre-built time series analysis models to obtain multiple time series analysis data, and perform time series differential processing on the multiple time series analysis data to obtain multiple differential analysis data Set, curve fitting multiple differential analysis data sets to obtain multiple prediction curves;
    损失值计算模块,用于利用损失函数计算多个所述预测曲线和所述真实曲线之间的损失值;A loss value calculation module, configured to use a loss function to calculate a plurality of loss values between the predicted curve and the true curve;
    滞后度计算模块,用于基于所述损失值计算每个所述预测曲线的滞后因子,并根据所述滞后因子确定每个所述预测曲线的滞后度;A hysteresis calculation module, configured to calculate the hysteresis factor of each prediction curve based on the loss value, and determine the hysteresis of each prediction curve according to the hysteresis factor;
    标准时序分析模型筛选模块,根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型;The standard time series analysis model screening module, according to the hysteresis of each of the prediction curves, selects standard time series analysis models from the plurality of time series analysis models;
    数据分析模块,用于利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。The data analysis module is configured to use the standard time series analysis model to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  9. 一种电子设备,其中,所述电子设备包括:An electronic device, wherein the electronic device includes:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下步骤:The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the following steps:
    获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集;Acquiring time series sample data, and performing time series differential processing on the time series sample data to obtain a differential data set;
    对所述差分数据集进行曲线拟合,得到真实曲线;Curve fitting the difference data set to obtain a true curve;
    通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线;Through pre-built multiple time series analysis models, analyze the time series sample data to obtain multiple time series analysis data, perform time series differential processing on the multiple time series analysis data, obtain multiple differential analysis data sets, and analyze multiple differential analysis data. Perform curve fitting on the data set to obtain multiple prediction curves;
    利用损失函数计算多个所述预测曲线和所述真实曲线之间的损失值;Using a loss function to calculate a plurality of loss values between the predicted curve and the real curve;
    基于所述损失值计算每个所述预测曲线的滞后因子,并根据所述滞后因子确定每个所述预测曲线的滞后度;Calculating the lag factor of each prediction curve based on the loss value, and determining the lag of each prediction curve according to the lag factor;
    根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型;Filtering out a standard time series analysis model from the plurality of time series analysis models according to the lag of each of the prediction curves;
    利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。The standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  10. 如权利要求9所述的电子设备,其中,所述对所述时序样本数据进行时序差分处理之前,所述方法还包括:9. The electronic device according to claim 9, wherein, before the time-series difference processing is performed on the time-series sample data, the method further comprises:
    判断所述时序样本数据是否具有平稳性;Judging whether the time series sample data has stationarity;
    在所述时序样本数据具有平稳性时,不执行对时序样本数据的时序差分处理;When the time series sample data has stationarity, do not perform time series difference processing on the time series sample data;
    在所述时序样本数据不具有平稳性时,执行对时序样本数据的时序差分处理。When the time series sample data does not have stationarity, the time series difference processing of the time series sample data is performed.
  11. 如权利要求9或10所述的电子设备,其中,所述对所述时序样本数据进行时序差分处理,包括:The electronic device according to claim 9 or 10, wherein the performing time-series difference processing on the time-series sample data comprises:
    采用下述公式对所述时序样本数据进行时序差分处理:Use the following formula to perform time-series differential processing on the time-series sample data:
    Δ yx=y x+1-y x Δ yx =y x+1 -y x
    其中,Δ yx是所述物体温度随时间变化数据的一阶差分,y x+1是在x+1时刻的数据,y x是在x时刻的数据。 Wherein, Δyx is the first-order difference of the object temperature change data with time, y x+1 is the data at the time x+1, and y x is the data at the time x.
  12. 如权利要求9所述的电子设备,其中,所述对所述差分数据集进行曲线拟合,得到真实曲线,包括:9. The electronic device according to claim 9, wherein said performing curve fitting on said difference data set to obtain a true curve comprises:
    提取所述差分数据集中差分数据的时序特征;Extracting the timing characteristics of the differential data in the differential data set;
    利用所述时序特征对所述差分数据集中的差分数据进行坐标编码;Use the time sequence feature to perform coordinate encoding on the differential data in the differential data set;
    利用所述坐标编码将所述差分数据集中的差分数据映射至预先构建的坐标系中;Mapping the differential data in the differential data set to a pre-built coordinate system by using the coordinate encoding;
    根据所述时序特征将坐标系中的差分数据进行连接,得到所述真实曲线。The difference data in the coordinate system is connected according to the time sequence feature to obtain the true curve.
  13. 如权利要求9所述的电子设备,其中,所述时序分析模型为:9. The electronic device of claim 9, wherein the timing analysis model is:
    S t+1=X t+Y t S t+1 =X t +Y t
    X t=αS t+(1-α)(S t-1-Y t-1) X t =αS t +(1-α)(S t-1 -Y t-1 )
    Y t=β(X t-X t-1)+(1-β)Y t-1 Y t =β(X t -X t-1 )+(1-β)Y t-1
    其中,S t表示所述差分数据集中在t时刻的差分数据,S t-1表示所述差分数据集中在t-1时刻的差分数据,X t表示t时刻的平滑值,X t-1表示t-1时刻的平滑值,Y t表示t时刻的趋势值,Y t-1表示t-1时刻的趋势值,S t+1表示所述差分解析 数据集中在t+1时刻的差分解析数据,α和β为预设的不同的平滑系数。 Among them, St represents the differential data of the differential data concentrated at time t, St-1 represents the differential data of the differential data concentrated at time t-1, X t represents the smooth value at time t, and X t-1 represents The smoothed value at time t-1, Y t represents the trend value at time t, Y t-1 represents the trend value at time t-1, and S t+1 represents that the differential analysis data is concentrated in the differential analysis data at time t+1 , Α and β are preset different smoothing coefficients.
  14. 如权利要求9所述的电子设备,其中,所述基于所述损失值计算所述预测曲线的滞后因子,包括:9. The electronic device according to claim 9, wherein the calculating the lag factor of the prediction curve based on the loss value comprises:
    利用如下算法计算所述滞后因子τ m The lag factor τ m is calculated using the following algorithm:
    Figure PCTCN2020119091-appb-100005
    Figure PCTCN2020119091-appb-100005
    其中,arg min是使函数取得其最小值的所有自变量集合,
    Figure PCTCN2020119091-appb-100006
    为解析值,τ为损失值,i为所述预测曲线的起始值,n为所述预测曲线的终点值。
    Among them, arg min is the set of all arguments that make the function obtain its minimum value,
    Figure PCTCN2020119091-appb-100006
    It is an analytical value, τ is a loss value, i is the starting value of the prediction curve, and n is the end value of the prediction curve.
  15. 如权利要求9所述的电子设备,其中,所述根据所述滞后因子确定每个所述预测曲线的滞后度,包括:9. The electronic device according to claim 9, wherein said determining the hysteresis of each said prediction curve according to said hysteresis factor comprises:
    利用如下算法计算所述滞后度score delayUse the following algorithm to calculate the lag score delay :
    Figure PCTCN2020119091-appb-100007
    Figure PCTCN2020119091-appb-100007
    其中,τ m为所述预测曲线的滞后因子;
    Figure PCTCN2020119091-appb-100008
    为τ m平移后最小的均方根误差。
    Wherein, τ m is the lag factor of the prediction curve;
    Figure PCTCN2020119091-appb-100008
    It is the smallest root mean square error after τ m translation.
  16. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the following steps:
    获取时序样本数据,对所述时序样本数据进行时序差分处理,得到差分数据集;Acquiring time series sample data, and performing time series differential processing on the time series sample data to obtain a differential data set;
    对所述差分数据集进行曲线拟合,得到真实曲线;Curve fitting the difference data set to obtain a true curve;
    通过预先构建的多个时序分析模型,解析所述时序样本数据,得到多个时序解析数据,对所述多个时序解析数据进行时序差分处理,得到多个差分解析数据集,对多个差分解析数据集进行曲线拟合,获得多个预测曲线;Through pre-built multiple time series analysis models, analyze the time series sample data to obtain multiple time series analysis data, perform time series differential processing on the multiple time series analysis data, obtain multiple differential analysis data sets, and analyze multiple differential analysis data. Perform curve fitting on the data set to obtain multiple prediction curves;
    利用损失函数计算多个所述预测曲线和所述真实曲线之间的损失值;Using a loss function to calculate a plurality of loss values between the predicted curve and the real curve;
    基于所述损失值计算每个所述预测曲线的滞后因子,并根据所述滞后因子确定每个所述预测曲线的滞后度;Calculating the lag factor of each prediction curve based on the loss value, and determining the lag of each prediction curve according to the lag factor;
    根据所述每个所述预测曲线的滞后度,从所述多个时序分析模型中筛选出标准时序分析模型;Filtering out a standard time series analysis model from the plurality of time series analysis models according to the lag of each of the prediction curves;
    利用所述标准时序分析模型对用户输入的预设事件的时序数据进行数据分析,得到所述预设事件的发展趋势。The standard time series analysis model is used to perform data analysis on the time series data of the preset event input by the user to obtain the development trend of the preset event.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述对所述时序样本数据进行时序差分处理之前,所述方法还包括:15. The computer-readable storage medium according to claim 16, wherein, before the time-series difference processing is performed on the time-series sample data, the method further comprises:
    判断所述时序样本数据是否具有平稳性;Judging whether the time series sample data has stationarity;
    在所述时序样本数据具有平稳性时,不执行对时序样本数据的时序差分处理;When the time series sample data has stationarity, do not perform time series difference processing on the time series sample data;
    在所述时序样本数据不具有平稳性时,执行对时序样本数据的时序差分处理。When the time series sample data does not have stationarity, the time series difference processing of the time series sample data is performed.
  18. 如权利要求16或17所述的计算机可读存储介质,其中,所述对所述时序样本数据进行时序差分处理,包括:18. The computer-readable storage medium according to claim 16 or 17, wherein the performing time-series difference processing on the time-series sample data comprises:
    采用下述公式对所述时序样本数据进行时序差分处理:Use the following formula to perform time-series differential processing on the time-series sample data:
    Δ yx=y x+1-y x Δ yx =y x+1 -y x
    其中,Δ yx是所述物体温度随时间变化数据的一阶差分,y x+1是在x+1时刻的数据,y x是在x时刻的数据。 Wherein, Δyx is the first-order difference of the object temperature change data with time, y x+1 is the data at the time x+1, and y x is the data at the time x.
  19. 如权利要求16所述的计算机可读存储介质,其中,所述对所述差分数据集进行曲线拟合,得到真实曲线,包括:15. The computer-readable storage medium of claim 16, wherein said performing curve fitting on the difference data set to obtain a true curve comprises:
    提取所述差分数据集中差分数据的时序特征;Extracting the timing characteristics of the differential data in the differential data set;
    利用所述时序特征对所述差分数据集中的差分数据进行坐标编码;Use the time sequence feature to perform coordinate encoding on the differential data in the differential data set;
    利用所述坐标编码将所述差分数据集中的差分数据映射至预先构建的坐标系中;Mapping the differential data in the differential data set to a pre-built coordinate system by using the coordinate encoding;
    根据所述时序特征将坐标系中的差分数据进行连接,得到所述真实曲线。The difference data in the coordinate system is connected according to the time sequence feature to obtain the true curve.
  20. 如权利要求16所述的计算机可读存储介质,其中,所述时序分析模型为:15. The computer-readable storage medium of claim 16, wherein the timing analysis model is:
    S t+1=X t+Y t S t+1 =X t +Y t
    X t=αS t+(1-α)(S t-1-Y t-1) X t =αS t +(1-α)(S t-1 -Y t-1 )
    Y t=β(X t-X t-1)+(1-β)Y t-1 Y t =β(X t -X t-1 )+(1-β)Y t-1
    其中,S t表示所述差分数据集中在t时刻的差分数据,S t-1表示所述差分数据集中在t-1时刻的差分数据,X t表示t时刻的平滑值,X t-1表示t-1时刻的平滑值,Y t表示t时刻的趋势值,Y t-1表示t-1时刻的趋势值,S t+1表示所述差分解析数据集中在t+1时刻的差分解析数据,α和β为预设的不同的平滑系数。 Among them, St represents the difference data of the difference data concentrated at time t, St-1 represents the difference data of the difference data concentrated at time t-1, X t represents the smooth value at time t, and X t-1 represents The smoothed value at time t-1, Y t represents the trend value at time t, Y t-1 represents the trend value at time t-1, and S t+1 represents that the differential analysis data is concentrated in the differential analysis data at time t+1 , Α and β are preset different smoothing coefficients.
PCT/CN2020/119091 2020-07-09 2020-09-29 Method and apparatus for hysteretic processing of time series data, electronic device, and storage medium WO2021151304A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010656657.9A CN111814106A (en) 2020-07-09 2020-07-09 Time series data hysteresis processing method and device, electronic equipment and storage medium
CN202010656657.9 2020-07-09

Publications (1)

Publication Number Publication Date
WO2021151304A1 true WO2021151304A1 (en) 2021-08-05

Family

ID=72842854

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119091 WO2021151304A1 (en) 2020-07-09 2020-09-29 Method and apparatus for hysteretic processing of time series data, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN111814106A (en)
WO (1) WO2021151304A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215438B (en) * 2020-11-09 2021-05-14 广东新禾道信息科技有限公司 Emergency disaster early warning analysis data processing method and system
CN113393179B (en) * 2021-08-18 2022-06-28 江苏中协智能科技有限公司 Data integration system based on time sequence difference

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968670A (en) * 2012-10-23 2013-03-13 北京京东世纪贸易有限公司 Method and device for predicting data
US20150112900A1 (en) * 2013-10-23 2015-04-23 Honda Motor Co., Ltd. Time-series data prediction device, time-series data prediction method, and program
CN107871538A (en) * 2016-12-19 2018-04-03 平安科技(深圳)有限公司 Big data Forecasting Methodology and system based on macroscopical factor
US20180341625A1 (en) * 2013-11-29 2018-11-29 Hitachi High-Technologies Corporation Data processing method, data processing apparatus and processing apparatus
CN109189762A (en) * 2018-09-03 2019-01-11 深圳市智物联网络有限公司 A kind of industry internet of things data analysis method, system and relevant device
CN110706823A (en) * 2019-11-15 2020-01-17 广州地理研究所 Method for predicting respiratory system disease morbidity based on lag analysis and LSTM

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968670A (en) * 2012-10-23 2013-03-13 北京京东世纪贸易有限公司 Method and device for predicting data
US20150112900A1 (en) * 2013-10-23 2015-04-23 Honda Motor Co., Ltd. Time-series data prediction device, time-series data prediction method, and program
US20180341625A1 (en) * 2013-11-29 2018-11-29 Hitachi High-Technologies Corporation Data processing method, data processing apparatus and processing apparatus
CN107871538A (en) * 2016-12-19 2018-04-03 平安科技(深圳)有限公司 Big data Forecasting Methodology and system based on macroscopical factor
CN109189762A (en) * 2018-09-03 2019-01-11 深圳市智物联网络有限公司 A kind of industry internet of things data analysis method, system and relevant device
CN110706823A (en) * 2019-11-15 2020-01-17 广州地理研究所 Method for predicting respiratory system disease morbidity based on lag analysis and LSTM

Also Published As

Publication number Publication date
CN111814106A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
Walker et al. Accurate and stable run-time power modeling for mobile and embedded CPUs
WO2021184727A1 (en) Data abnormality detection method and apparatus, electronic device and storage medium
WO2021189904A1 (en) Data anomaly detection method and apparatus, and electronic device and storage medium
CN109243619B (en) Generation method and device of prediction model and computer readable storage medium
WO2019019255A1 (en) Apparatus and method for establishing prediction model, program for establishing prediction model, and computer-readable storage medium
WO2021151304A1 (en) Method and apparatus for hysteretic processing of time series data, electronic device, and storage medium
US8635484B2 (en) Event based correlation of power events
JP2012521596A (en) Correlation between power distribution devices and devices
TWI663510B (en) Equipment maintenance forecasting system and operation method thereof
CN107357764B (en) Data analysis method, electronic device, and computer storage medium
WO2022088632A1 (en) User data monitoring and analysis method, apparatus, device, and medium
CN112365070A (en) Power load prediction method, device, equipment and readable storage medium
CN113268403A (en) Time series analysis and prediction method, device, equipment and storage medium
CN105447214B (en) Method and device for determining device parameters
JP2019219848A (en) Source code analysis method and source code analysis device
WO2023246391A1 (en) Extraction of risk feature description
JP2007164346A (en) Decision tree changing method, abnormality determination method, and program
US20150347174A1 (en) Method, Apparatus, and System for Migrating Virtual Machine
JP6930195B2 (en) Model identification device, prediction device, monitoring system, model identification method and prediction method
CN115394442A (en) Development evaluation method, device, equipment and medium
US11556685B1 (en) Time-based power analysis
WO2022178933A1 (en) Context-based voice sentiment detection method and apparatus, device and storage medium
CN113220551A (en) Index trend prediction and early warning method and device, electronic equipment and storage medium
CN112149880A (en) User scale prediction method, device, electronic equipment and storage medium
JP2013205970A (en) Device, program and method for estimating execution time

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20916806

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20916806

Country of ref document: EP

Kind code of ref document: A1