CN111967616A - Automatic time series regression method and device - Google Patents

Automatic time series regression method and device Download PDF

Info

Publication number
CN111967616A
CN111967616A CN202010832356.7A CN202010832356A CN111967616A CN 111967616 A CN111967616 A CN 111967616A CN 202010832356 A CN202010832356 A CN 202010832356A CN 111967616 A CN111967616 A CN 111967616A
Authority
CN
China
Prior art keywords
time
time series
data set
time sequence
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010832356.7A
Other languages
Chinese (zh)
Other versions
CN111967616B (en
Inventor
陈海波
罗志鹏
王锦
姚灿美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyan Technology Beijing Co ltd
Original Assignee
Shenyan Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyan Technology Beijing Co ltd filed Critical Shenyan Technology Beijing Co ltd
Priority to CN202010832356.7A priority Critical patent/CN111967616B/en
Priority claimed from CN202010832356.7A external-priority patent/CN111967616B/en
Publication of CN111967616A publication Critical patent/CN111967616A/en
Application granted granted Critical
Publication of CN111967616B publication Critical patent/CN111967616B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Abstract

The invention provides an automatic time series regression method and a device, wherein the method comprises the following steps: acquiring a time sequence data set, and preprocessing the time sequence data set; carrying out automatic time sequence characteristic engineering processing and data sampling on the preprocessed time sequence data set; establishing different types of machine learning models; and calculating dynamic weight based on a time sliding window according to the time sequence data set after preprocessing, automatic time sequence characteristic engineering processing and data sampling so as to fuse different types of machine learning models. In the machine learning application related to time series data, the invention can conveniently obtain the application model without depending on experience and knowledge accumulation of data scientists, and can obtain more accurate output result by using the model.

Description

Automatic time series regression method and device
Technical Field
The present invention relates to the field of machine learning technologies, and in particular, to an automatic time series regression method, an automatic time series regression apparatus, a computer device, a non-transitory computer-readable storage medium, and a computer program product.
Background
With the advent of the data age, the data volume is growing exponentially, and the growing speed forces the scientific community to face various challenges, among which, an important data representation form is a time series which is a function of time as a main independent variable, and many sequence data in life belong to the category of the time series, such as stock indexes, heart and brain electrograms, even voice signals, wind speed of certain places in grasslands, and the like, and have inherent characteristic changes. Time series regression is a deterministic and orderly regularity embedded in a large amount of data that appears random and unordered. The idea of this rule is not formal, and even changing a small factor in different time periods, different places will produce different changes, so the rule is dynamic and limited.
Time series relational data are very common in application scenes such as economy, finance, insurance, online advertising, recommendation systems, medical treatment and the like, and people often need to use the data to construct a machine learning model and apply the machine learning model to improve the effect of corresponding services. The time series data is the most important data for solving the industry scene, especially the big data scene, because the particularity of data input, the way of time series storage and the design way of the database are also greatly different from the common relational database. Current methods for time series pattern recognition mainly involve two directions: one is called a complex system, and the other is machine learning. Complex systems require fitting data to known models, such as classical AR (Auto Regressive Model), MA (Moving Average Model), ARMA (Auto Regressive and Moving Average Model), ARIMA (Auto Regressive integrated Moving Average Model), ARIMA (differential Auto Regressive Moving Average Model). While machine learning is using a generic model, such as a neural network, to perform a "brute force" fit.
In the conventional machine learning application, an experienced expert is required to extract effective feature information from the time series data and utilize the information to improve the effect of the machine learning model. Even with a deep knowledge base, experts need to build valuable timing features through continuous trial and error and improve the performance of the machine learning model by using a plurality of associated tables. In addition, strong machine learning expertise is also required for selecting appropriate machine learning models and hyper-parameters as support.
Disclosure of Invention
In order to solve the technical problems, the invention provides an automatic time series regression method and device, and in machine learning application related to time series data, an application model can be conveniently obtained without depending on experience and knowledge accumulation of a data scientist, and a more accurate output result can be obtained by using the model.
The technical scheme adopted by the invention is as follows:
an automatic time series regression method comprising the steps of: acquiring a time sequence data set, and preprocessing the time sequence data set; carrying out automatic time sequence characteristic engineering processing and data sampling on the preprocessed time sequence data set; establishing different types of machine learning models; and calculating dynamic weight based on a time sliding window according to the time sequence data set after preprocessing, automatic time sequence characteristic engineering processing and data sampling so as to fuse different types of machine learning models.
Preprocessing the time series data set, comprising: and smoothing abnormal points in the time sequence data set.
The characteristics obtained by carrying out automatic time sequence characteristic engineering processing on the preprocessed time sequence data set comprise target characteristics based on a time sliding window, target statistical characteristics based on the time sliding window, target trend characteristics based on the time sliding window, important original characteristics based on the time sliding window and statistical characteristics based on the time sliding window.
Performing data sampling on the preprocessed time series data set, including: randomly sampling IDs in the time-series data set, wherein different sampling ratios are used for different sizes of data volume.
Different types of machine learning models include linear regression and LightGBM models.
Calculating dynamic weights based on a time sliding window according to a time sequence data set after preprocessing, automatic time sequence characteristic engineering processing and data sampling so as to fuse different types of machine learning models, wherein the dynamic weights comprise: determining an initial fusion weight through the verification set; setting a time window of a test set, and testing with the initial fusion weight in a first time window; after each time window is finished, obtaining corresponding optimal fusion weight according to the test result of the time window; and updating the optimal fusion weight of the time window according to a set rule, and testing the optimal fusion weight in the next time window by using the updated fusion weight.
An automatic time series regression device comprising: the preprocessing module is used for acquiring a time sequence data set and preprocessing the time sequence data set; the characteristic engineering and sampling module is used for carrying out automatic time sequence characteristic engineering processing and data sampling on the preprocessed time sequence data set; the model establishing module is used for establishing different types of machine learning models; and the fusion module is used for calculating dynamic weight based on a time sliding window according to the time sequence data set after preprocessing, automatic time sequence characteristic engineering processing and data sampling so as to fuse different types of machine learning models.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above automatic time series regression method when executing the program.
A non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described automatic time series regression method.
A computer program product, wherein instructions, when executed by a processor, perform the above automatic time series regression method.
The invention has the beneficial effects that:
according to the invention, the time sequence data set is subjected to preprocessing, automatic time sequence characteristic engineering processing and data sampling, and different types of machine learning models are fused by calculating the dynamic weight based on the time sliding window, so that in the machine learning application related to time sequence data, an application model can be conveniently obtained without depending on experience and knowledge accumulation of a data scientist, and a more accurate output result can be obtained by using the model.
Drawings
FIG. 1 is a flow chart of an automatic time series regression method according to an embodiment of the present invention;
fig. 2 is a block diagram of an automatic time-series regression apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the automatic time series regression method of the embodiment of the present invention includes the following steps:
and S1, acquiring the time sequence data set and preprocessing the time sequence data set.
The time-series data is a data sequence in which the same uniform index is recorded in time series. In one embodiment of the invention, the abnormal points in the time sequence data set can be smoothed, and the influence of the data set on the model precision is reduced. In the time series data task, some abnormal value points usually occur, and the processing of the abnormal value is also more effective, and the processing of the abnormal value points of the time series data has a greater challenge compared with the processing of the abnormal value points of the non-time series data. The time series data generally has strong correlation with time, and the target value may drift to another value range along with the time, so that if the global mean standard deviation is directly adopted for processing, some non-abnormal points may be processed. In view of the problem, the embodiment of the present invention adopts global and local abnormal point smoothing processing.
The method considers the global mean standard deviation, simultaneously considers the mean standard deviation of the current point adjacent time window and the values of the adjacent points, and sets a larger multiple deviating from the global standard deviation to avoid the normal value being processed. It is also important to note here that the processing of the training set and test set is somewhat different because the data of the test set is slowly acquired at a time step, and then the time step following the current point in time is not visible, so the processing of the test data set is based on the data of the adjacent time window preceding the current point. After the abnormal point is detected in a global and local mode, a value in a relatively normal range can be calculated according to the standard deviation of the local mean value of the current point and the left and right adjacent values and is reassigned to serve as a new value of the current point.
And S2, performing automatic time sequence characteristic engineering processing and data sampling on the preprocessed time sequence data set.
In time series related tasks, what happens within a certain time window in the past has a large impact on future outcome predictions. For data of different levels of temporal granularity, the time windows that are typically affected are somewhat different. Therefore, the embodiments of the present invention are based on the time series itself, and mainly do the features regarding the time sliding window. The features obtained by the automatic time sequence feature engineering processing comprise target features based on a time sliding window, target statistical features based on the time sliding window, target trend features based on the time sliding window, important original features based on the time sliding window, statistical features based on the time sliding window and other features.
Wherein, for the target feature based on the time sliding window, the target will not deviate too much from the adjacent time step value in the time sequence data, and they have strong correlation, so the past adjacent target can be taken as the feature first. In addition, the time step interval of the data set is identified, the data set can be judged to be the time step according to hours, minutes, days, months or weeks, and the characteristic window size is determined according to the time interval and model verification search.
For the target statistical characteristic based on the time sliding window, after the target is subjected to the sliding window, the target is subjected to further statistics. There are two statistical methods, the first is to count the last N days, there are some differences according to the difference of time step, and count the last 2, 3, 5, 7 days of the general time interval of day, and at the same time, the memory limit is also considered here. The second is to divide a large time window into N segments and make statistics on each segment separately. The statistical calculations include maximum, minimum, mean, standard deviation, etc.
For the time-based sliding window target trend characteristic, the change rate of the target is calculated, and the change trend can be reflected.
Figure BDA0002638450890000051
Wherein r isiRepresents the rate of change, t, of the target's current timei-1Target, t, representing the last time nodei-2Representing the target of the last time node.
For important original features based on a time sliding window, a model can be trained by using the original features, the importance of the features is obtained, and then the features are ranked according to the importance of the features. Other raw features are less important than the historical target, so a smaller time window than target may be selected, and the number of features used is then determined based on the time window and system-limited resources.
For statistical features based on a time sliding window, the classification features and the numerical features are calculated respectively. For the classification feature, the statistical feature values occur in frequency and ratio within a time window. For the numerical type feature, the calculation mode is the same as that of the statistics based on target, and the maximum value, the minimum value, the mean value and the standard deviation are counted, but the time window is controlled to be smaller.
For other features, in addition to the above features, attempts have been made to use the features of the training set statistics directly as features of the entire data. For example, the global statistical frequency and ratio of the classification features in the training set, the statistical frequency and ratio of the classification features with high importance of two features are combined, the classification features with high importance of one feature and the numerical features with high importance of one feature are combined, and the numerical features are counted based on the classification features. Cross combinations of the historical target and other features may also be considered, for example, a cross manner such as multiplying or dividing the target by other numerical features with higher importance.
In the automatic feature engineering and automatic feature selection stages, it is usually time-consuming and memory-consuming, and in order to speed up the process, data can be sampled. The sampling of the time sequence needs attention to the sampling mode, and if the data are directly sampled randomly, the data of different time stamps of the same ID are lost, so that the data are incomplete, and the effect is poor and the effect of the full amount of data is not comparable. In view of this problem, the embodiment of the present invention randomly samples IDs in time series data sets, and uses different sampling ratios for different sizes of data, the larger the data is, the smaller the sampling ratio is, and when the data size is larger, the data is truncated according to a time step, and the data of the following time step is retained.
And S3, establishing different types of machine learning models.
In an embodiment of the present invention, two linear models and a tree model with large differences can be established. In particular, linear regression and LightGBM models may be established.
And S4, calculating dynamic weight based on the time sliding window according to the time sequence data set after preprocessing, automatic time sequence characteristic engineering processing and data sampling so as to fuse different types of machine learning models.
The linear regression and LightGBM models described above have relatively different effects on different time series datasets, some datasets being close in effect, some linear regression being better in effect, and some LightGBM models being better in effect. The analysis shows that the change rule of the data sets along with the time has great difference, the target of some tasks is continuously increased along with the time, and the data is not suitable for the tree model. In addition, for the same data set, the effect performance difference of different models in different time periods can be greatly changed.
The time series data has a large relation with time, so in order to reduce the influence of the model on the time factor, the model can be fused by calculating the dynamic weight based on the time sliding window.
Specifically, the initial fusion weight w may first be determined by the validation set0Then setting the time window of the test set and using the initial fusion weight w in the first time window0And (6) carrying out testing. And after each time window is finished, obtaining the corresponding optimal fusion weight according to the test result of the time window, updating the optimal fusion weight of the time window according to a set rule, and testing the next time window by using the updated fusion weight. That is, at a first time window with an initial fusion weight w0Testing is carried out, and when the first time window is finished, the optimal fusion weight w of the window can be obtained according to the test result of the first time window1Then update w using the following formula1
w′1=r×w0+(1-r)×w1
Wherein r is a memory factor, i.e. the weight of the last time window to the weight of the current time window in the update process.
W 'is used for the test results of the second window'1As fusion weights, iterative updates are made, and so on. Thus, over time, the effect of the results over time on fusion becomes smaller.
According to the automatic time series regression method, the time series data set is subjected to preprocessing, automatic time series characteristic engineering processing and data sampling, different types of machine learning models are fused through calculating the dynamic weight based on the time sliding window, therefore, in the machine learning application related to the time series data, the application model can be conveniently obtained without depending on experience and knowledge accumulation of a data scientist, and a more accurate output result can be obtained by using the model.
The invention further provides an automatic time series regression device corresponding to the automatic time series regression method of the embodiment.
As shown in fig. 2, the automatic time series regression apparatus according to the embodiment of the present invention includes a preprocessing module 10, a feature engineering and sampling module 20, a model building module 30, and a fusion module 40. The preprocessing module 10 is configured to acquire a time series data set and preprocess the time series data set; the characteristic engineering and sampling module 20 is used for performing automatic time sequence characteristic engineering processing and data sampling on the preprocessed time sequence data set; the model establishing module 30 is used for establishing different types of machine learning models; the fusion module 40 is configured to calculate dynamic weights based on a time sliding window according to the time series data set after preprocessing, automatic time series feature engineering processing, and data sampling, so as to fuse different types of machine learning models.
The time-series data is a data sequence in which the same uniform index is recorded in time series. In an embodiment of the present invention, the preprocessing module 10 may perform smoothing processing on abnormal points in the time series data set, so as to reduce the influence of the data set on the model accuracy. In the time series data task, some abnormal value points usually occur, and the processing of the abnormal value is also more effective, and the processing of the abnormal value points of the time series data has a greater challenge compared with the processing of the abnormal value points of the non-time series data. The time series data generally has strong correlation with time, and the target value may drift to another value range along with the time, so that if the global mean standard deviation is directly adopted for processing, some non-abnormal points may be processed. In view of the problem, the embodiment of the present invention adopts global and local abnormal point smoothing processing.
The method considers the global mean standard deviation, simultaneously considers the mean standard deviation of the current point adjacent time window and the values of the adjacent points, and sets a larger multiple deviating from the global standard deviation to avoid the normal value being processed. It is also important to note here that the processing of the training set and test set is somewhat different because the data of the test set is slowly acquired at a time step, and then the time step following the current point in time is not visible, so the processing of the test data set is based on the data of the adjacent time window preceding the current point. After the abnormal point is detected in a global and local mode, a value in a relatively normal range can be calculated according to the standard deviation of the local mean value of the current point and the left and right adjacent values and is reassigned to serve as a new value of the current point.
In time series related tasks, what happens within a certain time window in the past has a large impact on future outcome predictions. For data of different levels of temporal granularity, the time windows that are typically affected are somewhat different. Thus, the feature engineering and sampling module 20 of embodiments of the present invention primarily characterizes the time sliding window based on the time series itself. The features obtained by the automatic time sequence feature engineering processing comprise target features based on a time sliding window, target statistical features based on the time sliding window, target trend features based on the time sliding window, important original features based on the time sliding window, statistical features based on the time sliding window and other features.
Wherein, for the target feature based on the time sliding window, the target will not deviate too much from the adjacent time step value in the time sequence data, and they have strong correlation, so the past adjacent target can be taken as the feature first. In addition, the time step interval of the data set is identified, the data set can be judged to be the time step according to hours, minutes, days, months or weeks, and the characteristic window size is determined according to the time interval and model verification search.
For the target statistical characteristic based on the time sliding window, after the target is subjected to the sliding window, the target is subjected to further statistics. There are two statistical methods, the first is to count the last N days, there are some differences according to the difference of time step, and count the last 2, 3, 5, 7 days of the general time interval of day, and at the same time, the memory limit is also considered here. The second is to divide a large time window into N segments and make statistics on each segment separately. The statistical calculations include maximum, minimum, mean, standard deviation, etc.
For the time-based sliding window target trend characteristic, the change rate of the target is calculated, and the change trend can be reflected.
Figure BDA0002638450890000101
Wherein r isiRepresents the rate of change, t, of the target's current timei-1Target, t, representing the last time nodei-2Representing the target of the last time node.
For important original features based on a time sliding window, a model can be trained by using the original features, the importance of the features is obtained, and then the features are ranked according to the importance of the features. Other raw features are less important than the historical target, so a smaller time window than target may be selected, and the number of features used is then determined based on the time window and system-limited resources.
For statistical features based on a time sliding window, the classification features and the numerical features are calculated respectively. For the classification feature, the statistical feature values occur in frequency and ratio within a time window. For the numerical type feature, the calculation mode is the same as that of the statistics based on target, and the maximum value, the minimum value, the mean value and the standard deviation are counted, but the time window is controlled to be smaller.
For other features, in addition to the above features, attempts have been made to use the features of the training set statistics directly as features of the entire data. For example, the global statistical frequency and ratio of the classification features in the training set, the statistical frequency and ratio of the classification features with high importance of two features are combined, the classification features with high importance of one feature and the numerical features with high importance of one feature are combined, and the numerical features are counted based on the classification features. Cross combinations of the historical target and other features may also be considered, for example, a cross manner such as multiplying or dividing the target by other numerical features with higher importance.
In the automatic feature engineering and automatic feature selection stages, which are usually time consuming and memory consuming, data may be sampled by the feature engineering and sampling module 20 in order to speed up the process. The sampling of the time sequence needs attention to the sampling mode, and if the data are directly sampled randomly, the data of different time stamps of the same ID are lost, so that the data are incomplete, and the effect is poor and the effect of the full amount of data is not comparable. In view of this problem, the feature engineering and sampling module 20 according to the embodiment of the present invention randomly samples IDs in time series data sets, and uses different sampling ratios for data volumes of different sizes, where the larger the data is, the smaller the sampling ratio is, and when the data volume is larger, the data is truncated according to a time step, and the data of the following time step is retained, such a sampling manner is consistent with a basic effect of sampling with a full volume, and a final feature selection effect is relatively stable.
In an embodiment of the present invention, the model building module 30 may build two linear models and a tree model with a large difference. In particular, linear regression and LightGBM models may be established.
The linear regression and LightGBM models described above have relatively different effects on different time series datasets, some datasets being close in effect, some linear regression being better in effect, and some LightGBM models being better in effect. The analysis shows that the change rule of the data sets along with the time has great difference, the target of some tasks is continuously increased along with the time, and the data is not suitable for the tree model. In addition, for the same data set, the effect performance difference of different models in different time periods can be greatly changed.
The time series data is relatively time dependent, so to reduce the influence of the model on the time factor, the fusion module 40 can fuse the model by calculating the dynamic weight based on the time sliding window.
Specifically, the fusion module 40 may first determine an initial fusion weight w from the validation set0Then setting the time window of the test set and using the initial fusion weight w in the first time window0And (6) carrying out testing. After each time window is finished, the fusion module 40 obtains the corresponding optimal fusion weight according to the test result of the time window, and then updates the optimal fusion weight according to the set ruleAnd testing the optimal fusion weight of the time window by using the updated fusion weight in the next time window. That is, the fusion module 40 performs the fusion with the initial fusion weight w in the first time window0Testing is carried out, and when the first time window is finished, the optimal fusion weight w of the window can be obtained according to the test result of the first time window1Then update w using the following formula1
w′1=r×w0+(1-r)×w1
Wherein r is a memory factor, i.e. the weight of the last time window to the weight of the current time window in the update process.
W 'is used for the test results of the second window'1As fusion weights, iterative updates are made, and so on. Thus, over time, the effect of the results over time on fusion becomes smaller.
According to the automatic time series regression device disclosed by the embodiment of the invention, the time series data set is subjected to preprocessing, automatic time series characteristic engineering processing and data sampling, and different types of machine learning models are fused by calculating the dynamic weight based on the time sliding window, so that in the machine learning application related to the time series data, an application model can be conveniently obtained without depending on experience and knowledge accumulation of a data scientist, and a more accurate output result can be obtained by using the model.
The invention further provides a computer device corresponding to the embodiment.
The computer device according to the embodiment of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the automatic time-series regression method according to the above embodiment of the present invention may be implemented.
According to the computer device of the embodiment of the invention, when the processor executes the computer program stored on the memory, the time sequence data set is preprocessed, the time sequence characteristic engineering processing and the data sampling are carried out, and the dynamic weight based on the time sliding window is calculated to fuse different types of machine learning models, so that in the machine learning application related to the time sequence data, the application model can be conveniently obtained without depending on experience and knowledge accumulation of a data scientist, and more accurate output results can be obtained by using the model.
The invention also provides a non-transitory computer readable storage medium corresponding to the above embodiment.
A non-transitory computer-readable storage medium of an embodiment of the present invention has stored thereon a computer program that, when executed by a processor, can implement the automatic time series regression method according to the above-described embodiment of the present invention.
According to the non-transitory computer-readable storage medium of the embodiment of the invention, when the processor executes the computer program stored on the processor, the time series data set is subjected to preprocessing, automatic time series characteristic engineering processing and data sampling, and different types of machine learning models are fused by calculating dynamic weights based on a time sliding window, so that in machine learning application related to time series data, an application model can be conveniently obtained without depending on experience and knowledge accumulation of a data scientist, and a more accurate output result can be obtained by using the model.
The present invention also provides a computer program product corresponding to the above embodiments.
When the instructions in the computer program product of the embodiment of the present invention are executed by the processor, the automatic time series regression method according to the above-mentioned embodiment of the present invention can be executed.
According to the computer program product of the embodiment of the invention, when the processor executes the instructions, the time sequence data set is preprocessed, the time sequence characteristic engineering processing and the data sampling are carried out firstly, and the dynamic weight based on the time sliding window is calculated to fuse different types of machine learning models, so that in the machine learning application related to the time sequence data, the application model can be conveniently obtained without depending on experience and knowledge accumulation of a data scientist, and more accurate output results can be obtained by using the model.
In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The meaning of "plurality" is two or more unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. An automatic time series regression method, characterized by comprising the steps of:
acquiring a time sequence data set, and preprocessing the time sequence data set;
carrying out automatic time sequence characteristic engineering processing and data sampling on the preprocessed time sequence data set;
establishing different types of machine learning models;
and calculating dynamic weight based on a time sliding window according to the time sequence data set after preprocessing, automatic time sequence characteristic engineering processing and data sampling so as to fuse different types of machine learning models.
2. The automated time series regression method of claim 1, wherein preprocessing the time series data set comprises:
and smoothing abnormal points in the time sequence data set.
3. The automatic time series regression method of claim 2, wherein the features obtained by performing automatic time series feature engineering on the preprocessed time series data set comprise target features based on a time sliding window, target statistical features based on the time sliding window, target trend features based on the time sliding window, important original features based on the time sliding window, and statistical features based on the time sliding window.
4. The automated time series regression method of claim 3, wherein data sampling the pre-processed time series data set comprises:
randomly sampling IDs in the time-series data set, wherein different sampling ratios are used for different sizes of data volume.
5. The automated time series regression method of claim 4, wherein the different types of machine learning models comprise linear regression and LightGBM models.
6. The automated time series regression method of claim 5, wherein computing dynamic weights based on a time sliding window from the pre-processed, automated time series feature engineering processed and data sampled time series data sets to fuse different types of machine learning models comprises:
determining an initial fusion weight through the verification set;
setting a time window of a test set, and testing with the initial fusion weight in a first time window;
after each time window is finished, obtaining corresponding optimal fusion weight according to the test result of the time window;
and updating the optimal fusion weight of the time window according to a set rule, and testing the optimal fusion weight in the next time window by using the updated fusion weight.
7. An automatic time series regression device, comprising:
the preprocessing module is used for acquiring a time sequence data set and preprocessing the time sequence data set;
the characteristic engineering and sampling module is used for carrying out automatic time sequence characteristic engineering processing and data sampling on the preprocessed time sequence data set;
the model establishing module is used for establishing different types of machine learning models;
and the fusion module is used for calculating dynamic weight based on a time sliding window according to the time sequence data set after preprocessing, automatic time sequence characteristic engineering processing and data sampling so as to fuse different types of machine learning models.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements the automated time series regression method according to any one of claims 1-6.
9. A non-transitory computer-readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the automated time series regression method according to any one of claims 1-6.
10. A computer program product, characterized in that instructions in the computer program product, when executed by a processor, perform the automated time series regression method according to any one of claims 1-6.
CN202010832356.7A 2020-08-18 Automatic time series regression method and device Active CN111967616B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010832356.7A CN111967616B (en) 2020-08-18 Automatic time series regression method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010832356.7A CN111967616B (en) 2020-08-18 Automatic time series regression method and device

Publications (2)

Publication Number Publication Date
CN111967616A true CN111967616A (en) 2020-11-20
CN111967616B CN111967616B (en) 2024-04-23

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297481A (en) * 2021-05-08 2021-08-24 武汉卓尔数字传媒科技有限公司 Information pushing method, information pushing device and server based on streaming data processing

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6115406A (en) * 1999-09-10 2000-09-05 Interdigital Technology Corporation Transmission using an antenna array in a CDMA communication system
US20070183629A1 (en) * 2006-02-09 2007-08-09 Porikli Fatih M Method for tracking objects in videos using covariance matrices
CN108302329A (en) * 2018-01-25 2018-07-20 福建双环能源科技股份有限公司 A kind of dew point data exception detection method
CN108777873A (en) * 2018-06-04 2018-11-09 江南大学 The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend
CN109255506A (en) * 2018-11-22 2019-01-22 重庆邮电大学 A kind of internet finance user's overdue loan prediction technique based on big data
CN109299185A (en) * 2018-10-18 2019-02-01 上海船舶工艺研究所(中国船舶工业集团公司第十研究所) A kind of convolutional neural networks for timing flow data extract the analysis method of feature
CN110348622A (en) * 2019-07-02 2019-10-18 创新奇智(成都)科技有限公司 A kind of Time Series Forecasting Methods based on machine learning, system and electronic equipment
CN110443373A (en) * 2019-07-12 2019-11-12 清华大学 Linear model stablizes learning method and device
CN110705692A (en) * 2019-09-25 2020-01-17 中南大学 Method for predicting product quality of industrial nonlinear dynamic process by long-short term memory network based on space and time attention

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6115406A (en) * 1999-09-10 2000-09-05 Interdigital Technology Corporation Transmission using an antenna array in a CDMA communication system
US20070183629A1 (en) * 2006-02-09 2007-08-09 Porikli Fatih M Method for tracking objects in videos using covariance matrices
CN108302329A (en) * 2018-01-25 2018-07-20 福建双环能源科技股份有限公司 A kind of dew point data exception detection method
CN108777873A (en) * 2018-06-04 2018-11-09 江南大学 The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend
CN109299185A (en) * 2018-10-18 2019-02-01 上海船舶工艺研究所(中国船舶工业集团公司第十研究所) A kind of convolutional neural networks for timing flow data extract the analysis method of feature
CN109255506A (en) * 2018-11-22 2019-01-22 重庆邮电大学 A kind of internet finance user's overdue loan prediction technique based on big data
CN110348622A (en) * 2019-07-02 2019-10-18 创新奇智(成都)科技有限公司 A kind of Time Series Forecasting Methods based on machine learning, system and electronic equipment
CN110443373A (en) * 2019-07-12 2019-11-12 清华大学 Linear model stablizes learning method and device
CN110705692A (en) * 2019-09-25 2020-01-17 中南大学 Method for predicting product quality of industrial nonlinear dynamic process by long-short term memory network based on space and time attention

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JAE YOUNG CHOI等: "Combining LSTM Network Ensemble via Adaptive Weighting for Improved Time Series Forecasting", 《MATHEMATICAL PROBLEMS IN ENGINEERING》, pages 1 - 8 *
ZHONG-MIN WANG等: "An Isolation-Based Distributed Outlier Detection Framework Using Nearest Neighbor Ensembles for Wireless Sensor Networks", 《IEEE ACCESS》, pages 96319 - 96333 *
卢山: "基于非线性动力学的金融时间序列预测技术研究", 《中国博士学位论文全文数据库 (经济与管理科学辑)》, no. 04, pages 160 - 3 *
王景文: "基于深度表征的视觉理解关键技术研究", 《中国博士学位论文全文数据库 (信息科技辑)》, no. 12, pages 138 - 114 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297481A (en) * 2021-05-08 2021-08-24 武汉卓尔数字传媒科技有限公司 Information pushing method, information pushing device and server based on streaming data processing

Similar Documents

Publication Publication Date Title
McFee et al. Analyzing Song Structure with Spectral Clustering.
CN108960269B (en) Feature acquisition method and device for data set and computing equipment
US20080071764A1 (en) Method and an apparatus to perform feature similarity mapping
CN106056136A (en) Data clustering method for rapidly determining clustering center
JPWO2019146189A1 (en) Neural network rank optimizer and optimization method
CN113642938A (en) Intelligent production management method and system
CN110634060A (en) User credit risk assessment method, system, device and storage medium
CN110633859A (en) Hydrological sequence prediction method for two-stage decomposition integration
CN110674940B (en) Multi-index anomaly detection method based on neural network
CN112203324B (en) MR positioning method and device based on position fingerprint database
CN110110447B (en) Method for predicting thickness of strip steel of mixed frog leaping feedback extreme learning machine
CN112445690B (en) Information acquisition method and device and electronic equipment
CN111967616A (en) Automatic time series regression method and device
CN111967616B (en) Automatic time series regression method and device
CN106874286B (en) Method and device for screening user characteristics
CN116977091A (en) Method and device for determining individual investment portfolio, electronic equipment and readable storage medium
CN116468102A (en) Pruning method and device for cutter image classification model and computer equipment
CN112148942A (en) Business index data classification method and device based on data clustering
CN107203916B (en) User credit model establishing method and device
CN114418097A (en) Neural network quantization processing method and device, electronic equipment and storage medium
CN114169758A (en) Air quality data determination method and device, readable storage medium and electronic equipment
CN112700275A (en) Product production method and platform based on big data
CN113448876A (en) Service testing method, device, computer equipment and storage medium
CN113296947A (en) Resource demand prediction method based on improved XGboost model
CN111177465A (en) Method and device for determining category

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant