WO2019214143A1 - Server, financial time sequence data processing method and storage medium - Google Patents

Server, financial time sequence data processing method and storage medium Download PDF

Info

Publication number
WO2019214143A1
WO2019214143A1 PCT/CN2018/107678 CN2018107678W WO2019214143A1 WO 2019214143 A1 WO2019214143 A1 WO 2019214143A1 CN 2018107678 W CN2018107678 W CN 2018107678W WO 2019214143 A1 WO2019214143 A1 WO 2019214143A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
bits
missing value
intercepted
time series
Prior art date
Application number
PCT/CN2018/107678
Other languages
French (fr)
Chinese (zh)
Inventor
李正洋
李海疆
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to JP2019556878A priority Critical patent/JP6812573B2/en
Publication of WO2019214143A1 publication Critical patent/WO2019214143A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a server, a method for processing financial time series data, and a storage medium.
  • Financial time series data has statistical characteristics of time series and has many categories.
  • financial time series data of price includes: opening price, closing price, highest price, lowest price, and volume data of stocks, futures, foreign exchange, etc.
  • the financial time series data of derivative indicators include: China Bond Debt Yield to Yield - China Bond Corporate Bond Yield to Maturity, Risk Premium, Dividend Rate, CR Index, Ratio of Large and Small Handicap, RSRS Indicator, and CSI 300 Premium Rate , Shanghai and Shenzhen 300 initiative to buy the amount.
  • the financial time series data is missing due to various reasons, such as: 1.
  • the stock suspension of the listed company leads to the loss of information such as the opening price, closing price, highest price, lowest price, and trading volume of the day;
  • the platform cannot obtain the corresponding financial time series data; 3.
  • the financial time series data obtained on the public platform has significant deviation from the actual value, and so on.
  • Traditional missing value processing methods include manual filling, special value filling, mean filling, near filling, cluster filling, and so on.
  • the traditional simple processing method obtains missing values that are inaccurate, and cannot simulate the distribution of real financial time series data to the greatest extent, which is easy to cause information loss and affect subsequent Research on financial time series data.
  • the purpose of the present application is to provide a server, a method for processing financial time series data, and a storage medium, which are intended to predict accurate and objective missing values.
  • the present application provides a server including a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing system being The processor implements the following steps when executed:
  • the predetermined cyclic neural network model is respectively trained by using the sample data corresponding to each predetermined time step, and the model corresponding to each predetermined time step after the training is obtained as a prediction model;
  • the data to be input is input to each prediction model, and the predicted values output by the respective prediction models are obtained, and the average value of each predicted value is obtained as the filling value of the missing value.
  • the present application further provides a method for processing financial time series data, and the method for processing the financial time series data includes:
  • S4 Input the data to be input into each prediction model, obtain a predicted value output by each prediction model, and obtain an average value of each predicted value as a filling value of the missing value.
  • the application further provides a computer readable storage medium having a processing system stored thereon, the processing system being implemented by a processor to implement the steps:
  • the predetermined cyclic neural network model is respectively trained by using the sample data corresponding to each predetermined time step, and the model corresponding to each predetermined time step after the training is obtained as a prediction model;
  • the data to be input is input to each prediction model, and the predicted values output by the respective prediction models are obtained, and the average value of each predicted value is obtained as the filling value of the missing value.
  • the present application utilizes a cyclic neural network model to process and predict missing values in financial time series data, and can capture dependencies before and after financial time series data.
  • the padding value of missing values is given by the average of multiple models. More objective and accurate, it can restore the overall distribution of real financial time series data to the greatest extent.
  • FIG. 1 is a schematic diagram of a hardware architecture of an embodiment of a server according to the present application.
  • FIG. 2 is a schematic structural view of an LSTM model
  • FIG. 3 is a schematic structural view of the modified LSTM model shown in FIG. 2;
  • FIG. 4 is a schematic flowchart diagram of an embodiment of a method for processing financial time series data according to the present application.
  • the server 1 is a schematic diagram of a hardware architecture of an embodiment of a server according to the present application.
  • the server 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance.
  • the server 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing.
  • a super virtual computer consisting of a group of loosely coupled computers.
  • the server 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13 communicably connected to each other through a system bus, and the memory 11 stores a processing system operable on the processor 12. It is pointed out that Figure 1 only shows the server 1 with the components 11-13, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
  • the memory 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the server 1;
  • the readable storage medium can be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM).
  • a non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like.
  • the readable storage medium may be an internal storage unit of the server 1, such as a hard disk of the server 1; in other embodiments, the non-volatile storage medium may also be an external storage device of the server 1, For example, a plug-in hard disk provided on the server 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, and the like.
  • the readable storage medium of the memory 11 is generally used to store an operating system installed on the server 1 and various types of application software, such as program code for storing the processing system in an embodiment of the present application. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control the overall operation of the server 1, such as performing control and processing related to data interaction or communication with the other devices.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as a running processing system.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices.
  • the network interface 13 is mainly used to connect the server 1 with one or more terminal devices 2, and establish a data transmission channel and a communication connection between the server 1 and one or more terminal devices 2.
  • the processing system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application;
  • the at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
  • the predetermined time step includes 6 time units, 11 time units and 16 time units, and the time unit refers to a granularity unit of financial time series data, for example, financial time series data with a day-to-day granularity, and the time unit is days. High-frequency financial time series data in minutes, whose time unit is minutes, and so on.
  • the number of bits of the corresponding window data is 6 bits, and the number of bits of sample data obtained by sampling is 6 bits; for a sliding window of 11 time units, the number of bits of the corresponding window data is 11 Bit, the sampled sample has a bit number of 6 bits.
  • the sampled sample data is (x1, x3, x5, x7, x9, x11), ie, the first, third, fifth, and seventh in the sample window data.
  • the corresponding window data has a bit number of 16 bits
  • the sampled data has a bit number of 6 bits
  • the sampled sample data is (x1) , x4, x7, x10, x13, x16), that is, the data of the first, fourth, seventh, tenth, thirteenth, and sixteenth bits in the sampling window data.
  • the purpose of setting the sliding window with different predetermined time steps is to expand the long-distance and relationship of the captured information without changing the length of the sample data.
  • the financial time series data without missing values is sampled to obtain sample data, and the sample data is used to train the model to obtain a model with higher accuracy.
  • the predetermined cyclic neural network model is respectively trained by using the sample data corresponding to each predetermined time step, and the model corresponding to each predetermined time step after the training is obtained as a prediction model;
  • the predetermined cyclic neural network model is a hybrid model of two or more cyclic neural networks, preferably a Long Short-Term Memory (LSTM) and a gated loop unit model (Gated Recurrent).
  • LSTM Long Short-Term Memory
  • GRU gated loop unit model
  • a mixed model composed of Unit, GRU), LSTM model and GRU model can be used to capture the dependencies before and after the time series.
  • the step includes: dividing sample data corresponding to each predetermined time step into a training set of a first ratio and a test set of a second ratio, using a training set corresponding to each predetermined time step Performing training on a predetermined cyclic neural network model, wherein the sum of the first ratio and the second ratio is less than or equal to 1; and extracting a predetermined number of sample data as a verification set in each training set corresponding to the predetermined time step, using the The verification set tests the parameters of the cyclic neural network model in the training.
  • the test error is greater than or equal to a predetermined error threshold
  • the training is terminated to obtain the trained cyclic neural network model; the test set is used to train the cyclic neural network model.
  • the accuracy rate is tested; if the accuracy rate is greater than or equal to a predetermined accuracy threshold, the trained cyclic neural network model is used as a prediction model; if the accuracy is less than a predetermined accuracy threshold, the cyclic neural network model is modified. Implicit layer structure and re-training to get a pre-accuracy rate greater than or equal to the predetermined accuracy threshold Test model.
  • the sample data corresponding to each predetermined time step can be regarded as independent and identically distributed, random random sampling is adopted for the training set and the test set, and the proportion of the training set is 70%, and the proportion of the test set is 30%, for example, the training set includes 70,000 sample data, and the test set includes 30,000 sample data.
  • the training is performed by means of cross-validation, that is, the sample data in the training set is divided into 10 parts, 9 pieces are taken for training each time, and 1 sample data is taken as a verification set to use the verification set pair in training.
  • the parameters of the cyclic neural network model were tested. Training is performed on the training set, and the test result is obtained on the verification set. If the number of training increases, if the test error is found on the verification set, that is, the test error is greater than or equal to a predetermined error threshold, the training is stopped to obtain training.
  • the post-recurrent neural network model is used as a model for the test set described below to effectively avoid over-fitting of the model.
  • the training set is used to train the LSTM model, and the LSTM model structure may adopt a Bi-directional LSTM structure, and the sample data of the training set includes (X1, X2, X3, X4, X5, X6), as shown in FIG. 2, X1, X2, X3, X4, X5) are input layers, A is an implicit layer, and St is an output.
  • the hidden layer A is the memory unit of the LSTM model, which is the parameter of the model, and is calculated according to the input of the current input layer and the output of the hidden layer of the previous step.
  • the output St is compared with the X6 in the sample data to test, and the test results indicate the ability of the model to characterize the distribution of financial time series data. If the accuracy of the LSTM model is greater than or equal to a predetermined accuracy threshold (eg, 0.9), the LSTM model meets the requirements, and the trained LSTM model is used as a prediction model; if the accuracy of the LSTM model is less than a predetermined accuracy threshold, the LSTM model If the requirements are not met, the hidden layer structure of the LSTM model is modified. As shown in FIG. 3, in this embodiment, the hidden layer corresponding to the input sample data is modified from a single hidden layer to a double hidden layer. The structure is stacked and retrained to obtain a prediction model with an accuracy rate greater than or equal to a predetermined accuracy threshold.
  • a predetermined accuracy threshold e.g, 0.9
  • the structure of the GRU model is similar to that of the LSTM model, except that the structure of the hidden layer is more complex than the LSTM model.
  • the GRU model is trained by using the same training set as above.
  • the process of training the GRU model is basically consistent with the training of the LSTM model, and extracting part of the sample data as a verification set in the training set can effectively avoid over-fitting of the model.
  • the trained GRU model is tested by using the test set, so that the accuracy of the GRU model is greater than or equal to a predetermined accuracy threshold. If the accuracy of the GRU model is less than the accuracy threshold, then the structure of the GRU model is modified. The modification is similar to the LSTM model.
  • a hybrid model composed of LSTM model + GRU model corresponding to each predetermined time step is obtained as a prediction model.
  • the location of the missing value is first located. Since the financial time series data is a time series sequence, the position of the missing value can be located by the time point where the missing value is located; and then the number of bits of each missing value is determined, for example, 1 bit. Or 2 digits, etc.
  • the number of bits of the financial time series data of the input model is determined according to the number of bits of the missing value to be predicted, and several bits of data in front of the missing value are intercepted as the data to be input.
  • the number of bits of the missing value is generally 1 or 2 bits
  • the data to be input is preferably 5 bits, 6 bits or 7 bits, and less than 5 bits and more than 7 bits are usually difficult to achieve better results because less than 5
  • Table 1 the correspondence between the number of bits of the missing value and the number of bits of the data to be input is:
  • Number of missing values The number of bits of data to be entered 1 5 1 6 2 6 1 7 2 7
  • Table 1 if the number of bits of the missing value is 1 bit, it is determined that the number of bits of the intercepted data is 5, 6, or 7 bits, and the 5, 6, or 7 financial positions in front of the position of the missing value are intercepted.
  • Time series data with the intercepted data as the data to be input; if the number of bits of the missing value is 2 bits, it is determined that the number of bits of the intercepted data is 6 or 7 bits, and 6 bits or 7 in front of the position of the missing value are intercepted.
  • Bit financial time series data with the intercepted data as the data to be input.
  • the data to be input is input to each prediction model, and the predicted values output by the respective prediction models are obtained, and the average value of each predicted value is obtained as the filling value of the missing value.
  • the data to be input is respectively input into a prediction model of a mixed model composed of each GRU model and an LSTM model, that is, respectively input to a hybrid model corresponding to 6 time units, a hybrid model corresponding to 11 time units, and 16
  • the 2-bit is also the average of the predicted values of the corresponding positions of the calculated output.
  • the padding value V of the missing value can capture the dependencies before and after the financial time series data, and is given by the average of the three mixed models, which is more objective and accurate.
  • the present application sets a sliding window with different time steps to intercept data for financial time series data without missing values, and then samples the intercepted data to obtain sample data corresponding to different time steps, respectively.
  • the data partition training set and the test set train a predetermined cyclic neural network model to obtain prediction models corresponding to different time steps; for financial time series data with missing values, locate the position of the missing value and determine the number of missing values, according to the missing.
  • the position of the value and the number of digits of the missing value are taken in the financial time series data in front of the position of the missing value, and the data is input into each prediction model, and the predicted value output by each prediction model is obtained, and the average value of each predicted value is used as the missing value.
  • the filling value of the value uses the cyclic neural network model to process and predict the missing values in the financial time series data, and can capture the dependency relationship before and after the financial time series data, and the filling value of the missing value is given by the average value of various models, and Objective and accurate, it can restore the overall distribution of real financial time series data to the greatest extent.
  • FIG. 4 is a schematic flowchart diagram of an embodiment of a method for processing financial time series data according to the present application.
  • the method for processing the financial time series data includes the following steps:
  • Step S1 setting a sliding window with different predetermined time steps, using the set sliding window to slide on the financial time series data without missing values to obtain multiple window data, and sampling each window data to obtain corresponding predetermined time steps Sample data;
  • the predetermined time step includes 6 time units, 11 time units and 16 time units, and the time unit refers to a granularity unit of financial time series data, for example, financial time series data with a day-to-day granularity, and the time unit is days. High-frequency financial time series data in minutes, whose time unit is minutes, and so on.
  • the number of bits of the corresponding window data is 6 bits, and the number of bits of sample data obtained by sampling is 6 bits; for a sliding window of 11 time units, the number of bits of the corresponding window data is 11 Bit, the sampled sample has a bit number of 6 bits.
  • the sampled sample data is (x1, x3, x5, x7, x9, x11), ie, the first, third, fifth, and seventh in the sample window data.
  • the corresponding window data has a bit number of 16 bits
  • the sampled data has a bit number of 6 bits
  • the sampled sample data is (x1) , x4, x7, x10, x13, x16), that is, the data of the first, fourth, seventh, tenth, thirteenth, and sixteenth bits in the sampling window data.
  • the purpose of setting the sliding window with different predetermined time steps is to expand the long-distance and relationship of the captured information without changing the length of the sample data.
  • the financial time series data without missing values is sampled to obtain sample data, and the sample data is used to train the model to obtain a model with higher accuracy.
  • Step S2 training the predetermined cyclic neural network model by using sample data corresponding to each predetermined time step, and obtaining a model corresponding to each predetermined time step after the training as a prediction model;
  • the predetermined cyclic neural network model is a hybrid model of two or more cyclic neural networks, preferably a Long Short-Term Memory (LSTM) and a gated loop unit model (Gated Recurrent).
  • LSTM Long Short-Term Memory
  • GRU gated loop unit model
  • the mixed model composed of Unit, GRU), LSTM model and GRU model can be used to capture the dependencies before and after the time series.
  • the step includes: dividing sample data corresponding to each predetermined time step into a training set of a first ratio and a test set of a second ratio, using a training set corresponding to each predetermined time step Performing training on a predetermined cyclic neural network model, wherein the sum of the first ratio and the second ratio is less than or equal to 1; and extracting a predetermined number of sample data as a verification set in each training set corresponding to the predetermined time step, using the The verification set tests the parameters of the cyclic neural network model in the training.
  • the test error is greater than or equal to a predetermined error threshold
  • the training is terminated to obtain the trained cyclic neural network model; the test set is used to train the cyclic neural network model.
  • the accuracy rate is tested; if the accuracy rate is greater than or equal to a predetermined accuracy threshold, the trained cyclic neural network model is used as a prediction model; if the accuracy is less than a predetermined accuracy threshold, the cyclic neural network model is modified. Implicit layer structure and retraining to get an accuracy rate greater than or equal to the predetermined accuracy threshold Measurement model.
  • the sample data corresponding to each predetermined time step can be regarded as independent and identically distributed, random random sampling is adopted for the training set and the test set, and the proportion of the training set is 70%, and the proportion of the test set is 30%, for example, the training set includes 70,000 sample data, and the test set includes 30,000 sample data.
  • the training is performed by means of cross-validation, that is, the sample data in the training set is divided into 10 parts, 9 pieces are taken for training each time, and 1 sample data is taken as a verification set to use the verification set pair in training.
  • the parameters of the cyclic neural network model were tested. Training is performed on the training set, and the test result is obtained on the verification set. If the number of training increases, if the test error is found on the verification set, that is, the test error is greater than or equal to a predetermined error threshold, the training is stopped to obtain training.
  • the post-recurrent neural network model is used as a model for the test set described below to effectively avoid over-fitting of the model.
  • the training set is used to train the LSTM model, and the LSTM model structure may adopt a Bi-directional LSTM structure, and the sample data of the training set includes (X1, X2, X3, X4, X5, X6), as shown in FIG. 2, X1, X2, X3, X4, X5) are input layers, A is an implicit layer, and St is an output.
  • the hidden layer A is the memory unit of the LSTM model, which is the parameter of the model, and is calculated according to the input of the current input layer and the output of the hidden layer of the previous step.
  • the output St is compared with the X6 in the sample data to test, and the test results indicate the ability of the model to characterize the distribution of financial time series data. If the accuracy of the LSTM model is greater than or equal to a predetermined accuracy threshold (eg, 0.9), the LSTM model meets the requirements, and the trained LSTM model is used as a prediction model; if the accuracy of the LSTM model is less than a predetermined accuracy threshold, the LSTM model If the requirements are not met, the hidden layer structure of the LSTM model is modified. As shown in FIG. 3, in this embodiment, the hidden layer corresponding to the input sample data is modified from a single hidden layer to a double hidden layer. The structure is stacked and retrained to obtain a prediction model with an accuracy rate greater than or equal to a predetermined accuracy threshold.
  • a predetermined accuracy threshold e.g, 0.9
  • the structure of the GRU model is similar to that of the LSTM model, except that the structure of the hidden layer is more complex than the LSTM model.
  • the GRU model is trained by using the same training set as above.
  • the process of training the GRU model is basically consistent with the training of the LSTM model, and extracting part of the sample data as a verification set in the training set can effectively avoid over-fitting of the model.
  • the trained GRU model is tested by using the test set, so that the accuracy of the GRU model is greater than or equal to a predetermined accuracy threshold. If the accuracy of the GRU model is less than the accuracy threshold, then the structure of the GRU model is modified. The modification is similar to the LSTM model.
  • a hybrid model composed of LSTM model + GRU model corresponding to each predetermined time step is obtained as a prediction model.
  • Step S3 Acquire financial time series data with missing values, obtain the position of the missing value in the financial time series data, and the number of bits of the missing value, and intercept the position of the missing value according to the position of the missing value and the number of missing values.
  • Financial time series data with the intercepted data as the data to be input;
  • the location of the missing value is first located. Since the financial time series data is a time series sequence, the position of the missing value can be located by the time point where the missing value is located; and then the number of bits of each missing value is determined, for example, 1 bit. Or 2 digits, etc.
  • the number of bits of the financial time series data of the input model is determined according to the number of bits of the missing value to be predicted, and several bits of data in front of the missing value are intercepted as the data to be input.
  • the number of bits of the missing value is generally 1 or 2 bits
  • the data to be input is preferably 5 bits, 6 bits or 7 bits, and less than 5 bits and more than 7 bits are usually difficult to achieve better results because less than 5
  • it is as shown in Table 1 above.
  • Table 1 if the number of bits of the missing value is 1 bit, it is determined that the number of bits of the intercepted data is 5, 6, or 7 bits, and the 5, 6, or 7 financial positions in front of the position of the missing value are intercepted.
  • Time series data with the intercepted data as the data to be input; if the number of bits of the missing value is 2 bits, it is determined that the number of bits of the intercepted data is 6 or 7 bits, and 6 bits or 7 in front of the position of the missing value are intercepted.
  • Bit financial time series data with the intercepted data as the data to be input.
  • step S4 the data to be input is input to each prediction model, and the predicted value outputted by each prediction model is obtained, and the average value of each predicted value is obtained as the filling value of the missing value.
  • the data to be input is respectively input into a prediction model of a mixed model composed of each GRU model and an LSTM model, that is, respectively input to a hybrid model corresponding to 6 time units, a hybrid model corresponding to 11 time units, and 16
  • the 2-bit is also the average of the predicted values of the corresponding positions of the calculated output.
  • the padding value V of the missing value can capture the dependencies before and after the financial time series data, and is given by the average of the three mixed models, which is more objective and accurate.
  • the present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being executed by a processor to implement the steps of the processing method of the financial time series data described above.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Abstract

The present application relates to a server, a financial time sequence data processing method and a storage medium, the method comprising: configuring slide windows having different predetermined time steps, using the slide windows to slide on financial time sequence data which does not contain a missing value so as to obtain a plurality of window data, and sampling each window data so as to obtain sample data; using each sample data to respectively train a pre-determined cyclic neural network model, thus obtaining each trained model to serve as a prediction model; obtaining financial time sequence data comprising the missing value, obtaining the position and digits of the missing value in the financial time sequence data, intercepting financial time sequence data in front of the position of the missing value according to the position and digits of the missing value, and using the intercepted data as data to be inputted; and inputting the data to be inputted into each prediction model, and obtaining an average value of prediction values outputted by each prediction model to serve as a fill-in value for the missing value. The present application may predict accurate and objective missing values.

Description

服务器、金融时序数据的处理方法及存储介质Server, financial time series data processing method and storage medium
优先权申明Priority claim
本申请基于巴黎公约申明享有2018年05月10日递交的申请号为CN2018104414146、名称为“服务器、金融时序数据的处理方法及存储介质”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。The present application is based on the priority of the Chinese Patent Application entitled "Processing Method of Server, Financial Time Series Data and Storage Medium", which is filed on May 10, 2018, with the application number of CN2018104414146, the entire contents of which are The manner of reference is incorporated in the present application.
技术领域Technical field
本申请涉及数据处理技术领域,尤其涉及一种服务器、金融时序数据的处理方法及存储介质。The present application relates to the field of data processing technologies, and in particular, to a server, a method for processing financial time series data, and a storage medium.
背景技术Background technique
金融时序数据具有时间序列的统计特征,具有很多类别,例如,价量的金融时序数据包括:股票、期货、外汇等标的的开盘价、收盘价、最高价、最低价、成交量数据;又如,衍生指标的金融时序数据包括:中债国债到期收益率-中债企业债到期收益率、风险溢价、股息率、CR指标、大小盘换手率比值、RSRS指标、沪深300溢价率、沪深300主动买入额等。在实际情况中,金融时序数据由于各种原因引起数据的缺失,例如:1、上市公司股票停牌导致当日股票开盘价、收盘价、最高价、最低价、成交量等信息丧失;2、在公开平台无法获取相应的金融时序数据;3、在公开平台获取的金融时序数据与实际值存在显著偏差,等。Financial time series data has statistical characteristics of time series and has many categories. For example, financial time series data of price includes: opening price, closing price, highest price, lowest price, and volume data of stocks, futures, foreign exchange, etc. The financial time series data of derivative indicators include: China Bond Debt Yield to Yield - China Bond Corporate Bond Yield to Maturity, Risk Premium, Dividend Rate, CR Index, Ratio of Large and Small Handicap, RSRS Indicator, and CSI 300 Premium Rate , Shanghai and Shenzhen 300 initiative to buy the amount. In the actual situation, the financial time series data is missing due to various reasons, such as: 1. The stock suspension of the listed company leads to the loss of information such as the opening price, closing price, highest price, lowest price, and trading volume of the day; The platform cannot obtain the corresponding financial time series data; 3. The financial time series data obtained on the public platform has significant deviation from the actual value, and so on.
传统的缺失值处理方法包括人工填写、特殊值填充、均值填充、就近补齐、聚类填充等。但对于金融时序数据而言,由于其在时间上存在依赖关系,传统的简单处理方法得到的缺失值不准确,无法最大程度地模拟真实的金融时序数据的分布,易造成信息损失,影响了后续对金融时序数据的研究。Traditional missing value processing methods include manual filling, special value filling, mean filling, near filling, cluster filling, and so on. However, for financial time series data, due to its dependence on time, the traditional simple processing method obtains missing values that are inaccurate, and cannot simulate the distribution of real financial time series data to the greatest extent, which is easy to cause information loss and affect subsequent Research on financial time series data.
发明内容Summary of the invention
本申请的目的在于提供一种服务器、金融时序数据的处理方法及存储介质,旨在预测得到准确、客观的缺失值。The purpose of the present application is to provide a server, a method for processing financial time series data, and a storage medium, which are intended to predict accurate and objective missing values.
为实现上述目的,本申请提供一种服务器,所述服务器包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的处理系统,所述处理系统被所述处理器执行时实现如下步骤:To achieve the above object, the present application provides a server including a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing system being The processor implements the following steps when executed:
设置不同预定时间步长的滑动窗口,利用所设置的滑动窗口在不含有缺失值的金融时序数据滑动以获取多个窗口数据,对每一窗口数据进行采样得到各预定时间步长对应的样本数据;Setting a sliding window with different predetermined time steps, using the set sliding window to slide on the financial time series data without missing values to obtain multiple window data, and sampling each window data to obtain sample data corresponding to each predetermined time step ;
利用各预定时间步长对应的样本数据分别对预定的循环神经网络模型进行训练,得到训练后的各预定时间步长对应的模型作为预测模型;The predetermined cyclic neural network model is respectively trained by using the sample data corresponding to each predetermined time step, and the model corresponding to each predetermined time step after the training is obtained as a prediction model;
获取含有缺失值的金融时序数据,获取该金融时序数据中的缺失值的位置及缺失值的位数,根据该缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据;Obtaining financial time series data with missing values, obtaining the position of the missing value in the financial time series data and the number of missing values, and intercepting the financial timing ahead of the position of the missing value according to the position of the missing value and the number of bits of the missing value Data, with the intercepted data as the data to be input;
将待输入数据输入至各预测模型中,获取各预测模型输出的预测值,获取各预测值的平均值作为该缺失值的填充值。The data to be input is input to each prediction model, and the predicted values output by the respective prediction models are obtained, and the average value of each predicted value is obtained as the filling value of the missing value.
为实现上述目的,本申请还提供一种金融时序数据的处理方法,所述金融时序数据的处理方法包括:To achieve the above objective, the present application further provides a method for processing financial time series data, and the method for processing the financial time series data includes:
S1,设置不同预定时间步长的滑动窗口,利用所设置的滑动窗口在不含有缺失值的金融时序数据滑动以获取多个窗口数据,对每一窗口数据进行采样得到各预定时间步长对应的样本数据;S1, setting a sliding window with different predetermined time steps, using the set sliding window to slide on the financial time series data without missing values to obtain a plurality of window data, and sampling each window data to obtain corresponding time steps corresponding to each step sample;
S2,利用各预定时间步长对应的样本数据分别对预定的循环神经网络模型进行训练,得到训练后的各预定时间步长对应的模型作为预测模型;S2: training the predetermined cyclic neural network model by using sample data corresponding to each predetermined time step, and obtaining a model corresponding to each predetermined time step after the training as a prediction model;
S3,获取含有缺失值的金融时序数据,获取该金融时序数据中的缺失值的位置及缺失值的位数,根据该缺失值的位置及缺失值的位数截取在该缺失 值的位置前方的金融时序数据,以所截取的数据作为待输入数据;S3, obtaining financial time series data with missing values, obtaining the position of the missing value in the financial time series data and the number of bits of the missing value, and cutting the position of the missing value in front of the position of the missing value according to the position of the missing value and the number of missing values Financial time series data, with the intercepted data as the data to be input;
S4,将待输入数据输入至各预测模型中,获取各预测模型输出的预测值,获取各预测值的平均值作为该缺失值的填充值。S4: Input the data to be input into each prediction model, obtain a predicted value output by each prediction model, and obtain an average value of each predicted value as a filling value of the missing value.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有处理系统,所述处理系统被处理器执行时实现步骤:The application further provides a computer readable storage medium having a processing system stored thereon, the processing system being implemented by a processor to implement the steps:
设置不同预定时间步长的滑动窗口,利用所设置的滑动窗口在不含有缺失值的金融时序数据滑动以获取多个窗口数据,对每一窗口数据进行采样得到各预定时间步长对应的样本数据;Setting a sliding window with different predetermined time steps, using the set sliding window to slide on the financial time series data without missing values to obtain multiple window data, and sampling each window data to obtain sample data corresponding to each predetermined time step ;
利用各预定时间步长对应的样本数据分别对预定的循环神经网络模型进行训练,得到训练后的各预定时间步长对应的模型作为预测模型;The predetermined cyclic neural network model is respectively trained by using the sample data corresponding to each predetermined time step, and the model corresponding to each predetermined time step after the training is obtained as a prediction model;
获取含有缺失值的金融时序数据,获取该金融时序数据中的缺失值的位置及缺失值的位数,根据该缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据;Obtaining financial time series data with missing values, obtaining the position of the missing value in the financial time series data and the number of missing values, and intercepting the financial timing ahead of the position of the missing value according to the position of the missing value and the number of bits of the missing value Data, with the intercepted data as the data to be input;
将待输入数据输入至各预测模型中,获取各预测模型输出的预测值,获取各预测值的平均值作为该缺失值的填充值。The data to be input is input to each prediction model, and the predicted values output by the respective prediction models are obtained, and the average value of each predicted value is obtained as the filling value of the missing value.
本申请的有益效果是:本申请利用循环神经网络模型处理和预测金融时序数据中的缺失值,能够捕捉到金融时序数据前后的依赖关系,缺失值的填充值由多种模型的平均值给出,更加客观、准确,能够最大程度地还原真实的金融时序数据的整体分布。The beneficial effects of the present application are as follows: the present application utilizes a cyclic neural network model to process and predict missing values in financial time series data, and can capture dependencies before and after financial time series data. The padding value of missing values is given by the average of multiple models. More objective and accurate, it can restore the overall distribution of real financial time series data to the greatest extent.
附图说明DRAWINGS
图1为本申请服务器一实施例的硬件架构的示意图;1 is a schematic diagram of a hardware architecture of an embodiment of a server according to the present application;
图2为LSTM模型的结构示意图;2 is a schematic structural view of an LSTM model;
图3为图2所示修改后的LSTM模型的结构示意图;3 is a schematic structural view of the modified LSTM model shown in FIG. 2;
图4为本申请金融时序数据的处理方法一实施例的流程示意图。FIG. 4 is a schematic flowchart diagram of an embodiment of a method for processing financial time series data according to the present application.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.
参阅图1所示,为本申请服务器一实施例的硬件架构的示意图。服务器1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述服务器1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。1 is a schematic diagram of a hardware architecture of an embodiment of a server according to the present application. The server 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance. The server 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing. A super virtual computer consisting of a group of loosely coupled computers.
在本实施例中,服务器1可包括,但不仅限于,可通过系统总线相互通信连接的存储器11、处理器12、网络接口13,存储器11存储有可在处理器12上运行的处理系统。需要指出的是,图1仅示出了具有组件11-13的服务器1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。In the present embodiment, the server 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13 communicably connected to each other through a system bus, and the memory 11 stores a processing system operable on the processor 12. It is pointed out that Figure 1 only shows the server 1 with the components 11-13, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
其中,存储器11包括内存及至少一种类型的可读存储介质。内存为服 务器1的运行提供缓存;可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等的非易失性存储介质。在一些实施例中,可读存储介质可以是服务器1的内部存储单元,例如该服务器1的硬盘;在另一些实施例中,该非易失性存储介质也可以是服务器1的外部存储设备,例如服务器1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。本实施例中,存储器11的可读存储介质通常用于存储安装于服务器1的操作系统和各类应用软件,例如存储本申请一实施例中的处理系统的程序代码等。此外,存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 11 includes a memory and at least one type of readable storage medium. The memory provides a cache for the operation of the server 1; the readable storage medium can be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM). A non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like. In some embodiments, the readable storage medium may be an internal storage unit of the server 1, such as a hard disk of the server 1; in other embodiments, the non-volatile storage medium may also be an external storage device of the server 1, For example, a plug-in hard disk provided on the server 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, and the like. In this embodiment, the readable storage medium of the memory 11 is generally used to store an operating system installed on the server 1 and various types of application software, such as program code for storing the processing system in an embodiment of the present application. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述服务器1的总体操作,例如执行与所述其他设备进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,如运行处理系统等。The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the server 1, such as performing control and processing related to data interaction or communication with the other devices. In this embodiment, the processor 12 is configured to run program code or process data stored in the memory 11, such as a running processing system.
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述服务器1与其他电子设备之间建立通信连接。本实施例中,网络接口13主要用于将服务器1与一个或多个终端设备2相连,在服务器1与一个或多个终端设备2之间建立数据传输通道和通信连接。The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices. In this embodiment, the network interface 13 is mainly used to connect the server 1 with one or more terminal devices 2, and establish a data transmission channel and a communication connection between the server 1 and one or more terminal devices 2.
所述处理系统存储在存储器11中,包括至少一个存储在存储器11中的计算机可读指令,该至少一个计算机可读指令可被处理器器12执行,以实现本申请各实施例的方法;以及,该至少一个计算机可读指令依据其各部分所实现的功能不同,可被划为不同的逻辑模块。The processing system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application; The at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
在一实施例中,上述处理系统被所述处理器12执行时实现如下步骤:In an embodiment, when the processing system is executed by the processor 12, the following steps are implemented:
设置不同预定时间步长的滑动窗口,利用所设置的滑动窗口在不含有缺失值的金融时序数据滑动以获取多个窗口数据,对每一窗口数据进行采样得到各预定时间步长对应的样本数据;Setting a sliding window with different predetermined time steps, using the set sliding window to slide on the financial time series data without missing values to obtain multiple window data, and sampling each window data to obtain sample data corresponding to each predetermined time step ;
其中,预定时间步长包括6个时间单位、11个时间单位及16个时间单位,时间单位指的是金融时序数据的粒度单位,例如,以天为粒度的金融时序数据,其时间单位为天;以分钟为粒度的高频金融时序数据,其时间单位为分钟,等。The predetermined time step includes 6 time units, 11 time units and 16 time units, and the time unit refers to a granularity unit of financial time series data, for example, financial time series data with a day-to-day granularity, and the time unit is days. High-frequency financial time series data in minutes, whose time unit is minutes, and so on.
对于6个时间单位的滑动窗口,对应的窗口数据的位数为6位,采样得到的样本数据的位数为6位;对于11个时间单位的滑动窗口,对应的窗口数据的位数为11位,采样得到的样本数据的位数为6位,例如,采样得到的样本数据为(x1,x3,x5,x7,x9,x11),即采样窗口数据中的第1、3、5、7、9、11位的数据;对于16个时间单位的滑动窗口,对应的窗口数据的位数为16位,采样得到的样本数据的位数为6位,例如,采样得到的样本数据为(x1,x4,x7,x10,x13,x16),即采样窗口数据中的第1、4、7、10、13、16位的数据。For a sliding window of 6 time units, the number of bits of the corresponding window data is 6 bits, and the number of bits of sample data obtained by sampling is 6 bits; for a sliding window of 11 time units, the number of bits of the corresponding window data is 11 Bit, the sampled sample has a bit number of 6 bits. For example, the sampled sample data is (x1, x3, x5, x7, x9, x11), ie, the first, third, fifth, and seventh in the sample window data. , 9, 11-bit data; for a sliding window of 16 time units, the corresponding window data has a bit number of 16 bits, and the sampled data has a bit number of 6 bits, for example, the sampled sample data is (x1) , x4, x7, x10, x13, x16), that is, the data of the first, fourth, seventh, tenth, thirteenth, and sixteenth bits in the sampling window data.
其中,设置设置不同预定时间步长的滑动窗口的目的在于在样本数据的长度不变的情况下,扩大所捕获信息的久远度及联系关系。对不含有缺失值的金融时序数据进行采样得到样本数据,利用该样本数据来训练模型,以得到准确度较高的模型。The purpose of setting the sliding window with different predetermined time steps is to expand the long-distance and relationship of the captured information without changing the length of the sample data. The financial time series data without missing values is sampled to obtain sample data, and the sample data is used to train the model to obtain a model with higher accuracy.
利用各预定时间步长对应的样本数据分别对预定的循环神经网络模型进行训练,得到训练后的各预定时间步长对应的模型作为预测模型;The predetermined cyclic neural network model is respectively trained by using the sample data corresponding to each predetermined time step, and the model corresponding to each predetermined time step after the training is obtained as a prediction model;
其中,预定的循环神经网络模型为两个或两个以上的循环神经网络的混合模型,优选地,为长短期记忆网络模型(Long Short-Term Memory,LSTM)与门控循环单元模型(Gated Recurrent Unit,GRU)组成的混合模型,LSTM 模型及GRU模型均可用于捕捉时间序列前后的依赖关系。The predetermined cyclic neural network model is a hybrid model of two or more cyclic neural networks, preferably a Long Short-Term Memory (LSTM) and a gated loop unit model (Gated Recurrent). A mixed model composed of Unit, GRU), LSTM model and GRU model can be used to capture the dependencies before and after the time series.
在一实施例中,该步骤包括:将每一种预定时间步长对应的样本数据划分为第一比例的训练集及第二比例的测试集,利用每一种预定时间步长对应的训练集分别对预定的循环神经网络模型进行训练,所述第一比例与第二比例的和小于等于1;在每一种预定时间步长对应的训练集中抽取预定数量的样本数据作为验证集,利用该验证集对训练中的循环神经网络模型的参数进行测试,在测试误差大于等于预定的误差阈值时,结束训练以得到训练后的循环神经网络模型;利用测试集对训练后的循环神经网络模型的准确率进行测试;若该准确率大于等于预定的准确率阈值,则将该训练后的循环神经网络模型作为预测模型;若该准确率小于预定的准确率阈值,则修改该循环神经网络模型的隐含层结构,并重新进行训练,以得到准确率大于等于预定准确率阈值的预测模型。In an embodiment, the step includes: dividing sample data corresponding to each predetermined time step into a training set of a first ratio and a test set of a second ratio, using a training set corresponding to each predetermined time step Performing training on a predetermined cyclic neural network model, wherein the sum of the first ratio and the second ratio is less than or equal to 1; and extracting a predetermined number of sample data as a verification set in each training set corresponding to the predetermined time step, using the The verification set tests the parameters of the cyclic neural network model in the training. When the test error is greater than or equal to a predetermined error threshold, the training is terminated to obtain the trained cyclic neural network model; the test set is used to train the cyclic neural network model. The accuracy rate is tested; if the accuracy rate is greater than or equal to a predetermined accuracy threshold, the trained cyclic neural network model is used as a prediction model; if the accuracy is less than a predetermined accuracy threshold, the cyclic neural network model is modified. Implicit layer structure and re-training to get a pre-accuracy rate greater than or equal to the predetermined accuracy threshold Test model.
其中,由于各预定时间步长对应的样本数据可以视为是独立同分布的,故对于训练集和测试集采取随机随机抽样,训练集所占的比例为70%,测试集所占的比例为30%,例如,训练集包括7万份样本数据,测试集包括3万份样本数据。Wherein, since the sample data corresponding to each predetermined time step can be regarded as independent and identically distributed, random random sampling is adopted for the training set and the test set, and the proportion of the training set is 70%, and the proportion of the test set is 30%, for example, the training set includes 70,000 sample data, and the test set includes 30,000 sample data.
优选地,在训练集中,采用交叉验证的方式进行训练,即将训练集中的样本数据分为10份,每次取9份进行训练,取1份样本数据作为验证集,以利用验证集对训练中的循环神经网络模型的参数进行测试。在训练集上进行训练,并且在验证集上获取测试结果,随着训练次数的增加,如果在验证集上发现测试误差上升,即测试误差大于等于预定的误差阈值,则停止训练,以得到训练后的循环神经网络模型作为下述测试集测试的模型,可以有效避免模型的过度拟合。Preferably, in the training set, the training is performed by means of cross-validation, that is, the sample data in the training set is divided into 10 parts, 9 pieces are taken for training each time, and 1 sample data is taken as a verification set to use the verification set pair in training. The parameters of the cyclic neural network model were tested. Training is performed on the training set, and the test result is obtained on the verification set. If the number of training increases, if the test error is found on the verification set, that is, the test error is greater than or equal to a predetermined error threshold, the training is stopped to obtain training. The post-recurrent neural network model is used as a model for the test set described below to effectively avoid over-fitting of the model.
具体地,利用训练集对LSTM模型进行训练,LSTM模型结构可采用Bi-directional LSTM结构,训练集的样本数据包括(X1,X2,X3,X4,X5,X6), 如图2所示,(X1,X2,X3,X4,X5)为输入层,A为隐含层,St为输出。其中,隐含层A是LSTM模型的记忆单元,为模型的参数,根据当前输入层的输入和上一步隐含层的输出进行计算得到。在测试集对训练后的LSTM模型的准确率进行测试时,将输出St与样本数据中的X6进行比较,以进行测试,测试结果表明模型对金融时序数据分布的刻画能力。如果LSTM模型的准确率大于等于预定准确率阈值(例如,0.9),则LSTM模型符合要求,将该训练后的LSTM模型作为预测模型;如果LSTM模型的准确率小于预定准确率阈值,则LSTM模型不符合要求,修改LSTM模型的隐含层结构,如图3所示,本实施例中,将每一个时间点对应输入的样本数据的隐含层由单隐层的形式修改为双隐含层堆叠结构,并重新进行训练,以得到准确率大于等于预定准确率阈值的预测模型。Specifically, the training set is used to train the LSTM model, and the LSTM model structure may adopt a Bi-directional LSTM structure, and the sample data of the training set includes (X1, X2, X3, X4, X5, X6), as shown in FIG. 2, X1, X2, X3, X4, X5) are input layers, A is an implicit layer, and St is an output. The hidden layer A is the memory unit of the LSTM model, which is the parameter of the model, and is calculated according to the input of the current input layer and the output of the hidden layer of the previous step. When the test set tests the accuracy of the trained LSTM model, the output St is compared with the X6 in the sample data to test, and the test results indicate the ability of the model to characterize the distribution of financial time series data. If the accuracy of the LSTM model is greater than or equal to a predetermined accuracy threshold (eg, 0.9), the LSTM model meets the requirements, and the trained LSTM model is used as a prediction model; if the accuracy of the LSTM model is less than a predetermined accuracy threshold, the LSTM model If the requirements are not met, the hidden layer structure of the LSTM model is modified. As shown in FIG. 3, in this embodiment, the hidden layer corresponding to the input sample data is modified from a single hidden layer to a double hidden layer. The structure is stacked and retrained to obtain a prediction model with an accuracy rate greater than or equal to a predetermined accuracy threshold.
GRU模型和LSTM模型的结构类似,只是隐含层的结构比LSTM模型复杂。利用上述相同的训练集对GRU模型进行训练,训练GRU模型与训练LSTM模型的过程基本一致,且在训练集抽取部分样本数据作为验证集,可以有效避免模型的过度拟合。在训练后利用测试集对训练后的GRU模型进行测试,以使得GRU模型的准确率大于等于预定的准确率阈值,如果GRU模型的准确率小于该准确率阈值,则考虑修改GRU模型的结构,修改方式与LSTM模型类似。The structure of the GRU model is similar to that of the LSTM model, except that the structure of the hidden layer is more complex than the LSTM model. The GRU model is trained by using the same training set as above. The process of training the GRU model is basically consistent with the training of the LSTM model, and extracting part of the sample data as a verification set in the training set can effectively avoid over-fitting of the model. After training, the trained GRU model is tested by using the test set, so that the accuracy of the GRU model is greater than or equal to a predetermined accuracy threshold. If the accuracy of the GRU model is less than the accuracy threshold, then the structure of the GRU model is modified. The modification is similar to the LSTM model.
通过上述的训练及测试过程,拟合得到各预定时间步长对应的LSTM模型+GRU模型组合成的混合模型,作为预测模型。Through the above training and testing process, a hybrid model composed of LSTM model + GRU model corresponding to each predetermined time step is obtained as a prediction model.
获取含有缺失值的金融时序数据,获取该金融时序数据中的缺失值的位置及缺失值的位数,根据该缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据;Obtaining financial time series data with missing values, obtaining the position of the missing value in the financial time series data and the number of missing values, and intercepting the financial timing ahead of the position of the missing value according to the position of the missing value and the number of bits of the missing value Data, with the intercepted data as the data to be input;
本实施例中,首先定位缺失值的位置,由于金融时序数据是时序序列,因此可以通过缺失值所在的时间点定位缺失值的位置;然后确定每一处缺失 值的位数,例如为1位或2位等。根据将要预测的缺失值的位数,确定输入模型的金融时序数据的位数,截取在缺失值前方的若干位数据,作为待输入数据。In this embodiment, the location of the missing value is first located. Since the financial time series data is a time series sequence, the position of the missing value can be located by the time point where the missing value is located; and then the number of bits of each missing value is determined, for example, 1 bit. Or 2 digits, etc. The number of bits of the financial time series data of the input model is determined according to the number of bits of the missing value to be predicted, and several bits of data in front of the missing value are intercepted as the data to be input.
其中,缺失值的位数一般为1位或2位,待输入数据优选为5位、6位或者7位,少于5位和多于7位通常难以取得较好的效果,因为少于5位则捕获的时序信息较少,而多于7位则时序较长,信息偏差较大。优选地,如下表1所示,缺失值的位数与待输入数据的位数的对应关系为:Wherein, the number of bits of the missing value is generally 1 or 2 bits, and the data to be input is preferably 5 bits, 6 bits or 7 bits, and less than 5 bits and more than 7 bits are usually difficult to achieve better results because less than 5 The bit captures less timing information, while more than 7 bits have longer timing and greater information skew. Preferably, as shown in Table 1 below, the correspondence between the number of bits of the missing value and the number of bits of the data to be input is:
缺失值的位数Number of missing values 待输入数据的位数The number of bits of data to be entered
11 55
11 66
22 66
11 77
22 77
表1Table 1
在表1中,若缺失值的位数为1位,则确定截取数据的位数为5位、6位或者7位,截取在该缺失值的位置前方的5位、6位或者7位金融时序数据,以所截取的数据作为待输入数据;若缺失值的位数为2位,则确定截取数据的位数为6位或者7位,截取在该缺失值的位置前方的6位或者7位金融时序数据,以所截取的数据作为待输入数据。In Table 1, if the number of bits of the missing value is 1 bit, it is determined that the number of bits of the intercepted data is 5, 6, or 7 bits, and the 5, 6, or 7 financial positions in front of the position of the missing value are intercepted. Time series data, with the intercepted data as the data to be input; if the number of bits of the missing value is 2 bits, it is determined that the number of bits of the intercepted data is 6 or 7 bits, and 6 bits or 7 in front of the position of the missing value are intercepted. Bit financial time series data, with the intercepted data as the data to be input.
将待输入数据输入至各预测模型中,获取各预测模型输出的预测值,获取各预测值的平均值作为该缺失值的填充值。The data to be input is input to each prediction model, and the predicted values output by the respective prediction models are obtained, and the average value of each predicted value is obtained as the filling value of the missing value.
本实施例中,将待输入数据分别输入至各GRU模型和LSTM模型组成的混合模型的预测模型中,即分别输入至6个时间单位对应的混合模型、11个时间单位对应的混合模型、16个时间单位对应的混合模型中,获取三个混合模型对应输出的预测值V1、V2、V3,计算该缺失值的填充值V=(V1+V2+V3)/3,缺失值的位数为2位的也是计算输出的对应位置的预测 值的平均值。该缺失值的填充值V能够捕捉到金融时序数据前后的依赖关系,且由三种混合模型的平均值给出,更加客观、准确。In this embodiment, the data to be input is respectively input into a prediction model of a mixed model composed of each GRU model and an LSTM model, that is, respectively input to a hybrid model corresponding to 6 time units, a hybrid model corresponding to 11 time units, and 16 In the hybrid model corresponding to the time units, the predicted values V1, V2, and V3 corresponding to the output of the three mixed models are obtained, and the padding value of the missing value is calculated as V=(V1+V2+V3)/3, and the number of bits of the missing value is The 2-bit is also the average of the predicted values of the corresponding positions of the calculated output. The padding value V of the missing value can capture the dependencies before and after the financial time series data, and is given by the average of the three mixed models, which is more objective and accurate.
与现有技术相比,本申请对不含有缺失值的金融时序数据,设置不同时间步长的滑动窗口截取数据,再对截取的数据进行采样得到不同时间步长对应的样本数据,分别将样本数据划分训练集及测试集训练预定的循环神经网络模型,得到不同时间步长对应的预测模型;对于含有缺失值的金融时序数据,定位缺失值的位置及确定缺失值的位数,根据该缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,将数据输入至各预测模型中,得到各预测模型输出的预测值,以各预测值的平均值作为该缺失值的填充值,本申请利用循环神经网络模型处理和预测金融时序数据中的缺失值,能够捕捉到金融时序数据前后的依赖关系,缺失值的填充值由多种模型的平均值给出,更加客观、准确,能够最大程度地还原真实的金融时序数据的整体分布。Compared with the prior art, the present application sets a sliding window with different time steps to intercept data for financial time series data without missing values, and then samples the intercepted data to obtain sample data corresponding to different time steps, respectively. The data partition training set and the test set train a predetermined cyclic neural network model to obtain prediction models corresponding to different time steps; for financial time series data with missing values, locate the position of the missing value and determine the number of missing values, according to the missing The position of the value and the number of digits of the missing value are taken in the financial time series data in front of the position of the missing value, and the data is input into each prediction model, and the predicted value output by each prediction model is obtained, and the average value of each predicted value is used as the missing value. The filling value of the value, the present application uses the cyclic neural network model to process and predict the missing values in the financial time series data, and can capture the dependency relationship before and after the financial time series data, and the filling value of the missing value is given by the average value of various models, and Objective and accurate, it can restore the overall distribution of real financial time series data to the greatest extent.
如图4所示,图4本申请金融时序数据的处理方法一实施例的流程示意图,该金融时序数据的处理方法包括以下步骤:As shown in FIG. 4, FIG. 4 is a schematic flowchart diagram of an embodiment of a method for processing financial time series data according to the present application. The method for processing the financial time series data includes the following steps:
步骤S1,设置不同预定时间步长的滑动窗口,利用所设置的滑动窗口在不含有缺失值的金融时序数据滑动以获取多个窗口数据,对每一窗口数据进行采样得到各预定时间步长对应的样本数据;Step S1, setting a sliding window with different predetermined time steps, using the set sliding window to slide on the financial time series data without missing values to obtain multiple window data, and sampling each window data to obtain corresponding predetermined time steps Sample data;
其中,预定时间步长包括6个时间单位、11个时间单位及16个时间单位,时间单位指的是金融时序数据的粒度单位,例如,以天为粒度的金融时序数据,其时间单位为天;以分钟为粒度的高频金融时序数据,其时间单位为分钟,等。The predetermined time step includes 6 time units, 11 time units and 16 time units, and the time unit refers to a granularity unit of financial time series data, for example, financial time series data with a day-to-day granularity, and the time unit is days. High-frequency financial time series data in minutes, whose time unit is minutes, and so on.
对于6个时间单位的滑动窗口,对应的窗口数据的位数为6位,采样得到的样本数据的位数为6位;对于11个时间单位的滑动窗口,对应的窗口数据的位数为11位,采样得到的样本数据的位数为6位,例如,采样得到 的样本数据为(x1,x3,x5,x7,x9,x11),即采样窗口数据中的第1、3、5、7、9、11位的数据;对于16个时间单位的滑动窗口,对应的窗口数据的位数为16位,采样得到的样本数据的位数为6位,例如,采样得到的样本数据为(x1,x4,x7,x10,x13,x16),即采样窗口数据中的第1、4、7、10、13、16位的数据。For a sliding window of 6 time units, the number of bits of the corresponding window data is 6 bits, and the number of bits of sample data obtained by sampling is 6 bits; for a sliding window of 11 time units, the number of bits of the corresponding window data is 11 Bit, the sampled sample has a bit number of 6 bits. For example, the sampled sample data is (x1, x3, x5, x7, x9, x11), ie, the first, third, fifth, and seventh in the sample window data. , 9, 11-bit data; for a sliding window of 16 time units, the corresponding window data has a bit number of 16 bits, and the sampled data has a bit number of 6 bits, for example, the sampled sample data is (x1) , x4, x7, x10, x13, x16), that is, the data of the first, fourth, seventh, tenth, thirteenth, and sixteenth bits in the sampling window data.
其中,设置设置不同预定时间步长的滑动窗口的目的在于在样本数据的长度不变的情况下,扩大所捕获信息的久远度及联系关系。对不含有缺失值的金融时序数据进行采样得到样本数据,利用该样本数据来训练模型,以得到准确度较高的模型。The purpose of setting the sliding window with different predetermined time steps is to expand the long-distance and relationship of the captured information without changing the length of the sample data. The financial time series data without missing values is sampled to obtain sample data, and the sample data is used to train the model to obtain a model with higher accuracy.
步骤S2,利用各预定时间步长对应的样本数据分别对预定的循环神经网络模型进行训练,得到训练后的各预定时间步长对应的模型作为预测模型;Step S2: training the predetermined cyclic neural network model by using sample data corresponding to each predetermined time step, and obtaining a model corresponding to each predetermined time step after the training as a prediction model;
其中,预定的循环神经网络模型为两个或两个以上的循环神经网络的混合模型,优选地,为长短期记忆网络模型(Long Short-Term Memory,LSTM)与门控循环单元模型(Gated Recurrent Unit,GRU)组成的混合模型,LSTM模型及GRU模型均可用于捕捉时间序列前后的依赖关系。The predetermined cyclic neural network model is a hybrid model of two or more cyclic neural networks, preferably a Long Short-Term Memory (LSTM) and a gated loop unit model (Gated Recurrent). The mixed model composed of Unit, GRU), LSTM model and GRU model can be used to capture the dependencies before and after the time series.
在一实施例中,该步骤包括:将每一种预定时间步长对应的样本数据划分为第一比例的训练集及第二比例的测试集,利用每一种预定时间步长对应的训练集分别对预定的循环神经网络模型进行训练,所述第一比例与第二比例的和小于等于1;在每一种预定时间步长对应的训练集中抽取预定数量的样本数据作为验证集,利用该验证集对训练中的循环神经网络模型的参数进行测试,在测试误差大于等于预定的误差阈值时,结束训练以得到训练后的循环神经网络模型;利用测试集对训练后的循环神经网络模型的准确率进行测试;若该准确率大于等于预定的准确率阈值,则将该训练后的循环神经网络模型作为预测模型;若该准确率小于预定的准确率阈值,则修改该循环神 经网络模型的隐含层结构,并重新进行训练,以得到准确率大于等于预定准确率阈值的预测模型。In an embodiment, the step includes: dividing sample data corresponding to each predetermined time step into a training set of a first ratio and a test set of a second ratio, using a training set corresponding to each predetermined time step Performing training on a predetermined cyclic neural network model, wherein the sum of the first ratio and the second ratio is less than or equal to 1; and extracting a predetermined number of sample data as a verification set in each training set corresponding to the predetermined time step, using the The verification set tests the parameters of the cyclic neural network model in the training. When the test error is greater than or equal to a predetermined error threshold, the training is terminated to obtain the trained cyclic neural network model; the test set is used to train the cyclic neural network model. The accuracy rate is tested; if the accuracy rate is greater than or equal to a predetermined accuracy threshold, the trained cyclic neural network model is used as a prediction model; if the accuracy is less than a predetermined accuracy threshold, the cyclic neural network model is modified. Implicit layer structure and retraining to get an accuracy rate greater than or equal to the predetermined accuracy threshold Measurement model.
其中,由于各预定时间步长对应的样本数据可以视为是独立同分布的,故对于训练集和测试集采取随机随机抽样,训练集所占的比例为70%,测试集所占的比例为30%,例如,训练集包括7万份样本数据,测试集包括3万份样本数据。Wherein, since the sample data corresponding to each predetermined time step can be regarded as independent and identically distributed, random random sampling is adopted for the training set and the test set, and the proportion of the training set is 70%, and the proportion of the test set is 30%, for example, the training set includes 70,000 sample data, and the test set includes 30,000 sample data.
优选地,在训练集中,采用交叉验证的方式进行训练,即将训练集中的样本数据分为10份,每次取9份进行训练,取1份样本数据作为验证集,以利用验证集对训练中的循环神经网络模型的参数进行测试。在训练集上进行训练,并且在验证集上获取测试结果,随着训练次数的增加,如果在验证集上发现测试误差上升,即测试误差大于等于预定的误差阈值,则停止训练,以得到训练后的循环神经网络模型作为下述测试集测试的模型,可以有效避免模型的过度拟合。Preferably, in the training set, the training is performed by means of cross-validation, that is, the sample data in the training set is divided into 10 parts, 9 pieces are taken for training each time, and 1 sample data is taken as a verification set to use the verification set pair in training. The parameters of the cyclic neural network model were tested. Training is performed on the training set, and the test result is obtained on the verification set. If the number of training increases, if the test error is found on the verification set, that is, the test error is greater than or equal to a predetermined error threshold, the training is stopped to obtain training. The post-recurrent neural network model is used as a model for the test set described below to effectively avoid over-fitting of the model.
具体地,利用训练集对LSTM模型进行训练,LSTM模型结构可采用Bi-directional LSTM结构,训练集的样本数据包括(X1,X2,X3,X4,X5,X6),如图2所示,(X1,X2,X3,X4,X5)为输入层,A为隐含层,St为输出。其中,隐含层A是LSTM模型的记忆单元,为模型的参数,根据当前输入层的输入和上一步隐含层的输出进行计算得到。在测试集对训练后的LSTM模型的准确率进行测试时,将输出St与样本数据中的X6进行比较,以进行测试,测试结果表明模型对金融时序数据分布的刻画能力。如果LSTM模型的准确率大于等于预定准确率阈值(例如,0.9),则LSTM模型符合要求,将该训练后的LSTM模型作为预测模型;如果LSTM模型的准确率小于预定准确率阈值,则LSTM模型不符合要求,修改LSTM模型的隐含层结构,如图3所示,本实施例中,将每一个时间点对应输入的样本数据的隐含层由单隐层的形式修改为双隐含层堆叠结构,并重新进行训练,以得到准确率大于等于 预定准确率阈值的预测模型。Specifically, the training set is used to train the LSTM model, and the LSTM model structure may adopt a Bi-directional LSTM structure, and the sample data of the training set includes (X1, X2, X3, X4, X5, X6), as shown in FIG. 2, X1, X2, X3, X4, X5) are input layers, A is an implicit layer, and St is an output. The hidden layer A is the memory unit of the LSTM model, which is the parameter of the model, and is calculated according to the input of the current input layer and the output of the hidden layer of the previous step. When the test set tests the accuracy of the trained LSTM model, the output St is compared with the X6 in the sample data to test, and the test results indicate the ability of the model to characterize the distribution of financial time series data. If the accuracy of the LSTM model is greater than or equal to a predetermined accuracy threshold (eg, 0.9), the LSTM model meets the requirements, and the trained LSTM model is used as a prediction model; if the accuracy of the LSTM model is less than a predetermined accuracy threshold, the LSTM model If the requirements are not met, the hidden layer structure of the LSTM model is modified. As shown in FIG. 3, in this embodiment, the hidden layer corresponding to the input sample data is modified from a single hidden layer to a double hidden layer. The structure is stacked and retrained to obtain a prediction model with an accuracy rate greater than or equal to a predetermined accuracy threshold.
GRU模型和LSTM模型的结构类似,只是隐含层的结构比LSTM模型复杂。利用上述相同的训练集对GRU模型进行训练,训练GRU模型与训练LSTM模型的过程基本一致,且在训练集抽取部分样本数据作为验证集,可以有效避免模型的过度拟合。在训练后利用测试集对训练后的GRU模型进行测试,以使得GRU模型的准确率大于等于预定的准确率阈值,如果GRU模型的准确率小于该准确率阈值,则考虑修改GRU模型的结构,修改方式与LSTM模型类似。The structure of the GRU model is similar to that of the LSTM model, except that the structure of the hidden layer is more complex than the LSTM model. The GRU model is trained by using the same training set as above. The process of training the GRU model is basically consistent with the training of the LSTM model, and extracting part of the sample data as a verification set in the training set can effectively avoid over-fitting of the model. After training, the trained GRU model is tested by using the test set, so that the accuracy of the GRU model is greater than or equal to a predetermined accuracy threshold. If the accuracy of the GRU model is less than the accuracy threshold, then the structure of the GRU model is modified. The modification is similar to the LSTM model.
通过上述的训练及测试过程,拟合得到各预定时间步长对应的LSTM模型+GRU模型组合成的混合模型,作为预测模型。Through the above training and testing process, a hybrid model composed of LSTM model + GRU model corresponding to each predetermined time step is obtained as a prediction model.
步骤S3,获取含有缺失值的金融时序数据,获取该金融时序数据中的缺失值的位置及缺失值的位数,根据该缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据;Step S3: Acquire financial time series data with missing values, obtain the position of the missing value in the financial time series data, and the number of bits of the missing value, and intercept the position of the missing value according to the position of the missing value and the number of missing values. Financial time series data, with the intercepted data as the data to be input;
本实施例中,首先定位缺失值的位置,由于金融时序数据是时序序列,因此可以通过缺失值所在的时间点定位缺失值的位置;然后确定每一处缺失值的位数,例如为1位或2位等。根据将要预测的缺失值的位数,确定输入模型的金融时序数据的位数,截取在缺失值前方的若干位数据,作为待输入数据。In this embodiment, the location of the missing value is first located. Since the financial time series data is a time series sequence, the position of the missing value can be located by the time point where the missing value is located; and then the number of bits of each missing value is determined, for example, 1 bit. Or 2 digits, etc. The number of bits of the financial time series data of the input model is determined according to the number of bits of the missing value to be predicted, and several bits of data in front of the missing value are intercepted as the data to be input.
其中,缺失值的位数一般为1位或2位,待输入数据优选为5位、6位或者7位,少于5位和多于7位通常难以取得较好的效果,因为少于5位则捕获的时序信息较少,而多于7位则时序较长,信息偏差较大。优选地,如上述表1所示。Wherein, the number of bits of the missing value is generally 1 or 2 bits, and the data to be input is preferably 5 bits, 6 bits or 7 bits, and less than 5 bits and more than 7 bits are usually difficult to achieve better results because less than 5 The bit captures less timing information, while more than 7 bits have longer timing and greater information skew. Preferably, it is as shown in Table 1 above.
在表1中,若缺失值的位数为1位,则确定截取数据的位数为5位、6位或者7位,截取在该缺失值的位置前方的5位、6位或者7位金融时序数据,以所截取的数据作为待输入数据;若缺失值的位数为2位,则确定截取 数据的位数为6位或者7位,截取在该缺失值的位置前方的6位或者7位金融时序数据,以所截取的数据作为待输入数据。In Table 1, if the number of bits of the missing value is 1 bit, it is determined that the number of bits of the intercepted data is 5, 6, or 7 bits, and the 5, 6, or 7 financial positions in front of the position of the missing value are intercepted. Time series data, with the intercepted data as the data to be input; if the number of bits of the missing value is 2 bits, it is determined that the number of bits of the intercepted data is 6 or 7 bits, and 6 bits or 7 in front of the position of the missing value are intercepted. Bit financial time series data, with the intercepted data as the data to be input.
步骤S4,将待输入数据输入至各预测模型中,获取各预测模型输出的预测值,获取各预测值的平均值作为该缺失值的填充值。In step S4, the data to be input is input to each prediction model, and the predicted value outputted by each prediction model is obtained, and the average value of each predicted value is obtained as the filling value of the missing value.
本实施例中,将待输入数据分别输入至各GRU模型和LSTM模型组成的混合模型的预测模型中,即分别输入至6个时间单位对应的混合模型、11个时间单位对应的混合模型、16个时间单位对应的混合模型中,获取三个混合模型对应输出的预测值V1、V2、V3,计算该缺失值的填充值V=(V1+V2+V3)/3,缺失值的位数为2位的也是计算输出的对应位置的预测值的平均值。该缺失值的填充值V能够捕捉到金融时序数据前后的依赖关系,且由三种混合模型的平均值给出,更加客观、准确。In this embodiment, the data to be input is respectively input into a prediction model of a mixed model composed of each GRU model and an LSTM model, that is, respectively input to a hybrid model corresponding to 6 time units, a hybrid model corresponding to 11 time units, and 16 In the hybrid model corresponding to the time units, the predicted values V1, V2, and V3 corresponding to the output of the three mixed models are obtained, and the padding value of the missing value is calculated as V=(V1+V2+V3)/3, and the number of bits of the missing value is The 2-bit is also the average of the predicted values of the corresponding positions of the calculated output. The padding value V of the missing value can capture the dependencies before and after the financial time series data, and is given by the average of the three mixed models, which is more objective and accurate.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有处理系统,所述处理系统被处理器执行时实现上述的金融时序数据的处理方法的步骤。The present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being executed by a processor to implement the steps of the processing method of the financial time series data described above.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims (20)

  1. 一种服务器,其特征在于,所述服务器包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的处理系统,所述处理系统被所述处理器执行时实现如下步骤:A server, comprising: a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing system being The following steps are implemented during execution:
    设置不同预定时间步长的滑动窗口,利用所设置的滑动窗口在不含有缺失值的金融时序数据滑动以获取多个窗口数据,对每一窗口数据进行采样得到各预定时间步长对应的样本数据;Setting a sliding window with different predetermined time steps, using the set sliding window to slide on the financial time series data without missing values to obtain multiple window data, and sampling each window data to obtain sample data corresponding to each predetermined time step ;
    利用各预定时间步长对应的样本数据分别对预定的循环神经网络模型进行训练,得到训练后的各预定时间步长对应的模型作为预测模型;The predetermined cyclic neural network model is respectively trained by using the sample data corresponding to each predetermined time step, and the model corresponding to each predetermined time step after the training is obtained as a prediction model;
    获取含有缺失值的金融时序数据,获取该金融时序数据中的缺失值的位置及缺失值的位数,根据该缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据;Obtaining financial time series data with missing values, obtaining the position of the missing value in the financial time series data and the number of missing values, and intercepting the financial timing ahead of the position of the missing value according to the position of the missing value and the number of bits of the missing value Data, with the intercepted data as the data to be input;
    将待输入数据输入至各预测模型中,获取各预测模型输出的预测值,获取各预测值的平均值作为该缺失值的填充值。The data to be input is input to each prediction model, and the predicted values output by the respective prediction models are obtained, and the average value of each predicted value is obtained as the filling value of the missing value.
  2. 根据权利要求1所述的服务器,其特征在于,所述利用各预定时间步长对应的样本数据分别对预定的循环神经网络模型进行训练,得到训练后的各预定时间步长对应的模型作为预测模型的步骤,具体包括:The server according to claim 1, wherein the sample data corresponding to each predetermined time step is respectively used to train a predetermined cyclic neural network model, and a model corresponding to each predetermined time step after training is obtained as a prediction. The steps of the model include:
    将每一种预定时间步长对应的样本数据划分为第一比例的训练集及第二比例的测试集,利用每一种预定时间步长对应的训练集分别对预定的循环神经网络模型进行训练,所述第一比例与第二比例的和小于等于1;The sample data corresponding to each predetermined time step is divided into a training set of the first ratio and a test set of the second ratio, and the predetermined cyclic neural network model is respectively trained by using the training set corresponding to each predetermined time step. The sum of the first ratio and the second ratio is less than or equal to 1;
    在每一种预定时间步长对应的训练集中抽取预定数量的样本数据作为验证集,利用该验证集对训练中的循环神经网络模型的参数进行测试,在测试误差大于等于预定的误差阈值时,结束训练以得到训练后的循环神经网络模型;Extracting a predetermined number of sample data as a verification set in each training set corresponding to the predetermined time step, and using the verification set to test parameters of the cyclic neural network model in training, when the test error is greater than or equal to a predetermined error threshold, End training to obtain a trained cyclic neural network model;
    利用测试集对训练后的循环神经网络模型的准确率进行测试;Using the test set to test the accuracy of the trained cyclic neural network model;
    若该准确率大于等于预定的准确率阈值,则将该训练后的循环神经网络模型作为预测模型;If the accuracy rate is greater than or equal to a predetermined accuracy threshold, the trained cyclic neural network model is used as a prediction model;
    若该准确率小于预定的准确率阈值,则修改该循环神经网络模型的隐含层结构,并重新进行训练,以得到准确率大于等于预定准确率阈值的预测模型。If the accuracy is less than the predetermined accuracy threshold, the implicit layer structure of the cyclic neural network model is modified, and training is performed again to obtain a prediction model whose accuracy is greater than or equal to a predetermined accuracy threshold.
  3. 根据权利要求1所述的服务器,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,具体包括:The server according to claim 1, wherein the intercepting the financial time series data in front of the position of the missing value according to the position of the missing value and the number of bits of the missing value, and using the intercepted data as the data to be input Specifically, including:
    根据该缺失值的位数确定截取数据的位数,并截取在该缺失值的位置前方与所确定的位数相同的位数的金融时序数据,以所截取的数据作为待输入数据。The number of bits of the intercepted data is determined according to the number of bits of the missing value, and the financial time series data having the same number of bits as the determined number of bits in front of the position of the missing value is intercepted, and the intercepted data is used as the data to be input.
  4. 根据权利要求2所述的服务器,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,具体包括:The server according to claim 2, wherein the intercepting the financial time series data in front of the position of the missing value according to the position of the missing value and the number of bits of the missing value, and using the intercepted data as the data to be input Specifically, including:
    根据该缺失值的位数确定截取数据的位数,并截取在该缺失值的位置前方与所确定的位数相同的位数的金融时序数据,以所截取的数据作为待输入数据。The number of bits of the intercepted data is determined according to the number of bits of the missing value, and the financial time series data having the same number of bits as the determined number of bits in front of the position of the missing value is intercepted, and the intercepted data is used as the data to be input.
  5. 根据权利要求3所述的服务器,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,进一步包括:The server according to claim 3, wherein said step of intercepting the financial time series data in front of the position of the missing value based on the position of the missing value and the number of bits of the missing value, and using the intercepted data as the data to be input , further including:
    若缺失值的位数为1位,则确定截取数据的位数为5位、6位或者7位,截取在该缺失值的位置前方的5位、6位或者7位金融时序数据,以所截取的数据作为待输入数据;If the number of bits of the missing value is 1 bit, it is determined that the number of bits of the intercepted data is 5 bits, 6 bits or 7 bits, and the 5th, 6th or 7th financial time series data in front of the position of the missing value is intercepted. Intercepted data as data to be input;
    若缺失值的位数为2位,则确定截取数据的位数为6位或者7位,截取在该缺失值的位置前方的6位或者7位金融时序数据,以所截取的数据作为 待输入数据。If the number of bits of the missing value is 2 bits, it is determined that the number of bits of the intercepted data is 6 or 7 bits, and the 6-bit or 7-bit financial time series data in front of the position of the missing value is intercepted, and the intercepted data is used as the input. data.
  6. 根据权利要求4所述的服务器,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,进一步包括:The server according to claim 4, wherein said step of intercepting the financial time series data in front of the position of the missing value based on the position of the missing value and the number of bits of the missing value, and using the intercepted data as the data to be input , further including:
    若缺失值的位数为1位,则确定截取数据的位数为5位、6位或者7位,截取在该缺失值的位置前方的5位、6位或者7位金融时序数据,以所截取的数据作为待输入数据;If the number of bits of the missing value is 1 bit, it is determined that the number of bits of the intercepted data is 5 bits, 6 bits or 7 bits, and the 5th, 6th or 7th financial time series data in front of the position of the missing value is intercepted. Intercepted data as data to be input;
    若缺失值的位数为2位,则确定截取数据的位数为6位或者7位,截取在该缺失值的位置前方的6位或者7位金融时序数据,以所截取的数据作为待输入数据。If the number of bits of the missing value is 2 bits, it is determined that the number of bits of the intercepted data is 6 or 7 bits, and the 6-bit or 7-bit financial time series data in front of the position of the missing value is intercepted, and the intercepted data is used as the input. data.
  7. 根据权利要求1或2所述的服务器,其特征在于,所述预定时间步长为6个时间单位、11个时间单位及16个时间单位,所述预定的循环神经网络模型为长短期记忆网络模型与门控循环单元模型组成的混合模型。The server according to claim 1 or 2, wherein said predetermined time step is 6 time units, 11 time units, and 16 time units, and said predetermined cyclic neural network model is a long-term and short-term memory network. A hybrid model consisting of a model and a gated loop unit model.
  8. 一种金融时序数据的处理方法,其特征在于,所述金融时序数据的处理方法包括:A method for processing financial time series data, characterized in that the processing method of the financial time series data comprises:
    S1,设置不同预定时间步长的滑动窗口,利用所设置的滑动窗口在不含有缺失值的金融时序数据滑动以获取多个窗口数据,对每一窗口数据进行采样得到各预定时间步长对应的样本数据;S1, setting a sliding window with different predetermined time steps, using the set sliding window to slide on the financial time series data without missing values to obtain a plurality of window data, and sampling each window data to obtain corresponding time steps corresponding to each step sample;
    S2,利用各预定时间步长对应的样本数据分别对预定的循环神经网络模型进行训练,得到训练后的各预定时间步长对应的模型作为预测模型;S2: training the predetermined cyclic neural network model by using sample data corresponding to each predetermined time step, and obtaining a model corresponding to each predetermined time step after the training as a prediction model;
    S3,获取含有缺失值的金融时序数据,获取该金融时序数据中的缺失值的位置及缺失值的位数,根据该缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据;S3, obtaining financial time series data with missing values, obtaining the position of the missing value in the financial time series data and the number of bits of the missing value, and cutting the position of the missing value in front of the position of the missing value according to the position of the missing value and the number of missing values Financial time series data, with the intercepted data as the data to be input;
    S4,将待输入数据输入至各预测模型中,获取各预测模型输出的预测值,获取各预测值的平均值作为该缺失值的填充值。S4: Input the data to be input into each prediction model, obtain a predicted value output by each prediction model, and obtain an average value of each predicted value as a filling value of the missing value.
  9. 根据权利要求8所述的金融时序数据的处理方法,其特征在于,所述步骤S2,具体包括:The method of processing the financial time series data according to claim 8, wherein the step S2 comprises:
    将每一种预定时间步长对应的样本数据划分为第一比例的训练集及第二比例的测试集,利用每一种预定时间步长对应的训练集分别对预定的循环神经网络模型进行训练,所述第一比例与第二比例的和小于等于1;The sample data corresponding to each predetermined time step is divided into a training set of the first ratio and a test set of the second ratio, and the predetermined cyclic neural network model is respectively trained by using the training set corresponding to each predetermined time step. The sum of the first ratio and the second ratio is less than or equal to 1;
    在每一种预定时间步长对应的训练集中抽取预定数量的样本数据作为验证集,利用该验证集对训练中的循环神经网络模型的参数进行测试,在测试误差大于等于预定的误差阈值时,结束训练以得到训练后的循环神经网络模型;Extracting a predetermined number of sample data as a verification set in each training set corresponding to the predetermined time step, and using the verification set to test parameters of the cyclic neural network model in training, when the test error is greater than or equal to a predetermined error threshold, End training to obtain a trained cyclic neural network model;
    利用测试集对训练后的循环神经网络模型的准确率进行测试;Using the test set to test the accuracy of the trained cyclic neural network model;
    若该准确率大于等于预定的准确率阈值,则将该训练后的循环神经网络模型作为预测模型;If the accuracy rate is greater than or equal to a predetermined accuracy threshold, the trained cyclic neural network model is used as a prediction model;
    若该准确率小于预定的准确率阈值,则修改该循环神经网络模型的隐含层结构,并重新进行训练,以得到准确率大于等于预定准确率阈值的预测模型。If the accuracy is less than the predetermined accuracy threshold, the implicit layer structure of the cyclic neural network model is modified, and training is performed again to obtain a prediction model whose accuracy is greater than or equal to a predetermined accuracy threshold.
  10. 根据权利要求8所述的金融时序数据的处理方法,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,具体包括:The method for processing financial time series data according to claim 8, wherein the intercepting the financial time series data in front of the position of the missing value according to the position of the missing value and the number of bits of the missing value, using the intercepted data as The steps of inputting data include:
    根据该缺失值的位数确定截取数据的位数,并截取在该缺失值的位置前方与所确定的位数相同的位数的金融时序数据,以所截取的数据作为待输入数据。The number of bits of the intercepted data is determined according to the number of bits of the missing value, and the financial time series data having the same number of bits as the determined number of bits in front of the position of the missing value is intercepted, and the intercepted data is used as the data to be input.
  11. 根据权利要求9所述的金融时序数据的处理方法,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,具体包括:The method for processing financial time series data according to claim 9, wherein the intercepting the financial time series data in front of the position of the missing value according to the position of the missing value and the number of bits of the missing value, using the intercepted data as The steps of inputting data include:
    根据该缺失值的位数确定截取数据的位数,并截取在该缺失值的位置前 方与所确定的位数相同的位数的金融时序数据,以所截取的数据作为待输入数据。The number of bits of the intercepted data is determined according to the number of bits of the missing value, and the financial time series data having the same number of bits as the determined number of bits in front of the position of the missing value is intercepted, and the intercepted data is used as the data to be input.
  12. 根据权利要求10所述的金融时序数据的处理方法,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,进一步包括:The method for processing financial time series data according to claim 10, wherein the intercepting the financial time series data in front of the position of the missing value according to the position of the missing value and the number of bits of the missing value, using the intercepted data as The step of inputting data further includes:
    若缺失值的位数为1位,则确定截取数据的位数为5位、6位或者7位,截取在该缺失值的位置前方的5位、6位或者7位金融时序数据,以所截取的数据作为待输入数据;If the number of bits of the missing value is 1 bit, it is determined that the number of bits of the intercepted data is 5 bits, 6 bits or 7 bits, and the 5th, 6th or 7th financial time series data in front of the position of the missing value is intercepted. Intercepted data as data to be input;
    若缺失值的位数为2位,则确定截取数据的位数为6位或者7位,截取在该缺失值的位置前方的6位或者7位金融时序数据,以所截取的数据作为待输入数据。If the number of bits of the missing value is 2 bits, it is determined that the number of bits of the intercepted data is 6 or 7 bits, and the 6-bit or 7-bit financial time series data in front of the position of the missing value is intercepted, and the intercepted data is used as the input. data.
  13. 根据权利要求11所述的金融时序数据的处理方法,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,进一步包括:The method for processing financial time series data according to claim 11, wherein the intercepting the financial time series data in front of the position of the missing value according to the position of the missing value and the number of bits of the missing value, using the intercepted data as The step of inputting data further includes:
    若缺失值的位数为1位,则确定截取数据的位数为5位、6位或者7位,截取在该缺失值的位置前方的5位、6位或者7位金融时序数据,以所截取的数据作为待输入数据;If the number of bits of the missing value is 1 bit, it is determined that the number of bits of the intercepted data is 5 bits, 6 bits or 7 bits, and the 5th, 6th or 7th financial time series data in front of the position of the missing value is intercepted. Intercepted data as data to be input;
    若缺失值的位数为2位,则确定截取数据的位数为6位或者7位,截取在该缺失值的位置前方的6位或者7位金融时序数据,以所截取的数据作为待输入数据。If the number of bits of the missing value is 2 bits, it is determined that the number of bits of the intercepted data is 6 or 7 bits, and the 6-bit or 7-bit financial time series data in front of the position of the missing value is intercepted, and the intercepted data is used as the input. data.
  14. 根据权利要求8或9所述的金融时序数据的处理方法,其特征在于,所述预定时间步长为6个时间单位、11个时间单位及16个时间单位,所述预定的循环神经网络模型为长短期记忆网络模型与门控循环单元模型组成的混合模型。The method for processing financial time series data according to claim 8 or 9, wherein said predetermined time step is 6 time units, 11 time units, and 16 time units, said predetermined cyclic neural network model A hybrid model consisting of a long-term and short-term memory network model and a gated loop unit model.
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上 存储有处理系统,所述处理系统被处理器执行时实现步骤:A computer readable storage medium, wherein the computer readable storage medium stores a processing system, and when the processing system is executed by the processor, the steps are:
    设置不同预定时间步长的滑动窗口,利用所设置的滑动窗口在不含有缺失值的金融时序数据滑动以获取多个窗口数据,对每一窗口数据进行采样得到各预定时间步长对应的样本数据;Setting a sliding window with different predetermined time steps, using the set sliding window to slide on the financial time series data without missing values to obtain multiple window data, and sampling each window data to obtain sample data corresponding to each predetermined time step ;
    利用各预定时间步长对应的样本数据分别对预定的循环神经网络模型进行训练,得到训练后的各预定时间步长对应的模型作为预测模型;The predetermined cyclic neural network model is respectively trained by using the sample data corresponding to each predetermined time step, and the model corresponding to each predetermined time step after the training is obtained as a prediction model;
    获取含有缺失值的金融时序数据,获取该金融时序数据中的缺失值的位置及缺失值的位数,根据该缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据;Obtaining financial time series data with missing values, obtaining the position of the missing value in the financial time series data and the number of missing values, and intercepting the financial timing ahead of the position of the missing value according to the position of the missing value and the number of bits of the missing value Data, with the intercepted data as the data to be input;
    将待输入数据输入至各预测模型中,获取各预测模型输出的预测值,获取各预测值的平均值作为该缺失值的填充值。The data to be input is input to each prediction model, and the predicted values output by the respective prediction models are obtained, and the average value of each predicted value is obtained as the filling value of the missing value.
  16. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述利用各预定时间步长对应的样本数据分别对预定的循环神经网络模型进行训练,得到训练后的各预定时间步长对应的模型作为预测模型的步骤,具体包括:The computer readable storage medium according to claim 15, wherein the sample data corresponding to each predetermined time step is respectively used to train a predetermined cyclic neural network model, and each predetermined time step corresponding to the training is obtained. The steps of the model as a predictive model include:
    将每一种预定时间步长对应的样本数据划分为第一比例的训练集及第二比例的测试集,利用每一种预定时间步长对应的训练集分别对预定的循环神经网络模型进行训练,所述第一比例与第二比例的和小于等于1;The sample data corresponding to each predetermined time step is divided into a training set of the first ratio and a test set of the second ratio, and the predetermined cyclic neural network model is respectively trained by using the training set corresponding to each predetermined time step. The sum of the first ratio and the second ratio is less than or equal to 1;
    在每一种预定时间步长对应的训练集中抽取预定数量的样本数据作为验证集,利用该验证集对训练中的循环神经网络模型的参数进行测试,在测试误差大于等于预定的误差阈值时,结束训练以得到训练后的循环神经网络模型;Extracting a predetermined number of sample data as a verification set in each training set corresponding to the predetermined time step, and using the verification set to test parameters of the cyclic neural network model in training, when the test error is greater than or equal to a predetermined error threshold, End training to obtain a trained cyclic neural network model;
    利用测试集对训练后的循环神经网络模型的准确率进行测试;Using the test set to test the accuracy of the trained cyclic neural network model;
    若该准确率大于等于预定的准确率阈值,则将该训练后的循环神经网络模型作为预测模型;If the accuracy rate is greater than or equal to a predetermined accuracy threshold, the trained cyclic neural network model is used as a prediction model;
    若该准确率小于预定的准确率阈值,则修改该循环神经网络模型的隐含层结构,并重新进行训练,以得到准确率大于等于预定准确率阈值的预测模型。If the accuracy is less than the predetermined accuracy threshold, the implicit layer structure of the cyclic neural network model is modified, and training is performed again to obtain a prediction model whose accuracy is greater than or equal to a predetermined accuracy threshold.
  17. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,具体包括:The computer readable storage medium according to claim 15, wherein the intercepting the financial time series data in front of the position of the missing value according to the position of the missing value and the number of bits of the missing value, and taking the intercepted data as a waiting The steps for entering data, including:
    根据该缺失值的位数确定截取数据的位数,并截取在该缺失值的位置前方与所确定的位数相同的位数的金融时序数据,以所截取的数据作为待输入数据。The number of bits of the intercepted data is determined according to the number of bits of the missing value, and the financial time series data having the same number of bits as the determined number of bits in front of the position of the missing value is intercepted, and the intercepted data is used as the data to be input.
  18. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,具体包括:The computer readable storage medium according to claim 16, wherein the intercepting the financial time series data in front of the position of the missing value according to the position of the missing value and the number of bits of the missing value, and taking the intercepted data as a waiting The steps for entering data, including:
    根据该缺失值的位数确定截取数据的位数,并截取在该缺失值的位置前方与所确定的位数相同的位数的金融时序数据,以所截取的数据作为待输入数据。The number of bits of the intercepted data is determined according to the number of bits of the missing value, and the financial time series data having the same number of bits as the determined number of bits in front of the position of the missing value is intercepted, and the intercepted data is used as the data to be input.
  19. 根据权利要求17所述的计算机可读存储介质,其特征在于,所述根据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,进一步包括:The computer readable storage medium according to claim 17, wherein the intercepting the financial time series data in front of the position of the missing value according to the position of the missing value and the number of bits of the missing value, and taking the intercepted data as a waiting The steps of entering data further include:
    若缺失值的位数为1位,则确定截取数据的位数为5位、6位或者7位,截取在该缺失值的位置前方的5位、6位或者7位金融时序数据,以所截取的数据作为待输入数据;If the number of bits of the missing value is 1 bit, it is determined that the number of bits of the intercepted data is 5 bits, 6 bits or 7 bits, and the 5th, 6th or 7th financial time series data in front of the position of the missing value is intercepted. Intercepted data as data to be input;
    若缺失值的位数为2位,则确定截取数据的位数为6位或者7位,截取在该缺失值的位置前方的6位或者7位金融时序数据,以所截取的数据作为待输入数据。If the number of bits of the missing value is 2 bits, it is determined that the number of bits of the intercepted data is 6 or 7 bits, and the 6-bit or 7-bit financial time series data in front of the position of the missing value is intercepted, and the intercepted data is used as the input. data.
  20. 根据权利要求18所述的计算机可读存储介质,其特征在于,所述根 据缺失值的位置及缺失值的位数截取在该缺失值的位置前方的金融时序数据,以所截取的数据作为待输入数据的步骤,进一步包括:The computer readable storage medium according to claim 18, wherein the intercepting the financial time series data in front of the position of the missing value according to the position of the missing value and the number of bits of the missing value, and taking the intercepted data as a waiting The steps of entering data further include:
    若缺失值的位数为1位,则确定截取数据的位数为5位、6位或者7位,截取在该缺失值的位置前方的5位、6位或者7位金融时序数据,以所截取的数据作为待输入数据;If the number of bits of the missing value is 1 bit, it is determined that the number of bits of the intercepted data is 5 bits, 6 bits or 7 bits, and the 5th, 6th or 7th financial time series data in front of the position of the missing value is intercepted. Intercepted data as data to be input;
    若缺失值的位数为2位,则确定截取数据的位数为6位或者7位,截取在该缺失值的位置前方的6位或者7位金融时序数据,以所截取的数据作为待输入数据。If the number of bits of the missing value is 2 bits, it is determined that the number of bits of the intercepted data is 6 or 7 bits, and the 6-bit or 7-bit financial time series data in front of the position of the missing value is intercepted, and the intercepted data is used as the input. data.
PCT/CN2018/107678 2018-05-10 2018-09-26 Server, financial time sequence data processing method and storage medium WO2019214143A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2019556878A JP6812573B2 (en) 2018-05-10 2018-09-26 Servers, financial time series data processing methods and storage media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810441414.6A CN108615096A (en) 2018-05-10 2018-05-10 Server, the processing method of Financial Time Series and storage medium
CN201810441414.6 2018-05-10

Publications (1)

Publication Number Publication Date
WO2019214143A1 true WO2019214143A1 (en) 2019-11-14

Family

ID=63662626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/107678 WO2019214143A1 (en) 2018-05-10 2018-09-26 Server, financial time sequence data processing method and storage medium

Country Status (3)

Country Link
JP (1) JP6812573B2 (en)
CN (1) CN108615096A (en)
WO (1) WO2019214143A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110911011A (en) * 2019-11-27 2020-03-24 医惠科技有限公司 Sepsis early warning device, equipment and storage medium
CN111260156A (en) * 2020-02-18 2020-06-09 中国农业银行股份有限公司 Construction method of cash flow prediction model and cash flow prediction method and device
WO2023185089A1 (en) * 2022-03-29 2023-10-05 深圳先进技术研究院 Prediction method and prediction apparatus for price of financial derivative, and storage medium and device
CN117319312A (en) * 2023-11-29 2023-12-29 凯美瑞德(苏州)信息科技股份有限公司 Data flow control method and device

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635923A (en) * 2018-11-20 2019-04-16 北京字节跳动网络技术有限公司 Method and apparatus for handling data
CN109711665A (en) * 2018-11-20 2019-05-03 深圳壹账通智能科技有限公司 A kind of prediction model construction method and relevant device based on financial air control data
CN109886387B (en) * 2019-01-07 2021-02-26 北京大学 Traffic time sequence prediction method based on gating network and gradient lifting regression
CN111798018A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Behavior prediction method, behavior prediction device, storage medium and electronic equipment
CN110163748B (en) * 2019-05-28 2021-08-17 京东数字科技控股有限公司 Method and equipment for backfilling missing data of fluidity deadline management
CN110309136B (en) * 2019-07-10 2020-08-04 山东大学 Method and system for filling missing data of database abnormal event
CN110688365A (en) * 2019-09-18 2020-01-14 华泰证券股份有限公司 Method and device for synthesizing financial time series and storage medium
CN110851505B (en) * 2019-11-20 2023-12-22 鹏城实验室 Data processing framework, method and system
CN111694830A (en) * 2020-06-12 2020-09-22 复旦大学 Missing data completion method based on deep ensemble learning
CN113486433A (en) * 2020-12-31 2021-10-08 上海东方低碳科技产业股份有限公司 Method for calculating energy consumption shortage number of net zero energy consumption building and filling system
CN113780666B (en) * 2021-09-15 2024-03-22 湖北天天数链技术有限公司 Missing value prediction method and device and readable storage medium
CN113763186B (en) * 2021-10-22 2024-03-15 平安科技(深圳)有限公司 User transfer prediction method, device and equipment based on cyclic neural network
CN116823338B (en) * 2023-08-28 2023-11-17 国网山东省电力公司临沂供电公司 Method for deducing economic attribute missing value of power consumer

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991506A (en) * 2017-05-16 2017-07-28 深圳先进技术研究院 Intelligent terminal and its stock trend forecasting method based on LSTM
CN107563122A (en) * 2017-09-20 2018-01-09 长沙学院 The method of crime prediction of Recognition with Recurrent Neural Network is locally connected based on interleaving time sequence
CN107832897A (en) * 2017-11-30 2018-03-23 浙江工业大学 A kind of Stock Price Forecasting method based on deep learning

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19530646C1 (en) * 1995-08-21 1996-10-17 Siemens Ag Learning method for recurrent neural network
JP4206369B2 (en) * 2004-07-15 2009-01-07 日本放送協会 Time-series data complementing device, method and program thereof
JP5861619B2 (en) * 2012-11-22 2016-02-16 富士通株式会社 Data interpolation apparatus, data interpolation program, and data interpolation method
EP3511871A4 (en) * 2016-09-06 2020-06-24 Nippon Telegraph And Telephone Corporation Time-series-data feature-amount extraction device, time-series-data feature-amount extraction method and time-series-data feature-amount extraction program
CN106886846A (en) * 2017-04-26 2017-06-23 中南大学 A kind of bank outlets' excess reserve Forecasting Methodology that Recognition with Recurrent Neural Network is remembered based on shot and long term
CN107273429B (en) * 2017-05-19 2018-04-13 哈工大大数据产业有限公司 A kind of Missing Data Filling method and system based on deep learning
CN107316108A (en) * 2017-06-19 2017-11-03 华南理工大学 A kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology
CN107730087A (en) * 2017-09-20 2018-02-23 平安科技(深圳)有限公司 Forecast model training method, data monitoring method, device, equipment and medium
CN107577649A (en) * 2017-09-26 2018-01-12 广州供电局有限公司 The interpolation processing method and device of missing data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991506A (en) * 2017-05-16 2017-07-28 深圳先进技术研究院 Intelligent terminal and its stock trend forecasting method based on LSTM
CN107563122A (en) * 2017-09-20 2018-01-09 长沙学院 The method of crime prediction of Recognition with Recurrent Neural Network is locally connected based on interleaving time sequence
CN107832897A (en) * 2017-11-30 2018-03-23 浙江工业大学 A kind of Stock Price Forecasting method based on deep learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110911011A (en) * 2019-11-27 2020-03-24 医惠科技有限公司 Sepsis early warning device, equipment and storage medium
CN111260156A (en) * 2020-02-18 2020-06-09 中国农业银行股份有限公司 Construction method of cash flow prediction model and cash flow prediction method and device
CN111260156B (en) * 2020-02-18 2023-07-28 中国农业银行股份有限公司 Cash flow prediction model construction method and cash flow prediction method and device
WO2023185089A1 (en) * 2022-03-29 2023-10-05 深圳先进技术研究院 Prediction method and prediction apparatus for price of financial derivative, and storage medium and device
CN117319312A (en) * 2023-11-29 2023-12-29 凯美瑞德(苏州)信息科技股份有限公司 Data flow control method and device
CN117319312B (en) * 2023-11-29 2024-03-08 凯美瑞德(苏州)信息科技股份有限公司 Data flow control method and device

Also Published As

Publication number Publication date
JP6812573B2 (en) 2021-01-13
JP2020522774A (en) 2020-07-30
CN108615096A (en) 2018-10-02

Similar Documents

Publication Publication Date Title
WO2019214143A1 (en) Server, financial time sequence data processing method and storage medium
US11715029B2 (en) Updating attribute data structures to indicate trends in attribute data provided to automated modeling systems
WO2019019375A1 (en) Method and apparatus for creating underwriting decision tree, and computer device and storage medium
CN108833458B (en) Application recommendation method, device, medium and equipment
WO2021139317A1 (en) Data feature enhancement method and apparatus for corpus data, computer device, and storage medium
US20200042434A1 (en) Analysis of verification parameters for training reduction
CN113723618B (en) SHAP optimization method, equipment and medium
CN110647447A (en) Abnormal instance detection method, apparatus, device and medium for distributed system
WO2023202355A1 (en) Soil body state data calculation method and device based on boundary surface plasticity model
CN116821646A (en) Data processing chain construction method, data reduction method, device, equipment and medium
WO2018120726A1 (en) Data mining based modeling method, system, electronic device and storage medium
CN114742237A (en) Federal learning model aggregation method and device, electronic equipment and readable storage medium
CN109375146B (en) Supplementary collection method and system for electricity consumption data and terminal equipment
WO2019061667A1 (en) Electronic apparatus, data processing method and system, and computer-readable storage medium
CN116452007B (en) Enterprise tax compliance risk assessment method based on capsule network
WO2023246391A1 (en) Extraction of risk feature description
CN105719143A (en) Data verification method and device
CN112035159B (en) Configuration method, device, equipment and storage medium of audit model
US11138537B2 (en) Data volume-based server hardware sizing using edge case analysis
WO2021068253A1 (en) Customized data stream hardware simulation method and apparatus, device, and storage medium
US20160132583A1 (en) Representative sampling of relational data
CN111091420A (en) Method and device for predicting power price
CN111858108B (en) Hard disk fault prediction method and device, electronic equipment and storage medium
US8943177B1 (en) Modifying a computer program configuration based on variable-bin histograms
CN115842875B (en) Method, device, computer equipment and medium for determining similar data frames

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019556878

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18917851

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18917851

Country of ref document: EP

Kind code of ref document: A1