CN107844557A - A kind of Forecasting Methodology based on high dimensional data structural relation - Google Patents

A kind of Forecasting Methodology based on high dimensional data structural relation Download PDF

Info

Publication number
CN107844557A
CN107844557A CN201711049910.9A CN201711049910A CN107844557A CN 107844557 A CN107844557 A CN 107844557A CN 201711049910 A CN201711049910 A CN 201711049910A CN 107844557 A CN107844557 A CN 107844557A
Authority
CN
China
Prior art keywords
tensor
stock
news
relation
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711049910.9A
Other languages
Chinese (zh)
Inventor
李岳楠
张桐喆
苏育挺
井佩光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201711049910.9A priority Critical patent/CN107844557A/en
Publication of CN107844557A publication Critical patent/CN107844557A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Finance (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of Forecasting Methodology based on high dimensional data structural relation, comprise the following steps:The model and money order receipt to be signed and returned to the sender of Sina's stock and east wealth stock are obtained by crawler technology, obtains the particular news of Baidu's Advanced Search again.Nature semantic processes are carried out to model and news and obtain feature, technical indicator feature is calculated in conjunction with by formula.Tensor construction is carried out to three features, and tensor is reconstructed by Higher-order Singular value decomposition, reaches noise reduction and strengthens the purpose of each factor relation.Every day obtain above step tensor imformosome, by with lift pricing information corresponding relation, carry out it is proposed that restructing algorithm reconstruct data, obtain new tensor sequence.Here limited by reconstructing, make lifting degree similar or lifting direction imformosome always is similar.Tensor ridge regression is optimized to new tensor sequence.Regression forecasting, as auxiliary transaction system.

Description

A kind of Forecasting Methodology based on high dimensional data structural relation
Technical field
The present invention relates to high dimensional data construction applications, more particularly to a kind of prediction side based on high dimensional data structural relation Method.
Background technology
Stock Market Forecasting is always especially active research field for a long time.EMH (EMH) shows, stock city Price mainly by the influence of the factors such as news, stock market mood, company performance (such as payback of assets, leverage), because Cause existing share price to be included all the time for stock market's efficiency and reflect all relevant informations.There are two interesting examples, one is joint Airline because " relocation " event, have lost 300,000,000 dollars of market value, another is due to the expected to fall of muddy water mechanism recently Report, the market value of brightness mountain dairy industry have glided 90%.In the two events, the power of event strength in itself and online discussion Amount all draws final result, and these illustrate the market value of information.In recent decades, social network data is easier to access, It is even more important than ever.Many researchs attempt to use social affection and social network data in various applications, for example, Asur and Huberman provides a demonstration, and as expressed on Twitter, how the public sentiment related to film can predict box Receipt.Then Bollen etc. shows change of the mood time series predictive of Dow-Jones Industrial Average Index (DJIA) closing quotation value Change.
Traditional method is simple, and correctly predicted ability will reduce.If using sparse social information, just have too More noises.Lack effective forecast model of complex data structures, such as tensor.
The content of the invention
The invention provides a kind of Forecasting Methodology based on high dimensional data structural relation, the present invention collects database, processing It is openness in high dimensional data structure, the prediction of high dimensional data structural relation is carried out, it is described below:
A kind of Forecasting Methodology based on high dimensional data structural relation, the Forecasting Methodology comprise the following steps:
1) model and money order receipt to be signed and returned to the sender of Sina's stock and east wealth stock are obtained by crawler technology, obtains that Baidu is advanced to be searched The particular news of rope;
2) nature semantic processes are carried out to model and news and obtains feature, technical indicator is calculated in conjunction with by formula Feature;
3) tensor construction is carried out to three features, and tensor is reconstructed by Higher-order Singular value decomposition, reach noise reduction With the purpose for strengthening each factor relation;
4) reconstructed and limited by algorithm, made lifting degree similar or lifting direction imformosome always is similar;
5) tensor ridge regression is optimized to new tensor sequence;
6) regression forecasting, as auxiliary transaction system.
The beneficial effect of technical scheme provided by the invention is:
1st, propose that a kind of new method for reconstructing is strengthened to handle the openness and relation in high dimensional data structure, use tensor The optimized regression method of data.
2nd, data are collected and verify that there is preferable effect.
Brief description of the drawings
Fig. 1 is the flow chart of the Forecasting Methodology based on high dimensional data structural relation;
Fig. 2 is whether to include the effect contrast figure of emotional information;
Fig. 3 is this method and other method comparative result figure.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, embodiment of the present invention is made below further It is described in detail on ground.
The embodiment of the present invention have collected different information sources, and inquire into their interactive effect to predict stock valency Lattice, as news table reaches and author's mood news, investor sentiment and feature.The embodiment of the present invention uses natural language process (NLP) information and emotion that article was read and understood to computer how are taught.The job specification of the embodiment of the present invention is machine Learn the combination of (ML) and financial application.In fact, machine learning is not provided with feasibility or improves the simple approach performed, It is not a magic black box, but providing one by data for transaction optimization powerful has the framework of principle.This Inventive embodiments attempt to do, and find out most suitable data structure and effective forecast model, with applied to Prediction of Stock Index problem, Rather than be most difficult to.In a word, this paper contribution is as follows:It is proposed that a kind of new method for reconstructing is dilute in high dimensional data structure to handle Dredge property and relation is strengthened.Using the optimized regression method for meeting effective tensor data.Collect data and check above-mentioned idea and ask Topic, has preferable effect.
Main process is:Database is collected, handles openness in high dimensional data structure, progress Forecasting of Stock Prices, referring to figure 1, it is described below:
1) model and money order receipt to be signed and returned to the sender of Sina's stock and east wealth stock are obtained by crawler technology, it is advanced to obtain Baidu again The particular news of search.
2) nature semantic processes are carried out to model and news and obtains feature, technical indicator is calculated in conjunction with by formula Feature.
3) tensor construction is carried out to three features, and tensor is reconstructed by Higher-order Singular value decomposition, reach noise reduction With the purpose for strengthening each factor relation.
4) every day obtains the tensor imformosome of above step, by the corresponding relation with lifting information, carries out OAA weights Structure, obtain new tensor sequence.Here limited by reconstructing, make lifting degree similar or lift the information body phase of direction always Seemingly.
5) tensor ridge regression is optimized to new tensor sequence.
6) regression forecasting, as auxiliary transaction system.
Therefore, following object function is played to greatest extent:
Wherein:α is the parameter for adjusting the weight in class between class between collision matrix, and β is adjustment identification information drawn game The parameter of weight between the correlation information of domain, V are transformation matrix, WuThe matrix between class, DuFor matrix in class, ρ is characterized vector.
Wherein,It is new information tensor, CiFor core tensor, V1For first transformation matrix, U1For primitive factor matrix 1, V2For second transformation matrix, U2For primitive factor matrix 2, V3For the 3rd transformation matrix, U3For primitive factor matrix 3.Especially Ground, SWIt is to disperse matrix, S in classBIt is scatter matrix between class.Three classes are defined, and it is as follows to calculate collision matrix,
Wherein, c be classification number, yiFor label (i.e. fluctuation limit),It is total sample Mean Matrix, NiIt is the i-th class sample This number,It is the mean matrix of the i-th class, UiIt is i-th of sample in the i-th class.In order to calculate WuAnd Du, weighting matrix W is obtained, its It is as follows to capture geometry:
diIt is W columns sum, DuAnd WuIt is defined as foloows:
That is, the variance of matrix should maximize in each pattern.In order to optimize J (V), Lagrangian is constructed L, and seek partial derivatives of the L relative to V:
L (V)=trace (VT(β(SB-αSW)+(1-β)(Wu-Du))V)-λ(trace(VTDuV)-1)
The projection matrix a for maximizing object function is provided by following methods:
(β(SB-αSW)+(1-β)(Wu-Du)) V=λ DuV
Lower mask body introduces solution procedure, using the optimization method of high order tensor ridge regression:
Y=f (x;W, b)=< x, w >+b
Wherein x is the vector as input data, and w is parameter vector, and b is deviation, and y is the output scalar of the recurrence.Will It is as follows that the method expands to tensor space,
WhereinIt is the input data as tensor, W is the weight tensor of size identical with X, and scalar b is Biasing.But if input space elevation dimension, with regard to having two problems:Over-fitting and computation complexity are high.By weight tensorIt is constrained to the sum of the tensor of R orders one.That is,
WhereinEquation is substituted into, is obtained:
From equation as can be seen that for each pattern k, input feature vector X is along R direction projections.Such projection can be recognized For be supervision size reduce or feature selecting scheme.
Given a series of have label training setWhereinIt is information tensor, yiIt is in response to mark Label is measured, our purpose is to obtain parameter Θ={ U(1),U(2),…,U(M)Empiric risk minimized by following formula:
Wherein l () is a loss function, and ψ () is regularization, and this is introduced for the complexity of Controlling model simultaneously Avoid over-fitting.The empirical loss function used is Squared Error Loss l=(y-f)2/ 2 and regularization typeThis Need to carry out priori selection to the grade R of tensor weight.So equation is redefined:
The database used in this method obtains the model of Sina's stock and east wealth stock by crawler technology and returned Note, the particular news of Baidu's Advanced Search are obtained again.Nature semantic processes are carried out to model and news and obtain feature, in conjunction with Technical indicator feature is calculated by formula.
Evaluation criteria
This method takes two kinds of appraisal procedures:
Directional precision (DA):Precision is to represent the degree of closeness of observation and true value:
Mean square error (RMSE):Compared with assessment prediction fraction is easy to regression model with the coefficient correlation between true score.R- Value spans are [- 1,1], and 1 represents positive correlation, and -1 represents negative correlation:
Wherein N is the sum of prediction, and S is the prediction number that forecast price and actual share price have same direction of movement, PiI-th The forecast price of individual prediction, RiIt is the actual share price of i-th of prediction.
Contrast algorithm
This method and following four method are contrasted in experiment:
Support vector regression (SVR), k arest neighbors recurrence (KNR) and SGDRegression are three kinds of warps in machine learning Allusion quotation method, it is used to examine the serviceability of market sentiment.Feature is used as using characteristic vector (continuous different information vector) Input, then it is known that no market emotion informational function than the information characteristics with market sentiment with less prediction Ability.As a result as shown in table 1, RMSE or DA have more preferable affective characteristics.
Experimental result
Fig. 2 shows the information characteristics with market sentiment with more preferable predictive ability, RMSE or DA beating all Preferable experimental result is swept.
Fig. 3 constructs tensor sequence by identification information and local relation information both approaches, and proposes OAA calculations Method, then the high-order Tensor Ridge of the tensor sequence optimizing application different to these three are returned respectively, can show that OAA is calculated The advantage of method.
Bibliography:
[1]Anshul Mittal,Stock Prediction Using Twitter Sentiment Analysis
[2]Robert P.Schumaker,Evaluating sentiment in nancial news articles
[3]A quantitative stock prediction system based on financial news
[4]Technical Analysis,The Trader's Glossary of Technical Terms and TopicsRetrieved Mar.15,2005,2005,from,http://www.traders.com2005.
[5]Z.Jelveh,How a computer knows what many managers don't,The New York Times,2006.
[6]G.Gidofalvi,Using News Articles to Predict Stock Price Movements
[7]V.Lavrenko,M.Schmill,et al.,Language models for financial news recommendation,International Conference on Information and Knowledge Management
[8]M.Mittermayer,Forecasting intraday stock price trends with text mining techniques,Hawaii International Conference on System Sciences,Kailua- Kona,HI,2004.
[9]Stock price prediction using neural networks:A project report
[10]Michael Kearns,Machine learning for Market Microstructure and High-Frequency Tradings.
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention Sequence number is for illustration only, does not represent the quality of embodiment.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims (1)

1. a kind of Forecasting Methodology based on high dimensional data structural relation, it is characterised in that the Forecasting Methodology comprises the following steps:
1) model and money order receipt to be signed and returned to the sender of Sina's stock and east wealth stock are obtained by crawler technology, obtains Baidu's Advanced Search Particular news;
2) nature semantic processes are carried out to model and news and obtains feature, technical indicator spy is calculated in conjunction with by formula Sign;
3) tensor construction is carried out to three features, and tensor is reconstructed by Higher-order Singular value decomposition, reached noise reduction and add The purpose of strong each factor relation;
4) reconstructed and limited by algorithm, made lifting degree similar or lifting direction imformosome always is similar;
5) tensor ridge regression is optimized to new tensor sequence;
6) regression forecasting, as auxiliary transaction system.
CN201711049910.9A 2017-10-31 2017-10-31 A kind of Forecasting Methodology based on high dimensional data structural relation Pending CN107844557A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711049910.9A CN107844557A (en) 2017-10-31 2017-10-31 A kind of Forecasting Methodology based on high dimensional data structural relation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711049910.9A CN107844557A (en) 2017-10-31 2017-10-31 A kind of Forecasting Methodology based on high dimensional data structural relation

Publications (1)

Publication Number Publication Date
CN107844557A true CN107844557A (en) 2018-03-27

Family

ID=61681106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711049910.9A Pending CN107844557A (en) 2017-10-31 2017-10-31 A kind of Forecasting Methodology based on high dimensional data structural relation

Country Status (1)

Country Link
CN (1) CN107844557A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390408A (en) * 2018-04-16 2019-10-29 北京京东尚科信息技术有限公司 Trading object prediction technique and device
CN111428000A (en) * 2020-03-20 2020-07-17 华泰证券股份有限公司 Method, system and storage medium for quantizing unstructured text data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103117059A (en) * 2012-12-27 2013-05-22 北京理工大学 Voice signal characteristics extracting method based on tensor decomposition
US20140198108A1 (en) * 2013-01-16 2014-07-17 Disney Enterprises, Inc. Multi-linear dynamic hair or clothing model with efficient collision handling
CN106548016A (en) * 2016-10-24 2017-03-29 天津大学 Time series analysis method based on tensor relativity of time domain decomposition model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103117059A (en) * 2012-12-27 2013-05-22 北京理工大学 Voice signal characteristics extracting method based on tensor decomposition
US20140198108A1 (en) * 2013-01-16 2014-07-17 Disney Enterprises, Inc. Multi-linear dynamic hair or clothing model with efficient collision handling
CN106548016A (en) * 2016-10-24 2017-03-29 天津大学 Time series analysis method based on tensor relativity of time domain decomposition model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QING LI等: "A Tensor-Based Information Framework for Predicting the Stock Market", 《ACM TRABSACTIONS ON INFORMATION SYSTEMS(TOIS)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390408A (en) * 2018-04-16 2019-10-29 北京京东尚科信息技术有限公司 Trading object prediction technique and device
CN110390408B (en) * 2018-04-16 2024-03-05 北京京东尚科信息技术有限公司 Transaction object prediction method and device
CN111428000A (en) * 2020-03-20 2020-07-17 华泰证券股份有限公司 Method, system and storage medium for quantizing unstructured text data

Similar Documents

Publication Publication Date Title
Souma et al. Enhanced news sentiment analysis using deep learning methods
Taghian et al. Learning financial asset-specific trading rules via deep reinforcement learning
Nadkarni et al. Combining NeuroEvolution and Principal Component Analysis to trade in the financial markets
Wu et al. The analysis of credit risks in agricultural supply chain finance assessment model based on genetic algorithm and backpropagation neural network
Gao The use of machine learning combined with data mining technology in financial risk prevention
Feng et al. Analyzing the Internet financial market risk management using data mining and deep learning methods
Zhong et al. Effects of cost-benefit analysis under back propagation neural network on financial benefit evaluation of investment projects
Chen Stock movement prediction with financial news using contextualized embedding from bert
Zhao et al. Credit risk assessment of small and medium-sized enterprises in supply chain finance based on SVM and BP neural network
Etemadi et al. Earnings per share forecast using extracted rules from trained neural network by genetic algorithm
Lin Innovative risk early warning model under data mining approach in risk assessment of internet credit finance
Xu et al. Uncertainty in financing interest rates for startups
Shi et al. Method for improving the performance of technical analysis indicators by neural network models
Duan et al. Elliott wave theory and the Fibonacci sequence-gray model and their application in Chinese stock market
Horky et al. Don't miss out on NFTs?! A sentiment-based analysis of the early NFT market
Li et al. Online portfolio management via deep reinforcement learning with high-frequency data
CN107844557A (en) A kind of Forecasting Methodology based on high dimensional data structural relation
Wang et al. Joint loan risk prediction based on deep learning‐optimized stacking model
Wang et al. Collocating Recommendation Method for E‐Commerce Based on Fuzzy C‐Means Clustering Algorithm
Ding et al. Forecasting product sales using text mining: A case study in new energy vehicle
Tong et al. Adaptive trading system of assets for international cooperation in agricultural finance based on neural network
Jia Deep Learning Algorithm‐Based Financial Prediction Models
Chen et al. An Optimized BP Neural Network Model and Its Application in the Credit Evaluation of Venture Loans
Taghian et al. A reinforcement learning based encoder-decoder framework for learning stock trading rules
Lyridis et al. Freight-forward agreement time series modelling based on artificial neural network models/Modeliranje casovnih vrst terminskih pogodb na prevozne stroske z uporabo umetnih nevronskih mrez

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180327