CN107844557A - A kind of Forecasting Methodology based on high dimensional data structural relation - Google Patents
A kind of Forecasting Methodology based on high dimensional data structural relation Download PDFInfo
- Publication number
- CN107844557A CN107844557A CN201711049910.9A CN201711049910A CN107844557A CN 107844557 A CN107844557 A CN 107844557A CN 201711049910 A CN201711049910 A CN 201711049910A CN 107844557 A CN107844557 A CN 107844557A
- Authority
- CN
- China
- Prior art keywords
- tensor
- stock
- news
- relation
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2136—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Finance (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Accounting & Taxation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Technology Law (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of Forecasting Methodology based on high dimensional data structural relation, comprise the following steps:The model and money order receipt to be signed and returned to the sender of Sina's stock and east wealth stock are obtained by crawler technology, obtains the particular news of Baidu's Advanced Search again.Nature semantic processes are carried out to model and news and obtain feature, technical indicator feature is calculated in conjunction with by formula.Tensor construction is carried out to three features, and tensor is reconstructed by Higher-order Singular value decomposition, reaches noise reduction and strengthens the purpose of each factor relation.Every day obtain above step tensor imformosome, by with lift pricing information corresponding relation, carry out it is proposed that restructing algorithm reconstruct data, obtain new tensor sequence.Here limited by reconstructing, make lifting degree similar or lifting direction imformosome always is similar.Tensor ridge regression is optimized to new tensor sequence.Regression forecasting, as auxiliary transaction system.
Description
Technical field
The present invention relates to high dimensional data construction applications, more particularly to a kind of prediction side based on high dimensional data structural relation
Method.
Background technology
Stock Market Forecasting is always especially active research field for a long time.EMH (EMH) shows, stock city
Price mainly by the influence of the factors such as news, stock market mood, company performance (such as payback of assets, leverage), because
Cause existing share price to be included all the time for stock market's efficiency and reflect all relevant informations.There are two interesting examples, one is joint
Airline because " relocation " event, have lost 300,000,000 dollars of market value, another is due to the expected to fall of muddy water mechanism recently
Report, the market value of brightness mountain dairy industry have glided 90%.In the two events, the power of event strength in itself and online discussion
Amount all draws final result, and these illustrate the market value of information.In recent decades, social network data is easier to access,
It is even more important than ever.Many researchs attempt to use social affection and social network data in various applications, for example, Asur and
Huberman provides a demonstration, and as expressed on Twitter, how the public sentiment related to film can predict box
Receipt.Then Bollen etc. shows change of the mood time series predictive of Dow-Jones Industrial Average Index (DJIA) closing quotation value
Change.
Traditional method is simple, and correctly predicted ability will reduce.If using sparse social information, just have too
More noises.Lack effective forecast model of complex data structures, such as tensor.
The content of the invention
The invention provides a kind of Forecasting Methodology based on high dimensional data structural relation, the present invention collects database, processing
It is openness in high dimensional data structure, the prediction of high dimensional data structural relation is carried out, it is described below:
A kind of Forecasting Methodology based on high dimensional data structural relation, the Forecasting Methodology comprise the following steps:
1) model and money order receipt to be signed and returned to the sender of Sina's stock and east wealth stock are obtained by crawler technology, obtains that Baidu is advanced to be searched
The particular news of rope;
2) nature semantic processes are carried out to model and news and obtains feature, technical indicator is calculated in conjunction with by formula
Feature;
3) tensor construction is carried out to three features, and tensor is reconstructed by Higher-order Singular value decomposition, reach noise reduction
With the purpose for strengthening each factor relation;
4) reconstructed and limited by algorithm, made lifting degree similar or lifting direction imformosome always is similar;
5) tensor ridge regression is optimized to new tensor sequence;
6) regression forecasting, as auxiliary transaction system.
The beneficial effect of technical scheme provided by the invention is:
1st, propose that a kind of new method for reconstructing is strengthened to handle the openness and relation in high dimensional data structure, use tensor
The optimized regression method of data.
2nd, data are collected and verify that there is preferable effect.
Brief description of the drawings
Fig. 1 is the flow chart of the Forecasting Methodology based on high dimensional data structural relation;
Fig. 2 is whether to include the effect contrast figure of emotional information;
Fig. 3 is this method and other method comparative result figure.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, embodiment of the present invention is made below further
It is described in detail on ground.
The embodiment of the present invention have collected different information sources, and inquire into their interactive effect to predict stock valency
Lattice, as news table reaches and author's mood news, investor sentiment and feature.The embodiment of the present invention uses natural language process
(NLP) information and emotion that article was read and understood to computer how are taught.The job specification of the embodiment of the present invention is machine
Learn the combination of (ML) and financial application.In fact, machine learning is not provided with feasibility or improves the simple approach performed,
It is not a magic black box, but providing one by data for transaction optimization powerful has the framework of principle.This
Inventive embodiments attempt to do, and find out most suitable data structure and effective forecast model, with applied to Prediction of Stock Index problem,
Rather than be most difficult to.In a word, this paper contribution is as follows:It is proposed that a kind of new method for reconstructing is dilute in high dimensional data structure to handle
Dredge property and relation is strengthened.Using the optimized regression method for meeting effective tensor data.Collect data and check above-mentioned idea and ask
Topic, has preferable effect.
Main process is:Database is collected, handles openness in high dimensional data structure, progress Forecasting of Stock Prices, referring to figure
1, it is described below:
1) model and money order receipt to be signed and returned to the sender of Sina's stock and east wealth stock are obtained by crawler technology, it is advanced to obtain Baidu again
The particular news of search.
2) nature semantic processes are carried out to model and news and obtains feature, technical indicator is calculated in conjunction with by formula
Feature.
3) tensor construction is carried out to three features, and tensor is reconstructed by Higher-order Singular value decomposition, reach noise reduction
With the purpose for strengthening each factor relation.
4) every day obtains the tensor imformosome of above step, by the corresponding relation with lifting information, carries out OAA weights
Structure, obtain new tensor sequence.Here limited by reconstructing, make lifting degree similar or lift the information body phase of direction always
Seemingly.
5) tensor ridge regression is optimized to new tensor sequence.
6) regression forecasting, as auxiliary transaction system.
Therefore, following object function is played to greatest extent:
Wherein:α is the parameter for adjusting the weight in class between class between collision matrix, and β is adjustment identification information drawn game
The parameter of weight between the correlation information of domain, V are transformation matrix, WuThe matrix between class, DuFor matrix in class, ρ is characterized vector.
Wherein,It is new information tensor, CiFor core tensor, V1For first transformation matrix, U1For primitive factor matrix 1,
V2For second transformation matrix, U2For primitive factor matrix 2, V3For the 3rd transformation matrix, U3For primitive factor matrix 3.Especially
Ground, SWIt is to disperse matrix, S in classBIt is scatter matrix between class.Three classes are defined, and it is as follows to calculate collision matrix,
Wherein, c be classification number, yiFor label (i.e. fluctuation limit),It is total sample Mean Matrix, NiIt is the i-th class sample
This number,It is the mean matrix of the i-th class, UiIt is i-th of sample in the i-th class.In order to calculate WuAnd Du, weighting matrix W is obtained, its
It is as follows to capture geometry:
diIt is W columns sum, DuAnd WuIt is defined as foloows:
That is, the variance of matrix should maximize in each pattern.In order to optimize J (V), Lagrangian is constructed
L, and seek partial derivatives of the L relative to V:
L (V)=trace (VT(β(SB-αSW)+(1-β)(Wu-Du))V)-λ(trace(VTDuV)-1)
The projection matrix a for maximizing object function is provided by following methods:
(β(SB-αSW)+(1-β)(Wu-Du)) V=λ DuV
Lower mask body introduces solution procedure, using the optimization method of high order tensor ridge regression:
Y=f (x;W, b)=< x, w >+b
Wherein x is the vector as input data, and w is parameter vector, and b is deviation, and y is the output scalar of the recurrence.Will
It is as follows that the method expands to tensor space,
WhereinIt is the input data as tensor, W is the weight tensor of size identical with X, and scalar b is
Biasing.But if input space elevation dimension, with regard to having two problems:Over-fitting and computation complexity are high.By weight tensorIt is constrained to the sum of the tensor of R orders one.That is,
WhereinEquation is substituted into, is obtained:
From equation as can be seen that for each pattern k, input feature vector X is along R direction projections.Such projection can be recognized
For be supervision size reduce or feature selecting scheme.
Given a series of have label training setWhereinIt is information tensor, yiIt is in response to mark
Label is measured, our purpose is to obtain parameter Θ={ U(1),U(2),…,U(M)Empiric risk minimized by following formula:
Wherein l () is a loss function, and ψ () is regularization, and this is introduced for the complexity of Controlling model simultaneously
Avoid over-fitting.The empirical loss function used is Squared Error Loss l=(y-f)2/ 2 and regularization typeThis
Need to carry out priori selection to the grade R of tensor weight.So equation is redefined:
The database used in this method obtains the model of Sina's stock and east wealth stock by crawler technology and returned
Note, the particular news of Baidu's Advanced Search are obtained again.Nature semantic processes are carried out to model and news and obtain feature, in conjunction with
Technical indicator feature is calculated by formula.
Evaluation criteria
This method takes two kinds of appraisal procedures:
Directional precision (DA):Precision is to represent the degree of closeness of observation and true value:
Mean square error (RMSE):Compared with assessment prediction fraction is easy to regression model with the coefficient correlation between true score.R-
Value spans are [- 1,1], and 1 represents positive correlation, and -1 represents negative correlation:
Wherein N is the sum of prediction, and S is the prediction number that forecast price and actual share price have same direction of movement, PiI-th
The forecast price of individual prediction, RiIt is the actual share price of i-th of prediction.
Contrast algorithm
This method and following four method are contrasted in experiment:
Support vector regression (SVR), k arest neighbors recurrence (KNR) and SGDRegression are three kinds of warps in machine learning
Allusion quotation method, it is used to examine the serviceability of market sentiment.Feature is used as using characteristic vector (continuous different information vector)
Input, then it is known that no market emotion informational function than the information characteristics with market sentiment with less prediction
Ability.As a result as shown in table 1, RMSE or DA have more preferable affective characteristics.
Experimental result
Fig. 2 shows the information characteristics with market sentiment with more preferable predictive ability, RMSE or DA beating all
Preferable experimental result is swept.
Fig. 3 constructs tensor sequence by identification information and local relation information both approaches, and proposes OAA calculations
Method, then the high-order Tensor Ridge of the tensor sequence optimizing application different to these three are returned respectively, can show that OAA is calculated
The advantage of method.
Bibliography:
[1]Anshul Mittal,Stock Prediction Using Twitter Sentiment Analysis
[2]Robert P.Schumaker,Evaluating sentiment in nancial news articles
[3]A quantitative stock prediction system based on financial news
[4]Technical Analysis,The Trader's Glossary of Technical Terms and
TopicsRetrieved Mar.15,2005,2005,from,http://www.traders.com2005.
[5]Z.Jelveh,How a computer knows what many managers don't,The New
York Times,2006.
[6]G.Gidofalvi,Using News Articles to Predict Stock Price Movements
[7]V.Lavrenko,M.Schmill,et al.,Language models for financial news
recommendation,International Conference on Information and Knowledge
Management
[8]M.Mittermayer,Forecasting intraday stock price trends with text
mining techniques,Hawaii International Conference on System Sciences,Kailua-
Kona,HI,2004.
[9]Stock price prediction using neural networks:A project report
[10]Michael Kearns,Machine learning for Market Microstructure and
High-Frequency Tradings.
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention
Sequence number is for illustration only, does not represent the quality of embodiment.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.
Claims (1)
1. a kind of Forecasting Methodology based on high dimensional data structural relation, it is characterised in that the Forecasting Methodology comprises the following steps:
1) model and money order receipt to be signed and returned to the sender of Sina's stock and east wealth stock are obtained by crawler technology, obtains Baidu's Advanced Search
Particular news;
2) nature semantic processes are carried out to model and news and obtains feature, technical indicator spy is calculated in conjunction with by formula
Sign;
3) tensor construction is carried out to three features, and tensor is reconstructed by Higher-order Singular value decomposition, reached noise reduction and add
The purpose of strong each factor relation;
4) reconstructed and limited by algorithm, made lifting degree similar or lifting direction imformosome always is similar;
5) tensor ridge regression is optimized to new tensor sequence;
6) regression forecasting, as auxiliary transaction system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711049910.9A CN107844557A (en) | 2017-10-31 | 2017-10-31 | A kind of Forecasting Methodology based on high dimensional data structural relation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711049910.9A CN107844557A (en) | 2017-10-31 | 2017-10-31 | A kind of Forecasting Methodology based on high dimensional data structural relation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107844557A true CN107844557A (en) | 2018-03-27 |
Family
ID=61681106
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711049910.9A Pending CN107844557A (en) | 2017-10-31 | 2017-10-31 | A kind of Forecasting Methodology based on high dimensional data structural relation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107844557A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390408A (en) * | 2018-04-16 | 2019-10-29 | 北京京东尚科信息技术有限公司 | Trading object prediction technique and device |
CN111428000A (en) * | 2020-03-20 | 2020-07-17 | 华泰证券股份有限公司 | Method, system and storage medium for quantizing unstructured text data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103117059A (en) * | 2012-12-27 | 2013-05-22 | 北京理工大学 | Voice signal characteristics extracting method based on tensor decomposition |
US20140198108A1 (en) * | 2013-01-16 | 2014-07-17 | Disney Enterprises, Inc. | Multi-linear dynamic hair or clothing model with efficient collision handling |
CN106548016A (en) * | 2016-10-24 | 2017-03-29 | 天津大学 | Time series analysis method based on tensor relativity of time domain decomposition model |
-
2017
- 2017-10-31 CN CN201711049910.9A patent/CN107844557A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103117059A (en) * | 2012-12-27 | 2013-05-22 | 北京理工大学 | Voice signal characteristics extracting method based on tensor decomposition |
US20140198108A1 (en) * | 2013-01-16 | 2014-07-17 | Disney Enterprises, Inc. | Multi-linear dynamic hair or clothing model with efficient collision handling |
CN106548016A (en) * | 2016-10-24 | 2017-03-29 | 天津大学 | Time series analysis method based on tensor relativity of time domain decomposition model |
Non-Patent Citations (1)
Title |
---|
QING LI等: "A Tensor-Based Information Framework for Predicting the Stock Market", 《ACM TRABSACTIONS ON INFORMATION SYSTEMS(TOIS)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390408A (en) * | 2018-04-16 | 2019-10-29 | 北京京东尚科信息技术有限公司 | Trading object prediction technique and device |
CN110390408B (en) * | 2018-04-16 | 2024-03-05 | 北京京东尚科信息技术有限公司 | Transaction object prediction method and device |
CN111428000A (en) * | 2020-03-20 | 2020-07-17 | 华泰证券股份有限公司 | Method, system and storage medium for quantizing unstructured text data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Souma et al. | Enhanced news sentiment analysis using deep learning methods | |
Taghian et al. | Learning financial asset-specific trading rules via deep reinforcement learning | |
Nadkarni et al. | Combining NeuroEvolution and Principal Component Analysis to trade in the financial markets | |
Wu et al. | The analysis of credit risks in agricultural supply chain finance assessment model based on genetic algorithm and backpropagation neural network | |
Gao | The use of machine learning combined with data mining technology in financial risk prevention | |
Feng et al. | Analyzing the Internet financial market risk management using data mining and deep learning methods | |
Zhong et al. | Effects of cost-benefit analysis under back propagation neural network on financial benefit evaluation of investment projects | |
Chen | Stock movement prediction with financial news using contextualized embedding from bert | |
Zhao et al. | Credit risk assessment of small and medium-sized enterprises in supply chain finance based on SVM and BP neural network | |
Etemadi et al. | Earnings per share forecast using extracted rules from trained neural network by genetic algorithm | |
Lin | Innovative risk early warning model under data mining approach in risk assessment of internet credit finance | |
Xu et al. | Uncertainty in financing interest rates for startups | |
Shi et al. | Method for improving the performance of technical analysis indicators by neural network models | |
Duan et al. | Elliott wave theory and the Fibonacci sequence-gray model and their application in Chinese stock market | |
Horky et al. | Don't miss out on NFTs?! A sentiment-based analysis of the early NFT market | |
Li et al. | Online portfolio management via deep reinforcement learning with high-frequency data | |
CN107844557A (en) | A kind of Forecasting Methodology based on high dimensional data structural relation | |
Wang et al. | Joint loan risk prediction based on deep learning‐optimized stacking model | |
Wang et al. | Collocating Recommendation Method for E‐Commerce Based on Fuzzy C‐Means Clustering Algorithm | |
Ding et al. | Forecasting product sales using text mining: A case study in new energy vehicle | |
Tong et al. | Adaptive trading system of assets for international cooperation in agricultural finance based on neural network | |
Jia | Deep Learning Algorithm‐Based Financial Prediction Models | |
Chen et al. | An Optimized BP Neural Network Model and Its Application in the Credit Evaluation of Venture Loans | |
Taghian et al. | A reinforcement learning based encoder-decoder framework for learning stock trading rules | |
Lyridis et al. | Freight-forward agreement time series modelling based on artificial neural network models/Modeliranje casovnih vrst terminskih pogodb na prevozne stroske z uporabo umetnih nevronskih mrez |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180327 |