CN103559642A - Financial data mining method based on cloud computing - Google Patents

Financial data mining method based on cloud computing Download PDF

Info

Publication number
CN103559642A
CN103559642A CN201310536760.XA CN201310536760A CN103559642A CN 103559642 A CN103559642 A CN 103559642A CN 201310536760 A CN201310536760 A CN 201310536760A CN 103559642 A CN103559642 A CN 103559642A
Authority
CN
China
Prior art keywords
data
sample
cloud computing
mining method
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310536760.XA
Other languages
Chinese (zh)
Inventor
向阳
罗成
张依杨
张波
袁书寒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201310536760.XA priority Critical patent/CN103559642A/en
Publication of CN103559642A publication Critical patent/CN103559642A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a financial data mining method based on cloud computing. The method comprises the steps that mistake correcting, format conversion and other preprocessing operations are carried out on obtained financial data; needed nerve cell grids are established and are distributed in a rectangular mode, and the number of the grids accounts for 1% of the number of the possessed samples; the adaptive training is carried out based on the established grids and the processed data; the convergence training is carried out based on the established grids and the processed data; the discretization processing is carried out on the data based on the weight of trained nerve cells to enable the samples to correspond to the nerve cells in a one-to-one mode; each discrete point is labeled and visualized. The financial data mining method has the advantages that the distributed storing and computing characteristics are utilized, dimension reduction and clustering are carried on the data according to the characteristics of a self-organization nerve network, the visualization technology is adopted, and the data are more vivid.

Description

Finance data mining method based on cloud computing
Technical field
The present invention relates to a kind of distributed finance data mining method, especially process the finance data mining method based on cloud computing of quick clustering under large data.
Background technology
Along with Internet develops rapidly, WWW (World Wide Web is called for short WWW) has become a huge information space, for user provides valuable information resources.And in the face of a large amount of financial financial data resources, how analyzing and processing becomes vital problem.Method be by high dimensional data dimensionality reduction to two dimension and carry out a visualization processing, and carry out decision analysis with this aid decision making person.
Self organizing neural network SOM(self-organization mapping net) be a kind of important kind of the neural network based on unsupervised learning method.The self-organized mapping network theory the earliest Kohen of Shi You Helsinki, Finland Polytechnics proposed in 1981.After this, be accompanied by neural network the developing rapidly of the middle and later periods eighties 20th century, the theoretical and application of Self-organizing Maps has also had significant progress.
It is a kind of guideless clustering method.In its simulation human brain, in the neurocyte of zones of different, divide the work different features, zones of different has different response characteristics, and this process completes automatically.Self-organized mapping network by find optimum reference vector set to input pattern set classify.Each reference vector is the connection weight vector that an output unit is corresponding.Compare with traditional pattern clustering method, its formed cluster centre can be mapped in a curved surface or plane, and keeps topological structure constant.Discrimination for unknown cluster centre can be realized with Self-organizing Maps.
Self organizing neural network is neural network one of research field the most glamorous, it can be inputted sample association by it and detect its regular and input sample relation each other, and according to the information self-adapting of these input samples, adjust network, the later response of network and input sample are adapted.The neuron of competitive type neural network can be identified similar input vector in groups by input message; Self-organizing map neural network can be identified similar input vector in groups equally by study, makes the neuron being in close proximity to each other in those network layers produce response to similar input vector.Different from competitive type neural network is, self-organizing map neural network not only can be learnt the distribution situation of input vector, can also learn the topological structure of input vector, its single neuron does not play a decisive role to pattern classification, and will lean on a plurality of neuronic synergies just can complete pattern classification.
Learning vector quantization LVQ(learning vector quantization) be a kind of for training supervised learning (supervised learning) method of competition layer.Competitive layer neural network can the classification of automatic learning to input vector pattern, but the distance between input vector is only depended in the classification that competition layer is carried out, and when two input vectors approach very much, competition layer just may be classified as a class them.In the design of competition layer, there is no such mechanism, strictly by area, judge that any two input vectors belong to same class or belong to inhomogeneity.And for LVQ network user's intended target classification results, network can pass through supervised learning, complete the accurate classification to input vector pattern.
Summary of the invention
Technical matters to be solved by this invention is that a kind of self organizing neural network characteristic of utilizing will be provided, to Data Dimensionality Reduction clustering processing, and the visual finance data mining method based on cloud computing.
In order to solve above technical matters, the invention provides a kind of finance data mining method based on cloud computing, the method comprises the following steps:
1) raw data is carried out to the pretreatment operation such as Data Migration, cleaning;
2) according to the data volume of raw data and dimension, determine the structure of neuronic distribution grid;
3) utilize data and the neuron network handled well to have acclimatization training;
4) utilize the result of above-mentioned data and acclimatization training to carry out convergence training;
5) utilize above-mentioned training result to carry out discretize processing and visualization processing to data.
The data pretreatment operation of described step 1) comprises the following steps:
11) raw data unification is converted to csv format file;
12) missing data in above-mentioned document is filled up, vacancy value substitutes with this attribute mean value;
Described step 2) the neuron network in is two-dimensional rectangle lattice, and its quantity is sample number 1%; In two-dimensional rectangle lattice, neuronic distance is Euclidean distance.
Described step 3) comprises following steps:
31) initial neighborhood scope being set is 2) in the radius of grid;
32) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
Figure 201310536760X100002DEST_PATH_IMAGE001
, wherein
Figure 647076DEST_PATH_IMAGE002
for initial neighborhood scope,
Figure 201310536760X100002DEST_PATH_IMAGE003
for contraction coefficient;
33) Learning Step initial value is set and step-length is shunk constant ;
34) calculate every step circulation time Learning Step
Figure 244728DEST_PATH_IMAGE006
and neighborhood function , their computing formula is:
Figure 127233DEST_PATH_IMAGE008
Wherein
Figure 201310536760X100002DEST_PATH_IMAGE009
for two nodes in grid
Figure 417400DEST_PATH_IMAGE010
with
Figure 201310536760X100002DEST_PATH_IMAGE011
distance;
35) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
36) weight is upgraded to
Figure 796167DEST_PATH_IMAGE011
individual neuronic weight more new formula is:
Figure 234101DEST_PATH_IMAGE012
37) above-mentioned each sample at least will circulate and input 1000 times.
Described step 4) comprises following steps:
41) initial neighborhood scope being set is 2) in the radius of grid;
42) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
Figure 592401DEST_PATH_IMAGE001
, wherein
Figure 64971DEST_PATH_IMAGE002
for initial neighborhood scope, for contraction coefficient;
43) Learning Step initial value is set
Figure 41334DEST_PATH_IMAGE004
and step-length is shunk constant
Figure 265642DEST_PATH_IMAGE005
;
44) calculate every step circulation time Learning Step
Figure 327139DEST_PATH_IMAGE006
and neighborhood function
Figure 549173DEST_PATH_IMAGE007
, computing formula is:
Figure 201310536760X100002DEST_PATH_IMAGE013
Wherein
Figure 961700DEST_PATH_IMAGE009
for two nodes in grid
Figure 428847DEST_PATH_IMAGE014
with
Figure 610429DEST_PATH_IMAGE011
distance;
45) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
46) weight is upgraded to
Figure 800102DEST_PATH_IMAGE011
individual neuronic weight more new formula is:
Figure 201310536760X100002DEST_PATH_IMAGE015
47) above-mentioned each sample at least will circulate and input 4000 times;
In step 47) finish rear fixing
Figure 637608DEST_PATH_IMAGE016
with
Figure 201310536760X100002DEST_PATH_IMAGE017
constant continuation training.
Described step 5) is for making the result that the coordinate of neuron corresponding to the value of inner product maximum on grid is discretize of each neuronic weight and each sample.
Compared with prior art, the present invention has the following advantages:
1, well utilized the feature of distributed storage and calculating;
2, utilized self organizing neural network characteristic, to Data Dimensionality Reduction clustering processing;
3, adopted visualization technique, more vivid.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention.
Fig. 2 is the network topology figure of data.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.
As shown in Figure 1, the invention provides a kind of finance data mining method based on cloud computing, the method comprises the following steps:
1) to the financial financial data obtaining correct mistakes, the pretreatment operation such as format conversion;
2) set up required neuron grid, grid is rectangular distribution, and its number is 1% of the sample size that has;
3) utilize the grid of having set up and the data of handling well to have acclimatization training;
4) utilize the grid of having set up and the data of handling well to carry out convergence training;
5) utilize the neuronic weight training to carry out discretize processing to data, make the corresponding neuron of each sample;
6) each discrete point is carried out to label processing and visualization processing;
The data pretreatment operation of described step 1) comprises the following steps:
11) raw data unification is converted to csv format file;
12) missing data in above-mentioned document is filled up, vacancy value substitutes with this attribute mean value;
Described step 2) the neuron network in is two-dimensional rectangle lattice, and its quantity is sample number 1%.In two-dimensional rectangle lattice, neuronic distance is Euclidean distance.
Described step 3) comprises following steps:
31) initial neighborhood scope being set is 2) in the radius of grid;
32) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
Figure 141402DEST_PATH_IMAGE001
, wherein
Figure 443070DEST_PATH_IMAGE002
for initial neighborhood scope,
Figure 741327DEST_PATH_IMAGE003
for contraction coefficient;
33) Learning Step initial value is set
Figure 128446DEST_PATH_IMAGE004
and step-length is shunk constant ;
34) calculate every step circulation time Learning Step
Figure 825062DEST_PATH_IMAGE006
and neighborhood function , their computing formula is:
Wherein
Figure 342128DEST_PATH_IMAGE009
for two nodes in grid with
Figure 258448DEST_PATH_IMAGE011
distance;
35) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
36) weight is upgraded to
Figure 354580DEST_PATH_IMAGE011
individual neuronic weight more new formula is:
37) above-mentioned each sample at least will circulate and input 1000 times.
Described step 4) comprises following steps:
41) initial neighborhood scope being set is 2) in the radius of grid;
42) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
Figure 3867DEST_PATH_IMAGE001
, wherein
Figure 869055DEST_PATH_IMAGE002
for initial neighborhood scope,
Figure 447060DEST_PATH_IMAGE003
for contraction coefficient;
43) Learning Step initial value is set and step-length is shunk constant
Figure 280204DEST_PATH_IMAGE005
;
44) calculate every step circulation time Learning Step and neighborhood function , computing formula is:
Figure 583644DEST_PATH_IMAGE020
Wherein
Figure 574733DEST_PATH_IMAGE009
for two nodes in grid
Figure 414514DEST_PATH_IMAGE010
with distance;
45) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
46) weight is upgraded to
Figure 125298DEST_PATH_IMAGE011
individual neuronic weight more new formula is:
47) above-mentioned each sample at least will circulate and input 4000 times; In step 47) finish rear fixing
Figure 418613DEST_PATH_IMAGE016
with
Figure 112900DEST_PATH_IMAGE017
constant continuation training.
Described step 5) is for making the result that the coordinate of neuron corresponding to the value of inner product maximum on grid is discretize of each neuronic weight and each sample.
The example of specifically take is further set forth technical solution of the present invention as example.
Finance financial data is very complicated, comprise many indexs, as shown below is the financial index that emerging Rong invests a certain period, common way is that the tendency of each index is analyzed, but this analytical approach has been ignored the correlativity between each index, and be also difficult to the security financial data on whole market to unify comprehensive analyzing and processing, to determine the relation between each security.This method the finance of the listed company to all are unified to process and by these data compressions to one two-dimensional grid in order to show, the topological relation of lively displaying Liao Ge listed company.
Figure 764461DEST_PATH_IMAGE023
Figure 747461DEST_PATH_IMAGE025
Figure 142670DEST_PATH_IMAGE027
Figure DEST_PATH_IMAGE031
Figure DEST_PATH_IMAGE033
In These parameters, take out a kind of large class, as profitability, analyze, collected the above-mentioned data of all listed companies, and process by algorithm, obtain a two-dimensional grid as shown in Figure 2, by different colors, can clearly find out the network topology in above-mentioned data.

Claims (7)

1. the finance data mining method based on cloud computing, the method comprises the following steps:
1) raw data is carried out to the pretreatment operation such as Data Migration, cleaning;
2) according to the data volume of raw data and dimension, determine the structure of neuronic distribution grid;
3) utilize data and the neuron network handled well to have acclimatization training;
4) utilize the result of above-mentioned data and acclimatization training to carry out convergence training;
5) utilize above-mentioned training result to carry out discretize processing and visualization processing to data.
2. the finance data mining method based on cloud computing according to claim 1, is characterized in that, the data pretreatment operation of described step 1) comprises the following steps:
11) raw data unification is converted to csv format file;
12) missing data in above-mentioned document is filled up, vacancy value substitutes with this attribute mean value.
3. the finance data mining method based on cloud computing according to claim 1, is characterized in that described step 2) in neuron network be two-dimensional rectangle lattice, its quantity is sample number 1%.
4. the finance data mining method based on cloud computing according to claim 3, is characterized in that, in described two-dimensional rectangle lattice, neuronic distance is Euclidean distance.
5. the finance data mining method based on cloud computing according to claim 1, is characterized in that, described step 3) comprises following steps:
31) initial neighborhood scope being set is 2) in the radius of grid;
32) pass that neighborhood contraction coefficient and initial neighborhood constant is set is: , wherein
Figure 201310536760X100001DEST_PATH_IMAGE002
for initial neighborhood scope,
Figure 80284DEST_PATH_IMAGE003
for contraction coefficient;
33) Learning Step initial value is set
Figure 201310536760X100001DEST_PATH_IMAGE004
and step-length is shunk constant ;
34) calculate every step circulation time Learning Step and neighborhood function
Figure 192913DEST_PATH_IMAGE007
, computing formula is:
Figure 201310536760X100001DEST_PATH_IMAGE008
Wherein
Figure 944969DEST_PATH_IMAGE009
for two nodes in grid
Figure 201310536760X100001DEST_PATH_IMAGE010
with
Figure 203649DEST_PATH_IMAGE011
distance;
35) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
36) weight is upgraded to
Figure 470683DEST_PATH_IMAGE011
individual neuronic weight more new formula is:
Figure 201310536760X100001DEST_PATH_IMAGE012
37) above-mentioned each sample at least will circulate and input 1000 times.
6. the finance data mining method based on cloud computing according to claim 1, is characterized in that, described step 4) comprises following steps:
41) initial neighborhood scope being set is 2) in the radius of grid;
42) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
Figure 607266DEST_PATH_IMAGE001
, wherein
Figure 213828DEST_PATH_IMAGE002
for initial neighborhood scope,
Figure 207192DEST_PATH_IMAGE003
for contraction coefficient;
43) Learning Step initial value is set
Figure 961521DEST_PATH_IMAGE004
and step-length is shunk constant
Figure 636216DEST_PATH_IMAGE005
;
44) calculate every step circulation time Learning Step
Figure 159601DEST_PATH_IMAGE006
and neighborhood function
Figure 28593DEST_PATH_IMAGE007
, computing formula is:
Figure 270219DEST_PATH_IMAGE013
Wherein for two nodes in grid
Figure 126496DEST_PATH_IMAGE010
with
Figure 461663DEST_PATH_IMAGE011
distance;
45) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
46) weight is upgraded to individual neuronic weight more new formula is:
Figure 201310536760X100001DEST_PATH_IMAGE014
47) above-mentioned each sample at least will circulate and input 4000 times;
In step 47) finish rear fixing
Figure 207082DEST_PATH_IMAGE015
with
Figure 201310536760X100001DEST_PATH_IMAGE016
constant continuation training.
7. the finance data mining method based on cloud computing according to claim 1, is characterized in that, step 5) is for making the result that the coordinate of neuron corresponding to the value of inner product maximum on grid is discretize of each neuronic weight and each sample.
CN201310536760.XA 2013-11-04 2013-11-04 Financial data mining method based on cloud computing Pending CN103559642A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310536760.XA CN103559642A (en) 2013-11-04 2013-11-04 Financial data mining method based on cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310536760.XA CN103559642A (en) 2013-11-04 2013-11-04 Financial data mining method based on cloud computing

Publications (1)

Publication Number Publication Date
CN103559642A true CN103559642A (en) 2014-02-05

Family

ID=50013882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310536760.XA Pending CN103559642A (en) 2013-11-04 2013-11-04 Financial data mining method based on cloud computing

Country Status (1)

Country Link
CN (1) CN103559642A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104935673A (en) * 2015-07-08 2015-09-23 成都梦工厂网络信息有限公司 Meshing big data mining system based on cloud computing
CN105718600A (en) * 2016-03-08 2016-06-29 上海晶赞科技发展有限公司 Heterogeneous data set feature quality visualization method
CN105989179A (en) * 2015-03-06 2016-10-05 北京邮电大学 Financial data processing method and system
CN106227776A (en) * 2016-07-18 2016-12-14 四川君逸数码科技股份有限公司 A kind of data preprocessing method supporting wisdom finance and device
CN111552867A (en) * 2020-03-31 2020-08-18 北京城市网邻信息技术有限公司 Service information recommendation method and device
CN111652734A (en) * 2020-07-13 2020-09-11 李国安 Financial information management system based on block chain and big data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
乔磊: "基于神经网络方法的数据挖掘平台设计和实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
李岚: "基于划分的双向选择自组织聚类算法的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
白耀辉 等: "利用自组织特征映射神经网络进行可视化聚类", 《计算机仿真》 *
齐志: "基于SOM神经网络的聚类可视化方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989179A (en) * 2015-03-06 2016-10-05 北京邮电大学 Financial data processing method and system
CN104935673A (en) * 2015-07-08 2015-09-23 成都梦工厂网络信息有限公司 Meshing big data mining system based on cloud computing
CN105718600A (en) * 2016-03-08 2016-06-29 上海晶赞科技发展有限公司 Heterogeneous data set feature quality visualization method
CN106227776A (en) * 2016-07-18 2016-12-14 四川君逸数码科技股份有限公司 A kind of data preprocessing method supporting wisdom finance and device
CN111552867A (en) * 2020-03-31 2020-08-18 北京城市网邻信息技术有限公司 Service information recommendation method and device
CN111652734A (en) * 2020-07-13 2020-09-11 李国安 Financial information management system based on block chain and big data

Similar Documents

Publication Publication Date Title
Bandaru et al. Data mining methods for knowledge discovery in multi-objective optimization: Part A-Survey
CN103559642A (en) Financial data mining method based on cloud computing
Stefanovič et al. Visual analysis of self-organizing maps
CN106778832B (en) The semi-supervised Ensemble classifier method of high dimensional data based on multiple-objection optimization
CN105825511A (en) Image background definition detection method based on deep learning
CN112149873B (en) Low-voltage station line loss reasonable interval prediction method based on deep learning
Bergmann et al. Approximation of dispatching rules for manufacturing simulation using data mining methods
Wang et al. The load characteristics classification and synthesis of substations in large area power grid
CN102722578B (en) Unsupervised cluster characteristic selection method based on Laplace regularization
CN111160483B (en) Network relation type prediction method based on multi-classifier fusion model
CN105046323A (en) Regularization-based RBF network multi-label classification method
Rastogi et al. GA based clustering of mixed data type of attributes (numeric, categorical, ordinal, binary and ratio-scaled)
CN110738245A (en) automatic clustering algorithm selection system and method for scientific data analysis
CN114169998A (en) Financial big data analysis and mining algorithm
CN110175631A (en) A kind of multiple view clustering method based on common Learning Subspaces structure and cluster oriental matrix
Zhang et al. Prediction of dairy product quality risk based on extreme learning machine
CN104050451A (en) Robust target tracking method based on multi-channel Haar-like characteristics
Zhang et al. Detection of coronal mass ejections using multiple features and space–time continuity
Li et al. Simulation of multivariate scheduling optimization for open production line based on improved genetic algorithm
Clement et al. Beyond explaining: XAI-based Adaptive Learning with SHAP Clustering for Energy Consumption Prediction
Li et al. PointSmile: Point self-supervised learning via curriculum mutual information
Cabanes et al. On the use of Wasserstein metric in topological clustering of distributional data
CN113989671A (en) Remote sensing scene classification method and system based on semantic perception and dynamic graph convolution
CN111680576A (en) LULC prediction method based on self-adaptive cellular algorithm
Hu et al. Application of som neural network in lithology recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140205