CN103559642A - Financial data mining method based on cloud computing - Google Patents
Financial data mining method based on cloud computing Download PDFInfo
- Publication number
- CN103559642A CN103559642A CN201310536760.XA CN201310536760A CN103559642A CN 103559642 A CN103559642 A CN 103559642A CN 201310536760 A CN201310536760 A CN 201310536760A CN 103559642 A CN103559642 A CN 103559642A
- Authority
- CN
- China
- Prior art keywords
- data
- sample
- cloud computing
- mining method
- method based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a financial data mining method based on cloud computing. The method comprises the steps that mistake correcting, format conversion and other preprocessing operations are carried out on obtained financial data; needed nerve cell grids are established and are distributed in a rectangular mode, and the number of the grids accounts for 1% of the number of the possessed samples; the adaptive training is carried out based on the established grids and the processed data; the convergence training is carried out based on the established grids and the processed data; the discretization processing is carried out on the data based on the weight of trained nerve cells to enable the samples to correspond to the nerve cells in a one-to-one mode; each discrete point is labeled and visualized. The financial data mining method has the advantages that the distributed storing and computing characteristics are utilized, dimension reduction and clustering are carried on the data according to the characteristics of a self-organization nerve network, the visualization technology is adopted, and the data are more vivid.
Description
Technical field
The present invention relates to a kind of distributed finance data mining method, especially process the finance data mining method based on cloud computing of quick clustering under large data.
Background technology
Along with Internet develops rapidly, WWW (World Wide Web is called for short WWW) has become a huge information space, for user provides valuable information resources.And in the face of a large amount of financial financial data resources, how analyzing and processing becomes vital problem.Method be by high dimensional data dimensionality reduction to two dimension and carry out a visualization processing, and carry out decision analysis with this aid decision making person.
Self organizing neural network SOM(self-organization mapping net) be a kind of important kind of the neural network based on unsupervised learning method.The self-organized mapping network theory the earliest Kohen of Shi You Helsinki, Finland Polytechnics proposed in 1981.After this, be accompanied by neural network the developing rapidly of the middle and later periods eighties 20th century, the theoretical and application of Self-organizing Maps has also had significant progress.
It is a kind of guideless clustering method.In its simulation human brain, in the neurocyte of zones of different, divide the work different features, zones of different has different response characteristics, and this process completes automatically.Self-organized mapping network by find optimum reference vector set to input pattern set classify.Each reference vector is the connection weight vector that an output unit is corresponding.Compare with traditional pattern clustering method, its formed cluster centre can be mapped in a curved surface or plane, and keeps topological structure constant.Discrimination for unknown cluster centre can be realized with Self-organizing Maps.
Self organizing neural network is neural network one of research field the most glamorous, it can be inputted sample association by it and detect its regular and input sample relation each other, and according to the information self-adapting of these input samples, adjust network, the later response of network and input sample are adapted.The neuron of competitive type neural network can be identified similar input vector in groups by input message; Self-organizing map neural network can be identified similar input vector in groups equally by study, makes the neuron being in close proximity to each other in those network layers produce response to similar input vector.Different from competitive type neural network is, self-organizing map neural network not only can be learnt the distribution situation of input vector, can also learn the topological structure of input vector, its single neuron does not play a decisive role to pattern classification, and will lean on a plurality of neuronic synergies just can complete pattern classification.
Learning vector quantization LVQ(learning vector quantization) be a kind of for training supervised learning (supervised learning) method of competition layer.Competitive layer neural network can the classification of automatic learning to input vector pattern, but the distance between input vector is only depended in the classification that competition layer is carried out, and when two input vectors approach very much, competition layer just may be classified as a class them.In the design of competition layer, there is no such mechanism, strictly by area, judge that any two input vectors belong to same class or belong to inhomogeneity.And for LVQ network user's intended target classification results, network can pass through supervised learning, complete the accurate classification to input vector pattern.
Summary of the invention
Technical matters to be solved by this invention is that a kind of self organizing neural network characteristic of utilizing will be provided, to Data Dimensionality Reduction clustering processing, and the visual finance data mining method based on cloud computing.
In order to solve above technical matters, the invention provides a kind of finance data mining method based on cloud computing, the method comprises the following steps:
1) raw data is carried out to the pretreatment operation such as Data Migration, cleaning;
2) according to the data volume of raw data and dimension, determine the structure of neuronic distribution grid;
3) utilize data and the neuron network handled well to have acclimatization training;
4) utilize the result of above-mentioned data and acclimatization training to carry out convergence training;
5) utilize above-mentioned training result to carry out discretize processing and visualization processing to data.
The data pretreatment operation of described step 1) comprises the following steps:
11) raw data unification is converted to csv format file;
12) missing data in above-mentioned document is filled up, vacancy value substitutes with this attribute mean value;
Described step 2) the neuron network in is two-dimensional rectangle lattice, and its quantity is sample number 1%; In two-dimensional rectangle lattice, neuronic distance is Euclidean distance.
Described step 3) comprises following steps:
31) initial neighborhood scope being set is 2) in the radius of grid;
32) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
, wherein
for initial neighborhood scope,
for contraction coefficient;
33) Learning Step initial value is set
and step-length is shunk constant
;
34) calculate every step circulation time Learning Step
and neighborhood function
, their computing formula is:
35) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
37) above-mentioned each sample at least will circulate and input 1000 times.
Described step 4) comprises following steps:
41) initial neighborhood scope being set is 2) in the radius of grid;
42) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
, wherein
for initial neighborhood scope,
for contraction coefficient;
44) calculate every step circulation time Learning Step
and neighborhood function
, computing formula is:
45) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
47) above-mentioned each sample at least will circulate and input 4000 times;
Described step 5) is for making the result that the coordinate of neuron corresponding to the value of inner product maximum on grid is discretize of each neuronic weight and each sample.
Compared with prior art, the present invention has the following advantages:
1, well utilized the feature of distributed storage and calculating;
2, utilized self organizing neural network characteristic, to Data Dimensionality Reduction clustering processing;
3, adopted visualization technique, more vivid.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention.
Fig. 2 is the network topology figure of data.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.
As shown in Figure 1, the invention provides a kind of finance data mining method based on cloud computing, the method comprises the following steps:
1) to the financial financial data obtaining correct mistakes, the pretreatment operation such as format conversion;
2) set up required neuron grid, grid is rectangular distribution, and its number is 1% of the sample size that has;
3) utilize the grid of having set up and the data of handling well to have acclimatization training;
4) utilize the grid of having set up and the data of handling well to carry out convergence training;
5) utilize the neuronic weight training to carry out discretize processing to data, make the corresponding neuron of each sample;
6) each discrete point is carried out to label processing and visualization processing;
The data pretreatment operation of described step 1) comprises the following steps:
11) raw data unification is converted to csv format file;
12) missing data in above-mentioned document is filled up, vacancy value substitutes with this attribute mean value;
Described step 2) the neuron network in is two-dimensional rectangle lattice, and its quantity is sample number 1%.In two-dimensional rectangle lattice, neuronic distance is Euclidean distance.
Described step 3) comprises following steps:
31) initial neighborhood scope being set is 2) in the radius of grid;
32) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
, wherein
for initial neighborhood scope,
for contraction coefficient;
34) calculate every step circulation time Learning Step
and neighborhood function
, their computing formula is:
35) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
37) above-mentioned each sample at least will circulate and input 1000 times.
Described step 4) comprises following steps:
41) initial neighborhood scope being set is 2) in the radius of grid;
42) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
, wherein
for initial neighborhood scope,
for contraction coefficient;
44) calculate every step circulation time Learning Step
and neighborhood function
, computing formula is:
45) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
47) above-mentioned each sample at least will circulate and input 4000 times; In step 47) finish rear fixing
with
constant continuation training.
Described step 5) is for making the result that the coordinate of neuron corresponding to the value of inner product maximum on grid is discretize of each neuronic weight and each sample.
The example of specifically take is further set forth technical solution of the present invention as example.
Finance financial data is very complicated, comprise many indexs, as shown below is the financial index that emerging Rong invests a certain period, common way is that the tendency of each index is analyzed, but this analytical approach has been ignored the correlativity between each index, and be also difficult to the security financial data on whole market to unify comprehensive analyzing and processing, to determine the relation between each security.This method the finance of the listed company to all are unified to process and by these data compressions to one two-dimensional grid in order to show, the topological relation of lively displaying Liao Ge listed company.
In These parameters, take out a kind of large class, as profitability, analyze, collected the above-mentioned data of all listed companies, and process by algorithm, obtain a two-dimensional grid as shown in Figure 2, by different colors, can clearly find out the network topology in above-mentioned data.
Claims (7)
1. the finance data mining method based on cloud computing, the method comprises the following steps:
1) raw data is carried out to the pretreatment operation such as Data Migration, cleaning;
2) according to the data volume of raw data and dimension, determine the structure of neuronic distribution grid;
3) utilize data and the neuron network handled well to have acclimatization training;
4) utilize the result of above-mentioned data and acclimatization training to carry out convergence training;
5) utilize above-mentioned training result to carry out discretize processing and visualization processing to data.
2. the finance data mining method based on cloud computing according to claim 1, is characterized in that, the data pretreatment operation of described step 1) comprises the following steps:
11) raw data unification is converted to csv format file;
12) missing data in above-mentioned document is filled up, vacancy value substitutes with this attribute mean value.
3. the finance data mining method based on cloud computing according to claim 1, is characterized in that described step 2) in neuron network be two-dimensional rectangle lattice, its quantity is sample number 1%.
4. the finance data mining method based on cloud computing according to claim 3, is characterized in that, in described two-dimensional rectangle lattice, neuronic distance is Euclidean distance.
5. the finance data mining method based on cloud computing according to claim 1, is characterized in that, described step 3) comprises following steps:
31) initial neighborhood scope being set is 2) in the radius of grid;
32) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
, wherein
for initial neighborhood scope,
for contraction coefficient;
34) calculate every step circulation time Learning Step
and neighborhood function
, computing formula is:
35) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
37) above-mentioned each sample at least will circulate and input 1000 times.
6. the finance data mining method based on cloud computing according to claim 1, is characterized in that, described step 4) comprises following steps:
41) initial neighborhood scope being set is 2) in the radius of grid;
42) pass that neighborhood contraction coefficient and initial neighborhood constant is set is:
, wherein
for initial neighborhood scope,
for contraction coefficient;
44) calculate every step circulation time Learning Step
and neighborhood function
, computing formula is:
45) sample is inputted successively, and to each input sample calculation triumph unit, apart from the neuron of this sample Euclidean distance minimum;
46) weight is upgraded to
individual neuronic weight more new formula is:
47) above-mentioned each sample at least will circulate and input 4000 times;
7. the finance data mining method based on cloud computing according to claim 1, is characterized in that, step 5) is for making the result that the coordinate of neuron corresponding to the value of inner product maximum on grid is discretize of each neuronic weight and each sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310536760.XA CN103559642A (en) | 2013-11-04 | 2013-11-04 | Financial data mining method based on cloud computing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310536760.XA CN103559642A (en) | 2013-11-04 | 2013-11-04 | Financial data mining method based on cloud computing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103559642A true CN103559642A (en) | 2014-02-05 |
Family
ID=50013882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310536760.XA Pending CN103559642A (en) | 2013-11-04 | 2013-11-04 | Financial data mining method based on cloud computing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103559642A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104935673A (en) * | 2015-07-08 | 2015-09-23 | 成都梦工厂网络信息有限公司 | Meshing big data mining system based on cloud computing |
CN105718600A (en) * | 2016-03-08 | 2016-06-29 | 上海晶赞科技发展有限公司 | Heterogeneous data set feature quality visualization method |
CN105989179A (en) * | 2015-03-06 | 2016-10-05 | 北京邮电大学 | Financial data processing method and system |
CN106227776A (en) * | 2016-07-18 | 2016-12-14 | 四川君逸数码科技股份有限公司 | A kind of data preprocessing method supporting wisdom finance and device |
CN111552867A (en) * | 2020-03-31 | 2020-08-18 | 北京城市网邻信息技术有限公司 | Service information recommendation method and device |
CN111652734A (en) * | 2020-07-13 | 2020-09-11 | 李国安 | Financial information management system based on block chain and big data |
-
2013
- 2013-11-04 CN CN201310536760.XA patent/CN103559642A/en active Pending
Non-Patent Citations (4)
Title |
---|
乔磊: "基于神经网络方法的数据挖掘平台设计和实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
李岚: "基于划分的双向选择自组织聚类算法的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
白耀辉 等: "利用自组织特征映射神经网络进行可视化聚类", 《计算机仿真》 * |
齐志: "基于SOM神经网络的聚类可视化方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105989179A (en) * | 2015-03-06 | 2016-10-05 | 北京邮电大学 | Financial data processing method and system |
CN104935673A (en) * | 2015-07-08 | 2015-09-23 | 成都梦工厂网络信息有限公司 | Meshing big data mining system based on cloud computing |
CN105718600A (en) * | 2016-03-08 | 2016-06-29 | 上海晶赞科技发展有限公司 | Heterogeneous data set feature quality visualization method |
CN106227776A (en) * | 2016-07-18 | 2016-12-14 | 四川君逸数码科技股份有限公司 | A kind of data preprocessing method supporting wisdom finance and device |
CN111552867A (en) * | 2020-03-31 | 2020-08-18 | 北京城市网邻信息技术有限公司 | Service information recommendation method and device |
CN111652734A (en) * | 2020-07-13 | 2020-09-11 | 李国安 | Financial information management system based on block chain and big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bandaru et al. | Data mining methods for knowledge discovery in multi-objective optimization: Part A-Survey | |
CN103559642A (en) | Financial data mining method based on cloud computing | |
Stefanovič et al. | Visual analysis of self-organizing maps | |
CN106778832B (en) | The semi-supervised Ensemble classifier method of high dimensional data based on multiple-objection optimization | |
CN105825511A (en) | Image background definition detection method based on deep learning | |
CN112149873B (en) | Low-voltage station line loss reasonable interval prediction method based on deep learning | |
Bergmann et al. | Approximation of dispatching rules for manufacturing simulation using data mining methods | |
Wang et al. | The load characteristics classification and synthesis of substations in large area power grid | |
CN102722578B (en) | Unsupervised cluster characteristic selection method based on Laplace regularization | |
CN111160483B (en) | Network relation type prediction method based on multi-classifier fusion model | |
CN105046323A (en) | Regularization-based RBF network multi-label classification method | |
Rastogi et al. | GA based clustering of mixed data type of attributes (numeric, categorical, ordinal, binary and ratio-scaled) | |
CN110738245A (en) | automatic clustering algorithm selection system and method for scientific data analysis | |
CN114169998A (en) | Financial big data analysis and mining algorithm | |
CN110175631A (en) | A kind of multiple view clustering method based on common Learning Subspaces structure and cluster oriental matrix | |
Zhang et al. | Prediction of dairy product quality risk based on extreme learning machine | |
CN104050451A (en) | Robust target tracking method based on multi-channel Haar-like characteristics | |
Zhang et al. | Detection of coronal mass ejections using multiple features and space–time continuity | |
Li et al. | Simulation of multivariate scheduling optimization for open production line based on improved genetic algorithm | |
Clement et al. | Beyond explaining: XAI-based Adaptive Learning with SHAP Clustering for Energy Consumption Prediction | |
Li et al. | PointSmile: Point self-supervised learning via curriculum mutual information | |
Cabanes et al. | On the use of Wasserstein metric in topological clustering of distributional data | |
CN113989671A (en) | Remote sensing scene classification method and system based on semantic perception and dynamic graph convolution | |
CN111680576A (en) | LULC prediction method based on self-adaptive cellular algorithm | |
Hu et al. | Application of som neural network in lithology recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140205 |