CN109271371A - A kind of Distributed-tier big data analysis processing model based on Spark - Google Patents

A kind of Distributed-tier big data analysis processing model based on Spark Download PDF

Info

Publication number
CN109271371A
CN109271371A CN201810956427.7A CN201810956427A CN109271371A CN 109271371 A CN109271371 A CN 109271371A CN 201810956427 A CN201810956427 A CN 201810956427A CN 109271371 A CN109271371 A CN 109271371A
Authority
CN
China
Prior art keywords
layer
distributed
tier
big data
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810956427.7A
Other languages
Chinese (zh)
Other versions
CN109271371B (en
Inventor
宋泊东
张立臣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201810956427.7A priority Critical patent/CN109271371B/en
Publication of CN109271371A publication Critical patent/CN109271371A/en
Application granted granted Critical
Publication of CN109271371B publication Critical patent/CN109271371B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of, and the Distributed-tier big data analysis based on Spark handles model, including expression layer (PT), front end switching layer (FST), rear end switching layer (BST), real time business logical layer (RBLT), non-real-time service logical layer (NRBLT) and data access layer (DAT).The invention proposes a kind of, and the Distributed-tier big data analysis based on Spark handles model, can effectively reduce the analysis speed of mass data, and the Heterogeneous Information in support system between each subsystem is linked up and stored with data.It is sufficient for the short-term trend forecast demand of high frequency trade market.The application value with higher in high frequency, big data processing system.

Description

A kind of Distributed-tier big data analysis processing model based on Spark
Technical field
The present invention relates to big data analysis process fields, more particularly, to a kind of Distributed-tier based on Spark Big data analysis handles model.
Background technique
Big data can help user to improve insight, be promoted in higher level, wider array of visual angle, bigger range Decision edge.But some values having often are hidden in big data, show value density it is extremely low, distribution extremely not Rule, Information hiding are in the highest degree, discovery is useful is worth extremely difficult distinct characteristic.As the high frequency of stock market is traded (HFT), because of short-term market trend and quickly quotation, people are difficult to determine when buy or sell in time, to big data Accuracy, the rapidity of analysis have high requirement.
Summary of the invention
Present invention aim to address said one or multiple defects, propose that a kind of Distributed-tier based on Spark is big Data Analysis Services model.
To realize the above goal of the invention, the technical solution adopted is that:
A kind of Distributed-tier big data analysis processing model based on Spark, including expression layer (PT), front end exchange Layer (FST), rear end switching layer (BST), real time business logical layer (RBLT), non-real-time service logical layer (NRBLT) and data are visited Ask layer (DAT);Wherein expression layer (PT) carries out data transmission with front end switching layer (FST), the output of front end switching layer (FST) End is connect with the input terminal of medium;Medium carries out data transmission with rear end switching layer (BST);The output of rear end switching layer (BST) End is connect with the input terminal of the input terminal of real time business logical layer (RBLT) and non-real-time service logical layer (NRBLT);Real-time industry Be engaged in logical layer (RBLT) output end and non-real-time service logical layer (NRBLT) output end with data access layer (DAT) Input terminal connection.
Preferably, the expression layer (PT) is obtained data and is serviced using Facade and handled from user to rear from BLT Hold all requests of cluster.
Preferably, the front end switching layer (FST) further includes the front-end server being deployed on node, the front end Switching layer (FST) is responsible for receiving web request, and web request is transferred to Facade by Kafka message system.
Preferably, the front-end server is the front-end server for deploying MongoDB, before the MongoDB passes through End switching layer (FST) is sent to Kafka to avoid enter into rear end cluster.
Preferably, the rear end switching layer (BST) obtains message from Kafka, carries out front end by BST ingress interface Server and rear end switching layer carry out information transmission.
Preferably, the real time business logical layer (RBLT) further includes indicating node and docking center;The expression section Point is carried out data transmission by spout and medium;The docking center is carried out data transmission by bolt and medium.
Preferably, the non-real-time service logical layer (NRBLT) is for storing decision strategy;The wherein decision plan It is slightly stored in MongoDB, can be obtained the interface of quickly access large data collection using R program and Spark RDD.
Preferably, the data access layer (DAT) includes real time data resources bank, switching centre, baseline and data bins Library;Wherein real time data resources bank carries out real-time data access to switching centre.
Compared with prior art, the beneficial effects of the present invention are:
The invention proposes a kind of, and the Distributed-tier big data analysis based on Spark handles model, can effectively reduce sea The analysis speed of data is measured, and the Heterogeneous Information in support system between each subsystem is linked up and stored with data.It is sufficient for high frequency The short-term trend forecast demand of trade market.The application value with higher in high frequency, big data processing system.
Detailed description of the invention
Fig. 1 is the distributed architecture figure of this system;
Fig. 2 is real time business logical layer structure figure;
Fig. 3 is status center topology diagram;
Fig. 4 is original design figure;
Fig. 5 is HFT topology diagram;
Fig. 6 is the average calculation times figure that state of market calculates;
Fig. 7 is computing market each second status number figure;
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
Below in conjunction with drawings and examples, the present invention is further elaborated.
Embodiment 1
A kind of Distributed-tier big data analysis processing model based on Spark, referring to FIG. 1, including expression layer (PT), front end switching layer (FST), rear end switching layer (BST), real time business logical layer (RBLT), non-real-time service logical layer (NRBLT) and data access layer (DAT);Wherein expression layer (PT) carries out data transmission with front end switching layer (FST), and front end is handed over The input terminal of the output end and medium that change layer (FST) connects;Medium carries out data transmission with rear end switching layer (BST);It hands over rear end Change the defeated of the output end of layer (BST) and the input terminal of real time business logical layer (RBLT) and non-real-time service logical layer (NRBLT) Enter end connection;The output end of the output end of real time business logical layer (RBLT) and non-real-time service logical layer (NRBLT) with The input terminal of data access layer (DAT) connects.This framework is from Triple distribution architectural evolution.Finally, we will Business Logic is separated into real time business logical layer and non-real-time service logical layer.In addition, we use the message of two ranks Middleware is transmitted to solve the high frequency requirements in whole system.
Expression layer (PT) this layer obtains data from BLT, and prepares the user that web page is presented to online browse.In order to Accelerate loading velocity, reduces the delay of access time, the present embodiment services to handle from user to rear end cluster using Facade All requests.Architecture is set to have more loose couplings.
Front end switching layer (FST) is responsible for receiving web request, and is passed them to by Kafka message system Facade.This layer includes the front-end server being deployed on node.In view of operation efficiency, the present embodiment disposes MongoDB It in front-end server, is run through front end switching layer and is sent to Kafka, it is not necessary to enter rear end cluster and carry out data processing.
In the present embodiment, the front-end server is the front-end server for deploying MongoDB, and the MongoDB passes through Front end switching layer (FST) is sent to Kafka to avoid enter into rear end cluster.
In the present embodiment, the rear end switching layer (BST) obtains message from Kafka, before being carried out by BST ingress interface Server and rear end switching layer is held to carry out information transmission.
In the present embodiment, the real time business logical layer (RBLT) further includes indicating node and docking center;The expression Node is carried out data transmission by spout and medium;The docking center is carried out data transmission by bolt and medium.Real-time industry Business logical layer (RBLT) is the key component of radio frequency system, is mainly responsible for the processing and calculating of real time data.It includes two weights The service wanted, data analysis and decision.Such as a stock trade price forecasting system, it is necessary to a storm topology, with quick Real-time price quotations stream is handled, and is stored into HBase.Rket state is calculated for HDFS.That is: the signal bought in or sold is calculated. As shown in Fig. 2, if user terminal and transaction platform are divided into two topological networks.Pass through Kafka message system computing market State simultaneously passes it to user.In order to improve efficiency with higher speed, we incorporate the two topology, and will Kafka messaging middleware replaces with Netty, realizes high-frequency therapeutic treatment and the transmission of information.In Storm topology, Netty Speed be about 10 times of Kafka.
In the present embodiment, the function of non-real-time service logical layer (NRBLT) calculates user according to big data Information result carries out decision strategy.Decision strategy is stored in MongoDB, is quickly accessed convenient for user from front end node.It utilizes R program and Spark RDD, so that it may obtain the interface of quickly access large data collection.
In the present embodiment, the data access layer (DAT) includes real time data resources bank, switching centre, baseline and data Warehouse;Wherein real time data resources bank carries out real-time data access to switching centre.Data access layer (DAT) comes for accessing From all data of database or external data source.As DAT provides an order interface, and big data information can be combined At a K-Bar, middleware is transmitted to user's immediate feedback external data information by Kafka unified message.
By above-mentioned model framework, our one stock exchange big data analysis decision calculated examples of virtual development, to calculation Method process is analyzed.Stock trade price provides real-time price quotations and marketing state by network trading platform.Due to needing The requirement for meeting machine learning and quickly calculating.We are first using network trading center as a topology, to realize algorithm Low latency.See the most entire status center topology of Fig. 3.
In topology, KafkaSpout is serviced from external RealtimeDataPublisher and is received real-time price quotations, and is led to It crosses distributed information system Kafka and constantly sends market real-time deal price.Then KafkaSpout by Price pass-through give with 18 ComputeStateBolt afterwards.Each ComputeStateBolt has different computer logics, and is come using it Calculate state of market defined in specific TA logic.Then, result state of market is sent to spy by 18 ComputeStateBolt Fixed TA WriteDataBolt.HBase is written in corresponding TA data by each WriteDataBolt.For example, State of market is sent to MAWriteDataBolt by ComputeStateBolt, special to store MA state of market.In topology Outside, all black lines all indicate that Kafka, Netty distributed messaging system transmit.
The purpose of high frequency trading market data analysis is to acquire marketing and price status for user.Therefore, Wo Menxu Machine learning algorithm is used, historic market data are learnt, then according to historical trend changing rule, help constructs investment plan Slightly.In order to solve the problems, such as large data sets Fast Learning, herein using the Plan Center operation in Apache Spark frame Machine learning algorithm loads large-scale history data set from HBase, and learns in a short time and analyze.Plan Center branch Hold vector machine (SVM), logistic regression (LR) and classification.By Spark RDD, Plan Center can be by the city of hundreds of gb Field status data is loaded into memory, and multiple nodes in the cluster calculate analysis.User is helped to provide trading strategies.It hands over After easy strategy generating, user can choose the investment decision used on web page.Large data sets handle model framework such as Fig. 4 It is shown.
In order to reduce big data analysis, processing and the overhead time of transmission, we are by status center topology and trade us Above topology is merged into one large-scale topology, forms a large size HFT system such as Fig. 5.
HFT after integration extracts data from Kafka queue, and writes data into HBase and MongoDB.Fig. 5 is shown Entire HFT topological structure.Pass through the integration to network trading center and user terminal, so that it may the cost time of information transmission Shorten to several milliseconds.But due to the complexity of large-scale transaction system architecture, it is necessary to carry out efficient cluster resource pipe Reason, can just effectively improve the calculating speed of algorithm.Therefore, we are started most of services using yarn and managed on cluster All resources.And each node and Hadoop service status on the configuration monitoring Cloudera by customizing Hadoop service, Realize the cluster service of large data sets.
Due to high frequency and real time data processing requirement, trade center needs calculate millions of a markets in one second State.Therefore, simulated experimental environments have been built herein, compare the algorithm performance processing result of different number futures exchange.We 8 computers are prepared as cluster, wherein 6 run Storm topology as manager.Experimental situation is as shown in table 1.
The details of 1 cluster of table
In order to test the extreme efficiency of the architecture and find out the configuration of most suitable cluster, we are to each experiment The average calculation times of all state of market compare.
For check algorithm performance, we be added in original topological structure one it is entitled The new bolt of ExpStateReceiverBolt is flat by calculating quickly to collect all calculating metric datas of state of market Mean testing algorithm performance.Fig. 6 shows results of property, and Fig. 7 shows the state of market number of N number of stock.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (8)

1. a kind of Distributed-tier big data analysis based on Spark handles model, which is characterized in that including expression layer (PT), Front end switching layer (FST), rear end switching layer (BST), real time business logical layer (RBLT), non-real-time service logical layer (NRBLT) With data access layer (DAT);Wherein expression layer (PT) carries out data transmission with front end switching layer (FST), front end switching layer (FST) Output end and medium input terminal connect;Medium carries out data transmission with rear end switching layer (BST);Rear end switching layer (BST) Output end connect with the input terminal of the input terminal of real time business logical layer (RBLT) and non-real-time service logical layer (NRBLT); The output end of the output end of real time business logical layer (RBLT) and non-real-time service logical layer (NRBLT) is and data access layer (DAT) input terminal connection.
2. a kind of Distributed-tier big data analysis based on Spark according to claim 1 handles model, feature exists Ask the visitor in for the institute for obtaining data from BLT in, the expression layer (PT) and servicing using Facade to handle from user to rear end cluster It asks.
3. a kind of Distributed-tier big data analysis based on Spark according to claim 1 handles model, feature exists In the front end switching layer (FST) further includes the front-end server being deployed on node, and the front end switching layer (FST) is responsible for Web request is received, and is transferred to web request by Kafka message system
4. a kind of Distributed-tier big data analysis based on Spark according to claim 3 handles model, feature exists In the front-end server is the front-end server for deploying MongoDB, and the MongoDB is sent out by front end switching layer (FST) It send to Kafka to avoid enter into rear end cluster.
5. a kind of Distributed-tier big data analysis based on Spark according to claim 1 handles model, feature exists In the rear end switching layer (BST) obtains message from Kafka, carries out front-end server by BST ingress interface and exchanges with rear end Layer carries out information transmission.
6. a kind of Distributed-tier big data analysis based on Spark according to claim 1 handles model, feature exists In the real time business logical layer (RBLT) further includes indicating node and docking center;The expression node passes through spout and matchmaker Jie carries out data transmission;The docking center is carried out data transmission by bolt and medium.
7. a kind of Distributed-tier big data analysis based on Spark according to claim 1 handles model, feature exists In the non-real-time service logical layer (NRBLT) is for storing decision strategy;Wherein the decision strategy is stored in MongoDB In, it can be obtained the interface of quickly access large data collection using R program and Spark RDD.
8. a kind of distributed big data analysis based on Spark according to claim 1 handles model, which is characterized in that The data access layer (DAT) includes real time data resources bank, switching centre, baseline and data warehouse;Wherein real time data provides Source library carries out real-time data access to switching centre.
CN201810956427.7A 2018-08-21 2018-08-21 Spark-based distributed multi-layer big data analysis processing model Expired - Fee Related CN109271371B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810956427.7A CN109271371B (en) 2018-08-21 2018-08-21 Spark-based distributed multi-layer big data analysis processing model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810956427.7A CN109271371B (en) 2018-08-21 2018-08-21 Spark-based distributed multi-layer big data analysis processing model

Publications (2)

Publication Number Publication Date
CN109271371A true CN109271371A (en) 2019-01-25
CN109271371B CN109271371B (en) 2022-02-11

Family

ID=65154176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810956427.7A Expired - Fee Related CN109271371B (en) 2018-08-21 2018-08-21 Spark-based distributed multi-layer big data analysis processing model

Country Status (1)

Country Link
CN (1) CN109271371B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502509A (en) * 2019-08-27 2019-11-26 广东工业大学 A kind of traffic big data cleaning method and relevant apparatus based on Hadoop Yu Spark frame
CN111177765A (en) * 2020-01-06 2020-05-19 广州知弘科技有限公司 Financial big data processing method, storage medium and system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997020258A1 (en) * 1995-11-29 1997-06-05 Hybrithms Corp. Multiple-agent hybrid control architecture
US6112183A (en) * 1997-02-11 2000-08-29 United Healthcare Corporation Method and apparatus for processing health care transactions through a common interface in a distributed computing environment
US20040143602A1 (en) * 2002-10-18 2004-07-22 Antonio Ruiz Apparatus, system and method for automated and adaptive digital image/video surveillance for events and configurations using a rich multimedia relational database
CN102063306A (en) * 2011-01-06 2011-05-18 夏春秋 Technical implementation method for application development through electronic form
CN102364523A (en) * 2011-05-11 2012-02-29 武汉理工大学 Method for realizing three-dimensional virtual city system based on RIA (rich Internet application) architecture
CN102385739A (en) * 2011-11-15 2012-03-21 中国电力科学研究院 Integrated information management platform for county-level power supply enterprises
CN102523246A (en) * 2011-11-23 2012-06-27 陈刚 Cloud computation treating system and method
CN105162826A (en) * 2015-07-15 2015-12-16 中山大学 Cloud computing multilayer cloud architecture
CN107274062A (en) * 2017-05-11 2017-10-20 王嫣然 Share books management system and the sharing method using the system in a kind of campus based on school's LAN
CN107292473A (en) * 2016-04-10 2017-10-24 国网山东省电力公司经济技术研究院 The online estimating and examining system of planning feasibility study business and method based on process optimization
CN107657569A (en) * 2016-07-25 2018-02-02 湖南移商动力网络技术有限公司 J2EE and cloud computing design a kind of intelligence community system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997020258A1 (en) * 1995-11-29 1997-06-05 Hybrithms Corp. Multiple-agent hybrid control architecture
US6112183A (en) * 1997-02-11 2000-08-29 United Healthcare Corporation Method and apparatus for processing health care transactions through a common interface in a distributed computing environment
US20040143602A1 (en) * 2002-10-18 2004-07-22 Antonio Ruiz Apparatus, system and method for automated and adaptive digital image/video surveillance for events and configurations using a rich multimedia relational database
CN102063306A (en) * 2011-01-06 2011-05-18 夏春秋 Technical implementation method for application development through electronic form
CN102364523A (en) * 2011-05-11 2012-02-29 武汉理工大学 Method for realizing three-dimensional virtual city system based on RIA (rich Internet application) architecture
CN102385739A (en) * 2011-11-15 2012-03-21 中国电力科学研究院 Integrated information management platform for county-level power supply enterprises
CN102523246A (en) * 2011-11-23 2012-06-27 陈刚 Cloud computation treating system and method
CN105162826A (en) * 2015-07-15 2015-12-16 中山大学 Cloud computing multilayer cloud architecture
CN107292473A (en) * 2016-04-10 2017-10-24 国网山东省电力公司经济技术研究院 The online estimating and examining system of planning feasibility study business and method based on process optimization
CN107657569A (en) * 2016-07-25 2018-02-02 湖南移商动力网络技术有限公司 J2EE and cloud computing design a kind of intelligence community system
CN107274062A (en) * 2017-05-11 2017-10-20 王嫣然 Share books management system and the sharing method using the system in a kind of campus based on school's LAN

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴冕冠: "基于Spark的大数据应用开发支持环境研究开发", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王宇轲: "基于BA-BP算法的汽车配件需求预测系统研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502509A (en) * 2019-08-27 2019-11-26 广东工业大学 A kind of traffic big data cleaning method and relevant apparatus based on Hadoop Yu Spark frame
CN110502509B (en) * 2019-08-27 2023-04-18 广东工业大学 Traffic big data cleaning method based on Hadoop and Spark framework and related device
CN111177765A (en) * 2020-01-06 2020-05-19 广州知弘科技有限公司 Financial big data processing method, storage medium and system

Also Published As

Publication number Publication date
CN109271371B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
US8473422B2 (en) Method and system for social network analysis
CN108038145A (en) Distributed Services tracking, system, storage medium and electronic equipment
Sun et al. The cost-efficient deployment of replica servers in virtual content distribution networks for data fusion
Li et al. Topology-aware neural model for highly accurate QoS prediction
CN110245178A (en) Marketing automation management platform system and its management method
CN111130842B (en) Dynamic network map database construction method reflecting network multidimensional resources
WO2023185090A1 (en) Scheduling method and apparatus based on microservice link analysis and reinforcement learning
CN1956454B (en) Method and system for bundling and sending work units to a server based on a weighted cost
CN104410699A (en) Resource management method and system of open type cloud computing
WO2023217127A1 (en) Causation determination method and related device
CN113392150A (en) Data table display method, device, equipment and medium based on service domain
US8341263B2 (en) Peer to peer monitoring framework for transaction tracking
CN109271371A (en) A kind of Distributed-tier big data analysis processing model based on Spark
Akay et al. Predicting the performance measures of an optical distributed shared memory multiprocessor by using support vector regression
CN115373888A (en) Fault positioning method and device, electronic equipment and storage medium
Zhu et al. Analysis of stock market based on visibility graph and structure entropy
Wu et al. Blender: A container placement strategy by leveraging zipf-like distribution within containerized data centers
CN110380890A (en) A kind of CDN system service quality detection method and system
Yue et al. Desis: Efficient Window Aggregation in Decentralized Networks.
Deng The Informatization of Small and Medium‐Sized Enterprises Accounting System Based on Sensor Monitoring and Cloud Computing
CN114579311B (en) Method, device, equipment and storage medium for executing distributed computing task
Liu et al. Towards dynamic reconfiguration of composite services via failure estimation of general and domain quality of services
Zhu et al. An Influence Maximization Algorithm Based on Improved K-Shell in Temporal Social Networks.
Sudhakar et al. Path based optimization of mpi collective communication operation in cloud
CN111522662B (en) Node system for financial analysis and implementation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220211