CN104881476A - Cloud computing based mass data processing system - Google Patents

Cloud computing based mass data processing system Download PDF

Info

Publication number
CN104881476A
CN104881476A CN201510296226.5A CN201510296226A CN104881476A CN 104881476 A CN104881476 A CN 104881476A CN 201510296226 A CN201510296226 A CN 201510296226A CN 104881476 A CN104881476 A CN 104881476A
Authority
CN
China
Prior art keywords
cloud computing
distributed
hadoop
node
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510296226.5A
Other languages
Chinese (zh)
Inventor
陈勇
胡中骥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Science And Technology Co Ltd Is Swum In Jiangsu At Once
Original Assignee
Science And Technology Co Ltd Is Swum In Jiangsu At Once
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Science And Technology Co Ltd Is Swum In Jiangsu At Once filed Critical Science And Technology Co Ltd Is Swum In Jiangsu At Once
Priority to CN201510296226.5A priority Critical patent/CN104881476A/en
Publication of CN104881476A publication Critical patent/CN104881476A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a cloud computing based mass data processing system which comprises a Hadoop system, a distributed regional small group, a master node and a distributed file system, wherein the distributed regional small group is viewed as a node in a bigger non-sharing cluster and is managed by the Hadoop system, the master node is a coordinator of the Hadoop system, and data are stored in the distributed file system. The cloud computing based mass data processing system provides excellent loading balance, has a fault tolerance function and meets the requirements on distributed and parallel processing and can greatly reduce communication expense.

Description

A kind of mass data processing system based on cloud computing
Technical field
The present invention relates to data handling system, more specifically, relate to a kind of mass data processing system based on cloud computing.
Background technology
A major issue in cloud computing framework how to design an efficient accumulation layer to process the mass data on cloud computing platform.According to the design of swimming cloud platform at once, data are natural distributed management and storage, and namely all data connect into a data group by a high-speed local area network.The data of magnanimity are generated by various application on cloud plateform system, and possible data store and querying method is that use one is concentrated, and relational database management system (DBMS) is as bottom data accumulation layer.But we see the limitation of several this method, especially under distributed system.
First, central database server is difficult to the load balance realizing multiple node in system.
The second, be easy to appearance single point failure, namely Fault-Tolerant Problems may constitute a threat to the function of system.
3rd, it can produce very serious traffic load, because the data being distributed in each node must be delivered to central server by basic network.Finally, this pattern is difficult to realize parallel processing, to utilize the calculating advantage framework of cloud platform.
Summary of the invention
The object of the invention is the defect in order to solve existing for above-mentioned prior art, the present invention proposes a kind of mass data processing system based on cloud computing.
The technical solution adopted in the present invention is:
There is provided extendible distributed storage layer, adopt Hadoop system, keep distributed region groupuscule, then, these clusters are regarded as a larger node without sharing in cluster, return Hadoop system to manage.Each little cluster node is regarded as the slave node in Hadoop system, and wherein two host nodes are designated as the expeditor of Hadoop system.We are referred to as this design the Distributed Data Warehouse using Hadoop.We are stored in distributed file system data, HadoopDistributed File System (HDFS, and Map and the Reduce function that design ap-plication needs, to adapt to and to reduce calculated amount and the traffic of user application in cloud computing system.
This Distributed Data Warehouse is in particular designed by cloud computing framework, because it naturally provides fabulous load balance, fault tolerance, meets distributed and requirement that is parallel processing.Such as, distribution computation requirement can automatically be processed to underloaded node in our system.It utilizes the technology of data heavy duty, therefore, it is possible to the task that a failure node is performing is transplanted to other normal node continue evaluation work.Another attracting feature of our system is that it can greatly reduce the communication overhead of system.Our significant challenge to design, and realizes the design of personalized Map and Reduce to reduce communication cost and overall calculation cost (such as pruning unnecessary node visit and data transmission).The relational database management system of our also integrating traditional to our Hadoop Distributed Data Warehouse, especially in the process to structural data.For this reason, our useful expansion utilizes HadoopDB technology.Eachly use a relational database management system as its accumulation layer example in this locality from node, instead of only rely on HDFS's.Therefore, it can provide better efficiency (such as, to use an index structure a data base management system (DBMS), to accelerate to access local data) when processing structural data.
HBase is adopted to store computing system as our data.HBase is that an open source projects support is random, the large data of real-time read/write access.Its target is especially big table-billions of row on process commercial hardware cluster and millions of row.
The invention has the beneficial effects as follows,
The present invention is based on the mass data processing system of cloud computing,
1, provide fabulous load balance, fault tolerance, meet distributed and requirement that is parallel processing;
2, the communication overhead of system can be greatly reduced.
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Accompanying drawing explanation
Fig. 1 is the mass data processing system based on cloud computing of the present invention: data store and processing procedure.
Embodiment
In order to deepen the understanding of the present invention, below in conjunction with drawings and Examples, the present invention is further detailed explanation.Following examples only for technical scheme of the present invention is clearly described, and can not limit the scope of the invention with this.
Specific embodiments of the invention are,
As shown in Figure 1, provide extendible distributed storage layer, adopt Hadoop system, keep distributed region groupuscule, then, these clusters are regarded as a larger node without sharing in cluster, return Hadoop system to manage.Each little cluster node is regarded as the slave node in Hadoop system, and wherein two host nodes are designated as the expeditor of Hadoop system.We are referred to as this design the Distributed Data Warehouse using Hadoop.We are stored in distributed file system data, Hadoop Distributed File System (HDFS, and Map and the Reduce function that design ap-plication needs, to adapt to and to reduce calculated amount and the traffic of user application in cloud computing system.
This Distributed Data Warehouse is in particular designed by cloud computing framework, because it naturally provides fabulous load balance, fault tolerance, meets distributed and requirement that is parallel processing.Such as, distribution computation requirement can automatically be processed to underloaded node in our system.It utilizes the technology of data heavy duty, therefore, it is possible to the task that a failure node is performing is transplanted to other normal node continue evaluation work.Another attracting feature of our system is that it can greatly reduce the communication overhead of system.Our significant challenge to design, and realizes the design of personalized Map and Reduce to reduce communication cost and overall calculation cost (such as pruning unnecessary node visit and data transmission).The relational database management system of our also integrating traditional to our Hadoop Distributed Data Warehouse, especially in the process to structural data.For this reason, our useful expansion utilizes HadoopDB technology.Eachly use a relational database management system as its accumulation layer example in this locality from node, instead of only rely on HDFS's.Therefore, it can provide better efficiency (such as, to use an index structure a data base management system (DBMS), to accelerate to access local data) when processing structural data.
HBase is adopted to store computing system as our data.HBase is that an open source projects support is random, the large data of real-time read/write access.Its target is especially big table-billions of row on process commercial hardware cluster and millions of row.
Be noted that, the above embodiment is unrestricted to the explanation of technical solution of the present invention, the equivalent replacement of art those of ordinary skill or other amendments made according to prior art, as long as do not exceed thinking and the scope of technical solution of the present invention, all should be included within interest field of the presently claimed invention.

Claims (2)

1. the mass data processing system based on cloud computing, it is characterized in that: comprise Hadoop system, Distributed Area groupuscule, host node and distributed file system, Distributed Area groupuscule is regarded as a larger node without sharing in cluster, Hadoop system is returned to manage, host node is the expeditor of Hadoop system, and data are stored in distributed file system.
2. the mass data processing system based on cloud computing according to claim 1, is characterized in that: also comprise MapReduce node in described Hadoop system, to adapt to and to reduce calculated amount and the traffic of user application in cloud computing system.
CN201510296226.5A 2015-06-03 2015-06-03 Cloud computing based mass data processing system Pending CN104881476A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510296226.5A CN104881476A (en) 2015-06-03 2015-06-03 Cloud computing based mass data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510296226.5A CN104881476A (en) 2015-06-03 2015-06-03 Cloud computing based mass data processing system

Publications (1)

Publication Number Publication Date
CN104881476A true CN104881476A (en) 2015-09-02

Family

ID=53948969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510296226.5A Pending CN104881476A (en) 2015-06-03 2015-06-03 Cloud computing based mass data processing system

Country Status (1)

Country Link
CN (1) CN104881476A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169099A (en) * 2017-05-16 2017-09-15 成都四象联创科技有限公司 Data processing method based on HADOOP
CN109637278A (en) * 2019-01-03 2019-04-16 青岛萨纳斯智能科技股份有限公司 Big data teaching experiment training platform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169099A (en) * 2017-05-16 2017-09-15 成都四象联创科技有限公司 Data processing method based on HADOOP
CN109637278A (en) * 2019-01-03 2019-04-16 青岛萨纳斯智能科技股份有限公司 Big data teaching experiment training platform

Similar Documents

Publication Publication Date Title
CN102663117B (en) OLAP (On Line Analytical Processing) inquiry processing method facing database and Hadoop mixing platform
US9996552B2 (en) Method for generating a dataset structure for location-based services and method and system for providing location-based services to a mobile device
Padhy et al. RDBMS to NoSQL: reviewing some next-generation non-relational database’s
CN102567495B (en) Mass information storage system and implementation method
CN103593243B (en) Dynamic extensible trunked system for increasing virtual machine resources
CN104461740A (en) Cross-domain colony computing resource gathering and distributing method
CN103595799B (en) A kind of method realizing distributed shared data storehouse
CN103150304A (en) Cloud database system
CN103455512A (en) Multi-tenant data management model for SAAS (software as a service) platform
CN103793534A (en) Distributed file system and implementation method for balancing storage loads and access loads of metadata
CN103106249A (en) Data parallel processing system based on Cassandra
CN103399945A (en) Data structure based on cloud computing database system
CN103617162A (en) Method of constructing Hilbert R-tree index on equivalent cloud platform
US11080207B2 (en) Caching framework for big-data engines in the cloud
CN105354250A (en) Data storage method and device for cloud storage
CN103441918A (en) Self-organizing cluster server system and self-organizing method thereof
CN103034650B (en) A kind of data handling system and method
CN103823846A (en) Method for storing and querying big data on basis of graph theories
CN104182487A (en) Unified storage method supporting various storage modes
CN105677761A (en) Data sharding method and system
CN104539583A (en) Real-time database subscription system and method
CN105975345A (en) Video frame data dynamic equilibrium memory management method based on distributed memory
CN108153759B (en) Data transmission method of distributed database, intermediate layer server and system
CN103473848A (en) Network invoice checking frame and method based on high concurrency
CN103365987A (en) Clustered database system and data processing method based on shared-disk framework

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150902

WD01 Invention patent application deemed withdrawn after publication