CN104881476A - Cloud computing based mass data processing system - Google Patents
Cloud computing based mass data processing system Download PDFInfo
- Publication number
- CN104881476A CN104881476A CN201510296226.5A CN201510296226A CN104881476A CN 104881476 A CN104881476 A CN 104881476A CN 201510296226 A CN201510296226 A CN 201510296226A CN 104881476 A CN104881476 A CN 104881476A
- Authority
- CN
- China
- Prior art keywords
- cloud computing
- distributed
- hadoop
- node
- data processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a cloud computing based mass data processing system which comprises a Hadoop system, a distributed regional small group, a master node and a distributed file system, wherein the distributed regional small group is viewed as a node in a bigger non-sharing cluster and is managed by the Hadoop system, the master node is a coordinator of the Hadoop system, and data are stored in the distributed file system. The cloud computing based mass data processing system provides excellent loading balance, has a fault tolerance function and meets the requirements on distributed and parallel processing and can greatly reduce communication expense.
Description
Technical field
The present invention relates to data handling system, more specifically, relate to a kind of mass data processing system based on cloud computing.
Background technology
A major issue in cloud computing framework how to design an efficient accumulation layer to process the mass data on cloud computing platform.According to the design of swimming cloud platform at once, data are natural distributed management and storage, and namely all data connect into a data group by a high-speed local area network.The data of magnanimity are generated by various application on cloud plateform system, and possible data store and querying method is that use one is concentrated, and relational database management system (DBMS) is as bottom data accumulation layer.But we see the limitation of several this method, especially under distributed system.
First, central database server is difficult to the load balance realizing multiple node in system.
The second, be easy to appearance single point failure, namely Fault-Tolerant Problems may constitute a threat to the function of system.
3rd, it can produce very serious traffic load, because the data being distributed in each node must be delivered to central server by basic network.Finally, this pattern is difficult to realize parallel processing, to utilize the calculating advantage framework of cloud platform.
Summary of the invention
The object of the invention is the defect in order to solve existing for above-mentioned prior art, the present invention proposes a kind of mass data processing system based on cloud computing.
The technical solution adopted in the present invention is:
There is provided extendible distributed storage layer, adopt Hadoop system, keep distributed region groupuscule, then, these clusters are regarded as a larger node without sharing in cluster, return Hadoop system to manage.Each little cluster node is regarded as the slave node in Hadoop system, and wherein two host nodes are designated as the expeditor of Hadoop system.We are referred to as this design the Distributed Data Warehouse using Hadoop.We are stored in distributed file system data, HadoopDistributed File System (HDFS, and Map and the Reduce function that design ap-plication needs, to adapt to and to reduce calculated amount and the traffic of user application in cloud computing system.
This Distributed Data Warehouse is in particular designed by cloud computing framework, because it naturally provides fabulous load balance, fault tolerance, meets distributed and requirement that is parallel processing.Such as, distribution computation requirement can automatically be processed to underloaded node in our system.It utilizes the technology of data heavy duty, therefore, it is possible to the task that a failure node is performing is transplanted to other normal node continue evaluation work.Another attracting feature of our system is that it can greatly reduce the communication overhead of system.Our significant challenge to design, and realizes the design of personalized Map and Reduce to reduce communication cost and overall calculation cost (such as pruning unnecessary node visit and data transmission).The relational database management system of our also integrating traditional to our Hadoop Distributed Data Warehouse, especially in the process to structural data.For this reason, our useful expansion utilizes HadoopDB technology.Eachly use a relational database management system as its accumulation layer example in this locality from node, instead of only rely on HDFS's.Therefore, it can provide better efficiency (such as, to use an index structure a data base management system (DBMS), to accelerate to access local data) when processing structural data.
HBase is adopted to store computing system as our data.HBase is that an open source projects support is random, the large data of real-time read/write access.Its target is especially big table-billions of row on process commercial hardware cluster and millions of row.
The invention has the beneficial effects as follows,
The present invention is based on the mass data processing system of cloud computing,
1, provide fabulous load balance, fault tolerance, meet distributed and requirement that is parallel processing;
2, the communication overhead of system can be greatly reduced.
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Accompanying drawing explanation
Fig. 1 is the mass data processing system based on cloud computing of the present invention: data store and processing procedure.
Embodiment
In order to deepen the understanding of the present invention, below in conjunction with drawings and Examples, the present invention is further detailed explanation.Following examples only for technical scheme of the present invention is clearly described, and can not limit the scope of the invention with this.
Specific embodiments of the invention are,
As shown in Figure 1, provide extendible distributed storage layer, adopt Hadoop system, keep distributed region groupuscule, then, these clusters are regarded as a larger node without sharing in cluster, return Hadoop system to manage.Each little cluster node is regarded as the slave node in Hadoop system, and wherein two host nodes are designated as the expeditor of Hadoop system.We are referred to as this design the Distributed Data Warehouse using Hadoop.We are stored in distributed file system data, Hadoop Distributed File System (HDFS, and Map and the Reduce function that design ap-plication needs, to adapt to and to reduce calculated amount and the traffic of user application in cloud computing system.
This Distributed Data Warehouse is in particular designed by cloud computing framework, because it naturally provides fabulous load balance, fault tolerance, meets distributed and requirement that is parallel processing.Such as, distribution computation requirement can automatically be processed to underloaded node in our system.It utilizes the technology of data heavy duty, therefore, it is possible to the task that a failure node is performing is transplanted to other normal node continue evaluation work.Another attracting feature of our system is that it can greatly reduce the communication overhead of system.Our significant challenge to design, and realizes the design of personalized Map and Reduce to reduce communication cost and overall calculation cost (such as pruning unnecessary node visit and data transmission).The relational database management system of our also integrating traditional to our Hadoop Distributed Data Warehouse, especially in the process to structural data.For this reason, our useful expansion utilizes HadoopDB technology.Eachly use a relational database management system as its accumulation layer example in this locality from node, instead of only rely on HDFS's.Therefore, it can provide better efficiency (such as, to use an index structure a data base management system (DBMS), to accelerate to access local data) when processing structural data.
HBase is adopted to store computing system as our data.HBase is that an open source projects support is random, the large data of real-time read/write access.Its target is especially big table-billions of row on process commercial hardware cluster and millions of row.
Be noted that, the above embodiment is unrestricted to the explanation of technical solution of the present invention, the equivalent replacement of art those of ordinary skill or other amendments made according to prior art, as long as do not exceed thinking and the scope of technical solution of the present invention, all should be included within interest field of the presently claimed invention.
Claims (2)
1. the mass data processing system based on cloud computing, it is characterized in that: comprise Hadoop system, Distributed Area groupuscule, host node and distributed file system, Distributed Area groupuscule is regarded as a larger node without sharing in cluster, Hadoop system is returned to manage, host node is the expeditor of Hadoop system, and data are stored in distributed file system.
2. the mass data processing system based on cloud computing according to claim 1, is characterized in that: also comprise MapReduce node in described Hadoop system, to adapt to and to reduce calculated amount and the traffic of user application in cloud computing system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510296226.5A CN104881476A (en) | 2015-06-03 | 2015-06-03 | Cloud computing based mass data processing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510296226.5A CN104881476A (en) | 2015-06-03 | 2015-06-03 | Cloud computing based mass data processing system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104881476A true CN104881476A (en) | 2015-09-02 |
Family
ID=53948969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510296226.5A Pending CN104881476A (en) | 2015-06-03 | 2015-06-03 | Cloud computing based mass data processing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104881476A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169099A (en) * | 2017-05-16 | 2017-09-15 | 成都四象联创科技有限公司 | Data processing method based on HADOOP |
CN109637278A (en) * | 2019-01-03 | 2019-04-16 | 青岛萨纳斯智能科技股份有限公司 | Big data teaching experiment training platform |
-
2015
- 2015-06-03 CN CN201510296226.5A patent/CN104881476A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169099A (en) * | 2017-05-16 | 2017-09-15 | 成都四象联创科技有限公司 | Data processing method based on HADOOP |
CN109637278A (en) * | 2019-01-03 | 2019-04-16 | 青岛萨纳斯智能科技股份有限公司 | Big data teaching experiment training platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102663117B (en) | OLAP (On Line Analytical Processing) inquiry processing method facing database and Hadoop mixing platform | |
US9996552B2 (en) | Method for generating a dataset structure for location-based services and method and system for providing location-based services to a mobile device | |
Padhy et al. | RDBMS to NoSQL: reviewing some next-generation non-relational database’s | |
CN102567495B (en) | Mass information storage system and implementation method | |
CN103593243B (en) | Dynamic extensible trunked system for increasing virtual machine resources | |
CN104461740A (en) | Cross-domain colony computing resource gathering and distributing method | |
CN103595799B (en) | A kind of method realizing distributed shared data storehouse | |
CN103150304A (en) | Cloud database system | |
CN103455512A (en) | Multi-tenant data management model for SAAS (software as a service) platform | |
CN103793534A (en) | Distributed file system and implementation method for balancing storage loads and access loads of metadata | |
CN103106249A (en) | Data parallel processing system based on Cassandra | |
CN103399945A (en) | Data structure based on cloud computing database system | |
CN103617162A (en) | Method of constructing Hilbert R-tree index on equivalent cloud platform | |
US11080207B2 (en) | Caching framework for big-data engines in the cloud | |
CN105354250A (en) | Data storage method and device for cloud storage | |
CN103441918A (en) | Self-organizing cluster server system and self-organizing method thereof | |
CN103034650B (en) | A kind of data handling system and method | |
CN103823846A (en) | Method for storing and querying big data on basis of graph theories | |
CN104182487A (en) | Unified storage method supporting various storage modes | |
CN105677761A (en) | Data sharding method and system | |
CN104539583A (en) | Real-time database subscription system and method | |
CN105975345A (en) | Video frame data dynamic equilibrium memory management method based on distributed memory | |
CN108153759B (en) | Data transmission method of distributed database, intermediate layer server and system | |
CN103473848A (en) | Network invoice checking frame and method based on high concurrency | |
CN103365987A (en) | Clustered database system and data processing method based on shared-disk framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150902 |
|
WD01 | Invention patent application deemed withdrawn after publication |