CN105045929A - MPP architecture based distributed relational database - Google Patents
MPP architecture based distributed relational database Download PDFInfo
- Publication number
- CN105045929A CN105045929A CN201510547427.8A CN201510547427A CN105045929A CN 105045929 A CN105045929 A CN 105045929A CN 201510547427 A CN201510547427 A CN 201510547427A CN 105045929 A CN105045929 A CN 105045929A
- Authority
- CN
- China
- Prior art keywords
- cluster
- distributed
- framework according
- global transaction
- mpp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provided an MPP architecture based distributed relational database and relates to the fields of databases, big data and distributed computing. The database comprises four modules: a global transaction manager in charge of global transaction processing; a load balancing system in charge of load balancing management of a cluster; a cluster coordination manager used for coordinating work among data nodes; and the data nodes of a relational database based on PowerDB deployment. According to the MPP architecture based distributed relational database provided by the present invention, aimed at the PowerDB relational database, a distributed environmental cluster is established; by adopting an MPP architecture technology and Shared Nothing among the nodes, no influence on cluster work is ensured when a single point has a fault; and the cluster can extend laterally, PB level data storage is implemented, and massive parallel data writing is supported.
Description
Technical field
The present invention relates to database field, large data fields, Distributed Calculation field, for large data processing provides a kind of mass data storage solution based on relevant database, support OLTP and OLAP two kinds of application scenarioss.
Background technology
Infotech obtains develop rapidly in recent years, the quantity of information of enterprise and society be geometric format growth, and this is that the Storage and Processing of large data brings huge challenge, as He Jianshe large data sets group, realizing the storage of mass data, is the matter of utmost importance that large data fields faces.
To increase income large data framework based on hadoop, it is current extensive adopted solution, by the distributed file system that hadoop provides, PB DBMS storage problem can be solved, simultaneously by the product under the hadoop ecosphere, can realize the data management of column storage, data warehouse level, part solves mass data access issues, but the storage solution that hadoop provides has the following disadvantages, one is do not support SQL business.Current a large number of services system all based on SQL exploitation, cannot successfully move to hadoop platform.Two is that one process is written in parallel to scarce capacity.Its one process of HDFS or HBase does not possess the ability of being written in parallel to, and cannot satisfying magnanimity information acquisition need.
Summary of the invention
PowerDB is a kind of Database Systems based on Single-Server exploitation, the technical problem to be solved in the present invention is for PowerDB relevant database, set up the cluster of distributed environment, by adopting MPP architecture technology, SharedNothing between each node, when bonding point breaks down, do not affect cluster work, cluster can be extending transversely, realizes PB DBMS and store, and supports the write of massive parallel data.
For meeting the mass data storage needs based on SQL, the invention provides a kind of distributed relation database based on MPP framework, this database is distributed structure/architecture, comprising four modules, is global transaction management module (Power-GTM), SiteServer LBS (Power-Proxy), data harmonization manager (Power-COORD), back end (Power-DataNode) respectively; Wherein:
Global transaction manager is responsible for global transaction process, and SiteServer LBS is responsible for the load balancing management of cluster, and cluster-coordinator manager is for coordinating the work between each back end, and back end is the relational database disposed based on PowerDB.
As a further improvement on the present invention, a cluster generally only has a global transaction management module.
As a further improvement on the present invention, single global transaction management module can configure (StandBy) for subsequent use node.
As a further improvement on the present invention, multiple coordination manager can be had in a cluster.
As a further improvement on the present invention, the annoying physical arrangement of back end and logical organization keep completely the same with PowerDB, namely under a station server, maintain single relational database system.
Optionally, for improving cluster reliability, GTM provides active/standby pattern, configured by GTM-Standby, can set up multiple GTM node in the cluster, the same time only has a node job, data are synchronized to GTM-Standby by stream reproduction technology from GTM-host, when Single Point of Faliure occurs GTM-host, GTM-Standby becomes GTM-host automatically, bears global transaction management work.
Optionally, for back end, its reliability is also realized by multinode Redundancy Design, namely each back end designs one or more Standby nodes, data in ablation process, by stream reproduction technology, be synchronized on secondary node, when there is Single Point of Faliure, system switches automatically, to ensure to work for cluster 7*24 hour.
Optionally, roundrobin algorithm is adopted to carry out building table handling, make distributed type assemblies can the most efficient response data write operation, consider cluster its own overhead, for the cluster of N number of back end, compare Single-Server application, the performance of about 0.7*N can be obtained, the needs of the high concurrent write service of satisfying magnanimity data.
Optionally, with java exploitation based on the Auto-mounting deployment tool under windows, Linux.
Optionally, distributed type assemblies management tool mainly comprises Telnet and management, query analysis manager, cluster monitoring instrument etc.
Accompanying drawing explanation
Fig. 1 is autonomous controlled distribution formula relational database architecture figure of the present invention;
Fig. 2 is simple distributed relational database cluster efficiently of the present invention.
Embodiment
Below in conjunction with Figure of description, the present invention is described in more detail.Should be understood to, embodiment described herein only for explaining the present invention, but does not limit the present invention.
PowerDB is relational database system, it is developed based on PostgreSQL database postgreSQL, PostgreSQL is the concurrent operation of a kind of support, preferably PostgreSQL database system compatible with ORACLE, by adopting MVCC (Multi version concurrency control) mechanism, improve data writing capability, by tables of data space is cut into block, thus support that unit is concurrent, PowerDB relevant database is in the constant situation of maintenance postgreSQL kernel, external tool is developed, meet operation system exploitation, DBA requirements of one's work, support SQL2008 standard.
(1) autonomous controlled distribution formula relational database architecture design.
System supports main flow Linux, windows environmental structure, and suggestion uses the hardware environment of X86, to save group construction cost.
Consider performance and problem of management, suggestion is installed under linux, as UBUNTU, RedHat, centOS etc.
GTM requires higher to server reliability, and suggestion adopts commercial server, and because GTM does not preserve any data, to hard disk no requirement (NR), request memory is not high yet, and general 16GB just can satisfy the demand.
Each back end generally disposes Power-proxy, Power-COORD, Power-DataNode tri-functions simultaneously, thus utilize server resource to greatest extent, higher internal memory and hard disk resources need be configured, be typically to strong E5CPU, 64GB internal memory, 6TB hard disk.
(2) autonomous controlled distribution formula relational database architecture planning.
For 1 GTM node and 5 back end, do to plan as follows:
For each assembly of data-base cluster, should distribute corresponding machine name and port numbers, be below planning table.
(3) distributed database management mechanism
By GTM, global transaction is managed, the table transversally cutting of database is become multiple data block, and be stored into corresponding back end (Power-DataNode) respectively, the operating mechanism of back end and the database of stand-alone environment are as good as, and it is responsible for the service such as insertion, inquiry, amendment of data.(1) data insertion process.GTM calculates data according to distributed algorithm should be put into for which back end, algorithm comprises hash algorithm (generating hash function according to field scope, determination data memory node), roundrobin algorithm (being distributed to each back end at random) etc.(2) singly data query is shown.Querying command is distributed to each back end by GTM, and Query Result is uploaded to GTM by each point, organizes data query collection, give user by GTM.(3) multilist correlation inquiry.Organized by Power-COORD coordination manager, general each back end deploy has coordination manager, coordination manager be in charge of from other querying node to associated data, and carry out associating with this node data and calculate, form single node Query Result.(4) index.Index adopts two-tier system, and namely each back end safeguards the index of oneself, and GTM also safeguards a simple index, sends a command to each back end respectively to index response process.
(4) distributed data library initialization
APD is deployed to the associative directory of each node, as/usr/local/PowerDB catalogue.
Installation arranges SSH, realizes cluster and exempts from key login.
Run the initialization that the instruments such as initgtm, initdb realize Power-GTM, Power-Proxy, Power-COORD, Power-DataNode.
Also above process can be realized with installation and deployment instrument.
(5) startup of cluster.
The first step, starts GTM service, global transaction manager is normally worked.
Second step, starts the Power-Proxy of each back end respectively.Realize associating of node and GTM.
3rd step, starts the Power-DataNode of each back end respectively.
4th step, starts the Power-COORD of each back end respectively.
5th step, each back end is set up the correspondence table of other node.
Above process realizes by cluster installation and deployment instrument.
(6) database measuring and application
Distributed data base is set up by SQL query manager.Process of establishing is accessed consistent with relational database.Building database user, and authorize different rights.
Set up node group (group), and for building table handling.
Creation database table.Typical commands is:
Createtablet1(idint,ageint)distributbyroundrobintogroupgp1;
The table of t1 is in above order establishment one, and it uses stochastic distribution algorithm, uses the node group of gp1.
Carry out readwrite tests to database table, unit can reach the writing speed of 100000/second substantially, distributed environment, and writing speed estimates the speed that can reach 0.7*N*100000 bar/second.
Set up data access environment by JDBC, ODBC, OLEDB, realize application system development.
(7) foundation of highly reliable distributed data base system.
Master/slave node pattern can be configured, realize the highly reliable scheme of cluster, when making cluster occur single node failure, loss of data, cluster shutdown can not be caused, thus promote cluster reliability.
GTM node master/slave is arranged to need to configure standy in configuration file be on state, and be configured for hot standby machine name or IP address, port numbers.
For the master/slave setting of back end, be also realized by configuration file, basic skills is consistent.
Above process can be realized by installation and deployment instrument.
(8) autonomous controlled distribution formula database maintenance.
When the discontented foot of distributed data storage capacity requires, can carrying out extending transversely to it, namely by increasing back end, realizing this function.
New node need configure Power-Proxy, Power-COORD, Power-DataNode tri-modules.
Configure standby secondary node on request.
Start the service function of three modules.
Added by new node in the group of cluster, node new so just comes into effect, and the data of new write will partly be saved on new node.
When Single Point of Faliure appears in system, malfunctioning node need be unloaded, after repairing under line, add cluster, after carrying out data syn-chronization, resume work.
Above process completes by installation and deployment instrument.
(9) data-base cluster is closed.
The first step, sends message to user, determines that cluster will be closed, and can wait for that user job completes, or postpones certain hour.
Second step, closes back end service.
3rd step, closes coordinator node service.
4th step, closes load balancing node serve.
5th step, closes global transaction node serve.
For general technical staff of the technical field of the invention, under the prerequisite not departing from design of the present invention and spirit, by some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.
Claims (12)
1. the distributed relation database based on MPP framework, it is characterized in that: this database comprises four modules, global transaction management module (Power-GTM), SiteServer LBS (Power-Proxy), data harmonization manager (Power-COORD), back end (Power-DataNode) respectively, wherein:
Global transaction manager is responsible for global transaction process, and SiteServer LBS is responsible for the load balancing management of cluster, and cluster-coordinator manager is for coordinating the work between each back end, and back end is the relational database disposed based on PowerDB.
2. a kind of distributed relation database based on MPP framework according to claim 1, is characterized in that: a cluster generally only has a global transaction management module.
3. a kind of distributed relation database based on MPP framework according to claim 2, is characterized in that: single global transaction management module can configure secondary node.
4. a kind of distributed relation database based on MPP framework according to claim 1, is characterized in that: have multiple coordination manager in a cluster.
5. a kind of distributed relation database based on MPP framework according to claim 1, is characterized in that: maintain single relational database system under a station server.
6. a kind of distributed relation database based on MPP framework according to claim 1, is characterized in that: global transaction management module provides active/standby pattern.
7. a kind of distributed relation database based on MPP framework according to claim 6, is characterized in that: set up multiple global transaction management node in the cluster.
8. a kind of distributed relation database based on MPP framework according to claim 7, is characterized in that: the same time only has a node job.
9. a kind of distributed relation database based on MPP framework according to claim 1, it is characterized in that: each back end designs one or more secondary nodes, data are in ablation process, by stream reproduction technology, be synchronized on secondary node, when there is Single Point of Faliure, system switches automatically.
10. a kind of distributed relation database based on MPP framework according to claim 1, is characterized in that: adopt roundrobin algorithm to carry out building table handling, makes distributed type assemblies can the most efficient response data write operation.
11. a kind of distributed relation databases based on MPP framework according to claim 1, is characterized in that: with java exploitation based on the Auto-mounting deployment tool under windows, Linux.
12. a kind of distributed relation databases based on MPP framework according to claim 1, is characterized in that: distributed type assemblies management tool mainly comprises Telnet and management, query analysis manager, cluster monitoring instrument etc.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510547427.8A CN105045929A (en) | 2015-08-31 | 2015-08-31 | MPP architecture based distributed relational database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510547427.8A CN105045929A (en) | 2015-08-31 | 2015-08-31 | MPP architecture based distributed relational database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105045929A true CN105045929A (en) | 2015-11-11 |
Family
ID=54452475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510547427.8A Pending CN105045929A (en) | 2015-08-31 | 2015-08-31 | MPP architecture based distributed relational database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105045929A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105550309A (en) * | 2015-12-12 | 2016-05-04 | 天津南大通用数据技术股份有限公司 | MPP framework database cluster sequence system and sequence management method |
CN106250566A (en) * | 2016-08-31 | 2016-12-21 | 天津南大通用数据技术股份有限公司 | A kind of distributed data base and the management method of data operation thereof |
CN107480251A (en) * | 2017-08-14 | 2017-12-15 | 福建新大陆软件工程有限公司 | A kind of system for managing data access |
CN108446145A (en) * | 2018-03-21 | 2018-08-24 | 苏州提点信息科技有限公司 | A kind of distributed document loads MPP data base methods automatically |
CN109189561A (en) * | 2018-08-08 | 2019-01-11 | 广东亿迅科技有限公司 | A kind of transacter and its method based on MPP framework |
CN109344192A (en) * | 2018-10-24 | 2019-02-15 | 四川省气象探测数据中心 | A kind of optimization CIMISS Database Systems and its adaptation method |
CN109522098A (en) * | 2018-11-28 | 2019-03-26 | 星环信息科技(上海)有限公司 | Transaction methods, device, system and storage medium in distributed data base |
CN110019343A (en) * | 2017-12-15 | 2019-07-16 | 中国电力科学研究院有限公司 | A kind of new energy meteorological data management method and system |
CN110019523A (en) * | 2017-12-01 | 2019-07-16 | 江苏奥博洋信息技术有限公司 | A kind of storage method of big data |
CN111984696A (en) * | 2020-07-23 | 2020-11-24 | 深圳市赢时胜信息技术股份有限公司 | Novel database and method |
CN112488506A (en) * | 2020-11-30 | 2021-03-12 | 哈尔滨工程大学 | Extensible distributed architecture and self-organizing method of intelligent unmanned system cluster |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2362600A1 (en) * | 2009-11-22 | 2011-08-31 | Avaya Inc. | Sending a user associated telecommunication address |
CN104463465A (en) * | 2014-12-05 | 2015-03-25 | 国家电网公司 | Real-time monitoring cluster processing method based on distributed models |
CN104657483A (en) * | 2015-02-28 | 2015-05-27 | 华为技术有限公司 | Business processing method, processing node, center node and cluster |
-
2015
- 2015-08-31 CN CN201510547427.8A patent/CN105045929A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2362600A1 (en) * | 2009-11-22 | 2011-08-31 | Avaya Inc. | Sending a user associated telecommunication address |
CN104463465A (en) * | 2014-12-05 | 2015-03-25 | 国家电网公司 | Real-time monitoring cluster processing method based on distributed models |
CN104657483A (en) * | 2015-02-28 | 2015-05-27 | 华为技术有限公司 | Business processing method, processing node, center node and cluster |
Non-Patent Citations (1)
Title |
---|
ODE: "Postgres-XC集群笔记-概念与环境搭建", 《HTTP://WWW.CNBLOGS.COM/ODE/P/POSTGRES_XC_CLUSTER_NOTES_INSTALL_AND_CONFIGURATION.HTML》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105550309A (en) * | 2015-12-12 | 2016-05-04 | 天津南大通用数据技术股份有限公司 | MPP framework database cluster sequence system and sequence management method |
CN106250566A (en) * | 2016-08-31 | 2016-12-21 | 天津南大通用数据技术股份有限公司 | A kind of distributed data base and the management method of data operation thereof |
CN107480251A (en) * | 2017-08-14 | 2017-12-15 | 福建新大陆软件工程有限公司 | A kind of system for managing data access |
CN110019523A (en) * | 2017-12-01 | 2019-07-16 | 江苏奥博洋信息技术有限公司 | A kind of storage method of big data |
CN110019343A (en) * | 2017-12-15 | 2019-07-16 | 中国电力科学研究院有限公司 | A kind of new energy meteorological data management method and system |
CN108446145A (en) * | 2018-03-21 | 2018-08-24 | 苏州提点信息科技有限公司 | A kind of distributed document loads MPP data base methods automatically |
CN109189561A (en) * | 2018-08-08 | 2019-01-11 | 广东亿迅科技有限公司 | A kind of transacter and its method based on MPP framework |
CN109344192A (en) * | 2018-10-24 | 2019-02-15 | 四川省气象探测数据中心 | A kind of optimization CIMISS Database Systems and its adaptation method |
CN109522098A (en) * | 2018-11-28 | 2019-03-26 | 星环信息科技(上海)有限公司 | Transaction methods, device, system and storage medium in distributed data base |
CN111984696A (en) * | 2020-07-23 | 2020-11-24 | 深圳市赢时胜信息技术股份有限公司 | Novel database and method |
CN111984696B (en) * | 2020-07-23 | 2023-11-10 | 深圳市赢时胜信息技术股份有限公司 | Novel database and method |
CN112488506A (en) * | 2020-11-30 | 2021-03-12 | 哈尔滨工程大学 | Extensible distributed architecture and self-organizing method of intelligent unmanned system cluster |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105045929A (en) | MPP architecture based distributed relational database | |
US8140498B2 (en) | Distributed database system by sharing or replicating the meta information on memory caches | |
US8122284B2 (en) | N+1 failover and resynchronization of data storage appliances | |
US20130110873A1 (en) | Method and system for data storage and management | |
US10176184B2 (en) | System and method for supporting persistent store versioning and integrity in a distributed data grid | |
CN103034739A (en) | Distributed memory system and updating and querying method thereof | |
CN107423390B (en) | Real-time data synchronization method based on OLTP-OLAP mixed relational database system | |
CN103150304A (en) | Cloud database system | |
US10650024B2 (en) | System and method of replicating data in a distributed system | |
Moiz et al. | Database replication: A survey of open source and commercial tools | |
CN103593420A (en) | Method for constructing heterogeneous database clusters on same platform by sharing online logs | |
US11003550B2 (en) | Methods and systems of operating a database management system DBMS in a strong consistency mode | |
Qi | Digital forensics and NoSQL databases | |
CN105956041A (en) | Data model processing method based on Spring Data for MongoDB cluster | |
KR20130038517A (en) | System and method for managing data using distributed containers | |
US10970177B2 (en) | Methods and systems of managing consistency and availability tradeoffs in a real-time operational DBMS | |
Chen et al. | A performance evaluation of distributed database architectures | |
CN105975546A (en) | Novel computer supervision system | |
Azim et al. | Offsite 2-Way Data Replication toward Improving Data Refresh Performance | |
CN109753245A (en) | A kind of multiple disks load balancing asynchronous read and write dispatching method and device | |
Feuerlicht et al. | Can relational DBMS scale up to the cloud? | |
Herrmann et al. | Cinderella—Adaptive online partitioning of irregularly structured data | |
CN114385577A (en) | Distributed file system | |
Faiz et al. | Database replica management strategies in multidatabase systems with mobile hosts | |
KR101566884B1 (en) | Distribution store system for managing unstructured data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20151111 |
|
RJ01 | Rejection of invention patent application after publication |