CN105045929A - MPP architecture based distributed relational database - Google Patents

MPP architecture based distributed relational database Download PDF

Info

Publication number
CN105045929A
CN105045929A CN201510547427.8A CN201510547427A CN105045929A CN 105045929 A CN105045929 A CN 105045929A CN 201510547427 A CN201510547427 A CN 201510547427A CN 105045929 A CN105045929 A CN 105045929A
Authority
CN
China
Prior art keywords
cluster
distributed
framework according
global transaction
mpp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510547427.8A
Other languages
Chinese (zh)
Inventor
张宇
杨利兵
缪燕
李海
吕志来
张学深
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Beijing Xuji Electric Co Ltd
Original Assignee
State Grid Corp of China SGCC
Beijing Xuji Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Beijing Xuji Electric Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201510547427.8A priority Critical patent/CN105045929A/en
Publication of CN105045929A publication Critical patent/CN105045929A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provided an MPP architecture based distributed relational database and relates to the fields of databases, big data and distributed computing. The database comprises four modules: a global transaction manager in charge of global transaction processing; a load balancing system in charge of load balancing management of a cluster; a cluster coordination manager used for coordinating work among data nodes; and the data nodes of a relational database based on PowerDB deployment. According to the MPP architecture based distributed relational database provided by the present invention, aimed at the PowerDB relational database, a distributed environmental cluster is established; by adopting an MPP architecture technology and Shared Nothing among the nodes, no influence on cluster work is ensured when a single point has a fault; and the cluster can extend laterally, PB level data storage is implemented, and massive parallel data writing is supported.

Description

A kind of distributed relation database based on MPP framework
Technical field
The present invention relates to database field, large data fields, Distributed Calculation field, for large data processing provides a kind of mass data storage solution based on relevant database, support OLTP and OLAP two kinds of application scenarioss.
Background technology
Infotech obtains develop rapidly in recent years, the quantity of information of enterprise and society be geometric format growth, and this is that the Storage and Processing of large data brings huge challenge, as He Jianshe large data sets group, realizing the storage of mass data, is the matter of utmost importance that large data fields faces.
To increase income large data framework based on hadoop, it is current extensive adopted solution, by the distributed file system that hadoop provides, PB DBMS storage problem can be solved, simultaneously by the product under the hadoop ecosphere, can realize the data management of column storage, data warehouse level, part solves mass data access issues, but the storage solution that hadoop provides has the following disadvantages, one is do not support SQL business.Current a large number of services system all based on SQL exploitation, cannot successfully move to hadoop platform.Two is that one process is written in parallel to scarce capacity.Its one process of HDFS or HBase does not possess the ability of being written in parallel to, and cannot satisfying magnanimity information acquisition need.
Summary of the invention
PowerDB is a kind of Database Systems based on Single-Server exploitation, the technical problem to be solved in the present invention is for PowerDB relevant database, set up the cluster of distributed environment, by adopting MPP architecture technology, SharedNothing between each node, when bonding point breaks down, do not affect cluster work, cluster can be extending transversely, realizes PB DBMS and store, and supports the write of massive parallel data.
For meeting the mass data storage needs based on SQL, the invention provides a kind of distributed relation database based on MPP framework, this database is distributed structure/architecture, comprising four modules, is global transaction management module (Power-GTM), SiteServer LBS (Power-Proxy), data harmonization manager (Power-COORD), back end (Power-DataNode) respectively; Wherein:
Global transaction manager is responsible for global transaction process, and SiteServer LBS is responsible for the load balancing management of cluster, and cluster-coordinator manager is for coordinating the work between each back end, and back end is the relational database disposed based on PowerDB.
As a further improvement on the present invention, a cluster generally only has a global transaction management module.
As a further improvement on the present invention, single global transaction management module can configure (StandBy) for subsequent use node.
As a further improvement on the present invention, multiple coordination manager can be had in a cluster.
As a further improvement on the present invention, the annoying physical arrangement of back end and logical organization keep completely the same with PowerDB, namely under a station server, maintain single relational database system.
Optionally, for improving cluster reliability, GTM provides active/standby pattern, configured by GTM-Standby, can set up multiple GTM node in the cluster, the same time only has a node job, data are synchronized to GTM-Standby by stream reproduction technology from GTM-host, when Single Point of Faliure occurs GTM-host, GTM-Standby becomes GTM-host automatically, bears global transaction management work.
Optionally, for back end, its reliability is also realized by multinode Redundancy Design, namely each back end designs one or more Standby nodes, data in ablation process, by stream reproduction technology, be synchronized on secondary node, when there is Single Point of Faliure, system switches automatically, to ensure to work for cluster 7*24 hour.
Optionally, roundrobin algorithm is adopted to carry out building table handling, make distributed type assemblies can the most efficient response data write operation, consider cluster its own overhead, for the cluster of N number of back end, compare Single-Server application, the performance of about 0.7*N can be obtained, the needs of the high concurrent write service of satisfying magnanimity data.
Optionally, with java exploitation based on the Auto-mounting deployment tool under windows, Linux.
Optionally, distributed type assemblies management tool mainly comprises Telnet and management, query analysis manager, cluster monitoring instrument etc.
Accompanying drawing explanation
Fig. 1 is autonomous controlled distribution formula relational database architecture figure of the present invention;
Fig. 2 is simple distributed relational database cluster efficiently of the present invention.
Embodiment
Below in conjunction with Figure of description, the present invention is described in more detail.Should be understood to, embodiment described herein only for explaining the present invention, but does not limit the present invention.
PowerDB is relational database system, it is developed based on PostgreSQL database postgreSQL, PostgreSQL is the concurrent operation of a kind of support, preferably PostgreSQL database system compatible with ORACLE, by adopting MVCC (Multi version concurrency control) mechanism, improve data writing capability, by tables of data space is cut into block, thus support that unit is concurrent, PowerDB relevant database is in the constant situation of maintenance postgreSQL kernel, external tool is developed, meet operation system exploitation, DBA requirements of one's work, support SQL2008 standard.
(1) autonomous controlled distribution formula relational database architecture design.
System supports main flow Linux, windows environmental structure, and suggestion uses the hardware environment of X86, to save group construction cost.
Consider performance and problem of management, suggestion is installed under linux, as UBUNTU, RedHat, centOS etc.
GTM requires higher to server reliability, and suggestion adopts commercial server, and because GTM does not preserve any data, to hard disk no requirement (NR), request memory is not high yet, and general 16GB just can satisfy the demand.
Each back end generally disposes Power-proxy, Power-COORD, Power-DataNode tri-functions simultaneously, thus utilize server resource to greatest extent, higher internal memory and hard disk resources need be configured, be typically to strong E5CPU, 64GB internal memory, 6TB hard disk.
(2) autonomous controlled distribution formula relational database architecture planning.
For 1 GTM node and 5 back end, do to plan as follows:
For each assembly of data-base cluster, should distribute corresponding machine name and port numbers, be below planning table.
(3) distributed database management mechanism
By GTM, global transaction is managed, the table transversally cutting of database is become multiple data block, and be stored into corresponding back end (Power-DataNode) respectively, the operating mechanism of back end and the database of stand-alone environment are as good as, and it is responsible for the service such as insertion, inquiry, amendment of data.(1) data insertion process.GTM calculates data according to distributed algorithm should be put into for which back end, algorithm comprises hash algorithm (generating hash function according to field scope, determination data memory node), roundrobin algorithm (being distributed to each back end at random) etc.(2) singly data query is shown.Querying command is distributed to each back end by GTM, and Query Result is uploaded to GTM by each point, organizes data query collection, give user by GTM.(3) multilist correlation inquiry.Organized by Power-COORD coordination manager, general each back end deploy has coordination manager, coordination manager be in charge of from other querying node to associated data, and carry out associating with this node data and calculate, form single node Query Result.(4) index.Index adopts two-tier system, and namely each back end safeguards the index of oneself, and GTM also safeguards a simple index, sends a command to each back end respectively to index response process.
(4) distributed data library initialization
APD is deployed to the associative directory of each node, as/usr/local/PowerDB catalogue.
Installation arranges SSH, realizes cluster and exempts from key login.
Run the initialization that the instruments such as initgtm, initdb realize Power-GTM, Power-Proxy, Power-COORD, Power-DataNode.
Also above process can be realized with installation and deployment instrument.
(5) startup of cluster.
The first step, starts GTM service, global transaction manager is normally worked.
Second step, starts the Power-Proxy of each back end respectively.Realize associating of node and GTM.
3rd step, starts the Power-DataNode of each back end respectively.
4th step, starts the Power-COORD of each back end respectively.
5th step, each back end is set up the correspondence table of other node.
Above process realizes by cluster installation and deployment instrument.
(6) database measuring and application
Distributed data base is set up by SQL query manager.Process of establishing is accessed consistent with relational database.Building database user, and authorize different rights.
Set up node group (group), and for building table handling.
Creation database table.Typical commands is:
Createtablet1(idint,ageint)distributbyroundrobintogroupgp1;
The table of t1 is in above order establishment one, and it uses stochastic distribution algorithm, uses the node group of gp1.
Carry out readwrite tests to database table, unit can reach the writing speed of 100000/second substantially, distributed environment, and writing speed estimates the speed that can reach 0.7*N*100000 bar/second.
Set up data access environment by JDBC, ODBC, OLEDB, realize application system development.
(7) foundation of highly reliable distributed data base system.
Master/slave node pattern can be configured, realize the highly reliable scheme of cluster, when making cluster occur single node failure, loss of data, cluster shutdown can not be caused, thus promote cluster reliability.
GTM node master/slave is arranged to need to configure standy in configuration file be on state, and be configured for hot standby machine name or IP address, port numbers.
For the master/slave setting of back end, be also realized by configuration file, basic skills is consistent.
Above process can be realized by installation and deployment instrument.
(8) autonomous controlled distribution formula database maintenance.
When the discontented foot of distributed data storage capacity requires, can carrying out extending transversely to it, namely by increasing back end, realizing this function.
New node need configure Power-Proxy, Power-COORD, Power-DataNode tri-modules.
Configure standby secondary node on request.
Start the service function of three modules.
Added by new node in the group of cluster, node new so just comes into effect, and the data of new write will partly be saved on new node.
When Single Point of Faliure appears in system, malfunctioning node need be unloaded, after repairing under line, add cluster, after carrying out data syn-chronization, resume work.
Above process completes by installation and deployment instrument.
(9) data-base cluster is closed.
The first step, sends message to user, determines that cluster will be closed, and can wait for that user job completes, or postpones certain hour.
Second step, closes back end service.
3rd step, closes coordinator node service.
4th step, closes load balancing node serve.
5th step, closes global transaction node serve.
For general technical staff of the technical field of the invention, under the prerequisite not departing from design of the present invention and spirit, by some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims (12)

1. the distributed relation database based on MPP framework, it is characterized in that: this database comprises four modules, global transaction management module (Power-GTM), SiteServer LBS (Power-Proxy), data harmonization manager (Power-COORD), back end (Power-DataNode) respectively, wherein:
Global transaction manager is responsible for global transaction process, and SiteServer LBS is responsible for the load balancing management of cluster, and cluster-coordinator manager is for coordinating the work between each back end, and back end is the relational database disposed based on PowerDB.
2. a kind of distributed relation database based on MPP framework according to claim 1, is characterized in that: a cluster generally only has a global transaction management module.
3. a kind of distributed relation database based on MPP framework according to claim 2, is characterized in that: single global transaction management module can configure secondary node.
4. a kind of distributed relation database based on MPP framework according to claim 1, is characterized in that: have multiple coordination manager in a cluster.
5. a kind of distributed relation database based on MPP framework according to claim 1, is characterized in that: maintain single relational database system under a station server.
6. a kind of distributed relation database based on MPP framework according to claim 1, is characterized in that: global transaction management module provides active/standby pattern.
7. a kind of distributed relation database based on MPP framework according to claim 6, is characterized in that: set up multiple global transaction management node in the cluster.
8. a kind of distributed relation database based on MPP framework according to claim 7, is characterized in that: the same time only has a node job.
9. a kind of distributed relation database based on MPP framework according to claim 1, it is characterized in that: each back end designs one or more secondary nodes, data are in ablation process, by stream reproduction technology, be synchronized on secondary node, when there is Single Point of Faliure, system switches automatically.
10. a kind of distributed relation database based on MPP framework according to claim 1, is characterized in that: adopt roundrobin algorithm to carry out building table handling, makes distributed type assemblies can the most efficient response data write operation.
11. a kind of distributed relation databases based on MPP framework according to claim 1, is characterized in that: with java exploitation based on the Auto-mounting deployment tool under windows, Linux.
12. a kind of distributed relation databases based on MPP framework according to claim 1, is characterized in that: distributed type assemblies management tool mainly comprises Telnet and management, query analysis manager, cluster monitoring instrument etc.
CN201510547427.8A 2015-08-31 2015-08-31 MPP architecture based distributed relational database Pending CN105045929A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510547427.8A CN105045929A (en) 2015-08-31 2015-08-31 MPP architecture based distributed relational database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510547427.8A CN105045929A (en) 2015-08-31 2015-08-31 MPP architecture based distributed relational database

Publications (1)

Publication Number Publication Date
CN105045929A true CN105045929A (en) 2015-11-11

Family

ID=54452475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510547427.8A Pending CN105045929A (en) 2015-08-31 2015-08-31 MPP architecture based distributed relational database

Country Status (1)

Country Link
CN (1) CN105045929A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550309A (en) * 2015-12-12 2016-05-04 天津南大通用数据技术股份有限公司 MPP framework database cluster sequence system and sequence management method
CN106250566A (en) * 2016-08-31 2016-12-21 天津南大通用数据技术股份有限公司 A kind of distributed data base and the management method of data operation thereof
CN107480251A (en) * 2017-08-14 2017-12-15 福建新大陆软件工程有限公司 A kind of system for managing data access
CN108446145A (en) * 2018-03-21 2018-08-24 苏州提点信息科技有限公司 A kind of distributed document loads MPP data base methods automatically
CN109189561A (en) * 2018-08-08 2019-01-11 广东亿迅科技有限公司 A kind of transacter and its method based on MPP framework
CN109344192A (en) * 2018-10-24 2019-02-15 四川省气象探测数据中心 A kind of optimization CIMISS Database Systems and its adaptation method
CN109522098A (en) * 2018-11-28 2019-03-26 星环信息科技(上海)有限公司 Transaction methods, device, system and storage medium in distributed data base
CN110019343A (en) * 2017-12-15 2019-07-16 中国电力科学研究院有限公司 A kind of new energy meteorological data management method and system
CN110019523A (en) * 2017-12-01 2019-07-16 江苏奥博洋信息技术有限公司 A kind of storage method of big data
CN111984696A (en) * 2020-07-23 2020-11-24 深圳市赢时胜信息技术股份有限公司 Novel database and method
CN112488506A (en) * 2020-11-30 2021-03-12 哈尔滨工程大学 Extensible distributed architecture and self-organizing method of intelligent unmanned system cluster

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2362600A1 (en) * 2009-11-22 2011-08-31 Avaya Inc. Sending a user associated telecommunication address
CN104463465A (en) * 2014-12-05 2015-03-25 国家电网公司 Real-time monitoring cluster processing method based on distributed models
CN104657483A (en) * 2015-02-28 2015-05-27 华为技术有限公司 Business processing method, processing node, center node and cluster

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2362600A1 (en) * 2009-11-22 2011-08-31 Avaya Inc. Sending a user associated telecommunication address
CN104463465A (en) * 2014-12-05 2015-03-25 国家电网公司 Real-time monitoring cluster processing method based on distributed models
CN104657483A (en) * 2015-02-28 2015-05-27 华为技术有限公司 Business processing method, processing node, center node and cluster

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ODE: "Postgres-XC集群笔记-概念与环境搭建", 《HTTP://WWW.CNBLOGS.COM/ODE/P/POSTGRES_XC_CLUSTER_NOTES_INSTALL_AND_CONFIGURATION.HTML》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550309A (en) * 2015-12-12 2016-05-04 天津南大通用数据技术股份有限公司 MPP framework database cluster sequence system and sequence management method
CN106250566A (en) * 2016-08-31 2016-12-21 天津南大通用数据技术股份有限公司 A kind of distributed data base and the management method of data operation thereof
CN107480251A (en) * 2017-08-14 2017-12-15 福建新大陆软件工程有限公司 A kind of system for managing data access
CN110019523A (en) * 2017-12-01 2019-07-16 江苏奥博洋信息技术有限公司 A kind of storage method of big data
CN110019343A (en) * 2017-12-15 2019-07-16 中国电力科学研究院有限公司 A kind of new energy meteorological data management method and system
CN108446145A (en) * 2018-03-21 2018-08-24 苏州提点信息科技有限公司 A kind of distributed document loads MPP data base methods automatically
CN109189561A (en) * 2018-08-08 2019-01-11 广东亿迅科技有限公司 A kind of transacter and its method based on MPP framework
CN109344192A (en) * 2018-10-24 2019-02-15 四川省气象探测数据中心 A kind of optimization CIMISS Database Systems and its adaptation method
CN109522098A (en) * 2018-11-28 2019-03-26 星环信息科技(上海)有限公司 Transaction methods, device, system and storage medium in distributed data base
CN111984696A (en) * 2020-07-23 2020-11-24 深圳市赢时胜信息技术股份有限公司 Novel database and method
CN111984696B (en) * 2020-07-23 2023-11-10 深圳市赢时胜信息技术股份有限公司 Novel database and method
CN112488506A (en) * 2020-11-30 2021-03-12 哈尔滨工程大学 Extensible distributed architecture and self-organizing method of intelligent unmanned system cluster

Similar Documents

Publication Publication Date Title
CN105045929A (en) MPP architecture based distributed relational database
US8140498B2 (en) Distributed database system by sharing or replicating the meta information on memory caches
US8122284B2 (en) N+1 failover and resynchronization of data storage appliances
US20130110873A1 (en) Method and system for data storage and management
US10176184B2 (en) System and method for supporting persistent store versioning and integrity in a distributed data grid
CN103034739A (en) Distributed memory system and updating and querying method thereof
CN107423390B (en) Real-time data synchronization method based on OLTP-OLAP mixed relational database system
CN103150304A (en) Cloud database system
US10650024B2 (en) System and method of replicating data in a distributed system
Moiz et al. Database replication: A survey of open source and commercial tools
CN103593420A (en) Method for constructing heterogeneous database clusters on same platform by sharing online logs
US11003550B2 (en) Methods and systems of operating a database management system DBMS in a strong consistency mode
Qi Digital forensics and NoSQL databases
CN105956041A (en) Data model processing method based on Spring Data for MongoDB cluster
KR20130038517A (en) System and method for managing data using distributed containers
US10970177B2 (en) Methods and systems of managing consistency and availability tradeoffs in a real-time operational DBMS
Chen et al. A performance evaluation of distributed database architectures
CN105975546A (en) Novel computer supervision system
Azim et al. Offsite 2-Way Data Replication toward Improving Data Refresh Performance
CN109753245A (en) A kind of multiple disks load balancing asynchronous read and write dispatching method and device
Feuerlicht et al. Can relational DBMS scale up to the cloud?
Herrmann et al. Cinderella—Adaptive online partitioning of irregularly structured data
CN114385577A (en) Distributed file system
Faiz et al. Database replica management strategies in multidatabase systems with mobile hosts
KR101566884B1 (en) Distribution store system for managing unstructured data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20151111

RJ01 Rejection of invention patent application after publication