CN112860799A - Management method for data synchronization of distributed database - Google Patents
Management method for data synchronization of distributed database Download PDFInfo
- Publication number
- CN112860799A CN112860799A CN202110195939.8A CN202110195939A CN112860799A CN 112860799 A CN112860799 A CN 112860799A CN 202110195939 A CN202110195939 A CN 202110195939A CN 112860799 A CN112860799 A CN 112860799A
- Authority
- CN
- China
- Prior art keywords
- node
- nodes
- data synchronization
- connection
- distributed database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007726 management method Methods 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 claims description 14
- 230000000977 initiatory effect Effects 0.000 claims description 8
- 230000001360 synchronised effect Effects 0.000 claims description 5
- 238000009825 accumulation Methods 0.000 claims description 2
- 238000004891 communication Methods 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 4
- 229920006395 saturated elastomer Polymers 0.000 description 2
- 235000008694 Humulus lupulus Nutrition 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Abstract
The invention discloses a management method for data synchronization of a distributed database, and belongs to the technical field of distributed database systems. The management method for the data synchronization of the distributed database is based on the Gossip protocol, all nodes in the cluster establish network connection and metadata synchronization based on the Gossip protocol, and the connection influencing the network performance is adjusted by periodically checking the network connection condition of the nodes. The management method for data synchronization of the distributed database can improve the communication efficiency between nodes under a full peer-to-peer architecture, ensures the characteristics of high concurrency, high availability and the like of the distributed database, and has good popularization and application values.
Description
Technical Field
The invention relates to the technical field of distributed database systems, and particularly provides a management method for data synchronization of a distributed database.
Background
Distributed database systems are usually logically unified whole, and physically stored in different physical nodes, and their main design goals are scalability, strong consistency and high reliability. The data in the databases are stored in different local databases, managed by different database management systems, run on different machines, supported by different operating systems, and connected together by different communication networks. The Shared nothing architecture is a common distributed database architecture. Each node in the architecture is independent and self-sufficient, and there is no single point of competition in the overall system. Each node has a private CPU, a private memory, a private hard disk and the like, shared resources do not exist, all processing units are communicated through a protocol, and the parallel processing and expansion capabilities are better. Each node is independent from each other, and each node processes own data, and the processed results can be collected to an upper layer or transferred among the nodes.
In order to achieve the characteristics of the distributed database, each node needs to know operation information in the whole cluster, including overall configuration information of the cluster, node configuration information, node health condition, storage use condition, data information describing the location of stored data, node network connection condition, and the like, where the data information is different from user data and called metadata information. In order to implement the above functions, a current Shared nothing distributed database system usually designs a key-value pair map in a cluster to store these data, and continuously synchronizes the latest metadata to a directly connected node through a gossip protocol, specifically, a node initiates a network connection to a specified node in the cluster when starting, and the two nodes are respectively called a server and a client. Considering that too many network connections of a node affect performance, there is usually a limit to clients to which a server can connect. When the server reaches the maximum connection number, a new client initiates connection to the server, and at this time, the server randomly selects one from the currently connected client list as the new server, and the server and the client establish network connection. And finally synchronizing the metadata to all nodes in the cluster through the continuous interaction of each server and each client.
In the process of node synchronization, when the cluster size is large, the network connection of the server is saturated all the time, and a new client is connected to the old client continuously, so that two nodes farthest away in the cluster need to pass through a plurality of intermediate nodes when initiating data synchronization, which causes data synchronization delay, and even data loss when the intermediate node network is not good. This problem affects the highly available, highly concurrent, etc. nature of the distributed database system.
Disclosure of Invention
The technical task of the present invention is to provide a management method for data synchronization of a distributed database, which can improve the communication efficiency between nodes under a full peer-to-peer architecture and ensure the characteristics of high concurrency, high availability, etc. of the distributed database.
In order to achieve the purpose, the invention provides the following technical scheme:
a management method for data synchronization of a distributed database is based on a Gossip protocol, all nodes in a cluster establish network connection and metadata synchronization based on the Gossip protocol, and connection influencing network performance is adjusted by periodically checking the network connection condition of the nodes.
Preferably, the management method for data synchronization of the distributed database specifically includes the following steps:
s1, calculating the maximum depth of the network topology of the cluster;
s2, judging whether the maximum depth exceeds the maximum value which can maintain the current situation, if yes, executing the step S3, otherwise, maintaining the current situation;
s3, calculating the node with the farthest hop count;
s4, calculating the node with the minimum contribution, and disconnecting the network connection;
s5, initiating a new connection, and returning to step S1.
Preferably, in step S1, a hop count attribute is introduced when the KV map stores each piece of data, the hop count of each piece of data is set to 0 when generated, 1 is added to the hop count and stored in a node when the KV map stores each piece of data, and the piece of data continues to be synchronized to a directly connected node between the nodes.
Preferably, each node in the cluster stores all the originals, and stores the hop distance from the node where the original node is located to the node.
Preferably, in step S3, a cycle timer is started in the nodes to handle the connection of the farthest node of the nodes, and after the timer expires, the KV map stored in the node is traversed to find the entry with the largest hop count, where the node corresponding to the entry is the node farthest from the node.
Preferably, a direct network connection is initiated to the node corresponding to the entry, and the two nodes directly perform data synchronization after the connection is established.
Preferably, in step S4, traversing KV map, and sorting the entries of different client nodes after accumulation, where the node with the smallest entry is the node with the smallest contribution.
Preferably, after the cycle timer expires and the client connection reaches the upper limit, the node with the smallest contribution is disconnected.
In the management method for data synchronization of the distributed database, various types of cluster metadata information in a distributed database cluster are stored through a uniform key value pair KV map.
Compared with the prior art, the management method for the data synchronization of the distributed database has the following outstanding beneficial effects: the management method for the data synchronization of the distributed database periodically checks the network connection condition of the nodes and adjusts the connection influencing the network performance. The problem of network communication delay is solved by periodically calculating the node which is farthest away from the hop count, initiating direct connection to the node, reducing resource occupation by the node which contributes the least in periodic disconnection, solving the network delay in a distributed database system, improving the concurrency performance of the system and having good popularization and application values.
Drawings
Fig. 1 is a flowchart of a management method for data synchronization of a distributed database according to the present invention.
Detailed Description
The management method for data synchronization of distributed databases according to the present invention will be described in further detail with reference to the accompanying drawings and embodiments.
Examples
The management method for the data synchronization of the distributed database is based on the Gossip protocol, all nodes in the cluster establish network connection and metadata synchronization based on the Gossip protocol, and the connection influencing the network performance is adjusted by periodically checking the network connection condition of the nodes.
As shown in fig. 1, the management method for data synchronization of a distributed database specifically includes the following steps:
and S1, calculating the maximum depth of the network topology of the cluster.
And introducing a hop count attribute when the KV map stores each piece of data, setting the hop count of each piece of data to be 0 when the data is generated, adding 1 to the hop count when each node is synchronized, storing the hop count to the node, and continuously synchronizing the data among the nodes to the directly connected node. Each node in the cluster stores all the originals and stores hop distance from the node where the original node is located to the node.
When the cluster scale is large, along with the continuous online of the nodes, the newly added nodes initiate network connection to the nodes which are already stable, the situation that the client connected with each server side reaches the upper limit can easily occur, at the moment, the network topology of the cluster presents a tree structure with a large depth, metadata synchronization from a root node to leaf nodes needs to be forwarded for several times through intermediate nodes, and the network delay of the data greatly affects the high availability of the cluster. And introducing a hop count attribute when the KV map stores each piece of data, setting the hop count of each piece of data as 0 when the KV map is generated, adding 1 to the hop count when each node is synchronized, storing the hop count to the node, then continuously synchronizing the piece of data to other nodes directly connected with the node by the node, and processing each node according to the method by analogy, so that each node in the cluster not only stores all metadata, but also stores the hop count distance from the node where the metadata is located to the node.
S2, judging whether the maximum depth exceeds the maximum value which can maintain the current status, if yes, executing the step S3, otherwise, maintaining the current status.
And S3, calculating the node with the farthest hop count.
And starting a cycle timer in the node to process the connection of the farthest node of the node, traversing the KV map stored in the node to find the entry with the largest hop count after the timer is overtime, wherein the node corresponding to the entry is the node farthest from the node. And initiating direct network connection to the node corresponding to the entry, and directly synchronizing data of the two nodes after the connection is established.
The management method for data synchronization of the distributed database needs to periodically adjust and optimize the topology of the cluster network, the period can be configured, and when the cluster network condition is good, the period can be configured to be longer, so that the resource consumption caused by network change is reduced. And when the delay of the cluster network is larger, the period is configured to be shorter, and the network topology is continuously optimized. Starting a cycle timer in a node to process the connection of the farthest node of the node, traversing the KV map stored by the node to find the entry with the largest hop count after the timer is overtime, wherein the node corresponding to the entry is the farthest node from the node, initiating direct network connection to the node corresponding to the entry, and after the connection is established, the two nodes can directly perform data synchronization so as to reduce the transmission delay between the node and the farthest node.
And S4, calculating the node with the minimum contribution, and disconnecting the network connection.
And traversing the KV map to accumulate and sort the entries of different client nodes, wherein the node with the smallest entry is the node with the smallest contribution. And after the cycle timer is overtime and the client connection reaches the upper limit, disconnecting the node with the minimum contribution.
In order to reduce the pressure of a single node, the client corresponding to each server has a limitation, and when the cluster size is large, the number of clients connected to each node serving as a server is generally saturated. At this time, if the direct connection to the farthest node may not be successful, the node needs to start a cycle timer to periodically delete the least contributing node of the connected clients to provide the farthest node with a free connection. The judgment basis of the minimum contribution is to traverse the KV map to accumulate and sort the entries of different client nodes, and the node with the minimum entry is the node with the minimum contribution. And disconnecting the node with the minimum contribution if the client connection reaches the upper limit after the timer is overtime, and not processing the node if the client connection does not reach the upper limit.
S5, initiating a new connection, and returning to step S1.
If the cluster size is not large, the node farthest away in the network topology may have only 3 to 5 hops, the network pressure of the cluster is not large, and therefore the speed of data synchronization is not affected, in this case, if the network topology is adjusted, normal operation of the cluster is adversely affected, and an abnormality is caused. The client connectable to each server is a fixed value, and each node can calculate the total number of nodes in the cluster according to the synchronization of the cluster data, and the depth of the network topology can be calculated according to the two values, and the depth can be used as a basis for whether topology adjustment is needed or not. If the depth is less than the maximum value which can maintain the current situation, no adjustment is needed, and if the depth is greater than the maximum value, the network topology is optimized according to the scheme. Therefore, when the cluster size is large, the network topology is adjusted at regular time, and when the cluster is small, the network topology can run stably as much as possible.
The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.
Claims (8)
1. A management method for data synchronization of a distributed database is characterized in that: the method is based on the Gossip protocol, all nodes in the cluster establish network connection and metadata synchronization based on the Gossip protocol, and the connection affecting the network performance is adjusted by periodically checking the network connection condition of the nodes.
2. The method for managing data synchronization of distributed databases according to claim 1, wherein: the method specifically comprises the following steps:
s1, calculating the maximum depth of the network topology of the cluster;
s2, judging whether the maximum depth exceeds the maximum value which can maintain the current situation, if yes, executing the step S3, otherwise, maintaining the current situation;
s3, calculating the node with the farthest hop count;
s4, calculating the node with the minimum contribution, and disconnecting the network connection;
s5, initiating a new connection, and returning to step S1.
3. The method for managing data synchronization of distributed databases according to claim 2, wherein: in step S1, a hop count attribute is introduced when KV map stores each piece of data, the hop count of each piece of data is set to 0 when it is generated, the hop count is increased by 1 and stored in a node when a node is synchronized, and the piece of data between the nodes is continuously synchronized to a directly connected node.
4. The method for managing data synchronization of distributed databases as claimed in claim 3, wherein: each node in the cluster stores all the originals and stores hop distance from the node where the original node is located to the node.
5. The method for managing data synchronization of distributed databases as claimed in claim 4, wherein: in step S3, a cycle timer is started in the nodes to handle the connection of the farthest node of the node, and when the timer times out, the KV map stored in the node is traversed to find the entry with the largest hop count, where the node corresponding to the entry is the node farthest from the node.
6. The method for managing data synchronization of distributed databases as claimed in claim 5, wherein: and initiating direct network connection to the node corresponding to the entry, and directly synchronizing data of the two nodes after the connection is established.
7. The method for managing data synchronization of distributed databases as claimed in claim 6, wherein: in step S4, traversing the KV map, and sorting the entries of different client nodes after accumulation, where the node with the smallest entry is the node with the smallest contribution.
8. The method for managing data synchronization of distributed databases as claimed in claim 7, wherein: and after the cycle timer is overtime and the client connection reaches the upper limit, disconnecting the node with the minimum contribution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110195939.8A CN112860799A (en) | 2021-02-22 | 2021-02-22 | Management method for data synchronization of distributed database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110195939.8A CN112860799A (en) | 2021-02-22 | 2021-02-22 | Management method for data synchronization of distributed database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112860799A true CN112860799A (en) | 2021-05-28 |
Family
ID=75989825
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110195939.8A Pending CN112860799A (en) | 2021-02-22 | 2021-02-22 | Management method for data synchronization of distributed database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112860799A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114143226A (en) * | 2021-12-06 | 2022-03-04 | 浪潮云信息技术股份公司 | Dynamic cost calibration method and system for network delay of distributed database |
CN114363357A (en) * | 2021-12-28 | 2022-04-15 | 山东浪潮科学研究院有限公司 | Distributed database network connection management method based on Gossip |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101488898A (en) * | 2009-03-04 | 2009-07-22 | 北京邮电大学 | Tree shaped fast connection establishing method based on multi-Agent cooperation |
CN103995901A (en) * | 2014-06-10 | 2014-08-20 | 北京京东尚科信息技术有限公司 | Method for determining data node failure |
CN107004024A (en) * | 2014-12-12 | 2017-08-01 | 微软技术许可有限责任公司 | The multi-user communication of context driving |
US20170308547A1 (en) * | 2016-04-25 | 2017-10-26 | Sap Se | Metadata synchronization in a distrubuted database |
CN111046065A (en) * | 2019-10-28 | 2020-04-21 | 北京大学 | Extensible high-performance distributed query processing method and device |
CN111352943A (en) * | 2018-12-24 | 2020-06-30 | 华为技术有限公司 | Method and device for realizing data consistency, server and terminal |
CN112039884A (en) * | 2020-08-31 | 2020-12-04 | 浪潮云信息技术股份公司 | Application of quick interconnection protocol QUIC in distributed database system |
CN112328685A (en) * | 2020-11-05 | 2021-02-05 | 浪潮云信息技术股份公司 | Full-peer distributed database data synchronization method |
-
2021
- 2021-02-22 CN CN202110195939.8A patent/CN112860799A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101488898A (en) * | 2009-03-04 | 2009-07-22 | 北京邮电大学 | Tree shaped fast connection establishing method based on multi-Agent cooperation |
CN103995901A (en) * | 2014-06-10 | 2014-08-20 | 北京京东尚科信息技术有限公司 | Method for determining data node failure |
CN107004024A (en) * | 2014-12-12 | 2017-08-01 | 微软技术许可有限责任公司 | The multi-user communication of context driving |
US20170308547A1 (en) * | 2016-04-25 | 2017-10-26 | Sap Se | Metadata synchronization in a distrubuted database |
CN111352943A (en) * | 2018-12-24 | 2020-06-30 | 华为技术有限公司 | Method and device for realizing data consistency, server and terminal |
CN111046065A (en) * | 2019-10-28 | 2020-04-21 | 北京大学 | Extensible high-performance distributed query processing method and device |
CN112039884A (en) * | 2020-08-31 | 2020-12-04 | 浪潮云信息技术股份公司 | Application of quick interconnection protocol QUIC in distributed database system |
CN112328685A (en) * | 2020-11-05 | 2021-02-05 | 浪潮云信息技术股份公司 | Full-peer distributed database data synchronization method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114143226A (en) * | 2021-12-06 | 2022-03-04 | 浪潮云信息技术股份公司 | Dynamic cost calibration method and system for network delay of distributed database |
CN114143226B (en) * | 2021-12-06 | 2024-01-19 | 上海沄熹科技有限公司 | Dynamic cost calibration method and system for distributed database network delay |
CN114363357A (en) * | 2021-12-28 | 2022-04-15 | 山东浪潮科学研究院有限公司 | Distributed database network connection management method based on Gossip |
CN114363357B (en) * | 2021-12-28 | 2024-01-19 | 上海沄熹科技有限公司 | Distributed database network connection management method based on Gossip |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4652435B2 (en) | Optimal operation of hierarchical peer-to-peer networks | |
US7457257B2 (en) | Apparatus, system, and method for reliable, fast, and scalable multicast message delivery in service overlay networks | |
US9300534B2 (en) | Method for optimally utilizing a peer to peer network | |
CN109324757B (en) | Block chain data capacity reduction method and device and storage medium | |
US11018980B2 (en) | Data-interoperability-oriented trusted processing method and system | |
CN101102250B (en) | Distributed hashing mechanism for self-organizing networks | |
CN111046065B (en) | Extensible high-performance distributed query processing method and device | |
WO2010069198A1 (en) | Distributed network construction method and system and job processing method | |
WO2008034353A1 (en) | A method, system and device for establishing a peer to peer connection in a p2p network | |
WO2010127618A1 (en) | System and method for implementing streaming media content service | |
US7773609B2 (en) | Overlay network system which constructs and maintains an overlay network | |
CN112860799A (en) | Management method for data synchronization of distributed database | |
WO2011069387A1 (en) | Network node, method for data query and method for index update thereof | |
KR20100060304A (en) | Distributed content delivery system based on network awareness and method thereof | |
CN110866046A (en) | Extensible distributed query method and device | |
EP1719325A1 (en) | Method for optimally utilizing a peer to peer network | |
CN110990448B (en) | Distributed query method and device supporting fault tolerance | |
CN112328685A (en) | Full-peer distributed database data synchronization method | |
CN111800516B (en) | Internet of things equipment management method and device based on P2P | |
Aberer et al. | The quest for balancing peer load in structured peer-to-peer systems | |
WO2023124743A1 (en) | Block synchronization | |
CN101605094B (en) | Ring model based on point-to-point network and routing algorithm thereof | |
JP2008140388A (en) | Superpeer having load balancing function in hierarchical peer-to-peer system, and method for operating superpeer | |
CN106657333B (en) | Centralized directory data exchange system and method based on cloud service mode | |
Al Ridhawi et al. | A dynamic hybrid service overlay network for service compositions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210528 |