CN112860799A - Management method for data synchronization of distributed database - Google Patents

Management method for data synchronization of distributed database Download PDF

Info

Publication number
CN112860799A
CN112860799A CN202110195939.8A CN202110195939A CN112860799A CN 112860799 A CN112860799 A CN 112860799A CN 202110195939 A CN202110195939 A CN 202110195939A CN 112860799 A CN112860799 A CN 112860799A
Authority
CN
China
Prior art keywords
node
nodes
data synchronization
connection
distributed database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110195939.8A
Other languages
Chinese (zh)
Inventor
任宏晖
王瀚墨
周恒�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202110195939.8A priority Critical patent/CN112860799A/en
Publication of CN112860799A publication Critical patent/CN112860799A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The invention discloses a management method for data synchronization of a distributed database, and belongs to the technical field of distributed database systems. The management method for the data synchronization of the distributed database is based on the Gossip protocol, all nodes in the cluster establish network connection and metadata synchronization based on the Gossip protocol, and the connection influencing the network performance is adjusted by periodically checking the network connection condition of the nodes. The management method for data synchronization of the distributed database can improve the communication efficiency between nodes under a full peer-to-peer architecture, ensures the characteristics of high concurrency, high availability and the like of the distributed database, and has good popularization and application values.

Description

Management method for data synchronization of distributed database
Technical Field
The invention relates to the technical field of distributed database systems, and particularly provides a management method for data synchronization of a distributed database.
Background
Distributed database systems are usually logically unified whole, and physically stored in different physical nodes, and their main design goals are scalability, strong consistency and high reliability. The data in the databases are stored in different local databases, managed by different database management systems, run on different machines, supported by different operating systems, and connected together by different communication networks. The Shared nothing architecture is a common distributed database architecture. Each node in the architecture is independent and self-sufficient, and there is no single point of competition in the overall system. Each node has a private CPU, a private memory, a private hard disk and the like, shared resources do not exist, all processing units are communicated through a protocol, and the parallel processing and expansion capabilities are better. Each node is independent from each other, and each node processes own data, and the processed results can be collected to an upper layer or transferred among the nodes.
In order to achieve the characteristics of the distributed database, each node needs to know operation information in the whole cluster, including overall configuration information of the cluster, node configuration information, node health condition, storage use condition, data information describing the location of stored data, node network connection condition, and the like, where the data information is different from user data and called metadata information. In order to implement the above functions, a current Shared nothing distributed database system usually designs a key-value pair map in a cluster to store these data, and continuously synchronizes the latest metadata to a directly connected node through a gossip protocol, specifically, a node initiates a network connection to a specified node in the cluster when starting, and the two nodes are respectively called a server and a client. Considering that too many network connections of a node affect performance, there is usually a limit to clients to which a server can connect. When the server reaches the maximum connection number, a new client initiates connection to the server, and at this time, the server randomly selects one from the currently connected client list as the new server, and the server and the client establish network connection. And finally synchronizing the metadata to all nodes in the cluster through the continuous interaction of each server and each client.
In the process of node synchronization, when the cluster size is large, the network connection of the server is saturated all the time, and a new client is connected to the old client continuously, so that two nodes farthest away in the cluster need to pass through a plurality of intermediate nodes when initiating data synchronization, which causes data synchronization delay, and even data loss when the intermediate node network is not good. This problem affects the highly available, highly concurrent, etc. nature of the distributed database system.
Disclosure of Invention
The technical task of the present invention is to provide a management method for data synchronization of a distributed database, which can improve the communication efficiency between nodes under a full peer-to-peer architecture and ensure the characteristics of high concurrency, high availability, etc. of the distributed database.
In order to achieve the purpose, the invention provides the following technical scheme:
a management method for data synchronization of a distributed database is based on a Gossip protocol, all nodes in a cluster establish network connection and metadata synchronization based on the Gossip protocol, and connection influencing network performance is adjusted by periodically checking the network connection condition of the nodes.
Preferably, the management method for data synchronization of the distributed database specifically includes the following steps:
s1, calculating the maximum depth of the network topology of the cluster;
s2, judging whether the maximum depth exceeds the maximum value which can maintain the current situation, if yes, executing the step S3, otherwise, maintaining the current situation;
s3, calculating the node with the farthest hop count;
s4, calculating the node with the minimum contribution, and disconnecting the network connection;
s5, initiating a new connection, and returning to step S1.
Preferably, in step S1, a hop count attribute is introduced when the KV map stores each piece of data, the hop count of each piece of data is set to 0 when generated, 1 is added to the hop count and stored in a node when the KV map stores each piece of data, and the piece of data continues to be synchronized to a directly connected node between the nodes.
Preferably, each node in the cluster stores all the originals, and stores the hop distance from the node where the original node is located to the node.
Preferably, in step S3, a cycle timer is started in the nodes to handle the connection of the farthest node of the nodes, and after the timer expires, the KV map stored in the node is traversed to find the entry with the largest hop count, where the node corresponding to the entry is the node farthest from the node.
Preferably, a direct network connection is initiated to the node corresponding to the entry, and the two nodes directly perform data synchronization after the connection is established.
Preferably, in step S4, traversing KV map, and sorting the entries of different client nodes after accumulation, where the node with the smallest entry is the node with the smallest contribution.
Preferably, after the cycle timer expires and the client connection reaches the upper limit, the node with the smallest contribution is disconnected.
In the management method for data synchronization of the distributed database, various types of cluster metadata information in a distributed database cluster are stored through a uniform key value pair KV map.
Compared with the prior art, the management method for the data synchronization of the distributed database has the following outstanding beneficial effects: the management method for the data synchronization of the distributed database periodically checks the network connection condition of the nodes and adjusts the connection influencing the network performance. The problem of network communication delay is solved by periodically calculating the node which is farthest away from the hop count, initiating direct connection to the node, reducing resource occupation by the node which contributes the least in periodic disconnection, solving the network delay in a distributed database system, improving the concurrency performance of the system and having good popularization and application values.
Drawings
Fig. 1 is a flowchart of a management method for data synchronization of a distributed database according to the present invention.
Detailed Description
The management method for data synchronization of distributed databases according to the present invention will be described in further detail with reference to the accompanying drawings and embodiments.
Examples
The management method for the data synchronization of the distributed database is based on the Gossip protocol, all nodes in the cluster establish network connection and metadata synchronization based on the Gossip protocol, and the connection influencing the network performance is adjusted by periodically checking the network connection condition of the nodes.
As shown in fig. 1, the management method for data synchronization of a distributed database specifically includes the following steps:
and S1, calculating the maximum depth of the network topology of the cluster.
And introducing a hop count attribute when the KV map stores each piece of data, setting the hop count of each piece of data to be 0 when the data is generated, adding 1 to the hop count when each node is synchronized, storing the hop count to the node, and continuously synchronizing the data among the nodes to the directly connected node. Each node in the cluster stores all the originals and stores hop distance from the node where the original node is located to the node.
When the cluster scale is large, along with the continuous online of the nodes, the newly added nodes initiate network connection to the nodes which are already stable, the situation that the client connected with each server side reaches the upper limit can easily occur, at the moment, the network topology of the cluster presents a tree structure with a large depth, metadata synchronization from a root node to leaf nodes needs to be forwarded for several times through intermediate nodes, and the network delay of the data greatly affects the high availability of the cluster. And introducing a hop count attribute when the KV map stores each piece of data, setting the hop count of each piece of data as 0 when the KV map is generated, adding 1 to the hop count when each node is synchronized, storing the hop count to the node, then continuously synchronizing the piece of data to other nodes directly connected with the node by the node, and processing each node according to the method by analogy, so that each node in the cluster not only stores all metadata, but also stores the hop count distance from the node where the metadata is located to the node.
S2, judging whether the maximum depth exceeds the maximum value which can maintain the current status, if yes, executing the step S3, otherwise, maintaining the current status.
And S3, calculating the node with the farthest hop count.
And starting a cycle timer in the node to process the connection of the farthest node of the node, traversing the KV map stored in the node to find the entry with the largest hop count after the timer is overtime, wherein the node corresponding to the entry is the node farthest from the node. And initiating direct network connection to the node corresponding to the entry, and directly synchronizing data of the two nodes after the connection is established.
The management method for data synchronization of the distributed database needs to periodically adjust and optimize the topology of the cluster network, the period can be configured, and when the cluster network condition is good, the period can be configured to be longer, so that the resource consumption caused by network change is reduced. And when the delay of the cluster network is larger, the period is configured to be shorter, and the network topology is continuously optimized. Starting a cycle timer in a node to process the connection of the farthest node of the node, traversing the KV map stored by the node to find the entry with the largest hop count after the timer is overtime, wherein the node corresponding to the entry is the farthest node from the node, initiating direct network connection to the node corresponding to the entry, and after the connection is established, the two nodes can directly perform data synchronization so as to reduce the transmission delay between the node and the farthest node.
And S4, calculating the node with the minimum contribution, and disconnecting the network connection.
And traversing the KV map to accumulate and sort the entries of different client nodes, wherein the node with the smallest entry is the node with the smallest contribution. And after the cycle timer is overtime and the client connection reaches the upper limit, disconnecting the node with the minimum contribution.
In order to reduce the pressure of a single node, the client corresponding to each server has a limitation, and when the cluster size is large, the number of clients connected to each node serving as a server is generally saturated. At this time, if the direct connection to the farthest node may not be successful, the node needs to start a cycle timer to periodically delete the least contributing node of the connected clients to provide the farthest node with a free connection. The judgment basis of the minimum contribution is to traverse the KV map to accumulate and sort the entries of different client nodes, and the node with the minimum entry is the node with the minimum contribution. And disconnecting the node with the minimum contribution if the client connection reaches the upper limit after the timer is overtime, and not processing the node if the client connection does not reach the upper limit.
S5, initiating a new connection, and returning to step S1.
If the cluster size is not large, the node farthest away in the network topology may have only 3 to 5 hops, the network pressure of the cluster is not large, and therefore the speed of data synchronization is not affected, in this case, if the network topology is adjusted, normal operation of the cluster is adversely affected, and an abnormality is caused. The client connectable to each server is a fixed value, and each node can calculate the total number of nodes in the cluster according to the synchronization of the cluster data, and the depth of the network topology can be calculated according to the two values, and the depth can be used as a basis for whether topology adjustment is needed or not. If the depth is less than the maximum value which can maintain the current situation, no adjustment is needed, and if the depth is greater than the maximum value, the network topology is optimized according to the scheme. Therefore, when the cluster size is large, the network topology is adjusted at regular time, and when the cluster is small, the network topology can run stably as much as possible.
The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (8)

1. A management method for data synchronization of a distributed database is characterized in that: the method is based on the Gossip protocol, all nodes in the cluster establish network connection and metadata synchronization based on the Gossip protocol, and the connection affecting the network performance is adjusted by periodically checking the network connection condition of the nodes.
2. The method for managing data synchronization of distributed databases according to claim 1, wherein: the method specifically comprises the following steps:
s1, calculating the maximum depth of the network topology of the cluster;
s2, judging whether the maximum depth exceeds the maximum value which can maintain the current situation, if yes, executing the step S3, otherwise, maintaining the current situation;
s3, calculating the node with the farthest hop count;
s4, calculating the node with the minimum contribution, and disconnecting the network connection;
s5, initiating a new connection, and returning to step S1.
3. The method for managing data synchronization of distributed databases according to claim 2, wherein: in step S1, a hop count attribute is introduced when KV map stores each piece of data, the hop count of each piece of data is set to 0 when it is generated, the hop count is increased by 1 and stored in a node when a node is synchronized, and the piece of data between the nodes is continuously synchronized to a directly connected node.
4. The method for managing data synchronization of distributed databases as claimed in claim 3, wherein: each node in the cluster stores all the originals and stores hop distance from the node where the original node is located to the node.
5. The method for managing data synchronization of distributed databases as claimed in claim 4, wherein: in step S3, a cycle timer is started in the nodes to handle the connection of the farthest node of the node, and when the timer times out, the KV map stored in the node is traversed to find the entry with the largest hop count, where the node corresponding to the entry is the node farthest from the node.
6. The method for managing data synchronization of distributed databases as claimed in claim 5, wherein: and initiating direct network connection to the node corresponding to the entry, and directly synchronizing data of the two nodes after the connection is established.
7. The method for managing data synchronization of distributed databases as claimed in claim 6, wherein: in step S4, traversing the KV map, and sorting the entries of different client nodes after accumulation, where the node with the smallest entry is the node with the smallest contribution.
8. The method for managing data synchronization of distributed databases as claimed in claim 7, wherein: and after the cycle timer is overtime and the client connection reaches the upper limit, disconnecting the node with the minimum contribution.
CN202110195939.8A 2021-02-22 2021-02-22 Management method for data synchronization of distributed database Pending CN112860799A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110195939.8A CN112860799A (en) 2021-02-22 2021-02-22 Management method for data synchronization of distributed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110195939.8A CN112860799A (en) 2021-02-22 2021-02-22 Management method for data synchronization of distributed database

Publications (1)

Publication Number Publication Date
CN112860799A true CN112860799A (en) 2021-05-28

Family

ID=75989825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110195939.8A Pending CN112860799A (en) 2021-02-22 2021-02-22 Management method for data synchronization of distributed database

Country Status (1)

Country Link
CN (1) CN112860799A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114143226A (en) * 2021-12-06 2022-03-04 浪潮云信息技术股份公司 Dynamic cost calibration method and system for network delay of distributed database
CN114363357A (en) * 2021-12-28 2022-04-15 山东浪潮科学研究院有限公司 Distributed database network connection management method based on Gossip

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488898A (en) * 2009-03-04 2009-07-22 北京邮电大学 Tree shaped fast connection establishing method based on multi-Agent cooperation
CN103995901A (en) * 2014-06-10 2014-08-20 北京京东尚科信息技术有限公司 Method for determining data node failure
CN107004024A (en) * 2014-12-12 2017-08-01 微软技术许可有限责任公司 The multi-user communication of context driving
US20170308547A1 (en) * 2016-04-25 2017-10-26 Sap Se Metadata synchronization in a distrubuted database
CN111046065A (en) * 2019-10-28 2020-04-21 北京大学 Extensible high-performance distributed query processing method and device
CN111352943A (en) * 2018-12-24 2020-06-30 华为技术有限公司 Method and device for realizing data consistency, server and terminal
CN112039884A (en) * 2020-08-31 2020-12-04 浪潮云信息技术股份公司 Application of quick interconnection protocol QUIC in distributed database system
CN112328685A (en) * 2020-11-05 2021-02-05 浪潮云信息技术股份公司 Full-peer distributed database data synchronization method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488898A (en) * 2009-03-04 2009-07-22 北京邮电大学 Tree shaped fast connection establishing method based on multi-Agent cooperation
CN103995901A (en) * 2014-06-10 2014-08-20 北京京东尚科信息技术有限公司 Method for determining data node failure
CN107004024A (en) * 2014-12-12 2017-08-01 微软技术许可有限责任公司 The multi-user communication of context driving
US20170308547A1 (en) * 2016-04-25 2017-10-26 Sap Se Metadata synchronization in a distrubuted database
CN111352943A (en) * 2018-12-24 2020-06-30 华为技术有限公司 Method and device for realizing data consistency, server and terminal
CN111046065A (en) * 2019-10-28 2020-04-21 北京大学 Extensible high-performance distributed query processing method and device
CN112039884A (en) * 2020-08-31 2020-12-04 浪潮云信息技术股份公司 Application of quick interconnection protocol QUIC in distributed database system
CN112328685A (en) * 2020-11-05 2021-02-05 浪潮云信息技术股份公司 Full-peer distributed database data synchronization method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114143226A (en) * 2021-12-06 2022-03-04 浪潮云信息技术股份公司 Dynamic cost calibration method and system for network delay of distributed database
CN114143226B (en) * 2021-12-06 2024-01-19 上海沄熹科技有限公司 Dynamic cost calibration method and system for distributed database network delay
CN114363357A (en) * 2021-12-28 2022-04-15 山东浪潮科学研究院有限公司 Distributed database network connection management method based on Gossip
CN114363357B (en) * 2021-12-28 2024-01-19 上海沄熹科技有限公司 Distributed database network connection management method based on Gossip

Similar Documents

Publication Publication Date Title
JP4652435B2 (en) Optimal operation of hierarchical peer-to-peer networks
US7457257B2 (en) Apparatus, system, and method for reliable, fast, and scalable multicast message delivery in service overlay networks
US9300534B2 (en) Method for optimally utilizing a peer to peer network
CN109324757B (en) Block chain data capacity reduction method and device and storage medium
US11018980B2 (en) Data-interoperability-oriented trusted processing method and system
CN101102250B (en) Distributed hashing mechanism for self-organizing networks
CN111046065B (en) Extensible high-performance distributed query processing method and device
WO2010069198A1 (en) Distributed network construction method and system and job processing method
WO2008034353A1 (en) A method, system and device for establishing a peer to peer connection in a p2p network
WO2010127618A1 (en) System and method for implementing streaming media content service
US7773609B2 (en) Overlay network system which constructs and maintains an overlay network
CN112860799A (en) Management method for data synchronization of distributed database
WO2011069387A1 (en) Network node, method for data query and method for index update thereof
KR20100060304A (en) Distributed content delivery system based on network awareness and method thereof
CN110866046A (en) Extensible distributed query method and device
EP1719325A1 (en) Method for optimally utilizing a peer to peer network
CN110990448B (en) Distributed query method and device supporting fault tolerance
CN112328685A (en) Full-peer distributed database data synchronization method
CN111800516B (en) Internet of things equipment management method and device based on P2P
Aberer et al. The quest for balancing peer load in structured peer-to-peer systems
WO2023124743A1 (en) Block synchronization
CN101605094B (en) Ring model based on point-to-point network and routing algorithm thereof
JP2008140388A (en) Superpeer having load balancing function in hierarchical peer-to-peer system, and method for operating superpeer
CN106657333B (en) Centralized directory data exchange system and method based on cloud service mode
Al Ridhawi et al. A dynamic hybrid service overlay network for service compositions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210528