CN112346908A

CN112346908A - Data backup method in peer-to-peer distributed system

Info

Publication number: CN112346908A
Application number: CN201910736515.0A
Authority: CN
Inventors: 许长桥; 杨树杰; 郝昊; 皮文超; 赵楠; 熊永平
Original assignee: Nanjing Functional Intelligent Technology Research Institute Co ltd
Current assignee: Nanjing Functional Intelligent Technology Research Institute Co ltd
Priority date: 2019-08-09
Filing date: 2019-08-09
Publication date: 2021-02-09

Abstract

The invention provides a data backup method in a peer-to-peer distributed system, which is used for improving the data redundancy in the peer-to-peer distributed system and ensuring the data reliability in a severe environment. A user access node scans nodes in a system to obtain an online node list; the user access node requests the file storage status of each node, completes the acquisition of the current global file storage status and constructs a global storage list; the user access node completes the construction of the global data storage capacity table according to the file storage condition of each node and the global storage list; sequencing the nodes according to a priority rule set by a user to realize the construction of a candidate node pool; calculating the backup quantity of data according to the backup proportion specified by a user, and sequentially acquiring candidate nodes from the candidate node pool according to the quantity to realize the selection of the backup nodes; and the user data uploading node sends a data backup instruction and the data to be backed up to the candidate node, and the candidate node receives and stores the data to complete data backup.

Description

Data backup method in peer-to-peer distributed system

Technical Field

The invention relates to the field of computer data storage, in particular to a data backup method in a peer-to-peer distributed system.

Background

With the rapid development of computer technology and information technology, data storage technology is more and more widely applied, and the data storage requirements under different environments are increasing day by day. Most of the current mainstream data storage schemes rely on a centralized system. The centralized system is that one or more host computers form a central node, data is stored in the central node in a centralized manner, all service units of the whole system are deployed on the central node in a centralized manner, and all functions of the system are processed in a centralized manner. In a centralized system, each terminal or client machine is solely responsible for the entry and output of data, while the data storage and control process is done entirely by the nodes. The centralized system has the greatest characteristic of simple deployment structure, and is usually based on large nodes with excellent bottom performance, so that how to deploy a plurality of nodes for a service does not need to be considered, and the problem of distributed cooperation among the nodes does not need to be considered. However, under the scenes of military operations, emergency disaster relief, field exploration, resource exploration and acquisition and the like which need data storage, the centralized system is difficult to adapt to the characteristics of rapid deployment, poor basic conditions, strong mobility, random access positions, high equipment destructiveness and the like.

A distributed system, as opposed to a centralized system, is a software system in which hardware or software components are distributed among different network computers and communicate and coordinate with each other solely through message passing. Distributed systems allow a large number of applications to access data stored in local or remote databases. In this case, the data distribution is achieved by a replication process. A standard distributed system will have the characteristics of distribution, peering, and concurrency without any specific business logic constraints. Due to the characteristics, the distributed system is suitable for occasions where centralized systems such as military operations, emergency disaster relief, field exploration, resource exploration and collection and the like are not suitable.

The distributed system is divided into a centralized management type distributed system with non-peer nodes and a peer type distributed system with peer nodes. The centralized control type distributed system needs one or more core nodes to perform global control on the whole system, and the externally provided access interface is necessarily limited. The number of core nodes occupies a lower number weight among all the nodes, which causes the stability of the distributed system depending on the centralized management of the core nodes to be greatly influenced by the stability of the core nodes. The peer-to-peer distribution system has the characteristics of no dependence on core nodes, equal topology and identical functions of all nodes, and can avoid the dependence of the overall stability of the system on the stability of key nodes, so that the stability of the whole system is not limited by a small number of specific nodes.

With the advent of the big data application era, data storage backup becomes more and more important, and a distributed system also needs to perform data backup to ensure the redundancy and the availability of data. The data backup method of the distributed system needs to have the characteristics of safety, reliability, simplicity, convenience and the like. The backup content is guaranteed to be complete and effective. The backup and restore do not require complicated manual operations. The data storage and backup system is oriented to an application program database, a service system and a core server, and realizes the functions of data storage, data backup and recovery, system backup and recovery, application program backup and recovery and the like.

In summary, in order to implement data backup in a peer-to-peer distributed system, a data caching method based on the peer-to-peer distributed system needs to be designed, in which a data node uploaded by a user scans an online node list of the whole system to request file storage conditions of the online node, then the data uploading node constructs a global data storage capacity table according to the file storage conditions of the online node, sorts the nodes according to priorities set by the user according to the capacity table of all data storage to complete construction of a candidate node pool, then calculates the number of backup nodes according to a backup proportion specified by the user, sequentially selects backup nodes from the candidate node pool, finally the data uploading node of the user sends data to be backed up to the backup nodes, and the backup nodes receive the data to complete data backup. The redundancy and the availability of data are ensured.

Disclosure of Invention

In view of this, the present invention provides a data backup method in a peer-to-peer distributed system, where the method includes:

a user access node scans a node list in a system to obtain an online node list;

the user access node requests the file storage status of each node in the network, completes the acquisition of the current global file storage status and constructs a global storage list;

the user access node completes the construction of the global data storage capacity table according to the file storage condition of each node and the global storage list;

sequencing the nodes according to a priority rule set by a user to realize the construction of a candidate node pool;

calculating the backup quantity of data according to the backup proportion specified by a user, and sequentially acquiring candidate nodes from the candidate node pool according to the quantity to realize the selection of the backup nodes;

and the user data uploading node sends a data backup instruction and the data to be backed up to the candidate node, and the candidate node receives and stores the data to complete data backup.

The method for acquiring the online node list comprises the following steps:

defining a user access node as AN, a node list in a distributed system as LN, AN online node list as LON, a node as Ni and AN online node as ONj; f (Ni) - > ONj is defined as the direct mapping relation between the nodes and the online nodes.

And the user access node AN sequentially sends handshake data packets to the node Ni according to a pre-stored node list LN, determines AN online node ONj according to the response, and obtains AN online node list LON according to F (Ni) — > ONj, wherein the LON is a subset of the LN.

The method for acquiring the storage condition of the global file comprises the following steps:

the file storage information of each online node ONj is defined as FONj, the file storage information list, i.e., the global storage list, of all online nodes ONj is defined as FLON, and an algorithm for acquiring the global storage list FLON through the node storage information FON is defined.

The user access node AN sends a file storage information request packet to the node ONj in the online node list LON in sequence, the node receiving the request packet sends its file storage information FON to the requester, and when the user access node requests the file storage information of all online nodes, a global storage list is constructed according to AN algorithm.

The construction method of the global storage capability table comprises the following steps:

and defining a global storage capability table as SLON and defining the storage space of each online node as SONj. An algorithm for acquiring the global storage capability table SLON through the node storage information FLON is defined.

And after the FLON construction is completed, acquiring a global storage capability table SLON from the global storage list FLON according to an algorithm.

The construction method of the candidate node pool comprises the following steps:

the candidate node pool is defined as CNP, and the user-defined node priority rule is R.

And sequencing the nodes according to a node priority rule R set by a user to realize the construction of a candidate node pool CNP.

The backup node selection comprises the following steps:

defining the backup ratio as eta, the number of the candidate node pool nodes as SCNP and the candidate node list CNL.

And calculating the number of candidate nodes according to the backup proportion. And sequentially selecting a corresponding number of nodes from the candidate node pool to construct a candidate node list CNL.

The method for sending the backup data comprises the following steps:

and the user access node AN sequentially sends a cache command packet and a cache data packet according to the candidate node list CNL to realize data backup.

The invention has the following technical effects: the method and the system have the advantages that the online node list of the data node scanning system uploaded by a user is used for requesting the file storage condition of the online node, the global data storage capacity table is constructed, the candidate node pool is constructed, the number of backup nodes is calculated, the backup nodes are selected, the data to be backed up is sent, and the like, so that the backup of the data in the peer-to-peer distributed system is completed, the redundancy and the availability of the data in the peer-to-peer distributed system are ensured, and the system can meet the requirements of different application scenes.

Drawings

Fig. 1 is a flowchart of a data backup method in a peer-to-peer distributed system according to the present invention.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.

The method comprises the steps of utilizing an online node list of a user access node scanning system to request file storage conditions of online nodes, constructing a global data storage capacity table, constructing a candidate node pool, calculating the number of backup nodes, selecting the backup nodes, sending data to be backed up and the like to complete backup of the data in a peer-to-peer distributed system, and realizing backup of the data in the peer-to-peer distributed system with redundancy and usability.

Example one

The embodiment of the invention provides a method for acquiring an online node list, which comprises the following steps:

the invention defines that the user access node is AN, the node list in the distributed system is LN, the online node list is LON, the node is Ni, and the online node is ONj; f (Ni) - > ONj is defined as the direct mapping relation between the nodes and the online nodes.

The mapping function F is implemented as follows:

setting a variable a as the number of existing elements in the LON;

then the definition of F for the input node x ∈ LON is as follows:

F(x)＝ON_a

thus, an online node list LON is obtained according to F, the LON being a subset of LNs.

Therefore, the online node list is obtained, and preparation is made for obtaining the global file storage condition in the next step.

Example two

With reference to the content of the first embodiment, an embodiment of the present invention provides an obtaining method for constructing a global file storage status, where:

the invention defines the file storage information of each online node ONj as FONj, the file storage information list of all online nodes ONj, namely the global storage list, as FLON, and defines an algorithm for acquiring the global storage list FLON through the node storage information LON.

Algorithm 1 obtaining a global storage list FLON from an online node list LON

Input-Online node List LON

Output Global storage List FLON

ForeachONjasLON

create FONjrequest Rj

sendRj to ONj

End For

ForeachONjasLON

listen askAj

End For

ForeachONjasLON

selectFONjfromAj

add FONjto FLON

End For

ReturnFLON

Therefore, the acquisition of the global file storage list is realized, and preparation is made for the construction of the next global storage capacity table.

EXAMPLE III

With reference to the content of the second embodiment, this embodiment provides a method for constructing the global storage capability table, and a specific method flow of this embodiment includes:

and defining the global storage capability table as SLON, and defining an algorithm for acquiring the global storage capability table SLON through the global storage list FLON.

Algorithm 2 obtaining global storage capability table by global storage list FLON

SLON

Global storage List FLON

Output global storage capability table SLON

ForeachFONiasFLON

VarRONi＝SONi–FONi

ADDRONitoSLON

End For

ReturnSLON

Therefore, the construction of the global storage capacity table is obtained, and preparation is made for the construction of a candidate node pool in the next step.

Example four

With reference to the content of the third embodiment, this embodiment provides a method for constructing a candidate node pool, and a specific method flow of this embodiment includes:

the invention defines a global storage capacity table SLON, a candidate node pool CNP and a node priority rule R defined by a user.

Algorithm 3 of candidate node pool CNP

Construction of

Input global storage capability table

SLON

Output candidate node pool CNP

varflag＝true

copy SLON to CNP

While flag＝＝true

flag＝false

ForeachSONiasSLON

if R(SONi,SONi+1)＝＝false

swap(SONi,SONi+1)

flag＝true

end if

End For

Loop

ReturnCNP

Therefore, the construction of the candidate node pool is realized, and preparation is made for the selection of the next backup node.

EXAMPLE five

In combination with the content of the third embodiment, this embodiment provides a backup node selection method, and a specific method flow of this embodiment includes:

Algorithm 4 construction of candidate node list CNL

Input backup ratio η, candidate node pool CNP

Output candidate node list CNL

ForeachONiasCNPand(i<SCNP*η)

Add Oni to CNL

End For

ReturnCNL

Thus, the construction of the backup node list is realized, and the backup node list is prepared for the next step.

EXAMPLE six

With reference to the content of the third embodiment, this embodiment provides a method for sending backup data, and a specific flow of the method of this embodiment includes:

the user access node AN firstly converts the data to be sent into data stream based on JSON protocol, and then uses TCP connection to send a cache command packet and a cache data packet in sequence according to a backup node list CNP to realize data backup.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of data backup in a peer-to-peer distributed system, the method comprising:

a user access node scans nodes in a system to obtain an online node list;

2. The method of claim 1, wherein the method for obtaining the online node list comprises:

defining a user access node as AN, a node list in a distributed system as LN, AN online node list as LON, a node as Ni and AN online node as ONj; defining F (Ni) - > ONj as the direct mapping relation between the nodes and the online nodes;

3. The method according to claim 1, wherein the method for obtaining the storage status of the global file comprises:

defining file storage information of each online node ONj as FONj, a global storage list which is a file storage information list of all online nodes ONj as FLON, and defining an algorithm for acquiring the global storage list FLON through the node storage information FON;

4. The method according to claim 1, wherein the method for constructing the global storage capability table comprises:

and defining a global storage capability table as SLON and defining the storage space of each online node as SONj. Defining an algorithm for acquiring a global storage capability table SLON through node storage information FLON;

5. The method of claim 1, wherein the method for constructing the candidate node pool comprises:

defining a candidate node pool as CNP, and defining a node priority rule defined by a user as R;

6. The method of claim 1, wherein the backup node selection method comprises:

defining backup ratio as eta, the number of candidate node pool nodes as SCNP and a candidate node list CNL;

7. The method according to any one of claims 1 to 6, wherein the backup data sending method comprises: