CN112732484A - Data backup system based on end-to-end node - Google Patents

Data backup system based on end-to-end node Download PDF

Info

Publication number
CN112732484A
CN112732484A CN202011631046.5A CN202011631046A CN112732484A CN 112732484 A CN112732484 A CN 112732484A CN 202011631046 A CN202011631046 A CN 202011631046A CN 112732484 A CN112732484 A CN 112732484A
Authority
CN
China
Prior art keywords
data
peer
backup
message
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011631046.5A
Other languages
Chinese (zh)
Inventor
翟红鹰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puhua Yunchuang Technology Beijing Co ltd
Original Assignee
Puhua Yunchuang Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Puhua Yunchuang Technology Beijing Co ltd filed Critical Puhua Yunchuang Technology Beijing Co ltd
Priority to CN202011631046.5A priority Critical patent/CN112732484A/en
Publication of CN112732484A publication Critical patent/CN112732484A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Abstract

The invention discloses a data backup system based on an end-to-end node, which comprises: a message encoding module; the module encodes the message using a preset data serialization mechanism; a network module defining the following components: a message flow component, a provisioning component, an announcement key component, a chapter management component, and a route; a decision engine, the decision engine being a credit management module; a requirement management module to manage the WantList request. The technical scheme provided by the invention is a method for constructing a data backup system by utilizing the p2p technology, provides distributed backup service, and enhances the reliability of backup data; the invention adopts an optimal data distribution strategy to upload the block data to the response node, and improves the reliability of backup data by utilizing a grouping peer-to-peer mutual backup redundancy mode. When a file is lost, the system ensures that the integrity and authenticity of the file is restored using a variety of security methods.

Description

Data backup system based on end-to-end node
Technical Field
The invention relates to the technical field of block chains, in particular to a data backup system based on an end-to-end node.
Background
The increasing popularization of networks leads to unprecedented simplification of information exchange, and data becomes an indispensable factor in people and work. However, uncertain factors such as terrorist events, natural disasters, system failures and the like threaten the safety of data at any moment; as a data security policy, backup is the most basic and final means to avoid data loss.
The inventor finds and provides a method for data backup based on end-to-end nodes, and the method for constructing a data backup system by using an end-to-end technology provides distributed backup service and enhances the reliability of backup data; the traditional backup system based on the C/S mode relies on a backup server and is easy to form a single point of failure. The invention adopts an optimal data distribution strategy to upload the block data to the response node, and improves the reliability of backup data by utilizing a grouping peer-to-peer mutual backup redundancy mode. When a file is lost, the system ensures that the integrity and authenticity of the file is restored using a variety of security methods.
Therefore, there is a need to provide a new data backup system based on an end-to-end node to solve the above technical problems.
Disclosure of Invention
The invention mainly aims to provide a data backup system based on an end-to-end node, and aims to solve the technical problem that in the related art, a traditional backup system based on a C/S mode depends on a backup server and is easy to form single-point failure faults.
In order to achieve the above object, the data backup system based on an end-to-end node provided by the present invention includes:
a message encoding module; the module encodes the message using a preset data serialization mechanism;
a network module defining the following components: a message flow component, a provisioning component, an announcement key component, a chapter management component, and a route;
a decision engine, the decision engine being a credit management module;
a requirement management module to manage the WantList request.
Preferably, the messages communicated between nodes are divided into two types, one is a request message, which is used to describe the request; the second is a block message, which is used to represent the block data that is transferred.
Preferably, the message flow component is used for the sending and processing of the messages.
Preferably, the message flow component defines four interfaces for processing requests and sending data blocks, the four interfaces being an open interface, a send demand interface, a send block interface and an end interface, respectively.
Preferably, the decision engine is used for a request queue, and records basic information between nodes and transmits backup records by using an account book, and accordingly determines whether to respond to a backup download request of a peer.
Preferably, the demand management module is an implementation module; the WantList is a core data structure, and by managing a message queue, once a new WantList Entry is added, a working thread of the message queue is triggered, so that a data block is sent to a specified Peer.
The invention provides a method for constructing a data backup system by utilizing a p2p technology based on the data backup system between end-to-end nodes, which provides distributed backup service and enhances the reliability of backup data; the traditional backup system based on the C/S mode depends on a backup server, and a single point failure fault is easy to form;
the invention adopts an optimal data distribution strategy to upload the block data to the response node, and improves the reliability of backup data by utilizing a grouping peer-to-peer mutual backup redundancy mode. When a file is lost, the system ensures that the integrity and authenticity of the file is restored using a variety of security methods.
Drawings
Fig. 1 is an architecture diagram of a preferred embodiment of a data backup system based on end-to-end nodes according to the present invention.
The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
To facilitate understanding of the technical solution of the present invention, the following concepts are explained:
1. and (6) backing up data.
Data backup refers to a process of copying all or part of a data set from a hard disk or an array of an application host to another storage medium in order to prevent data loss caused by system misoperation or system failure. The method mainly adopts a built-in or external tape unit to carry out cold backup. However, this method can only prevent human failures such as misoperation, and the recovery time is long.
With the continuous development of the technology, the amount of data is increased, and a large number of enterprises begin to adopt network backup. Network backups are typically implemented by specialized data storage management software in conjunction with corresponding hardware and storage devices
2. p2p (end-to-end).
P2P is called Peer to Peer, and the direct translation is "between two nodes in equal position". Peer's English intent is that a person of the same ability and status, such as My Peer, may refer to My colleague. Peer Review, refers to letting a person who has equal level of ability to do work to evaluate me. The P2P network is a network of many computers. Each computer is installed with the same client software to complete the same task, so each computer is a Peer or a node. Each node is equal and has no privilege, and is a provider and a consumer of resources at the same time. This is clearly different from the current internet client/server model, and under the condition of P2P, the role distinction is not made, and each node is both client and server.
3. The BitSwap protocol.
IPFS implements the p2p data exchange protocol on the basis of BitTorrent: the BitSwap protocol.
Each node maintains two lists: an existing data block (have _ list), a desired data block (wait _ list).
The BitSwap is a protocol for defining a data block exchange mode in an IPFS network, is a message peer-to-peer protocol based on a uniform format, and is different from a request/response mode. Briefly, in IPFS, the same type of message bundle is used for both request and response messages. In the IPFS network, all Peers are Peers, and a Tracker server such as that in BitTorrent does not exist, so that the communication method is simpler.
The BitSwap protocol also defines strategies of how to request data, how to send data and to whom data can be sent, each node can have own strategy and serve as a core module of data exchange, the BitSwap uses some predefined incentive mechanisms to promote the flow of data in a network, and the aim of reciprocity is achieved through a point-to-point transmission record account.
The invention provides a data backup system based on an end-to-end node.
Referring to fig. 1, to achieve the above object, in an embodiment of the present invention, a data backup system based on an end-to-end node includes:
1. a Message encoding module (Message Protocol module); the module encodes the message using a preset data serialization mechanism;
in this embodiment, the preset data serialization mechanism is a Protocol Buffer, which is a data serialization mechanism supporting multiple platforms, multiple languages, and extensible.
It is understood that the messages communicated between nodes are divided into two types, one is a request message (WantList) which is used to describe the request; the second is a Block message (Block) which is used to represent the Block data that is transferred.
Specifically, the request message includes the following fields:
block index (blockCid), priority (priority), whether it is a full request (cancel), and the like.
In the present system, a file is divided into several data Blocks (Chunks, also called Blocks); the data block is the most basic unit of data manipulation in the p2p network.
Each Block is indexed by a CID identifier, the CID is a self-described index structure body, and integrates information such as codec, length, hash and the like used for Block coding, so that one Block can be uniquely identified.
Therefore, to download a Block from Peer, we only need to tell it the CID, and we can also easily verify when Block is received.
In fact, since the multicode adaptive coding protocol is used in the system, a multicode prefix is added before the message is sent, so that the format range of the message is greatly increased, and not only Protobuf, such as JSON, Cbor and the like, can support.
2. A network module (Networking module) defining the following components:
a Message Stream component (Message Stream), a provisioning component (Provider), an announcement Keys component (announcement Keys), a chapter management component (Session management), and routing;
2.1, message flow components are used for sending and processing messages, and in a p2p network, the messages are packaged into a Multistream for transmission; after receiving a Stream, Peer decodes it into the corresponding message format and then decides whether to respond to the request or receive block data according to the requirements of the message content. Wherein data is not necessarily received when a request is responded to; the two are not in an alternative relationship, and the data block is not received after the WantList request is responded, and other types of corresponding requests exist.
The message flow component defines four interfaces to process requests and send data blocks; the four interfaces are respectively: the method comprises the steps of opening an interface (open), sending a demand interface (send _ wait _ list), sending a block interface (send _ block) and ending an interface (close).
2.11, the interface is opened to allow the sender node to initialize the local node bill when the node establishes a connection, possibly to save a bill for the peer, or to create a new cleared bill, depending on the node bill consistency problem. And then, the sender node sends an open message carrying a bill to inform the receiver node, and the receiver node can select whether to accept the connection request or not after receiving the open message.
2.12, sending a demand interface; when the connection is already in open, the sender node will broadcast the wan _ list to all connected receiver nodes. Meanwhile, after receiving a wait _ list, the receiving node checks whether the receiving node has the data blocks that the receiving node wants, and if so, the receiving node sends and transmits the data blocks by using a strategy.
2.13, sending a block interface; the default sender node transmits only the data block, and after receiving all the data, the receiver node calculates a Multihash to check whether it matches the expected one, and then returns an acknowledgement.
After completing the transmission of the block, the receiver node moves the data block information from the need _ list to the have _ list, and both the receiver and the sender update their billing lists synchronously. If the transmission fails verification, the sender may malfunction or intentionally attack the receiver's behavior and the receiver may reject further transactions.
2.14, ending the interface; the peer-to-peer connection should be closed in two cases; silent _ wait times out but does not receive any message from the other party, the node issues peer.
Node is exiting and BitSwap is closing, in which case the node issues peer.
2.2, providing a component; when the current node wants to download a data block and the current node is not found in the local block database, the network module (DHT) of BitSwap is called to search Providers, and once the Providers are found, the current node connects the current node and sends a WantList request to the current node. If block data is received, it is stored in the local block database. And updating local WantList, transmitting a record ledger, updating Session and the like.
2.3, declaring a key assembly; the Announce operation is performed by the local node as Key Provider. In the whole p2p network, Block is identified by Key (i.e. Cid Prefix), therefore, whenever a new Block data is added locally, provider worker will asynchronously conduct an intersection operation into the network to declare that it owns a certain Block. Thus, when other peers want to download, the Providers to be connected can be conveniently found according to the DHT Table.
And 2.4, managing connections among Peers by a chapter management component, wherein the connections include information such as request states (priority, cancelled or not) of the Peers, and live states of the Peers.
3. Decision Engine (Decision Engine);
the Decision Engine is the Credit management Module in backups. It manages a request queue, uses an account book to record the basic information between nodes and transmit backup records, and determines whether to respond to the backup download request of the opposite end. Therefore, a credit account book is established, and the following purposes are mainly achieved:
firstly, the method comprises the following steps: the efficiency of data exchange between nodes is improved;
II, secondly: prevention of freerder;
thirdly, the method comprises the following steps: preventing some aggressive behavior (e.g., Sybil attacks);
fourthly, the method comprises the following steps: establishing a loose mechanism for the trusted node;
the Credit record is between two nodes and is divided into two parts, namely Credit and Debt.
For example, if node a sends data to node B, then a owns Credit to B, and B owes Debt to a.
If the Credit owned by a to B exceeds Debt, B will immediately feed back data the next time it sends a WantList request block data to B.
In general, the Credit minus Debt of a is a net value, and Debt ratio (r) is used in backups.
The formula for the liability rate is:
debtRatio ═ bytes _ send/(bytes _ recv + 1); wherein bytes _ send represents byte reception and bytes _ recv represents byte transmission.
The node calculates the transmission rate P (send | r) of the node as 1-1/(1+ exp (6-3r)) according to the liability rate.
From these two functions it can be seen that the liability rate drops sharply when it reaches a certain value.
The significance of this model expression: if a node only receives data but does not share the data, the probability that other people send the data to the node is lower and lower (the probability is reduced to be close to 0 after a certain value is reached), and if the node keeps sharing the data continuously, the probability that other nodes send the data to the other nodes is higher and higher.
4. A demand management module (WantManager);
the wandManager module mainly manages the wandList request, and is an implementation module. The WantList is a core data structure, and by managing a message queue, once a new WantList Entry is added, a working thread of the message queue is triggered, so that a data block is sent to a specified Peer.
The WantManager provides some mechanisms to ensure the distribution of data blocks, for example, when the transmission fails, the rebladcast is retransmitted after waiting a certain interval, the transmission process is monitored through the WantList Gauge, and the request cancellation after the data download is completed is performed.
The invention provides a system for data backup between end-to-end nodes, which utilizes the p2p technology to construct a data backup system, provides distributed backup service and enhances the reliability of backup data.
Designing a backup module in a p2p program, and processing a related backup request through backups; other P2P peer nodes, after receiving backup request, call get command in program to realize relative backup operation;
the specific implementation mode is as follows:
assuming that A, B, C, D, E five nodes exist, a node A user adds a new file, the file is stored in a storage device through an add command, and then the file root hash value is returned to the user;
at this time, the system checks whether the file is backed up at the peer node of the p2p network; if the backup is not carried out, a backup request is sent to the backup module, and the backup module performs the following steps after receiving the request:
1. acquiring the number of all nodes connected with the current node, and broadcasting a root hash to other peer nodes, wherein the root hash is-peers (B, C, D, E);
2. the connected nodes return backup condition information (time length, hardware information, ip information and storage information) to the backup module nodes;
3. calculating related data nodes which accord with backup through a strategy, and if (B, C), informing the nodes to carry out backup;
4. node B, C receives the backup notification and performs the get operation over the root hash.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a computer-readable storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, and includes several instructions for enabling a terminal device to enter the method according to the embodiments of the present invention.
In the description herein, references to the description of the term "one embodiment," "another embodiment," or "first through xth embodiments," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, method steps, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (6)

1. A system for end-to-end node based data backup, comprising:
a message encoding module; the module encodes the message using a preset data serialization mechanism;
a network module defining the following components: a message flow component, a provisioning component, an announcement key component, a chapter management component, and a route;
a decision engine, the decision engine being a credit management module;
a requirement management module to manage the WantList request.
2. The peer-to-peer node based data backup system of claim 1 wherein messages communicated between nodes are divided into two types, one being request messages, which describe requests; the second is a block message, which is used to represent the block data that is transferred.
3. The end-to-end node-based data backup system of claim 2, wherein the message flow component is used for the sending and processing of the messages.
4. The peer-to-peer node-based data backup system of claim 3 wherein the message flow component defines four interfaces for processing requests and sending data blocks, the four interfaces being an open interface, a send demand interface, a send block interface, and an end interface, respectively.
5. The peer-to-peer node based data backup system of claim 1 wherein the decision engine is configured to use a request queue to record basic information between nodes and to transmit backup records using a ledger and to determine whether to respond to peer backup download requests.
6. The end-to-end node-based data backup system of claim 1, wherein the demand management module is an implementation module; the WantList is a core data structure, and by managing a message queue, once a new WantList Entry is added, a working thread of the message queue is triggered, so that a data block is sent to a specified Peer.
CN202011631046.5A 2020-12-30 2020-12-30 Data backup system based on end-to-end node Pending CN112732484A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011631046.5A CN112732484A (en) 2020-12-30 2020-12-30 Data backup system based on end-to-end node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011631046.5A CN112732484A (en) 2020-12-30 2020-12-30 Data backup system based on end-to-end node

Publications (1)

Publication Number Publication Date
CN112732484A true CN112732484A (en) 2021-04-30

Family

ID=75608280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011631046.5A Pending CN112732484A (en) 2020-12-30 2020-12-30 Data backup system based on end-to-end node

Country Status (1)

Country Link
CN (1) CN112732484A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100094921A1 (en) * 2008-10-13 2010-04-15 Subhash Chandra Roy Peer-To-Peer Distributed Storage
CN108805570A (en) * 2018-06-01 2018-11-13 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN109271283A (en) * 2018-09-06 2019-01-25 北京云测信息技术有限公司 A kind of data back up method based on block chain

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100094921A1 (en) * 2008-10-13 2010-04-15 Subhash Chandra Roy Peer-To-Peer Distributed Storage
CN108805570A (en) * 2018-06-01 2018-11-13 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN109271283A (en) * 2018-09-06 2019-01-25 北京云测信息技术有限公司 A kind of data back up method based on block chain

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNIWAY: "IPFS中的BitSwap协议", pages 1 - 4, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/39005672> *
LUZHOU: "IPFS中文白皮书", pages 1 - 12, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/34757382> *

Similar Documents

Publication Publication Date Title
CN100369026C (en) Transaction accelerator for client-server communication systems
CN107193490B (en) Distributed data storage system and method based on block chain
CN1692616B (en) Network traffic control in peer-to-peer environments
Zhao et al. Tapestry: A resilient global-scale overlay for service deployment
US8005865B2 (en) Systems and methods for notifying listeners of events
JP3956365B2 (en) System and method for responding to resource requests in a distributed computer network
US6694368B1 (en) Communication apparatus and method between distributed objects
EP1543420B1 (en) Consistent message ordering for semi-active and passive replication
CA2205725C (en) Preventing conflicts in distributed systems
CN100407627C (en) Method and system for realizing end-to-end data sharing
US20020198943A1 (en) Web-enabled two-way remote messaging facility
US20070005711A1 (en) System and method for building instant messaging applications
CN1881944B (en) Improved distributed kernel operating system
CN103873501B (en) A kind of cloud standby system and its data back up method
CN101356769A (en) Peer-to-peer message format data structure
US7876698B2 (en) Distributed presence management in peer-to-peer networks
KR20060045365A (en) System and method for sharing objects between computers over a network
AU2010214672A1 (en) Method for optimally utilizing a peer to peer network
US20090037451A1 (en) Attack and Disaster Resilient Cellular Storage Systems and Methods
CN101355591A (en) P2P network and scheduling method thereof
CN108683697A (en) Connect method, system, selector and the server of Redis memory databases
CN102034144B (en) The system and method on the scene for determining group
CN112732484A (en) Data backup system based on end-to-end node
Vardhan et al. A demand based fault tolerant file replication model for clouds
CN112311798A (en) Data block exchange method based on peer-to-peer hypermedia distribution protocol

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination