CN113572812A

CN113572812A - File block synchronization method based on distributed cloud platform

Info

Publication number: CN113572812A
Application number: CN202110684525.1A
Authority: CN
Inventors: 李帅
Original assignee: Zhongdun Innovation Archives Management Beijing Co ltd
Current assignee: Zhongdun Innovation Archives Management Beijing Co ltd
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2021-10-29

Abstract

The file block synchronization method based on the distributed cloud platform is characterized in that an incremental synchronization backup tool is designed and realized in a hierarchical and modular mode, a bottom layer is responsible for data storage, communication and data transmission, files are monitored by adopting an Inotify mechanism aiming at real-time synchronization requirements, efficient incremental synchronization is realized by core data synchronization through an RAMM algorithm and a partition-free single-hash bloom filter, a control module is responsible for overall control and scheduling, log recording, exception processing and the like, increments are searched in a file block partition mode, and file blocks of an incremental part are synchronized through a file block layer. Meanwhile, the invention comprehensively combines the current distributed block chain technical architecture, the consensus layer adopts a consensus mechanism to identify the distrust condition between the nodes, so that the two parties of the transaction can reach the agreement without the participation of a third party, and the information is maintained by all the file blocks together.

Description

File block synchronization method based on distributed cloud platform

Technical Field

The invention belongs to the technical field of network file block synchronization, and relates to a file block synchronization method based on a distributed cloud platform, in particular to a mirror file block synchronization method based on a special file block partitioning mode, which is used for carrying out difference comparison on new and old different versions of mirror file blocks when the mirror file blocks are distributed in a cloud data center.

Background

Large-scale archive blocks may be shared in a network system and need to be kept consistent in archive block versions at each host. For example, when a new version of an operating system image is distributed to hosts in a large data center, the new version of the image file block needs to be distributed to each host in the system in order to synchronize the image file blocks stored by the hosts in the system. However, transmitting the entire mirrored archive block is not only time consuming, but also can result in excessive network stress. Since the new version of the mirrored archive block tends to differ little from the old version of the archive block, these differences are only a small part compared to the complete mirrored archive block. Therefore, how to obtain these differences to achieve merging of the difference file block and the old version file block becomes a bottleneck to solve the above problems.

In recent years, in order to explore the difference between two archive blocks, various aspects related to an operating system, a remote desktop, deployment of P2P mirror archive blocks and the like are researched, and the following results are mainly achieved:

(1) the file block partitioning method comprises the following steps: the method includes the steps of dividing file blocks to be transmitted into a plurality of blocks, comparing the difference between the new file block and the old file block by taking the file blocks as units, and checking data according to the blocks. Similar ideas are also adopted for carrying out difference comparison by taking a 'line' as a unit under the Linux diff command. The file blocks are partitioned, namely the number of the file blocks of the large file block is reduced, and the data volume of the file blocks is abstracted into the number of the file blocks. The calculated file block difference is also in units of file blocks as a minimum. The smaller the calculated difference amount is, the higher the accuracy of the method is. However, if the file blocks are partitioned sufficiently, i.e., a file block is small enough, the amount of difference calculated by partitioning is very close to the actual amount of difference.

(2) Fixed-length file block partitioning method:

for Fixed-length chunking of file blocks, the most important chunking method at present comes from Fixed-Sized partitioning (FSP). The method is very simple, has high time efficiency, and is widely used for testing the correctness of the file blocks of the downloaded software. The input quantity of the method is the total length of the file blocks and the required block number, the file blocks can be fixedly divided according to the result of 'the length of the file blocks is the total length of the file/the required block number', the length of the tail end is insufficient, and the file blocks are directly divided into one block. For the block division method, in the process of carrying out difference operation, calculation can be completely carried out in a one-to-one correspondence mode. Two archives blocks that need to contrast all divide the piece according to same block length promptly, and write archives block number in proper order, later contrast two archives blocks in the same block number archives block content whether unanimous.

(3) The file block indefinite length block division method comprises the following steps:

the non-fixed-length block is also called variable-length block, and is provided for overcoming the problem of dislocation of the fixed-length block during comparison difference. The existing method with better effect at present is Content-based non-Defined Chunking (CDC). The method takes the content of the file blocks into consideration, locates the mark points of the blocks and divides the blocks, records the block marks of each block in the block description of the file blocks, and guides the division of another file block according to the description.

For the above problem analysis, it is known that the problem of inaccurate contrast difference of the file blocks is the existence of the above-mentioned "misplacement" problem. How to solve the problem of dislocation becomes a big problem in the field. In particular, with the application of the distributed blockchain technique, the technique of synchronizing file blocks in combination with the blockchain architecture is in need of further improvement.

Disclosure of Invention

In order to solve various problems faced in the current file block synchronization, the present application claims a file block synchronization method based on a distributed cloud platform, which adopts a distributed block chain network to complete file block synchronization, and is characterized by comprising:

the client and the server carry out communication interaction to ensure the transmission of the file blocks, and a TCP/IP link is adopted to improve the transmission efficiency in a multithreading mode;

the data layer controls and manages the whole block, the scheduling is completed, the information of the fault generated by the block is fed back and processed, the generated fault information is processed, and the fault block is notified to process once the fault is generated, so that the normal and continuous operation of the client and the server is ensured;

monitoring the change condition of the appointed directory or file block in real time, changing the file block into a queue in the form of an account book when the monitored file block changes, waiting for control processing, and at least comprising directory or file block monitoring and account book queue processing;

the application layer searches for the increment in a file block partition mode, and synchronizes file blocks of the increment part through a file block layer, wherein the file block layer at least comprises file block control, file block partition and partition verification;

the consensus layer adopts a consensus mechanism to identify the distrust condition between the nodes, so that two transaction parties can reach a consensus without participation of a third party, and information is commonly maintained by all the file blocks.

Further, the client and the server perform communication interaction to ensure the transmission of the file blocks, and a TCP/IP link is adopted to improve the transmission efficiency by using a multithread mode, and the method further includes:

the initialization starts, firstly, the information of the configuration file block is read, and at least IP address and network port information are obtained;

the main thread is created through createMainThread () function, which is mainly responsible for network communication links. Executing a main thread, monitoring the obtained network port, and waiting for a response message of a server;

when the server side responds to the request from the client side, the main thread responds to the request to establish a sub-thread which is mainly responsible for the transmission of the file blocks at the two ends;

and the sub-thread is closed after the communication transmission is finished, and the main thread still keeps monitoring the port and waits for the next link.

Furthermore, the data layer controls and manages the whole block, finishes scheduling, feeds back and processes the information of the fault generated by the block, processes the generated fault information, and informs the fault block to process once the fault is generated, so as to ensure the normal and continuous operation of the client and the server;

the system is responsible for coordination and cooperation among all blocks in the process, log recording and fault processing;

generating a log message, wherein the log records some conditions occurring in the operation process, at least including the log of the system operation and the log of the change of the archive blocks; processing fault messages, defining a message queue, and using the api to manage and maintain the message queue;

initializing by using an initControl () function, then creating a message queue, setting necessary global variables and initializing;

receiving a specified message by using an msgrcv function and then responding;

if the message is received, the message type is continuously judged, and if the message is not received, the message is continuously waited;

judging the type of the received message, transferring to the next step for operation according to different types, if the type of the received message is a log type, transferring to a log sub-block for processing, and if the type of the received message is a fault, transferring to a fault sub-block for processing;

repeatedly receiving the message until the initiative is finished;

the generating of the log message, the log recording some conditions occurring during the operation, including at least a log of the system operation and a log of the change of the archive block, the message processing of the log further includes:

initializing a block, wherein the initialization comprises initializing various parameters, and reading an archive block path of a system log and an archive block change log;

judging the type of the log according to the read information, and analyzing the detailed content of the log;

writing corresponding log file blocks according to different types, if the log type is the file block change type, recording the log into a file block change log, and if not, recording the log into a system log;

recording the contents into corresponding logs according to different log types and corresponding formats;

the processing of the fault message further comprises:

firstly, initializing a block, including initialization of needed parameters and a file block path;

after receiving the fault message, acquiring the fault type and detailed fault description of the fault message, and performing different processing according to different fault types;

recording different fault types and contents into a system log according to the format so as to find and repair a fault source more quickly during checking;

if the fault is judged to be serious and the existing processing method cannot solve the problem, the management personnel is informed to carry out detection and maintenance.

The change condition of the appointed catalogue of real time monitoring or archives piece when the archives piece that is monitored changes, can change archives piece and generate the queue with the form of account book, and the waiting control is handled, contains catalogue or archives piece control, account book queue processing at least, still includes:

reading and configuring objects to be monitored in a file block, if the objects are the file block, directly monitoring the file block, if the objects are directories, recursively traversing all the subdirectories of the directories, and establishing monitoring instances for all the subdirectories to monitor; the epoll technology is utilized to take account book drive as the response level, only the file block descriptor triggered by the account book is processed, and when the readable and writable account book occurs on the monitored file block descriptor, epoll _ wait () informs a processing program to read and write;

if the file block is not completely read or written this time, then epoll _ wait () is called the next time it is not notified again, i.e., it is notified only once, until a second readable and writable ledger appears on the file block descriptor.

Furthermore, the application layer searches for the increment in a file block partition mode, and synchronizes the file block of the increment part through the file block layer, which at least includes file block control, file block partition, partition check, and further includes:

reading data from the initial byte of the file;

storing the read data into the tail part of a check interval, wherein the check interval is defined as a window with len length;

judging whether the data volume of the check interval is equal to or not more than len, if not, continuing to execute the previous step, and if so, entering the next step;

calculating the total number n of bits of 1 in each byte of data in the check interval, and judging whether n is greater than or equal to a set threshold value threshold, if yes, determining the tail of the check interval as a block boundary, and using an RAMM (random access memory) indefinite length algorithm partition;

sliding a byte backwards in the check interval, calculating the total number n of bits 1 in each byte data in the check interval again, and judging whether n is greater than threshold, if so, determining that the tail of the check interval is a blocking boundary, and if not, entering the next step;

judging whether the distance between the tail of the current check interval and the right boundary of the last block is greater than or equal to a threshold change interval d, if not, re-entering the step, if so, subtracting 1 from the current threshold, and then re-entering the step until the block boundary is found;

the client sends all verification sequence codes to the server, strong verification sequence codes G1, G2, G3, G4 and G5' are calculated for each partition, MD5 is adopted, and when a partition boundary is found, the current threshold is reset to be an initial value;

repeating until the tail part of the check interval reaches the tail byte of the file;

the method comprises the following steps of taking a tail data block which is located at the tail of a file and does not meet a blocking condition as a single block, wherein the non-meeting the blocking condition means that no interval with the length of len exists in the data block at the tail of the file, so that the total number n of bits 1 in each byte of data in the interval is greater than or equal to a current threshold value threshold;

the server partitions the old file blocks in the same way, namely Grid1 ', Grid 2', Grid3 ', Grid 4' and Grid5 ', and partitions the old file blocks by using an RAMM algorithm in the same way, and calculates strong verification sequence codes G1', G2 ', G3', G4 'and G5' of each file block in the same way;

generating a bloom filter according to received verification sequence codes G1, G2, G3, G4 and G5 'sent by the client, and then judging whether differences exist in G1', G2 ', G3', G4 'and G5' through the bloom filter;

sending the difference information to a client to request the client to send a difference file block;

the client receives the request and sends the responded difference file block to the server according to the difference information;

after the server receives the different file blocks, the old file blocks are recombined to complete the synchronization of the file blocks;

for deleting a file block, if k file blocks are deleted in the ith file block, the k file blocks are not deleted to the next file block, and the block information of the ith file block comprises: the check value and the length of the ith file block, and the blocking information of the file block comprises: an offset value for the file block; if k file blocks are deleted from the ith file block, deleting the k file blocks to the next file block or even deleting the k file block to the jth file block, wherein the block information of the ith file block comprises: the check value and the length of the ith file block, and the blocking information of the (i + 1) th file block includes: check value, length, deviant and index number of the (i + 1) th file block, the blocking information of the file block includes: an offset value and an index number for the archive block.

Further, the consensus layer adopts a consensus mechanism to identify the distrust condition between the nodes, so that the two transaction parties can reach a consensus without the participation of a third party, and the information is maintained by all the archive blocks together, and the method further comprises the following steps:

selecting a part of trusted nodes to form a verification node list L, P_L＝{P₁，P₂，P₃，...P_NGiving an initial integral IV to the node_iEach node needs to serve other nodes to maintain the integral, the best block of the file is selected for packing the verification node in each round of consensus, and the integral of the verification node is packed by the worst block of the file, i.e. IV, with a coefficient y_i＝γIV_iγ ∈ (0, 1); when the node score in the verification node list L is below a specified value, the node will be cleared from the list, when there are insufficient 2/3 nodes left in the list, the list will be disassembled, and a new verification node list L will be generated.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a distributed cloud platform-based archive block synchronization method according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.

Referring to fig. 1, the present application claims a file block synchronization method based on a distributed cloud platform, which uses a distributed block chain network to complete file block synchronization, and is characterized by comprising:

receiving a specified message by using an msgrcv function and then responding;

repeatedly receiving the message until the initiative is finished;

the processing of the fault message further comprises:

reading and configuring objects to be monitored in a file block, if the objects are the file block, directly monitoring the file block, if the objects are directories, recursively traversing all the subdirectories of the directories, and establishing monitoring instances for all the subdirectories to monitor;

the epoll technology is utilized to take account book drive as the response level, only the file block descriptor triggered by the account book is processed, and when the readable and writable account book occurs on the monitored file block descriptor, epoll _ wait () informs a processing program to read and write;

reading data from the initial byte of the file;

selecting a part of trusted nodes to form a verification node list L, P_L＝{P₁，P₂，P₃，...P_NGiving an initial integral IV to the node_iEach node needs to serve other nodes to maintain the integral, the best block of the file is selected for packing the verification node in each round of consensus, and the integral of the verification node is packed by the worst block of the file, i.e. IV, with a coefficient y_i＝γIV_i，γ∈(0，1)，

(ii) a When the node score in the verification node list L is below a specified value, the node will be cleared from the list, when there are insufficient 2/3 nodes left in the list, the list will be disassembled, and a new verification node list L will be generated.

Further, the consensus process is in a dynamic data storage system, P_L＝{P₁，P₂，P₃，...P_NIs the set of verification nodes, P_NIs C (P)_N) The combined candidate set to be verified is

Terminal-submitted block B of packed file blocks_i＝{px₁，px₂…px_m}，px_m∈C(P_L) Obtaining other verification combinations and revenue sets denoted as O_i＝{φ_i1，φ_i2…φ_im：μ_i}. By a certain terminal P_iPacked file block B_iEach terminal of the composition verifies the combination (phi)_i1，φ_i2…φ_im) In, P arbitrarily participating in verification_KTo P_iSubmit file block B_iIs expressed as phi_ik；

The specific steps of consensus include:

(1) generating a list L of verification nodes for the earliest occurring P_i∈P_LAnd μ_i(φ_i1，φ_i2…φ_im) N, selecting P_iPacking the file block into the optimal file block, and turning to the step (4); otherwise, executing the step (2);

(2)

so that n > mu_i(φ_i1，φ_i2...φ_im)＞μ_j(φ_j1，φ_j2…φ_jm) Selecting P_iPacking the file block into the optimal file block, and turning to the step (4); otherwise, executing the step (3);

(3)

if n > mu_i(φ_i1，φ_i2…φ_im)＝μ_j(φ_j1，φ_j2...φ_jm)＞

μ_k(φ_k1，φ_k2…φ_km)

Then from P_i，P_jIn the selection of the earliest arriving mu_iPackaging the file block as an optimal file block by the verification node of the current value;

(4)

mu.s of_i(φ_i1，φ_i2…φ_im)＜μ_j(φ_j1，φ_j2…φ_jm) Then select P_iPack the file block as the worst file block and perform IV_i＝γIV_iγ ∈ (0, 1), to reduce the node integral, set the integral specification value to ε, if IV_iIf < epsilon, then P is judged_iNode failure, clears it out of list L

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A distributed cloud platform-based file block synchronization method adopts a distributed block chain network to complete file block synchronization, and is characterized by comprising the following steps:

2. The distributed cloud platform-based archive block synchronization method of claim 1, wherein the client and the server perform communication interaction to ensure the transmission of archive blocks, and a TCP/IP link is used to improve transmission efficiency in a multi-thread manner, and further comprising:

creating a main thread through a createMainThread () function, mainly taking charge of network communication link, executing the main thread, monitoring the obtained network port, and waiting for a response message of a server;

3. The file block synchronization method based on the distributed cloud platform as claimed in claim 1, wherein the data layer controls and manages the whole block, completes scheduling, and feeds back and processes the information of the fault generated in the block, processes the generated fault information, and notifies the fault block to process once the fault is generated, thereby ensuring normal and continuous operation of the client and the server;

receiving a specified message by using an msgrcv function and then responding;

repeatedly receiving the message until the initiative is finished;

the processing of the fault message further comprises:

4. The file block synchronization method based on the distributed cloud platform as claimed in claim 1, wherein the method comprises the steps of monitoring change conditions of a designated directory or file block in real time, changing the file block to generate a queue in the form of an account book when the monitored file block changes, waiting for control processing, and at least comprising directory or file block monitoring and account book queue processing, and further comprising the steps of:

5. The distributed cloud platform-based archive block synchronization method according to claim 1, wherein the application layer searches for the increment in an archive block partitioning manner, and synchronizes the archive blocks of the increment portion through an archive block layer, which at least includes archive block control, archive block partitioning, and partition checking, and further includes:

reading data from the initial byte of the file;

storing the read data into the tail part of a check interval, wherein the check interval is defined as a window with len length; judging whether the data volume of the check interval is equal to or not more than len, if not, continuing to execute the previous step, and if so, entering the next step;

generating bloom filter according to received verification sequence codes G1, G2, G3, G4 and G5 sent by client terminal, and then generating bloom filter according to the verification sequence codes G1, G2, G3, G4 and G5 sent by client terminal

Then judging whether the differences exist among G1 ', G2 ', G3 ', G4 ' and G5 ' through a bloom filter;

6. The distributed cloud platform-based archive block synchronization method according to claim 1, wherein the consensus layer identifies an untrusted situation between nodes by using a consensus mechanism, so that both trading parties agree without participation of a third party, and information is maintained by all archive blocks together, further comprising:

selecting a part of trusted nodes to form a verification node list L, P_L＝{P₁，P₂，P₃，...P_NGiving an initial integral IV to the node_iEach node needs to serve other nodes to maintain the integral, the best block of the file is selected for packing the verification node in each round of consensus, and the integral of the verification node is packed by the worst block of the file, i.e. IV, with a coefficient y_i＝γ IV_iγ ∈ (0, 1); when the node score in the verification node list L is below a specified value, the node will be cleared from the list, when there are insufficient 2/3 nodes left in the list, the list will be disassembled, and a new verification node list L will be generated.