CN113572812A - File block synchronization method based on distributed cloud platform - Google Patents

File block synchronization method based on distributed cloud platform Download PDF

Info

Publication number
CN113572812A
CN113572812A CN202110684525.1A CN202110684525A CN113572812A CN 113572812 A CN113572812 A CN 113572812A CN 202110684525 A CN202110684525 A CN 202110684525A CN 113572812 A CN113572812 A CN 113572812A
Authority
CN
China
Prior art keywords
block
file
file block
log
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110684525.1A
Other languages
Chinese (zh)
Inventor
李帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongdun Innovation Archives Management Beijing Co ltd
Original Assignee
Zhongdun Innovation Archives Management Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongdun Innovation Archives Management Beijing Co ltd filed Critical Zhongdun Innovation Archives Management Beijing Co ltd
Priority to CN202110684525.1A priority Critical patent/CN113572812A/en
Publication of CN113572812A publication Critical patent/CN113572812A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/14Multichannel or multilink protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]

Abstract

The file block synchronization method based on the distributed cloud platform is characterized in that an incremental synchronization backup tool is designed and realized in a hierarchical and modular mode, a bottom layer is responsible for data storage, communication and data transmission, files are monitored by adopting an Inotify mechanism aiming at real-time synchronization requirements, efficient incremental synchronization is realized by core data synchronization through an RAMM algorithm and a partition-free single-hash bloom filter, a control module is responsible for overall control and scheduling, log recording, exception processing and the like, increments are searched in a file block partition mode, and file blocks of an incremental part are synchronized through a file block layer. Meanwhile, the invention comprehensively combines the current distributed block chain technical architecture, the consensus layer adopts a consensus mechanism to identify the distrust condition between the nodes, so that the two parties of the transaction can reach the agreement without the participation of a third party, and the information is maintained by all the file blocks together.

Description

File block synchronization method based on distributed cloud platform
Technical Field
The invention belongs to the technical field of network file block synchronization, and relates to a file block synchronization method based on a distributed cloud platform, in particular to a mirror file block synchronization method based on a special file block partitioning mode, which is used for carrying out difference comparison on new and old different versions of mirror file blocks when the mirror file blocks are distributed in a cloud data center.
Background
Large-scale archive blocks may be shared in a network system and need to be kept consistent in archive block versions at each host. For example, when a new version of an operating system image is distributed to hosts in a large data center, the new version of the image file block needs to be distributed to each host in the system in order to synchronize the image file blocks stored by the hosts in the system. However, transmitting the entire mirrored archive block is not only time consuming, but also can result in excessive network stress. Since the new version of the mirrored archive block tends to differ little from the old version of the archive block, these differences are only a small part compared to the complete mirrored archive block. Therefore, how to obtain these differences to achieve merging of the difference file block and the old version file block becomes a bottleneck to solve the above problems.
In recent years, in order to explore the difference between two archive blocks, various aspects related to an operating system, a remote desktop, deployment of P2P mirror archive blocks and the like are researched, and the following results are mainly achieved:
(1) the file block partitioning method comprises the following steps: the method includes the steps of dividing file blocks to be transmitted into a plurality of blocks, comparing the difference between the new file block and the old file block by taking the file blocks as units, and checking data according to the blocks. Similar ideas are also adopted for carrying out difference comparison by taking a 'line' as a unit under the Linux diff command. The file blocks are partitioned, namely the number of the file blocks of the large file block is reduced, and the data volume of the file blocks is abstracted into the number of the file blocks. The calculated file block difference is also in units of file blocks as a minimum. The smaller the calculated difference amount is, the higher the accuracy of the method is. However, if the file blocks are partitioned sufficiently, i.e., a file block is small enough, the amount of difference calculated by partitioning is very close to the actual amount of difference.
(2) Fixed-length file block partitioning method:
for Fixed-length chunking of file blocks, the most important chunking method at present comes from Fixed-Sized partitioning (FSP). The method is very simple, has high time efficiency, and is widely used for testing the correctness of the file blocks of the downloaded software. The input quantity of the method is the total length of the file blocks and the required block number, the file blocks can be fixedly divided according to the result of 'the length of the file blocks is the total length of the file/the required block number', the length of the tail end is insufficient, and the file blocks are directly divided into one block. For the block division method, in the process of carrying out difference operation, calculation can be completely carried out in a one-to-one correspondence mode. Two archives blocks that need to contrast all divide the piece according to same block length promptly, and write archives block number in proper order, later contrast two archives blocks in the same block number archives block content whether unanimous.
(3) The file block indefinite length block division method comprises the following steps:
the non-fixed-length block is also called variable-length block, and is provided for overcoming the problem of dislocation of the fixed-length block during comparison difference. The existing method with better effect at present is Content-based non-Defined Chunking (CDC). The method takes the content of the file blocks into consideration, locates the mark points of the blocks and divides the blocks, records the block marks of each block in the block description of the file blocks, and guides the division of another file block according to the description.
For the above problem analysis, it is known that the problem of inaccurate contrast difference of the file blocks is the existence of the above-mentioned "misplacement" problem. How to solve the problem of dislocation becomes a big problem in the field. In particular, with the application of the distributed blockchain technique, the technique of synchronizing file blocks in combination with the blockchain architecture is in need of further improvement.
Disclosure of Invention
In order to solve various problems faced in the current file block synchronization, the present application claims a file block synchronization method based on a distributed cloud platform, which adopts a distributed block chain network to complete file block synchronization, and is characterized by comprising:
the client and the server carry out communication interaction to ensure the transmission of the file blocks, and a TCP/IP link is adopted to improve the transmission efficiency in a multithreading mode;
the data layer controls and manages the whole block, the scheduling is completed, the information of the fault generated by the block is fed back and processed, the generated fault information is processed, and the fault block is notified to process once the fault is generated, so that the normal and continuous operation of the client and the server is ensured;
monitoring the change condition of the appointed directory or file block in real time, changing the file block into a queue in the form of an account book when the monitored file block changes, waiting for control processing, and at least comprising directory or file block monitoring and account book queue processing;
the application layer searches for the increment in a file block partition mode, and synchronizes file blocks of the increment part through a file block layer, wherein the file block layer at least comprises file block control, file block partition and partition verification;
the consensus layer adopts a consensus mechanism to identify the distrust condition between the nodes, so that two transaction parties can reach a consensus without participation of a third party, and information is commonly maintained by all the file blocks.
Further, the client and the server perform communication interaction to ensure the transmission of the file blocks, and a TCP/IP link is adopted to improve the transmission efficiency by using a multithread mode, and the method further includes:
the initialization starts, firstly, the information of the configuration file block is read, and at least IP address and network port information are obtained;
the main thread is created through createMainThread () function, which is mainly responsible for network communication links. Executing a main thread, monitoring the obtained network port, and waiting for a response message of a server;
when the server side responds to the request from the client side, the main thread responds to the request to establish a sub-thread which is mainly responsible for the transmission of the file blocks at the two ends;
and the sub-thread is closed after the communication transmission is finished, and the main thread still keeps monitoring the port and waits for the next link.
Furthermore, the data layer controls and manages the whole block, finishes scheduling, feeds back and processes the information of the fault generated by the block, processes the generated fault information, and informs the fault block to process once the fault is generated, so as to ensure the normal and continuous operation of the client and the server;
the system is responsible for coordination and cooperation among all blocks in the process, log recording and fault processing;
generating a log message, wherein the log records some conditions occurring in the operation process, at least including the log of the system operation and the log of the change of the archive blocks; processing fault messages, defining a message queue, and using the api to manage and maintain the message queue;
initializing by using an initControl () function, then creating a message queue, setting necessary global variables and initializing;
receiving a specified message by using an msgrcv function and then responding;
if the message is received, the message type is continuously judged, and if the message is not received, the message is continuously waited;
judging the type of the received message, transferring to the next step for operation according to different types, if the type of the received message is a log type, transferring to a log sub-block for processing, and if the type of the received message is a fault, transferring to a fault sub-block for processing;
repeatedly receiving the message until the initiative is finished;
the generating of the log message, the log recording some conditions occurring during the operation, including at least a log of the system operation and a log of the change of the archive block, the message processing of the log further includes:
initializing a block, wherein the initialization comprises initializing various parameters, and reading an archive block path of a system log and an archive block change log;
judging the type of the log according to the read information, and analyzing the detailed content of the log;
writing corresponding log file blocks according to different types, if the log type is the file block change type, recording the log into a file block change log, and if not, recording the log into a system log;
recording the contents into corresponding logs according to different log types and corresponding formats;
the processing of the fault message further comprises:
firstly, initializing a block, including initialization of needed parameters and a file block path;
after receiving the fault message, acquiring the fault type and detailed fault description of the fault message, and performing different processing according to different fault types;
recording different fault types and contents into a system log according to the format so as to find and repair a fault source more quickly during checking;
if the fault is judged to be serious and the existing processing method cannot solve the problem, the management personnel is informed to carry out detection and maintenance.
The change condition of the appointed catalogue of real time monitoring or archives piece when the archives piece that is monitored changes, can change archives piece and generate the queue with the form of account book, and the waiting control is handled, contains catalogue or archives piece control, account book queue processing at least, still includes:
reading and configuring objects to be monitored in a file block, if the objects are the file block, directly monitoring the file block, if the objects are directories, recursively traversing all the subdirectories of the directories, and establishing monitoring instances for all the subdirectories to monitor; the epoll technology is utilized to take account book drive as the response level, only the file block descriptor triggered by the account book is processed, and when the readable and writable account book occurs on the monitored file block descriptor, epoll _ wait () informs a processing program to read and write;
if the file block is not completely read or written this time, then epoll _ wait () is called the next time it is not notified again, i.e., it is notified only once, until a second readable and writable ledger appears on the file block descriptor.
Furthermore, the application layer searches for the increment in a file block partition mode, and synchronizes the file block of the increment part through the file block layer, which at least includes file block control, file block partition, partition check, and further includes:
reading data from the initial byte of the file;
storing the read data into the tail part of a check interval, wherein the check interval is defined as a window with len length;
judging whether the data volume of the check interval is equal to or not more than len, if not, continuing to execute the previous step, and if so, entering the next step;
calculating the total number n of bits of 1 in each byte of data in the check interval, and judging whether n is greater than or equal to a set threshold value threshold, if yes, determining the tail of the check interval as a block boundary, and using an RAMM (random access memory) indefinite length algorithm partition;
sliding a byte backwards in the check interval, calculating the total number n of bits 1 in each byte data in the check interval again, and judging whether n is greater than threshold, if so, determining that the tail of the check interval is a blocking boundary, and if not, entering the next step;
judging whether the distance between the tail of the current check interval and the right boundary of the last block is greater than or equal to a threshold change interval d, if not, re-entering the step, if so, subtracting 1 from the current threshold, and then re-entering the step until the block boundary is found;
the client sends all verification sequence codes to the server, strong verification sequence codes G1, G2, G3, G4 and G5' are calculated for each partition, MD5 is adopted, and when a partition boundary is found, the current threshold is reset to be an initial value;
repeating until the tail part of the check interval reaches the tail byte of the file;
the method comprises the following steps of taking a tail data block which is located at the tail of a file and does not meet a blocking condition as a single block, wherein the non-meeting the blocking condition means that no interval with the length of len exists in the data block at the tail of the file, so that the total number n of bits 1 in each byte of data in the interval is greater than or equal to a current threshold value threshold;
the server partitions the old file blocks in the same way, namely Grid1 ', Grid 2', Grid3 ', Grid 4' and Grid5 ', and partitions the old file blocks by using an RAMM algorithm in the same way, and calculates strong verification sequence codes G1', G2 ', G3', G4 'and G5' of each file block in the same way;
generating a bloom filter according to received verification sequence codes G1, G2, G3, G4 and G5 'sent by the client, and then judging whether differences exist in G1', G2 ', G3', G4 'and G5' through the bloom filter;
sending the difference information to a client to request the client to send a difference file block;
the client receives the request and sends the responded difference file block to the server according to the difference information;
after the server receives the different file blocks, the old file blocks are recombined to complete the synchronization of the file blocks;
for deleting a file block, if k file blocks are deleted in the ith file block, the k file blocks are not deleted to the next file block, and the block information of the ith file block comprises: the check value and the length of the ith file block, and the blocking information of the file block comprises: an offset value for the file block; if k file blocks are deleted from the ith file block, deleting the k file blocks to the next file block or even deleting the k file block to the jth file block, wherein the block information of the ith file block comprises: the check value and the length of the ith file block, and the blocking information of the (i + 1) th file block includes: check value, length, deviant and index number of the (i + 1) th file block, the blocking information of the file block includes: an offset value and an index number for the archive block.
Further, the consensus layer adopts a consensus mechanism to identify the distrust condition between the nodes, so that the two transaction parties can reach a consensus without the participation of a third party, and the information is maintained by all the archive blocks together, and the method further comprises the following steps:
selecting a part of trusted nodes to form a verification node list L, PL={P1,P2,P3,...PNGiving an initial integral IV to the nodeiEach node needs to serve other nodes to maintain the integral, the best block of the file is selected for packing the verification node in each round of consensus, and the integral of the verification node is packed by the worst block of the file, i.e. IV, with a coefficient yi=γIViγ ∈ (0, 1); when the node score in the verification node list L is below a specified value, the node will be cleared from the list, when there are insufficient 2/3 nodes left in the list, the list will be disassembled, and a new verification node list L will be generated.
The file block synchronization method based on the distributed cloud platform is characterized in that an incremental synchronization backup tool is designed and realized in a hierarchical and modular mode, a bottom layer is responsible for data storage, communication and data transmission, files are monitored by adopting an Inotify mechanism aiming at real-time synchronization requirements, efficient incremental synchronization is realized by core data synchronization through an RAMM algorithm and a partition-free single-hash bloom filter, a control module is responsible for overall control and scheduling, log recording, exception processing and the like, increments are searched in a file block partition mode, and file blocks of an incremental part are synchronized through a file block layer. Meanwhile, the invention comprehensively combines the current distributed block chain technical architecture, the consensus layer adopts a consensus mechanism to identify the distrust condition between the nodes, so that the two parties of the transaction can reach the agreement without the participation of a third party, and the information is maintained by all the file blocks together.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a distributed cloud platform-based archive block synchronization method according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.
Referring to fig. 1, the present application claims a file block synchronization method based on a distributed cloud platform, which uses a distributed block chain network to complete file block synchronization, and is characterized by comprising:
the client and the server carry out communication interaction to ensure the transmission of the file blocks, and a TCP/IP link is adopted to improve the transmission efficiency in a multithreading mode;
the data layer controls and manages the whole block, the scheduling is completed, the information of the fault generated by the block is fed back and processed, the generated fault information is processed, and the fault block is notified to process once the fault is generated, so that the normal and continuous operation of the client and the server is ensured;
monitoring the change condition of the appointed directory or file block in real time, changing the file block into a queue in the form of an account book when the monitored file block changes, waiting for control processing, and at least comprising directory or file block monitoring and account book queue processing;
the application layer searches for the increment in a file block partition mode, and synchronizes file blocks of the increment part through a file block layer, wherein the file block layer at least comprises file block control, file block partition and partition verification;
the consensus layer adopts a consensus mechanism to identify the distrust condition between the nodes, so that two transaction parties can reach a consensus without participation of a third party, and information is commonly maintained by all the file blocks.
Further, the client and the server perform communication interaction to ensure the transmission of the file blocks, and a TCP/IP link is adopted to improve the transmission efficiency by using a multithread mode, and the method further includes:
the initialization starts, firstly, the information of the configuration file block is read, and at least IP address and network port information are obtained;
the main thread is created through createMainThread () function, which is mainly responsible for network communication links. Executing a main thread, monitoring the obtained network port, and waiting for a response message of a server;
when the server side responds to the request from the client side, the main thread responds to the request to establish a sub-thread which is mainly responsible for the transmission of the file blocks at the two ends;
and the sub-thread is closed after the communication transmission is finished, and the main thread still keeps monitoring the port and waits for the next link.
Furthermore, the data layer controls and manages the whole block, finishes scheduling, feeds back and processes the information of the fault generated by the block, processes the generated fault information, and informs the fault block to process once the fault is generated, so as to ensure the normal and continuous operation of the client and the server;
the system is responsible for coordination and cooperation among all blocks in the process, log recording and fault processing;
generating a log message, wherein the log records some conditions occurring in the operation process, at least including the log of the system operation and the log of the change of the archive blocks; processing fault messages, defining a message queue, and using the api to manage and maintain the message queue;
initializing by using an initControl () function, then creating a message queue, setting necessary global variables and initializing;
receiving a specified message by using an msgrcv function and then responding;
if the message is received, the message type is continuously judged, and if the message is not received, the message is continuously waited;
judging the type of the received message, transferring to the next step for operation according to different types, if the type of the received message is a log type, transferring to a log sub-block for processing, and if the type of the received message is a fault, transferring to a fault sub-block for processing;
repeatedly receiving the message until the initiative is finished;
the generating of the log message, the log recording some conditions occurring during the operation, including at least a log of the system operation and a log of the change of the archive block, the message processing of the log further includes:
initializing a block, wherein the initialization comprises initializing various parameters, and reading an archive block path of a system log and an archive block change log;
judging the type of the log according to the read information, and analyzing the detailed content of the log;
writing corresponding log file blocks according to different types, if the log type is the file block change type, recording the log into a file block change log, and if not, recording the log into a system log;
recording the contents into corresponding logs according to different log types and corresponding formats;
the processing of the fault message further comprises:
firstly, initializing a block, including initialization of needed parameters and a file block path;
after receiving the fault message, acquiring the fault type and detailed fault description of the fault message, and performing different processing according to different fault types;
recording different fault types and contents into a system log according to the format so as to find and repair a fault source more quickly during checking;
if the fault is judged to be serious and the existing processing method cannot solve the problem, the management personnel is informed to carry out detection and maintenance.
The change condition of the appointed catalogue of real time monitoring or archives piece when the archives piece that is monitored changes, can change archives piece and generate the queue with the form of account book, and the waiting control is handled, contains catalogue or archives piece control, account book queue processing at least, still includes:
reading and configuring objects to be monitored in a file block, if the objects are the file block, directly monitoring the file block, if the objects are directories, recursively traversing all the subdirectories of the directories, and establishing monitoring instances for all the subdirectories to monitor;
the epoll technology is utilized to take account book drive as the response level, only the file block descriptor triggered by the account book is processed, and when the readable and writable account book occurs on the monitored file block descriptor, epoll _ wait () informs a processing program to read and write;
if the file block is not completely read or written this time, then epoll _ wait () is called the next time it is not notified again, i.e., it is notified only once, until a second readable and writable ledger appears on the file block descriptor.
Furthermore, the application layer searches for the increment in a file block partition mode, and synchronizes the file block of the increment part through the file block layer, which at least includes file block control, file block partition, partition check, and further includes:
reading data from the initial byte of the file;
storing the read data into the tail part of a check interval, wherein the check interval is defined as a window with len length;
judging whether the data volume of the check interval is equal to or not more than len, if not, continuing to execute the previous step, and if so, entering the next step;
calculating the total number n of bits of 1 in each byte of data in the check interval, and judging whether n is greater than or equal to a set threshold value threshold, if yes, determining the tail of the check interval as a block boundary, and using an RAMM (random access memory) indefinite length algorithm partition;
sliding a byte backwards in the check interval, calculating the total number n of bits 1 in each byte data in the check interval again, and judging whether n is greater than threshold, if so, determining that the tail of the check interval is a blocking boundary, and if not, entering the next step;
judging whether the distance between the tail of the current check interval and the right boundary of the last block is greater than or equal to a threshold change interval d, if not, re-entering the step, if so, subtracting 1 from the current threshold, and then re-entering the step until the block boundary is found;
the client sends all verification sequence codes to the server, strong verification sequence codes G1, G2, G3, G4 and G5' are calculated for each partition, MD5 is adopted, and when a partition boundary is found, the current threshold is reset to be an initial value;
repeating until the tail part of the check interval reaches the tail byte of the file;
the method comprises the following steps of taking a tail data block which is located at the tail of a file and does not meet a blocking condition as a single block, wherein the non-meeting the blocking condition means that no interval with the length of len exists in the data block at the tail of the file, so that the total number n of bits 1 in each byte of data in the interval is greater than or equal to a current threshold value threshold;
the server partitions the old file blocks in the same way, namely Grid1 ', Grid 2', Grid3 ', Grid 4' and Grid5 ', and partitions the old file blocks by using an RAMM algorithm in the same way, and calculates strong verification sequence codes G1', G2 ', G3', G4 'and G5' of each file block in the same way;
generating a bloom filter according to received verification sequence codes G1, G2, G3, G4 and G5 'sent by the client, and then judging whether differences exist in G1', G2 ', G3', G4 'and G5' through the bloom filter;
sending the difference information to a client to request the client to send a difference file block;
the client receives the request and sends the responded difference file block to the server according to the difference information;
after the server receives the different file blocks, the old file blocks are recombined to complete the synchronization of the file blocks;
for deleting a file block, if k file blocks are deleted in the ith file block, the k file blocks are not deleted to the next file block, and the block information of the ith file block comprises: the check value and the length of the ith file block, and the blocking information of the file block comprises: an offset value for the file block; if k file blocks are deleted from the ith file block, deleting the k file blocks to the next file block or even deleting the k file block to the jth file block, wherein the block information of the ith file block comprises: the check value and the length of the ith file block, and the blocking information of the (i + 1) th file block includes: check value, length, deviant and index number of the (i + 1) th file block, the blocking information of the file block includes: an offset value and an index number for the archive block.
Further, the consensus layer adopts a consensus mechanism to identify the distrust condition between the nodes, so that the two transaction parties can reach a consensus without the participation of a third party, and the information is maintained by all the archive blocks together, and the method further comprises the following steps:
selecting a part of trusted nodes to form a verification node list L, PL={P1,P2,P3,...PNGiving an initial integral IV to the nodeiEach node needs to serve other nodes to maintain the integral, the best block of the file is selected for packing the verification node in each round of consensus, and the integral of the verification node is packed by the worst block of the file, i.e. IV, with a coefficient yi=γIVi,γ∈(0,1),
(ii) a When the node score in the verification node list L is below a specified value, the node will be cleared from the list, when there are insufficient 2/3 nodes left in the list, the list will be disassembled, and a new verification node list L will be generated.
Further, the consensus process is in a dynamic data storage system, PL={P1,P2,P3,...PNIs the set of verification nodes, PNIs C (P)N) The combined candidate set to be verified is
Figure BDA0003123974160000091
Terminal-submitted block B of packed file blocksi={px1,px2…pxm},pxm∈C(PL) Obtaining other verification combinations and revenue sets denoted as Oi={φi1,φi2…φim:μi}. By a certain terminal PiPacked file block BiEach terminal of the composition verifies the combination (phi)i1,φi2…φim) In, P arbitrarily participating in verificationKTo PiSubmit file block BiIs expressed as phiik
The specific steps of consensus include:
(1) generating a list L of verification nodes for the earliest occurring Pi∈PLAnd μii1,φi2…φim) N, selecting PiPacking the file block into the optimal file block, and turning to the step (4); otherwise, executing the step (2);
(2)
Figure BDA0003123974160000092
so that n > muii1,φi2...φim)>μjj1,φj2…φjm) Selecting PiPacking the file block into the optimal file block, and turning to the step (4); otherwise, executing the step (3);
(3)
Figure BDA0003123974160000101
if n > muii1,φi2…φim)=μjj1,φj2...φjm)>
μkk1,φk2…φkm)
Then from Pi,PjIn the selection of the earliest arriving muiPackaging the file block as an optimal file block by the verification node of the current value;
(4)
Figure BDA0003123974160000102
mu.s ofii1,φi2…φim)<μjj1,φj2…φjm) Then select PiPack the file block as the worst file block and perform IVi=γIViγ ∈ (0, 1), to reduce the node integral, set the integral specification value to ε, if IViIf < epsilon, then P is judgediNode failure, clears it out of list L
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. A distributed cloud platform-based file block synchronization method adopts a distributed block chain network to complete file block synchronization, and is characterized by comprising the following steps:
the client and the server carry out communication interaction to ensure the transmission of the file blocks, and a TCP/IP link is adopted to improve the transmission efficiency in a multithreading mode;
the data layer controls and manages the whole block, the scheduling is completed, the information of the fault generated by the block is fed back and processed, the generated fault information is processed, and the fault block is notified to process once the fault is generated, so that the normal and continuous operation of the client and the server is ensured;
monitoring the change condition of the appointed directory or file block in real time, changing the file block into a queue in the form of an account book when the monitored file block changes, waiting for control processing, and at least comprising directory or file block monitoring and account book queue processing;
the application layer searches for the increment in a file block partition mode, and synchronizes file blocks of the increment part through a file block layer, wherein the file block layer at least comprises file block control, file block partition and partition verification;
the consensus layer adopts a consensus mechanism to identify the distrust condition between the nodes, so that two transaction parties can reach a consensus without participation of a third party, and information is commonly maintained by all the file blocks.
2. The distributed cloud platform-based archive block synchronization method of claim 1, wherein the client and the server perform communication interaction to ensure the transmission of archive blocks, and a TCP/IP link is used to improve transmission efficiency in a multi-thread manner, and further comprising:
the initialization starts, firstly, the information of the configuration file block is read, and at least IP address and network port information are obtained;
creating a main thread through a createMainThread () function, mainly taking charge of network communication link, executing the main thread, monitoring the obtained network port, and waiting for a response message of a server;
when the server side responds to the request from the client side, the main thread responds to the request to establish a sub-thread which is mainly responsible for the transmission of the file blocks at the two ends;
and the sub-thread is closed after the communication transmission is finished, and the main thread still keeps monitoring the port and waits for the next link.
3. The file block synchronization method based on the distributed cloud platform as claimed in claim 1, wherein the data layer controls and manages the whole block, completes scheduling, and feeds back and processes the information of the fault generated in the block, processes the generated fault information, and notifies the fault block to process once the fault is generated, thereby ensuring normal and continuous operation of the client and the server;
the system is responsible for coordination and cooperation among all blocks in the process, log recording and fault processing;
generating a log message, wherein the log records some conditions occurring in the operation process, at least including the log of the system operation and the log of the change of the archive blocks; processing fault messages, defining a message queue, and using the api to manage and maintain the message queue;
initializing by using an initControl () function, then creating a message queue, setting necessary global variables and initializing;
receiving a specified message by using an msgrcv function and then responding;
if the message is received, the message type is continuously judged, and if the message is not received, the message is continuously waited;
judging the type of the received message, transferring to the next step for operation according to different types, if the type of the received message is a log type, transferring to a log sub-block for processing, and if the type of the received message is a fault, transferring to a fault sub-block for processing;
repeatedly receiving the message until the initiative is finished;
the generating of the log message, the log recording some conditions occurring during the operation, including at least a log of the system operation and a log of the change of the archive block, the message processing of the log further includes:
initializing a block, wherein the initialization comprises initializing various parameters, and reading an archive block path of a system log and an archive block change log;
judging the type of the log according to the read information, and analyzing the detailed content of the log;
writing corresponding log file blocks according to different types, if the log type is the file block change type, recording the log into a file block change log, and if not, recording the log into a system log;
recording the contents into corresponding logs according to different log types and corresponding formats;
the processing of the fault message further comprises:
firstly, initializing a block, including initialization of needed parameters and a file block path;
after receiving the fault message, acquiring the fault type and detailed fault description of the fault message, and performing different processing according to different fault types;
recording different fault types and contents into a system log according to the format so as to find and repair a fault source more quickly during checking;
if the fault is judged to be serious and the existing processing method cannot solve the problem, the management personnel is informed to carry out detection and maintenance.
4. The file block synchronization method based on the distributed cloud platform as claimed in claim 1, wherein the method comprises the steps of monitoring change conditions of a designated directory or file block in real time, changing the file block to generate a queue in the form of an account book when the monitored file block changes, waiting for control processing, and at least comprising directory or file block monitoring and account book queue processing, and further comprising the steps of:
reading and configuring objects to be monitored in a file block, if the objects are the file block, directly monitoring the file block, if the objects are directories, recursively traversing all the subdirectories of the directories, and establishing monitoring instances for all the subdirectories to monitor;
the epoll technology is utilized to take account book drive as the response level, only the file block descriptor triggered by the account book is processed, and when the readable and writable account book occurs on the monitored file block descriptor, epoll _ wait () informs a processing program to read and write;
if the file block is not completely read or written this time, then epoll _ wait () is called the next time it is not notified again, i.e., it is notified only once, until a second readable and writable ledger appears on the file block descriptor.
5. The distributed cloud platform-based archive block synchronization method according to claim 1, wherein the application layer searches for the increment in an archive block partitioning manner, and synchronizes the archive blocks of the increment portion through an archive block layer, which at least includes archive block control, archive block partitioning, and partition checking, and further includes:
reading data from the initial byte of the file;
storing the read data into the tail part of a check interval, wherein the check interval is defined as a window with len length; judging whether the data volume of the check interval is equal to or not more than len, if not, continuing to execute the previous step, and if so, entering the next step;
calculating the total number n of bits of 1 in each byte of data in the check interval, and judging whether n is greater than or equal to a set threshold value threshold, if yes, determining the tail of the check interval as a block boundary, and using an RAMM (random access memory) indefinite length algorithm partition;
sliding a byte backwards in the check interval, calculating the total number n of bits 1 in each byte data in the check interval again, and judging whether n is greater than threshold, if so, determining that the tail of the check interval is a blocking boundary, and if not, entering the next step;
judging whether the distance between the tail of the current check interval and the right boundary of the last block is greater than or equal to a threshold change interval d, if not, re-entering the step, if so, subtracting 1 from the current threshold, and then re-entering the step until the block boundary is found;
the client sends all verification sequence codes to the server, strong verification sequence codes G1, G2, G3, G4 and G5' are calculated for each partition, MD5 is adopted, and when a partition boundary is found, the current threshold is reset to be an initial value;
repeating until the tail part of the check interval reaches the tail byte of the file;
the method comprises the following steps of taking a tail data block which is located at the tail of a file and does not meet a blocking condition as a single block, wherein the non-meeting the blocking condition means that no interval with the length of len exists in the data block at the tail of the file, so that the total number n of bits 1 in each byte of data in the interval is greater than or equal to a current threshold value threshold;
the server partitions the old file blocks in the same way, namely Grid1 ', Grid 2', Grid3 ', Grid 4' and Grid5 ', and partitions the old file blocks by using an RAMM algorithm in the same way, and calculates strong verification sequence codes G1', G2 ', G3', G4 'and G5' of each file block in the same way;
generating bloom filter according to received verification sequence codes G1, G2, G3, G4 and G5 sent by client terminal, and then generating bloom filter according to the verification sequence codes G1, G2, G3, G4 and G5 sent by client terminal
Then judging whether the differences exist among G1 ', G2 ', G3 ', G4 ' and G5 ' through a bloom filter;
sending the difference information to a client to request the client to send a difference file block;
the client receives the request and sends the responded difference file block to the server according to the difference information;
after the server receives the different file blocks, the old file blocks are recombined to complete the synchronization of the file blocks;
for deleting a file block, if k file blocks are deleted in the ith file block, the k file blocks are not deleted to the next file block, and the block information of the ith file block comprises: the check value and the length of the ith file block, and the blocking information of the file block comprises: an offset value for the file block; if k file blocks are deleted from the ith file block, deleting the k file blocks to the next file block or even deleting the k file block to the jth file block, wherein the block information of the ith file block comprises: the check value and the length of the ith file block, and the blocking information of the (i + 1) th file block includes: check value, length, deviant and index number of the (i + 1) th file block, the blocking information of the file block includes: an offset value and an index number for the archive block.
6. The distributed cloud platform-based archive block synchronization method according to claim 1, wherein the consensus layer identifies an untrusted situation between nodes by using a consensus mechanism, so that both trading parties agree without participation of a third party, and information is maintained by all archive blocks together, further comprising:
selecting a part of trusted nodes to form a verification node list L, PL={P1,P2,P3,...PNGiving an initial integral IV to the nodeiEach node needs to serve other nodes to maintain the integral, the best block of the file is selected for packing the verification node in each round of consensus, and the integral of the verification node is packed by the worst block of the file, i.e. IV, with a coefficient yi=γ IViγ ∈ (0, 1); when the node score in the verification node list L is below a specified value, the node will be cleared from the list, when there are insufficient 2/3 nodes left in the list, the list will be disassembled, and a new verification node list L will be generated.
CN202110684525.1A 2021-06-21 2021-06-21 File block synchronization method based on distributed cloud platform Pending CN113572812A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110684525.1A CN113572812A (en) 2021-06-21 2021-06-21 File block synchronization method based on distributed cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110684525.1A CN113572812A (en) 2021-06-21 2021-06-21 File block synchronization method based on distributed cloud platform

Publications (1)

Publication Number Publication Date
CN113572812A true CN113572812A (en) 2021-10-29

Family

ID=78162454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110684525.1A Pending CN113572812A (en) 2021-06-21 2021-06-21 File block synchronization method based on distributed cloud platform

Country Status (1)

Country Link
CN (1) CN113572812A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898506A (en) * 2022-05-12 2022-08-12 南京百米需供应链管理有限公司 Intelligent cabinet control system and control method
CN115631065A (en) * 2022-12-21 2023-01-20 国网江苏省电力有限公司营销服务中心 File accuracy detection method based on front-end electricity utilization service change

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368388A (en) * 2017-06-20 2017-11-21 华南理工大学 A kind of database real time backup method for monitoring file system change
CN109670950A (en) * 2018-10-29 2019-04-23 平安科技(深圳)有限公司 Transaction monitor method, device, equipment and storage medium based on block chain
WO2019228569A2 (en) * 2019-09-12 2019-12-05 Alibaba Group Holding Limited Log-structured storage systems
CN111367871A (en) * 2020-02-29 2020-07-03 华南理工大学 Method for increment synchronization among files based on SAPCI (software application programming interface) variable-length blocks
CN111966529A (en) * 2020-07-14 2020-11-20 上海浩霖汇信息科技有限公司 Method and system for real-time incremental synchronous backup of database files

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368388A (en) * 2017-06-20 2017-11-21 华南理工大学 A kind of database real time backup method for monitoring file system change
CN109670950A (en) * 2018-10-29 2019-04-23 平安科技(深圳)有限公司 Transaction monitor method, device, equipment and storage medium based on block chain
WO2019228569A2 (en) * 2019-09-12 2019-12-05 Alibaba Group Holding Limited Log-structured storage systems
CN111367871A (en) * 2020-02-29 2020-07-03 华南理工大学 Method for increment synchronization among files based on SAPCI (software application programming interface) variable-length blocks
CN111966529A (en) * 2020-07-14 2020-11-20 上海浩霖汇信息科技有限公司 Method and system for real-time incremental synchronous backup of database files

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
乔蕊: "物联网联盟链数据存储与访问控制关键技术研究", 《战略支援部队信息工程大学》 *
乔蕊等: "基于联盟链的物联网动态数据溯源机制", 《软件学报》 *
俞快: "基于数据分块的文件增量同步技术研究与实现", 《华南理工大学》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898506A (en) * 2022-05-12 2022-08-12 南京百米需供应链管理有限公司 Intelligent cabinet control system and control method
CN114898506B (en) * 2022-05-12 2023-06-27 南京百米需供应链管理有限公司 Intelligent cabinet control system and control method
CN115631065A (en) * 2022-12-21 2023-01-20 国网江苏省电力有限公司营销服务中心 File accuracy detection method based on front-end electricity utilization service change

Similar Documents

Publication Publication Date Title
US11677687B2 (en) Switching between fault response models in a storage system
US11281644B2 (en) Blockchain logging of data from multiple systems
US10795911B2 (en) Apparatus and method for replicating changed-data in source database management system to target database management system in real time
US9098455B2 (en) Systems and methods of event driven recovery management
US8977602B2 (en) Offline verification of replicated file system
US7177995B2 (en) Long term data protection system and method
US7685384B2 (en) System and method for replicating files in a computer network
EP3258369B1 (en) Systems and methods for distributed storage
US8839031B2 (en) Data consistency between virtual machines
EP2330519A1 (en) Distributed file system and data block consistency managing method thereof
US9122635B1 (en) Efficient data backup with change tracking
US10459886B2 (en) Client-side deduplication with local chunk caching
CN113572812A (en) File block synchronization method based on distributed cloud platform
US9824131B2 (en) Regulating a replication operation
US8930751B2 (en) Initializing replication in a virtual machine
US10210188B2 (en) Multi-tiered data storage in a deduplication system
US11074224B2 (en) Partitioned data replication
CN109947730B (en) Metadata recovery method, device, distributed file system and readable storage medium
KR101254179B1 (en) Method for effective data recovery in distributed file system
US20160044077A1 (en) Policy use in a data mover employing different channel protocols
CN111404737B (en) Disaster recovery processing method and related device
US20180067653A1 (en) De-duplicating multi-device plugin
US8954384B2 (en) File storage system and file storage method
CN111522688A (en) Data backup method and device for distributed system
CN117555493A (en) Data processing method, system, device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 769, building 2, East Ring Road, Yanqing Park, Zhongguancun, Yanqing District, Beijing 102101

Applicant after: ZHONGDUN innovative digital technology (Beijing) Co.,Ltd.

Address before: Room 769, building 2, East Ring Road, Yanqing Park, Zhongguancun, Yanqing District, Beijing 102101

Applicant before: ZHONGDUN innovation archives management (Beijing) Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211029