CN106209974A - A kind of method of data synchronization, equipment and system - Google Patents

A kind of method of data synchronization, equipment and system Download PDF

Info

Publication number
CN106209974A
CN106209974A CN201610451188.0A CN201610451188A CN106209974A CN 106209974 A CN106209974 A CN 106209974A CN 201610451188 A CN201610451188 A CN 201610451188A CN 106209974 A CN106209974 A CN 106209974A
Authority
CN
China
Prior art keywords
data block
data
transmitted
block
check code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610451188.0A
Other languages
Chinese (zh)
Other versions
CN106209974B (en
Inventor
张建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201610451188.0A priority Critical patent/CN106209974B/en
Publication of CN106209974A publication Critical patent/CN106209974A/en
Application granted granted Critical
Publication of CN106209974B publication Critical patent/CN106209974B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • H04L67/5651Reducing the amount or size of exchanged application data

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of method of data synchronization, equipment and system, method includes: be respectively configured the corresponding sampling period for different types of service;When arriving the sampling period of target type service correspondence, gather the data of the required transmission of described target type service;The data transmitted needed for described are divided at least one data block;Determine the first data block not being transmitted across at least one data block described, the second data block having been transmitted through, and determine the first data directory of described first data block, the second data directory of described second data block;By described first data block, described first data directory and described second data directory, send to data center, so that described data center storage at least one data block described.According to this programme, it is possible to reduce the data volume in transmitting procedure, and then the occupancy of the network bandwidth can be reduced.

Description

Data synchronization method, device and system
Technical Field
The invention relates to the technical field of cloud computing, in particular to a data synchronization method, equipment and system.
Background
The cloud computing technology utilizes a virtualization technology to pool resources such as computing, storage, networks and the like, and provides shared software and hardware to users in a service mode through the internet. Among them, the PAAS (Platform as a Service) Platform is a Service type of cloud computing, and provides software deployment, operation and maintenance as a Service to software development users as needed, which is a very popular research direction in recent years.
In the PAAS platform, in order to implement system services and various customized services, data generated on each physical machine needs to be synchronously stored. When the data generated on each physical machine is synchronized in real time, the physical machines acquire the data generated by the physical machines and send the data to the data center through network resources, and the data center stores the data into the real-time database.
Network resources are unusually precious for the PAAS platform, and in the prior art, each physical machine generates a large amount of data, which results in a large occupation of network bandwidth.
Disclosure of Invention
The embodiment of the invention provides a data synchronization method, equipment and a system, which are used for reducing the occupation amount of network bandwidth.
In a first aspect, an embodiment of the present invention provides a data synchronization method, which is applied to a physical machine and respectively configures corresponding sampling periods for different types of services; the method comprises the following steps:
when a sampling period corresponding to a target type service is reached, acquiring data required to be transmitted by the target type service;
dividing the data to be transmitted into at least one data block;
determining a first data block which has not been transmitted and a second data block which has been transmitted in the at least one data block, and determining a first data index of the first data block and a second data index of the second data block;
and sending the first data block, the first data index and the second data index to a data center so that the data center stores the at least one data block.
Preferably, the first and second electrodes are formed of a metal,
before dividing the data to be transmitted into at least one data block, further comprising: calculating the division length by using a first formula;
the first formula includes:
k ( s i ) = p e r i o d ( s i ) s t a t u s ( s i ) * λ + d e f a t u l t _ s i z e
wherein k (S)i) For characterizing SiCorresponding division lengthDegree; siThe system is used for characterizing the ith service provided by the platform; period(s)i) For characterizing pairs SiA configured sampling period; status(s)i) For characterising siEvaluating the state change granularity of (1); λ is used to represent the influence factor, and is a known constant; default _ size is used to characterize the default length of the data block;
dividing the data to be transmitted into at least one data block, including: dividing the data to be transmitted into the at least one data block by using the division length;
and/or the presence of a gas in the gas,
after sending the first data block, the first data index, and the second data index to a data center, further comprising: locally storing the at least one data block and a data index corresponding to each data block in the at least one data block;
and/or the presence of a gas in the gas,
the determining that a first data block has not been transmitted and a second data block has been transmitted in the at least one data block includes:
determining a first check code corresponding to each data block in each locally stored data block;
calculating a second check code corresponding to each data block in the at least one data block;
traversing the first check code according to the second check code; and taking the data block corresponding to the second check code in the first check code as the second data block, and taking the data block corresponding to the second check code which is not in the first check code as the first data block.
Preferably, the calculating the second parity code corresponding to each of the at least one data block includes:
calculating a second check code corresponding to each data block in the at least one data block by using a second formula, a third formula and a fourth formula;
the second formula includes:
A ( 1 , k ) = ( Σ j = 1 k d a t a [ j ] ) mod M ;
the third formula includes:
B ( 1 , k ) = ( Σ j = 1 k ( k - j + 1 ) d a t a [ i ] ) mod M ;
the fourth formula includes:
Adler32(1,k)=A(1,k)+216B(1,k);
wherein Adler32(1, k) is used to characterize the second parity code; a (1, k) is used for characterizing the first intermediate parameter, B (1, k) is used for characterizing the second intermediate parameter; data [ j ] is used for representing data corresponding to the jth byte in the current data block; m is used to characterize a known constant; k is used to characterize the number of bytes included in the current data block.
In a second aspect, an embodiment of the present invention further provides a data synchronization method, applied to a data center, including:
acquiring a first data block aiming at a target type service, a first data index corresponding to the first data block and a second data index corresponding to a second data block, which are sent by a current physical machine; the first data block is a data block which is not transmitted by the current physical machine, and the second data block is a data block which is transmitted by the current physical machine;
determining a second data block according to each third data block included in the locally stored data copy, a third data index corresponding to each third data block and the second data index;
storing the first data block and the second data block.
Preferably, further comprising: and performing the following target quantity of data copy copying on the stored first data block and the second data block:
wherein number is used for representing the target number of data copy copying performed on the first data block and the second data block; the request _ dead _ lock is used for representing the quantity of service blockage caused by data competition of the first data block and the second data block by a multi-user request; the all _ request is used for representing the concurrent access amount corresponding to the target type service; a is used for characterizing the scale factor and is a known constant; init _ size is used to characterize the initialized number of copies of data.
In a third aspect, an embodiment of the present invention further provides a physical machine, including:
the configuration unit is used for respectively configuring corresponding sampling periods for different types of services;
the device comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is used for collecting data required to be transmitted by a target type service when a sampling period corresponding to the target type service is reached;
the dividing unit is used for dividing the data needing to be transmitted into at least one data block;
a determining unit, configured to determine a first data block that has not been transmitted and a second data block that has been transmitted in the at least one data block, and determine a first data index of the first data block and a second data index of the second data block;
a sending unit, configured to send the first data block, the first data index, and the second data index to a data center, so that the data center stores the at least one data block.
Preferably, the first and second electrodes are formed of a metal,
further comprising: a calculation unit for calculating the division length using a first formula;
the first formula includes:
k ( s i ) = p e r i o d ( s i ) s t a t u s ( s i ) * λ + d e f a t u l t _ s i z e
wherein, k(s)i) For characterising siA corresponding division length; siThe system is used for characterizing the ith service provided by the platform; period(s)i) For characterizing pairs siA configured sampling period; status(s)i) For characterising siEvaluating the state change granularity of (1); λ is used to represent the influence factor, and is a known constant; default _ size is used to characterize the default length of the data block;
the dividing unit is specifically configured to divide the data to be transmitted into the at least one data block by using the division length;
and/or the presence of a gas in the gas,
further comprising: the storage unit is used for locally storing the at least one data block and a data index corresponding to each data block in the at least one data block;
and/or the presence of a gas in the gas,
the determination unit includes:
the determining module is used for determining a first check code corresponding to each data block in each locally stored data block;
the calculation module is used for calculating a second check code corresponding to each data block in the at least one data block;
the traversing module is used for traversing the first check code according to the second check code; and taking the data block corresponding to the second check code in the first check code as the second data block, and taking the data block corresponding to the second check code which is not in the first check code as the first data block.
In a fourth aspect, an embodiment of the present invention further provides a data center, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first data block aiming at a target type service, a first data index corresponding to the first data block and a second data index corresponding to a second data block which are sent by a current physical machine; the first data block is a data block which is not transmitted by the current physical machine, and the second data block is a data block which is transmitted by the current physical machine;
a determining unit, configured to determine, according to each third data block included in the locally stored data copy and a third data index corresponding to each third data block, and according to the second data index, the second data block;
and the storage unit is used for storing the first data block and the second data block.
Preferably, further comprising: a copy unit, configured to copy the stored first data block and the second data block with the following target number of copies of data:
wherein number is used for representing the target number of data copy copying performed on the first data block and the second data block; the request _ dead _ lock is used for representing the quantity of service blockage caused by data competition of the first data block and the second data block by a multi-user request; the all _ request is used for representing the concurrent access amount corresponding to the target type service; a is used for characterizing the scale factor and is a known constant; init _ size is used to characterize the initialized number of copies of data.
In a fifth aspect, an embodiment of the present invention further provides a data synchronization system, including: the data center, and at least one physical machine.
The embodiment of the invention provides a data synchronization method, equipment and a system, when acquired data to be transmitted are transmitted, the data to be transmitted are divided into at least one data block, a first data block which is not transmitted in the at least one data block and a first data index thereof, and a second data index corresponding to a second data block which is transmitted are transmitted to a data center without transmitting the second data block, so that the data volume in the transmission process can be reduced, and the occupation amount of network bandwidth can be further reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a method provided by one embodiment of the present invention;
FIG. 2 is a flow diagram of another method provided by one embodiment of the present invention;
FIG. 3 is a flow chart of yet another method provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a physical machine according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a data center architecture provided by one embodiment of the present invention;
fig. 6 is a schematic structural diagram of a data synchronization system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a data synchronization method, which is applied to a physical machine and respectively configures corresponding sampling periods for different types of services; the method may comprise the steps of:
step 101: and when a sampling period corresponding to the target type service is reached, acquiring data required to be transmitted by the target type service.
New data may be generated at intervals on each physical machine and stored each time after it is generated. Some data have a requirement on real-time performance, and some data have no requirement on real-time performance, so that the real-time performance of the data needs to be judged.
In this embodiment, sampling periods may be respectively configured for different types of services, so as to perform data acquisition and transmission for the different types of services, thereby implementing classified storage of the services.
Step 102: and dividing the data required to be transmitted into at least one data block.
In order to reduce the occupation of network bandwidth, the data required to be transmitted is divided into at least one data block. For example, the data to be transmitted is 1M, and the data to be transmitted may be divided into 10 data blocks.
Step 103: determining a first data block which has not been transmitted and a second data block which has been transmitted in the at least one data block, and determining a first data index of the first data block and a second data index of the second data block.
In this embodiment, the data blocks that have been transmitted to the data center may be stored in a physical machine. And determining which of the at least one data block has been transmitted and which has not been transmitted by each data block that has been transmitted and is stored locally by the physical machine.
Step 104: and sending the first data block, the first data index and the second data index to a data center so that the data center stores the at least one data block.
In order to reduce the occupation of network bandwidth, only the first data block which is not transmitted can be transmitted, and the second data block which is transmitted can be transmitted without transmission, and only the second data index corresponding to the second data block is transmitted to the data center, so that the network occupation can be ensured to be reduced relative to the prior art.
According to the scheme of the embodiment, when the acquired data required to be transmitted is transmitted, the data required to be transmitted is divided into at least one data block, the first data block which is not transmitted in the at least one data block and the first data index thereof and the second data index corresponding to the second data block which is transmitted are transmitted to the data center, and the second data block is not required to be transmitted, so that the data volume in the transmission process can be reduced, and the occupation amount of the network bandwidth can be further reduced.
In one embodiment of the present invention, the division of the data to be transmitted can be achieved as follows: further comprising: calculating the division length by using a first formula;
the first formula includes:
k ( s i ) = p e r i o d ( s i ) s t a t u s ( s i ) * λ + d e f a t u l t _ s i z e
wherein, k(s)i) For characterising siA corresponding division length; siThe system is used for characterizing the ith service provided by the platform; period(s)i) For characterizing pairs siA configured sampling period; status(s)i) For characterising siEvaluating the state change granularity of (1); λ is used to represent the influence factor, and is a known constant; default _ size is used to characterize the default length of the data block; wherein status(s) in case the type of the service is determinedi) Are known parameters. Wherein the default size is set to ensure that the finally determined partition length is at least the default length.
Wherein the partition length may be a number of bytes, for example, 10 bytes; it may also be a specific space occupation, for example 100 KB.
Dividing the data to be transmitted into at least one data block, including: and dividing the data to be transmitted into the at least one data block by using the division length.
E.g. k(s)i) For 10 bytes, the data to be transmitted is 100 bytes, and the data to be transmitted can be divided into 10 data blocks, each of which has 10 bytes.
E.g., k(s)i) For 10 bytes, the data to be transmitted is 99 bytes, and the data to be transmitted can be divided into 10 data blocks, each of the first 9 data blocks has 10 bytes, and the last 1 data block has 9 bytes.
In an embodiment of the present invention, after the first data block, the first data index, and the second data index are sent to the data center, in order to ensure that the number of data blocks to be transmitted can be reduced when data block transmission is performed next time, after the first data block, the first data index, and the second data index are sent to the data center, the method further includes: and locally storing the at least one data block and the data index corresponding to each data block in the at least one data block.
In an embodiment of the present invention, determining a first data block that has not been transmitted and a second data block that has been transmitted in the at least one data block may include:
determining a first check code corresponding to each data block in each locally stored data block;
calculating a second check code corresponding to each data block in the at least one data block;
traversing the first check code according to the second check code; and taking the data block corresponding to the second check code in the first check code as the second data block, and taking the data block corresponding to the second check code which is not in the first check code as the first data block.
The calculation mode of the first check code corresponding to each data block in each locally stored data block is the same as the calculation mode of the second check code corresponding to each data block in the at least one data block.
For example, the at least one data block includes 10 data blocks, which are respectively: data block 1, data block 2, data block 3, … …, and data block 10, where the second parity code corresponding to each data block is: a1, A2, A3, … … and A10. Assuming that the check codes of a1, a2, and A3 are included in the first check code of the local storage, it can be determined that data block 1, data block 2, and data block 3 are data blocks that have been transmitted, and data block 4, data block 5, data block 6, … … data block 10 are data blocks that have not been transmitted.
In an embodiment of the present invention, the second parity code corresponding to each of the at least one data block may be calculated by the following method, and the technical method may include:
calculating a second check code corresponding to each data block in the at least one data block by using a second formula, a third formula and a fourth formula;
the second formula includes:
A ( 1 , k ) = ( Σ j = 1 k d a t a [ j ] ) mod M ;
the third formula includes:
B ( 1 , k ) = ( Σ j = 1 k ( k - j + 1 ) d a t a [ i ] ) mod M ;
the fourth formula includes:
Adler32(1,k)=A(1,k)+216B(1,k);
wherein Adler32(1, k) is used to characterize the second parity code; a (1, k) is used for characterizing the first intermediate parameter, B (1, k) is used for characterizing the second intermediate parameter; data [ j ] is used for representing data corresponding to the jth byte in the current data block; m is used to characterize a known constant; k is used to characterize the number of bytes included in the current data block.
If the second check code calculated according to the calculation method is different from the locally stored first check code, it indicates that the second data block corresponding to the different second check code has not been transmitted; if the second check code is identical to the locally stored first check code, it indicates that the second data block corresponding to the identical second check code may have been transmitted, and further checking needs to be performed using a unique check value, for example, an MD5 value, and if the result of further checking using an MD5 value is identical, it indicates that the second data block corresponding to the second check code has been transmitted, otherwise, it has not been transmitted.
In an embodiment of the present invention, the second check code may also be directly calculated by the unique check value to determine the first data block and the second data block.
Referring to fig. 2, an embodiment of the present invention further provides a data synchronization method applied to a data center, including:
step 201: acquiring a first data block aiming at a target type service, a first data index corresponding to the first data block and a second data index corresponding to a second data block, which are sent by a current physical machine; the first data block is a data block which is not transmitted by the current physical machine, and the second data block is a data block which is transmitted by the current physical machine.
Step 202: and determining the second data block according to each third data block included in the locally stored data copy, the third data index corresponding to each third data block and the second data index.
The data center locally stores data copies of all the data blocks, wherein the data copies are a general technology for improving data access efficiency, system fault tolerance capability and load balancing capability. The data copy not only includes each third data block, but also includes a third data index corresponding to each third data block, and the data index that is the same as the second data index is searched in the third data index, and the corresponding data block is determined by using the same data index, and the determined data block is the second data block that has been transmitted.
Step 203: storing the first data block and the second data block.
In this embodiment, a service S may also be setiThe threshold of the corresponding data amount is, for example, 100, and when the number of data blocks included in the service reaches 100, the service is executedAnd storing the data blocks corresponding to the services in the cloud storage.
Further, the data center stores the data needing real-time property into a real-time database and stores the data not needing real-time property into a common database by judging the real-time property requirement of the data.
In an embodiment of the present invention, since an unreasonable number of data copies may cause storage pressure of a platform and waste a large amount of storage space, the number of data copies may be determined by the following method, further including: and performing the following target quantity of data copy copying on the stored first data block and the second data block:
wherein number is used for representing the target number of data copy copying performed on the first data block and the second data block; the request _ dead _ lock is used for representing the quantity of service blockage caused by data competition of the first data block and the second data block by a multi-user request; the all _ request is used for representing the concurrent access amount corresponding to the target type service; a is used for characterizing the scale factor and is a known constant; init _ size is used to characterize the initialized number of copies of data.
Taking a platform providing services as a PAAS platform as an example, the PAAS platform includes at least one physical machine and a data center, and the data synchronization process is described in detail by interaction between one physical machine and the data center, please refer to fig. 3, where the method may include the following steps:
step 301: and respectively configuring corresponding sampling periods for different types of services by the physical machine.
New data may be generated at intervals on each physical machine and stored each time after it is generated.
Assuming that 4 service types are included, the four service types are configured as shown in table 1 below:
table 1:
type of service Sampling period/time(s)
Type 1 60
Type 2 180
Type 3 300
Type 4 90
In this embodiment, different acquisition units may be respectively utilized to perform data acquisition on the four types of services, and each acquisition unit performs acquisition of service data of a corresponding type each time a corresponding sampling period is reached according to the sampling period configured in table 1.
In this embodiment, the user also needs to register monitoring information, such as a monitoring index, a monitoring object, a monitoring mode, and the like, with the Agent of the data center.
In order to provide better expansibility, an Agent in a physical machine provides two expansion modes: the first is that the monitoring script is transmitted to Agent by user, and the Agent calls the script running monitoring module to collect service state information. The second mode is that the aim of obtaining the service state information can be achieved by calling the API of the Agent and expanding the API.
Step 302: and when a sampling period corresponding to the target type service is reached, acquiring data required to be transmitted by the target type service.
In this embodiment, the physical machine may include a plurality of virtual machines, and each virtual machine may be configured with an Agent, and the Agent may implement data collection and processing.
Step 303: the partition length is calculated according to the service type.
In the present embodiment, the division length can be calculated by the following formula (1):
k ( s i ) = p e r i o d ( s i ) s t a t u s ( s i ) * λ + d e f a t u l t _ s i z e - - - ( 1 )
wherein, k(s)i) For characterising siA corresponding division length; siThe system is used for characterizing the ith service provided by the platform; period(s)i) For characterizing pairs siA configured sampling period; status(s)i) For characterising siEvaluating the state change granularity of (1); λ is used to represent the influence factor, and is a known constant; default size is used to characterize the default length of the data block.
Wherein status(s) in case the type of the service is determinedi) Are known parameters.
Wherein the default size is set to ensure that the finally determined partition length is at least the default length. For example, the minimum default length is 5 bytes.
Step 304: and dividing the data to be transmitted into m data blocks by using the calculated division length.
Assume that the m data blocks into which data required for transmission by the target type service are divided are as follows:wherein,
step 305: and acquiring each locally stored data block.
The data index corresponding to each data block is stored in the local area of the physical machine.
Step 306: and determining a first check code corresponding to each data block in the locally stored data blocks.
Step 307: and calculating a second check code corresponding to each data block in the m data blocks by using a calculation mode of the first check code, and determining a first data block which is not transmitted and a second data block which is transmitted in the m data blocks according to the first check code and the second check code.
In this embodiment, in order to ensure consistency of the check codes, the second check code needs to be calculated by using a calculation method of the first check code.
The calculation method may be to directly calculate a unique check value of each data block, and the unique check value may be an MD5 value. For example, the second check codes respectively corresponding to the m data blocks are: a1, A2, A3, … … and Am. The first check code stored locally comprises: a1, a2 and A3, then data block 1, data block 2 and data block 3 are determined to be data blocks that have already been transmitted, data block 1, data block 2 and data block 3 are taken as second data blocks, and data block 4, data block 5, data block 6, … … and data block m are taken as first data blocks that have not been transmitted.
Because the calculation mode of the unique check value is complex and the time consumption is long, the second check code can be calculated in the following mode:
in an embodiment of the present invention, the second parity code corresponding to each of the at least one data block may be calculated by:
calculating a second check code corresponding to each data block in the m data blocks by using a formula (2), a formula (3) and a formula (4);
A ( 1 , k ) = ( Σ j = 1 k d a t a [ j ] ) mod M - - - ( 2 )
B ( 1 , k ) = ( Σ j = 1 k ( k - j + 1 ) d a t a [ i ] ) mod M - - - ( 3 )
Adler32(1,k)=A(1,k)+216B(1,k) (4)
wherein Adler32(1, k) is used to characterize the second parity code; a (1, k) is used for characterizing the first intermediate parameter, B (1, k) is used for characterizing the second intermediate parameter; data [ j ] is used for representing data corresponding to the jth byte in the current data block; m is used to characterize a known constant; k is used to characterize the number of bytes included in the current data block.
If the second check code calculated according to the calculation method is different from the locally stored first check code, it indicates that the second data block corresponding to the different second check code has not been transmitted; assuming that the second parity codes corresponding to the m data blocks are respectively: a1, A2, A3, … … and Am. A1, a2 and A3 are all located in the first check code stored locally, so a4, a5, a6, … … and Am have not been transmitted, and data block 4, data block 5, data block 6, … … and data block m can be determined as the second data block.
If the second parity check code is the same as the locally stored first parity check code, it indicates that the second data block corresponding to the same second parity check code may have been transmitted, and further parity check using a unique parity check value, for example, the MD5 value, is required. The data block 1, the data block 2 and the data block 3 need to be further checked, MD5 values corresponding to the data block 1, the data block 2 and the data block 3 respectively are calculated, MD5 values corresponding to three data blocks of which the first check codes stored locally are a1, a2 and A3 respectively are calculated, and if MD5 values of the corresponding data blocks are the same, it is indicated that the data blocks have been transmitted; otherwise, the data block has not been transmitted. For example, a1, a2, and A3 have all transmitted, and data block 1, data block 2, and data block 3 are determined as the first data blocks.
Step 308: and determining a first data index corresponding to the first data block, and determining a second data index corresponding to the second data block.
Wherein the first data index needs to be generated from the first data block.
The second data index may be determined from a locally stored index.
Step 309: and sending the first data block, the first data index and the second data index to a data center, and locally storing the first data block and the first data index.
In order to reduce the occupation of network bandwidth, only the first data block which is not transmitted can be transmitted, and the second data block which is transmitted can be transmitted without transmission, and only the second data index corresponding to the second data block is transmitted to the data center, so that the network occupation can be ensured to be reduced relative to the prior art.
In this embodiment, after the first data block, the first data index, and the second data index are sent to the data center, in order to ensure that the number of data blocks to be transmitted can be reduced when data block transmission is performed next time, the method may further include: and locally storing the at least one data block and the data index corresponding to each data block in the at least one data block. The stored data may further include timestamps corresponding to the first data block and the second data block.
In this embodiment, the first data block, the first data index, and the second data index may also be compressed, and the compressed data packet may be sent to the data center.
Step 310: and the data center determines the second data block according to the second data index and the third data indexes corresponding to the third data blocks and the third data indexes included in the stored data copy.
Since the second data block has already been transmitted to the data center, the data center stores the second data block and the second data index.
Wherein the second data block may be obtained by a data copy.
The data center locally stores data copies of all the data blocks, wherein the data copies are a general technology for improving data access efficiency, system fault tolerance capability and load balancing capability. The data copy not only includes each third data block, but also includes a third data index corresponding to each third data block, and the data index that is the same as the second data index is searched in the third data index, and the corresponding data block is determined by using the same data index, and the determined data block is the second data block that has been transmitted.
Step 311: and determining whether the number of the first data block and the second data block is larger than a threshold value according to the set threshold value corresponding to the target type service, if so, storing the first data block and the second data block in cloud storage, and if not, storing the first data block and the second data block in a real-time database.
Further, a service S may also be setiThe threshold of the corresponding data amount is, for example, 100, and when the number of data blocks included in the service reaches 100, the data block corresponding to the service is stored in the cloud storage.
Step 312: and copying the data copy of the first data block and the second data block.
Unreasonable number of data copies may cause storage pressure of the platform and waste a large amount of storage space, and therefore, the number of data copies may be determined as follows:
the number of copies of the data may be determined by the following equation (5):
wherein number is used for representing the target number of data copy copying performed on the first data block and the second data block; the request _ dead _ lock is used for representing the quantity of service blockage caused by data competition of the first data block and the second data block by a multi-user request; the all _ request is used for representing the concurrent access amount corresponding to the target type service; a is used for characterizing the scale factor and is a known constant; init _ size is used to characterize the initialized number of copies of data.
Referring to fig. 4, an embodiment of the present invention further provides a physical machine, which may include:
a configuration unit 401, configured to configure corresponding sampling periods for different types of services, respectively;
an acquisition unit 402, configured to acquire data to be transmitted by a target type service when a sampling period corresponding to the target type service is reached;
a dividing unit 403, configured to divide the data to be transmitted into at least one data block;
a determining unit 404, configured to determine a first data block that has not been transmitted and a second data block that has been transmitted in the at least one data block, and determine a first data index of the first data block and a second data index of the second data block;
a sending unit 405, configured to send the first data block, the first data index, and the second data index to a data center, so that the data center stores the at least one data block.
In one embodiment of the present invention, the method may further include: a calculation unit for calculating the division length using a first formula;
the first formula includes:
k ( s i ) = p e r i o d ( s i ) s t a t u s ( s i ) * λ + d e f a t u l t _ s i z e
wherein, k(s)i) By usingIs characterized by siA corresponding division length; siThe system is used for characterizing the ith service provided by the platform; period(s)i) For characterizing pairs siA configured sampling period; status(s)i) For characterising siEvaluating the state change granularity of (1); λ is used to represent the influence factor, and is a known constant; default _ size is used to characterize the default length of the data block;
the dividing unit is specifically configured to divide the data to be transmitted into the at least one data block by using the division length;
in one embodiment of the present invention, the method may further include: the storage unit is used for locally storing the at least one data block and a data index corresponding to each data block in the at least one data block;
in an embodiment of the present invention, the determining unit includes:
the determining module is used for determining a first check code corresponding to each data block in each locally stored data block;
the calculation module is used for calculating a second check code corresponding to each data block in the at least one data block;
the traversing module is used for traversing the first check code according to the second check code; and taking the data block corresponding to the second check code in the first check code as the second data block, and taking the data block corresponding to the second check code which is not in the first check code as the first data block.
Referring to fig. 5, an embodiment of the present invention further provides a data center, which may include:
an obtaining unit 501, configured to obtain a first data block, a first data index corresponding to the first data block, and a second data index corresponding to a second data block, which are sent by a current physical machine and are for a target type service; the first data block is a data block which is not transmitted by the current physical machine, and the second data block is a data block which is transmitted by the current physical machine;
a determining unit 502, configured to determine, according to each third data block included in the locally stored data copy, and a third data index corresponding to each third data block, and according to the second data index, the second data block;
a storage unit 503, configured to store the first data block and the second data block.
In one embodiment of the present invention, the method may further include: a copy unit, configured to copy the stored first data block and the second data block with the following target number of copies of data:
wherein number is used for representing the target number of data copy copying performed on the first data block and the second data block; the request _ dead _ lock is used for representing the quantity of service blockage caused by data competition of the first data block and the second data block by a multi-user request; the all _ request is used for representing the concurrent access amount corresponding to the target type service; a is used for characterizing the scale factor and is a known constant; init _ size is used to characterize the initialized number of copies of data.
Referring to fig. 6, an embodiment of the present invention further provides a data synchronization system, which may include: a data center 50 as described above, and at least one physical machine 40 as described above.
In summary, the embodiments of the present invention can at least achieve the following beneficial effects:
1. in the embodiment of the invention, when the acquired data required to be transmitted is transmitted, the data required to be transmitted is divided into at least one data block, and a first data block which is not transmitted in the at least one data block and a first data index thereof and a second data index corresponding to a second data block which is transmitted are transmitted to the data center without transmitting the second data block, so that the data volume in the transmission process can be reduced, and the occupation amount of the network bandwidth can be further reduced.
2. In the embodiment of the invention, the Agent in the physical machine can provide flexible extension function, and a user can dynamically extend the data monitoring and acquisition of the Agent in an interface extension or script language calling mode. The user does not need to change the whole platform system because the customized service is introduced.
3. In the embodiment of the invention, the service-oriented multi-copy data storage strategy not only reduces the redundancy quantity of the data copies but also improves the concurrent access degree of the system on the basis of ensuring the real-time storage access of the data.
Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the inclusion of an element by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A data synchronization method is characterized in that the method is applied to a physical machine, and corresponding sampling periods are respectively configured for different types of services; the method comprises the following steps:
when a sampling period corresponding to a target type service is reached, acquiring data required to be transmitted by the target type service;
dividing the data to be transmitted into at least one data block;
determining a first data block which has not been transmitted and a second data block which has been transmitted in the at least one data block, and determining a first data index of the first data block and a second data index of the second data block;
and sending the first data block, the first data index and the second data index to a data center so that the data center stores the at least one data block.
2. The method of claim 1,
before dividing the data to be transmitted into at least one data block, further comprising: calculating the division length by using a first formula;
the first formula includes:
k ( s i ) = p e r i o d ( s i ) s t a t u s ( s i ) * λ + d e f a t u l t _ s i z e
wherein, k(s)i) For characterising siA corresponding division length; siThe system is used for characterizing the ith service provided by the platform; period(s)i) For characterizing pairs siA configured sampling period; status (si) For characterising siEvaluating the state change granularity of (1); λ is used to represent the influence factor, and is a known constant; default _ size is used to characterize the default length of the data block;
dividing the data to be transmitted into at least one data block, including: dividing the data to be transmitted into the at least one data block by using the division length;
and/or the presence of a gas in the gas,
after sending the first data block, the first data index, and the second data index to a data center, further comprising: locally storing the at least one data block and a data index corresponding to each data block in the at least one data block;
and/or the presence of a gas in the gas,
the determining that a first data block has not been transmitted and a second data block has been transmitted in the at least one data block includes:
determining a first check code corresponding to each data block in each locally stored data block;
calculating a second check code corresponding to each data block in the at least one data block;
traversing the first check code according to the second check code; and taking the data block corresponding to the second check code in the first check code as the second data block, and taking the data block corresponding to the second check code which is not in the first check code as the first data block.
3. The method of claim 2, wherein the calculating the second parity code corresponding to each of the at least one data block comprises:
calculating a second check code corresponding to each data block in the at least one data block by using a second formula, a third formula and a fourth formula;
the second formula includes:
A ( 1 , k ) = ( Σ j = 1 k d a t a [ j ] ) mod M ;
the third formula includes:
B ( 1 , k ) = ( Σ j = 1 k ( k - j + 1 ) d a t a [ i ] ) mod M ;
the fourth formula includes:
Adler32(1,k)=A(1,k)+216B(1,k);
wherein Adler32(1, k) is used to characterize the second parity code; a (1, k) is used for characterizing the first intermediate parameter, B (1, k) is used for characterizing the second intermediate parameter; data [ j ] is used for representing data corresponding to the jth byte in the current data block; m is used to characterize a known constant; k is used to characterize the number of bytes included in the current data block.
4. A data synchronization method is applied to a data center and comprises the following steps:
acquiring a first data block aiming at a target type service, a first data index corresponding to the first data block and a second data index corresponding to a second data block, which are sent by a current physical machine; the first data block is a data block which is not transmitted by the current physical machine, and the second data block is a data block which is transmitted by the current physical machine;
determining a second data block according to each third data block included in the locally stored data copy, a third data index corresponding to each third data block and the second data index;
storing the first data block and the second data block.
5. The method of claim 4, further comprising: and performing the following target quantity of data copy copying on the stored first data block and the second data block:
wherein number is used for representing the target number of data copy copying performed on the first data block and the second data block; the request _ dead _ lock is used for representing the quantity of service blockage caused by data competition of the first data block and the second data block by a multi-user request; the all _ request is used for representing the concurrent access amount corresponding to the target type service; a is used for characterizing the scale factor and is a known constant; init _ size is used to characterize the initialized number of copies of data.
6. A physical machine, comprising:
the configuration unit is used for respectively configuring corresponding sampling periods for different types of services;
the device comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is used for collecting data required to be transmitted by a target type service when a sampling period corresponding to the target type service is reached;
the dividing unit is used for dividing the data needing to be transmitted into at least one data block;
a determining unit, configured to determine a first data block that has not been transmitted and a second data block that has been transmitted in the at least one data block, and determine a first data index of the first data block and a second data index of the second data block;
a sending unit, configured to send the first data block, the first data index, and the second data index to a data center, so that the data center stores the at least one data block.
7. The physical machine of claim 6,
further comprising: a calculation unit for calculating the division length using a first formula;
the first formula includes:
k ( s i ) = p e r i o d ( s i ) s t a t u s ( s i ) * λ + d e f a t u l t _ s i z e
wherein, k(s)i) For characterising siA corresponding division length; siThe system is used for characterizing the ith service provided by the platform; period(s)i) For characterizing pairs siA configured sampling period; status(s)i) For characterising siEvaluating the state change granularity of (1); λ is used to represent the influence factor, and is a known constant; default _ size is used to characterize the default length of the data block;
the dividing unit is specifically configured to divide the data to be transmitted into the at least one data block by using the division length;
and/or the presence of a gas in the gas,
further comprising: the storage unit is used for locally storing the at least one data block and a data index corresponding to each data block in the at least one data block;
and/or the presence of a gas in the gas,
the determination unit includes:
the determining module is used for determining a first check code corresponding to each data block in each locally stored data block;
the calculation module is used for calculating a second check code corresponding to each data block in the at least one data block;
the traversing module is used for traversing the first check code according to the second check code; and taking the data block corresponding to the second check code in the first check code as the second data block, and taking the data block corresponding to the second check code which is not in the first check code as the first data block.
8. A data center, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first data block aiming at a target type service, a first data index corresponding to the first data block and a second data index corresponding to a second data block which are sent by a current physical machine; the first data block is a data block which is not transmitted by the current physical machine, and the second data block is a data block which is transmitted by the current physical machine;
a determining unit, configured to determine, according to each third data block included in the locally stored data copy and a third data index corresponding to each third data block, and according to the second data index, the second data block;
and the storage unit is used for storing the first data block and the second data block.
9. The data center of claim 8, further comprising: a copy unit, configured to copy the stored first data block and the second data block with the following target number of copies of data:
wherein number is used for representing the target number of data copy copying performed on the first data block and the second data block; the request _ dead _ lock is used for representing the quantity of service blockage caused by data competition of the first data block and the second data block by a multi-user request; the all _ request is used for representing the concurrent access amount corresponding to the target type service; a is used for characterizing the scale factor and is a known constant; init _ size is used to characterize the initialized number of copies of data.
10. A data synchronization system, comprising: a data centre as claimed in claim 8 or 9 above and at least one physical machine as claimed in claim 6 or 7 above.
CN201610451188.0A 2016-06-21 2016-06-21 A kind of method of data synchronization, equipment and system Active CN106209974B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610451188.0A CN106209974B (en) 2016-06-21 2016-06-21 A kind of method of data synchronization, equipment and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610451188.0A CN106209974B (en) 2016-06-21 2016-06-21 A kind of method of data synchronization, equipment and system

Publications (2)

Publication Number Publication Date
CN106209974A true CN106209974A (en) 2016-12-07
CN106209974B CN106209974B (en) 2019-03-12

Family

ID=57460677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610451188.0A Active CN106209974B (en) 2016-06-21 2016-06-21 A kind of method of data synchronization, equipment and system

Country Status (1)

Country Link
CN (1) CN106209974B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107070740A (en) * 2017-03-11 2017-08-18 郑州云海信息技术有限公司 A kind of efficient PAAS platform monitoring methods and system
CN107241447A (en) * 2017-07-31 2017-10-10 广东欧珀移动通信有限公司 Data syn-chronization management-control method, device, storage medium and electronic equipment
CN107357746A (en) * 2017-07-26 2017-11-17 郑州云海信息技术有限公司 A kind of communication means and system
CN109733444A (en) * 2018-09-19 2019-05-10 比亚迪股份有限公司 Database Systems and train supervision management equipment
WO2019157881A1 (en) * 2018-02-13 2019-08-22 论客科技(广州)有限公司 Method and device for mail synchronization, and computer-readable storage medium
CN113364555A (en) * 2020-03-04 2021-09-07 英飞凌科技股份有限公司 Device, controller for device and method of communication

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101355588B (en) * 2008-09-08 2012-08-01 创新科存储技术(深圳)有限公司 Data transmission method and transmission terminal base on peer-to-peer network
CN102436478B (en) * 2011-10-12 2013-06-19 浪潮(北京)电子信息产业有限公司 System and method for accessing massive data
CN103873503A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 Data block backup system and method
CN104063377A (en) * 2013-03-18 2014-09-24 联想(北京)有限公司 Information processing method and electronic equipment using same
CN104348884A (en) * 2013-08-08 2015-02-11 中国科学院计算机网络信息中心 Cloud storage automatic synchronization method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101355588B (en) * 2008-09-08 2012-08-01 创新科存储技术(深圳)有限公司 Data transmission method and transmission terminal base on peer-to-peer network
CN102436478B (en) * 2011-10-12 2013-06-19 浪潮(北京)电子信息产业有限公司 System and method for accessing massive data
CN103873503A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 Data block backup system and method
CN104063377A (en) * 2013-03-18 2014-09-24 联想(北京)有限公司 Information processing method and electronic equipment using same
CN104348884A (en) * 2013-08-08 2015-02-11 中国科学院计算机网络信息中心 Cloud storage automatic synchronization method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107070740A (en) * 2017-03-11 2017-08-18 郑州云海信息技术有限公司 A kind of efficient PAAS platform monitoring methods and system
CN107357746A (en) * 2017-07-26 2017-11-17 郑州云海信息技术有限公司 A kind of communication means and system
CN107241447A (en) * 2017-07-31 2017-10-10 广东欧珀移动通信有限公司 Data syn-chronization management-control method, device, storage medium and electronic equipment
WO2019157881A1 (en) * 2018-02-13 2019-08-22 论客科技(广州)有限公司 Method and device for mail synchronization, and computer-readable storage medium
CN109733444A (en) * 2018-09-19 2019-05-10 比亚迪股份有限公司 Database Systems and train supervision management equipment
CN113364555A (en) * 2020-03-04 2021-09-07 英飞凌科技股份有限公司 Device, controller for device and method of communication

Also Published As

Publication number Publication date
CN106209974B (en) 2019-03-12

Similar Documents

Publication Publication Date Title
CN106209974B (en) A kind of method of data synchronization, equipment and system
CN111614690B (en) Abnormal behavior detection method and device
CN108829581B (en) Application program testing method and device, computer equipment and storage medium
CN112800095B (en) Data processing method, device, equipment and storage medium
CN106815254B (en) Data processing method and device
CN110445939B (en) Capacity resource prediction method and device
CN111385142B (en) Kubernetes-based adaptive web container stretching method
CN109597800B (en) Log distribution method and device
CN113760640A (en) Monitoring log processing method, device, equipment and storage medium
CN114095567A (en) Data access request processing method and device, computer equipment and medium
CN109428760B (en) User credit evaluation method based on operator data
US10313457B2 (en) Collaborative filtering in directed graph
CN111984733A (en) Data transmission method and device based on block chain and storage medium
CN110300011B (en) Alarm root cause positioning method, device and computer readable storage medium
CN111260419A (en) Method and device for acquiring user attribute, computer equipment and storage medium
CN104735063B (en) A kind of safe evaluating method for cloud infrastructure
CN112749166A (en) Service data processing method, device, equipment and storage medium
CN111209159A (en) Information processing method, device, equipment and storage medium
CN113162960A (en) Data processing method, device, equipment and medium
CN111598390B (en) Method, device, equipment and readable storage medium for evaluating high availability of server
CN114285786A (en) Method and device for constructing network link library
CN110489568B (en) Method and device for generating event graph, storage medium and electronic equipment
CN114996010B (en) Intelligent service guaranteeing method oriented to mobile edge environment
CN104506663B (en) A kind of intelligent cloud computing operation management system
CN112636976B (en) Service quality determination method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200522

Address after: 250100 No. 1036 Tidal Road, Jinan High-tech Zone, Shandong Province, S01 Building, Tidal Science Park

Patentee after: Tidal Cloud Information Technology Co.,Ltd.

Address before: 250100 Ji'nan high tech Zone, Shandong, No. 1036 wave road

Patentee before: INSPUR ELECTRONIC INFORMATION INDUSTRY Co.,Ltd.

TR01 Transfer of patent right
CP01 Change in the name or title of a patent holder

Address after: 250100 No. 1036 Tidal Road, Jinan High-tech Zone, Shandong Province, S01 Building, Tidal Science Park

Patentee after: Inspur cloud Information Technology Co., Ltd

Address before: 250100 No. 1036 Tidal Road, Jinan High-tech Zone, Shandong Province, S01 Building, Tidal Science Park

Patentee before: Tidal Cloud Information Technology Co.,Ltd.

CP01 Change in the name or title of a patent holder