CN106209974A - A kind of method of data synchronization, equipment and system - Google Patents
A kind of method of data synchronization, equipment and system Download PDFInfo
- Publication number
- CN106209974A CN106209974A CN201610451188.0A CN201610451188A CN106209974A CN 106209974 A CN106209974 A CN 106209974A CN 201610451188 A CN201610451188 A CN 201610451188A CN 106209974 A CN106209974 A CN 106209974A
- Authority
- CN
- China
- Prior art keywords
- data
- data block
- represent
- block
- transmitted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/565—Conversion or adaptation of application format or content
- H04L67/5651—Reducing the amount or size of exchanged application data
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供了一种数据同步方法、设备及系统,方法包括:针对不同类型的服务分别配置相应的采样周期;在到达目标类型服务对应的采样周期时,采集所述目标类型服务所需传输的数据;将所述所需传输的数据划分为至少一个数据块;确定所述至少一个数据块中未曾传输过的第一数据块、已经传输过的第二数据块,以及确定所述第一数据块的第一数据索引、所述第二数据块的第二数据索引;将所述第一数据块、所述第一数据索引和所述第二数据索引,发送至数据中心,以使所述数据中心存储所述至少一个数据块。根据本方案,可以减少传输过程中的数据量,进而可以降低网络带宽的占用量。
The present invention provides a data synchronization method, device and system. The method includes: respectively configuring corresponding sampling periods for different types of services; data; divide the data to be transmitted into at least one data block; determine the first data block that has never been transmitted in the at least one data block, the second data block that has been transmitted, and determine the first data block The first data index of the block, the second data index of the second data block; sending the first data block, the first data index and the second data index to the data center, so that the The data center stores the at least one data block. According to the solution, the amount of data in the transmission process can be reduced, thereby reducing the occupation of network bandwidth.
Description
技术领域technical field
本发明涉及云计算技术领域,特别涉及一种数据同步方法、设备及系统。The present invention relates to the technical field of cloud computing, in particular to a data synchronization method, device and system.
背景技术Background technique
云计算技术利用了虚拟化技术将计算、存储、网络等资源进行池化,通过互联网将共享的软硬件以服务的方式提供给用户。其中,PAAS(Platform as a Service,平台即服务)平台作为云计算的一种服务类型,将软件部署、运维作为一种服务按需提供给软件开发用户,成为近年来非常热门的研究方向。Cloud computing technology uses virtualization technology to pool computing, storage, network and other resources, and provides shared software and hardware to users in the form of services through the Internet. Among them, PAAS (Platform as a Service, Platform as a Service) platform, as a service type of cloud computing, provides software deployment and operation and maintenance as a service to software development users on demand, and has become a very popular research direction in recent years.
在PAAS平台中,为了实现系统服务以及各类订制化服务,需要对各个物理机上产生的数据进行同步存储。在对各个物理机上产生的数据进行实时同步时,物理机通过采集到自身产生的数据,将该数据通过网络资源发送给数据中心,由数据中心将该数据存储到实时数据库中。In the PAAS platform, in order to realize system services and various customized services, the data generated on each physical machine needs to be stored synchronously. When synchronizing the data generated on each physical machine in real time, the physical machine collects the data generated by itself, sends the data to the data center through network resources, and the data center stores the data in the real-time database.
网络资源对于PAAS平台来说异常珍贵,且现有技术中各个物理机上会产生大量的数据,导致网络带宽的占用量较大。Network resources are extremely precious to the PAAS platform, and in the prior art, a large amount of data will be generated on each physical machine, resulting in a large occupation of network bandwidth.
发明内容Contents of the invention
本发明实施例提供了一种数据同步方法、设备及系统,以降低网络带宽的占用量。The embodiment of the present invention provides a data synchronization method, device and system, so as to reduce the occupation of network bandwidth.
第一方面,本发明实施例提供了一种数据同步方法,应用于物理机,针对不同类型的服务分别配置相应的采样周期;所述方法包括:In the first aspect, the embodiment of the present invention provides a data synchronization method, which is applied to a physical machine and configures corresponding sampling periods for different types of services; the method includes:
在到达目标类型服务对应的采样周期时,采集所述目标类型服务所需传 输的数据;When the sampling period corresponding to the target type service is reached, collect the data that the target type service needs to transmit;
将所述所需传输的数据划分为至少一个数据块;dividing the data to be transmitted into at least one data block;
确定所述至少一个数据块中未曾传输过的第一数据块、已经传输过的第二数据块,以及确定所述第一数据块的第一数据索引、所述第二数据块的第二数据索引;Determining the first data block that has not been transmitted and the second data block that has been transmitted in the at least one data block, and determining the first data index of the first data block and the second data of the second data block index;
将所述第一数据块、所述第一数据索引和所述第二数据索引,发送至数据中心,以使所述数据中心存储所述至少一个数据块。sending the first data block, the first data index, and the second data index to a data center, so that the data center stores the at least one data block.
优选地,Preferably,
在将所述所需传输的数据划分为至少一个数据块之前,进一步包括:利用第一公式计算划分长度;Before dividing the data to be transmitted into at least one data block, it further includes: using a first formula to calculate the division length;
所述第一公式包括:The first formula includes:
其中,k(Si)用于表征Si对应的划分长度;Si用于表征平台提供的第i个服务;period(si)用于表征对Si配置的采样周期;status(si)用于表征si的状态变化粒度评估;λ用于表示影响因子,为已知常数;defatult_size用于表征数据块的默认长度;Among them, k(S i ) is used to represent the division length corresponding to S i ; S i is used to represent the i-th service provided by the platform; period(s i ) is used to represent the sampling period configured for S i ; status(s i ) is used to characterize the state change granularity evaluation of s i ; λ is used to represent the impact factor, which is a known constant; default_size is used to represent the default length of the data block;
将所述所需传输的数据划分为至少一个数据块,包括:利用所述划分长度,将所需传输的数据划分为所述至少一个数据块;Dividing the data to be transmitted into at least one data block includes: dividing the data to be transmitted into the at least one data block by using the division length;
和/或,and / or,
在将所述第一数据块、所述第一数据索引和所述第二数据索引,发送至数据中心之后,进一步包括:将所述至少一个数据块,以及所述至少一个数据块中每一个数据块对应的数据索引进行本地存储;After sending the first data block, the first data index, and the second data index to the data center, further comprising: sending the at least one data block, and each of the at least one data block The data index corresponding to the data block is stored locally;
和/或,and / or,
所述确定所述至少一个数据块中未曾传输过的第一数据块、已经传输过 的第二数据块,包括:The determination of the first data block that has never been transmitted and the second data block that has been transmitted in the at least one data block includes:
确定本地存储的各个数据块中每一个数据块对应的第一校验码;Determine the first check code corresponding to each data block in each data block stored locally;
计算所述至少一个数据块中每一个数据块对应的第二校验码;calculating a second check code corresponding to each data block in the at least one data block;
根据所述第二校验码,遍历所述第一校验码;将位于所述第一校验码中的所述第二校验码对应的数据块作为所述第二数据块,将未位于所述第一校验码中的所述第二校验码对应的数据块作为所述第一数据块。Traverse the first check code according to the second check code; use the data block corresponding to the second check code in the first check code as the second data block, and A data block corresponding to the second check code in the first check code is used as the first data block.
优选地,所述计算所述至少一个数据块中每一个数据块对应的第二校验码,包括:Preferably, the calculating the second check code corresponding to each data block in the at least one data block includes:
利用第二公式、第三公式和第四公式计算所述至少一个数据块中每一个数据块对应的第二校验码;calculating a second check code corresponding to each data block in the at least one data block by using the second formula, the third formula and the fourth formula;
所述第二公式包括:The second formula includes:
所述第三公式包括:The third formula includes:
所述第四公式包括:The fourth formula includes:
Adler32(1,k)=A(1,k)+216B(1,k);Adler32(1,k)=A(1,k)+2 16 B(1,k);
其中,Adler32(1,k)用于表征第二校验码;A(1,k)用于表征第一中间参数,B(1,k)用于表征第二中间参数;data[j]用于表征当前数据块中的第j个字节对应的数据;M用于表征已知常数;k用于表征该当前数据块包括的字节数。Among them, Adler32(1,k) is used to represent the second check code; A(1,k) is used to represent the first intermediate parameter, B(1,k) is used to represent the second intermediate parameter; data[j] is used to represent is used to represent the data corresponding to the jth byte in the current data block; M is used to represent a known constant; k is used to represent the number of bytes included in the current data block.
第二方面,本发明实施例还提供了一种数据同步方法,应用于数据中心,包括:In the second aspect, the embodiment of the present invention also provides a data synchronization method applied to a data center, including:
获取当前物理机发送的针对目标类型服务的第一数据块、所述第一数据块对应的第一数据索引、第二数据块对应的第二数据索引;其中,所述第一数据块为所述当前物理机未曾传输过的数据块,所述第二数据块为所述当前 物理机已经传输过的数据块;Obtain the first data block for the target type service sent by the current physical machine, the first data index corresponding to the first data block, and the second data index corresponding to the second data block; wherein, the first data block is the A data block that has never been transmitted by the current physical machine, and the second data block is a data block that has been transmitted by the current physical machine;
根据本地存储的数据副本中包括的各个第三数据块、以及每一个所述第三数据块对应的第三数据索引,以及根据所述第二数据索引,确定出所述第二数据块;determining the second data block according to each third data block included in the locally stored data copy, and a third data index corresponding to each of the third data blocks, and according to the second data index;
存储所述第一数据块和所述第二数据块。The first data block and the second data block are stored.
优选地,进一步包括:对存储的所述第一数据块和所述第二数据块,进行如下目标数量的数据副本复制:Preferably, it further includes: performing the following target number of data copies on the stored first data block and the second data block:
其中,number用于表征对所述第一数据块和所述第二数据块进行数据副本复制的所述目标数量;request_dead_lock用于表征多用户请求对所述第一数据块和所述第二数据块的数据竞争造成服务阻塞的数量;all_request用于表征所述目标类型服务对应的并发访问量;a用于表征比例因子,为已知常数;init_size用于表征数据副本的初始化数量。Among them, number is used to represent the target number of data copy replication for the first data block and the second data block; request_dead_lock is used to represent the multi-user request for the first data block and the second data block The number of service blocks caused by block data competition; all_request is used to represent the concurrent access amount corresponding to the target type of service; a is used to represent the scaling factor, which is a known constant; init_size is used to represent the initialization number of data copies.
第三方面,本发明实施例还提供了一种物理机,包括:In a third aspect, the embodiment of the present invention also provides a physical machine, including:
配置单元,用于针对不同类型的服务分别配置相应的采样周期;The configuration unit is configured to respectively configure corresponding sampling periods for different types of services;
采集单元,用于在到达目标类型服务对应的采样周期时,采集所述目标类型服务所需传输的数据;A collection unit, configured to collect the data required to be transmitted by the target type service when the sampling period corresponding to the target type service is reached;
划分单元,用于将所述所需传输的数据划分为至少一个数据块;a division unit, configured to divide the data to be transmitted into at least one data block;
确定单元,用于确定所述至少一个数据块中未曾传输过的第一数据块、已经传输过的第二数据块,以及确定所述第一数据块的第一数据索引、所述第二数据块的第二数据索引;A determining unit, configured to determine a first data block that has not been transmitted in the at least one data block, a second data block that has been transmitted, and determine the first data index of the first data block, the second data block the second data index of the block;
发送单元,用于将所述第一数据块、所述第一数据索引和所述第二数据索引,发送至数据中心,以使所述数据中心存储所述至少一个数据块。A sending unit, configured to send the first data block, the first data index, and the second data index to a data center, so that the data center stores the at least one data block.
优选地,Preferably,
进一步包括:计算单元,用于利用第一公式计算划分长度;Further comprising: a calculation unit, configured to calculate the division length by using the first formula;
所述第一公式包括:The first formula includes:
其中,k(si)用于表征si对应的划分长度;si用于表征平台提供的第i个服务;period(si)用于表征对si配置的采样周期;status(si)用于表征si的状态变化粒度评估;λ用于表示影响因子,为已知常数;defatult_size用于表征数据块的默认长度;Among them, k(s i ) is used to represent the division length corresponding to s i ; s i is used to represent the i-th service provided by the platform; period(s i ) is used to represent the sampling period configured for s i ; status(s i ) is used to characterize the state change granularity evaluation of s i ; λ is used to represent the impact factor, which is a known constant; default_size is used to represent the default length of the data block;
所述划分单元,具体用于利用所述划分长度,将所需传输的数据划分为所述至少一个数据块;The division unit is specifically configured to divide the data to be transmitted into the at least one data block by using the division length;
和/或,and / or,
进一步包括:存储单元,用于将所述至少一个数据块,以及所述至少一个数据块中每一个数据块对应的数据索引进行本地存储;Further comprising: a storage unit, configured to locally store the at least one data block and a data index corresponding to each data block in the at least one data block;
和/或,and / or,
所述确定单元,包括:The determination unit includes:
确定模块,用于确定本地存储的各个数据块中每一个数据块对应的第一校验码;A determining module, configured to determine a first check code corresponding to each data block in each data block stored locally;
计算模块,用于计算所述至少一个数据块中每一个数据块对应的第二校验码;A calculation module, configured to calculate a second check code corresponding to each data block in the at least one data block;
遍历模块,用于根据所述第二校验码,遍历所述第一校验码;将位于所述第一校验码中的所述第二校验码对应的数据块作为所述第二数据块,将未位于所述第一校验码中的所述第二校验码对应的数据块作为所述第一数据块。A traversal module, configured to traverse the first check code according to the second check code; use the data block corresponding to the second check code in the first check code as the second check code For a data block, a data block corresponding to the second check code that is not located in the first check code is used as the first data block.
第四方面,本发明实施例还提供了一种数据中心,包括:In a fourth aspect, the embodiment of the present invention also provides a data center, including:
获取单元,用于获取当前物理机发送的针对目标类型服务的第一数据块、所述第一数据块对应的第一数据索引、第二数据块对应的第二数据索引;其中,所述第一数据块为所述当前物理机未曾传输过的数据块,所述第二数据块为所述当前物理机已经传输过的数据块;The acquiring unit is configured to acquire the first data block sent by the current physical machine for the service of the target type, the first data index corresponding to the first data block, and the second data index corresponding to the second data block; wherein, the first A data block is a data block that has not been transmitted by the current physical machine, and the second data block is a data block that has been transmitted by the current physical machine;
确定单元,用于根据本地存储的数据副本中包括的各个第三数据块、以及每一个所述第三数据块对应的第三数据索引,以及根据所述第二数据索引,确定出所述第二数据块;A determining unit, configured to determine the first data block according to each third data block included in the locally stored data copy, and the third data index corresponding to each third data block, and according to the second data index Two data blocks;
存储单元,用于存储所述第一数据块和所述第二数据块。A storage unit, configured to store the first data block and the second data block.
优选地,进一步包括:复制单元,用于对存储的所述第一数据块和所述第二数据块,进行如下目标数量的数据副本复制:Preferably, it further includes: a copying unit, configured to perform the following target number of data copies on the stored first data block and the second data block:
其中,number用于表征对所述第一数据块和所述第二数据块进行数据副本复制的所述目标数量;request_dead_lock用于表征多用户请求对所述第一数据块和所述第二数据块的数据竞争造成服务阻塞的数量;all_request用于表征所述目标类型服务对应的并发访问量;a用于表征比例因子,为已知常数;init_size用于表征数据副本的初始化数量。Among them, number is used to represent the target number of data copy replication for the first data block and the second data block; request_dead_lock is used to represent the multi-user request for the first data block and the second data block The number of service blocks caused by block data competition; all_request is used to represent the concurrent access amount corresponding to the target type of service; a is used to represent the scaling factor, which is a known constant; init_size is used to represent the initialization number of data copies.
第五方面,本发明实施例还提供了一种数据同步系统,包括:上述所述的数据中心、和至少一个上述所述的物理机。In a fifth aspect, an embodiment of the present invention further provides a data synchronization system, including: the above-mentioned data center, and at least one of the above-mentioned physical machines.
本发明实施例提供了一种数据同步方法、设备及系统,在对采集的所需传输的数据进行传输时,通过对所需传输的数据划分为至少一个数据块,将该至少一个数据块中未曾传输过的第一数据块及其第一数据索引、已经传输过的第二数据块对应的第二数据索引,传输给数据中心即可,无需传输第二数据块,从而可以减少传输过程中的数据量,进而可以降低网络带宽的占用量。Embodiments of the present invention provide a data synchronization method, device, and system. When transmitting the collected data to be transmitted, the data to be transmitted is divided into at least one data block, and the data in the at least one data block is The first data block that has never been transmitted and its first data index, and the second data index corresponding to the second data block that has been transmitted can be transmitted to the data center, without the need to transmit the second data block, which can reduce the transmission process. The amount of data, which in turn can reduce the occupation of network bandwidth.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are For some embodiments of the present invention, those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1是本发明一个实施例提供的一种方法流程图;Fig. 1 is a kind of method flowchart provided by one embodiment of the present invention;
图2是本发明一个实施例提供的另一种方法流程图;Fig. 2 is another method flowchart provided by an embodiment of the present invention;
图3是本发明一个实施例提供的又一种方法流程图;Fig. 3 is another method flowchart provided by an embodiment of the present invention;
图4是本发明一个实施例提供的物理机结构示意图;Fig. 4 is a schematic structural diagram of a physical machine provided by an embodiment of the present invention;
图5是本发明一个实施例提供的数据中心结构示意图;Fig. 5 is a schematic structural diagram of a data center provided by an embodiment of the present invention;
图6是本发明一个实施例提供的数据同步系统结构示意图。Fig. 6 is a schematic structural diagram of a data synchronization system provided by an embodiment of the present invention.
具体实施方式detailed description
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例,基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work belong to the protection of the present invention. scope.
如图1所示,本发明实施例提供了一种数据同步方法,应用于物理机,针对不同类型的服务分别配置相应的采样周期;该方法可以包括以下步骤:As shown in Figure 1, the embodiment of the present invention provides a data synchronization method, which is applied to a physical machine and configures corresponding sampling periods for different types of services; the method may include the following steps:
步骤101:在到达目标类型服务对应的采样周期时,采集所述目标类型服务所需传输的数据。Step 101: When the sampling period corresponding to the target type service is reached, collect the data required to be transmitted by the target type service.
在每一个物理机上每隔一段时间可能会产生一些新数据,每当这些产生新数据之后,需要对这些数据进行存储。其中,有些数据对实时性有要求,有些数据对实时性没有要求,因此,需要对数据的实时性进行判断。Some new data may be generated at regular intervals on each physical machine, and these data need to be stored whenever new data is generated. Among them, some data have requirements for real-time performance, and some data have no requirement for real-time performance. Therefore, it is necessary to judge the real-time performance of the data.
在本实施例中,可以对不同类型的服务分别配置采样周期,以针对不同类型的服务进行数据采集和传输,从而实现对服务的分类存储。In this embodiment, sampling periods may be configured for different types of services, so as to collect and transmit data for different types of services, thereby implementing classified storage of services.
步骤102:将所述所需传输的数据划分为至少一个数据块。Step 102: Divide the data to be transmitted into at least one data block.
为了降低网络带宽的占用量,需要所需传输的数据划分为至少一个数据块。例如,所需传输的数据为1M,可以将该所需传输的数据划分为10个数据块。In order to reduce the occupation of network bandwidth, the data to be transmitted needs to be divided into at least one data block. For example, the data to be transmitted is 1M, and the data to be transmitted may be divided into 10 data blocks.
步骤103:确定所述至少一个数据块中未曾传输过的第一数据块、已经 传输过的第二数据块,以及确定所述第一数据块的第一数据索引、所述第二数据块的第二数据索引。Step 103: Determine the first data block that has not been transmitted and the second data block that has been transmitted in the at least one data block, and determine the first data index of the first data block and the index of the second data block Second data index.
在本实施例中,可以将已经传输给数据中心的数据块存储在物理机中。并通过物理机本地存储的已经传输过的各个数据块来确定,该至少一个数据块中哪些已经传输过,哪些未曾传输过。In this embodiment, the data blocks that have been transferred to the data center may be stored in the physical machine. And determine which of the at least one data block has been transmitted and which has not been transmitted by using the transmitted data blocks stored locally on the physical machine.
步骤104:将所述第一数据块、所述第一数据索引和所述第二数据索引,发送至数据中心,以使所述数据中心存储所述至少一个数据块。Step 104: Send the first data block, the first data index, and the second data index to a data center, so that the data center stores the at least one data block.
为了降低网络带宽的占用量,可以只传输未曾传输过的第一数据块,而对于已经传输过的第二数据块可以不用传输,只需将第二数据块对应的第二数据索引传输给数据中心即可,从而可以保证网络占用量相对于现有技术减少了第二数据块的网络占用量。In order to reduce the occupation of network bandwidth, only the first data block that has not been transmitted can be transmitted, and the second data block that has been transmitted can not be transmitted, only the second data index corresponding to the second data block is transmitted to the data The center only needs to be sufficient, thereby ensuring that the network occupancy is reduced by the network occupancy of the second data block compared to the prior art.
根据上述实施例方案,在对采集的所需传输的数据进行传输时,通过对所需传输的数据划分为至少一个数据块,将该至少一个数据块中未曾传输过的第一数据块及其第一数据索引、已经传输过的第二数据块对应的第二数据索引,传输给数据中心即可,无需传输第二数据块,从而可以减少传输过程中的数据量,进而可以降低网络带宽的占用量。According to the solution of the above-mentioned embodiment, when the collected data to be transmitted is transmitted, by dividing the data to be transmitted into at least one data block, the first data block that has never been transmitted in the at least one data block and its The first data index and the second data index corresponding to the second data block that has already been transmitted can be transmitted to the data center without the need to transmit the second data block, thereby reducing the amount of data in the transmission process and reducing the network bandwidth. occupancy.
在本发明一个实施例中,可以通过如下方式实现对所需传输的数据的划分:进一步包括:利用第一公式计算划分长度;In an embodiment of the present invention, the division of the data to be transmitted can be realized in the following manner: further comprising: using the first formula to calculate the division length;
所述第一公式包括:The first formula includes:
其中,k(si)用于表征si对应的划分长度;si用于表征平台提供的第i个服务;period(si)用于表征对si配置的采样周期;status(si)用于表征si的状态变化粒度评估;λ用于表示影响因子,为已知常数;defatult_size用于表征数据块的默认长度;其中,在该服务的类型确定的情况下,status(si)为已知的参数。其中,该defatult_size的设置是为了保证最终确定的划分长度最小为该默认长度。Among them, k(s i ) is used to represent the division length corresponding to s i ; s i is used to represent the i-th service provided by the platform; period(s i ) is used to represent the sampling period configured for s i ; status(s i ) is used to characterize the status change granularity evaluation of s i ; λ is used to represent the impact factor, which is a known constant; defaultult_size is used to represent the default length of the data block; where, when the type of the service is determined, status(s i ) is a known parameter. Wherein, the setting of the defaultult_size is to ensure that the finally determined division length is at least the default length.
其中,该划分长度可以字节数,例如,10个字节;也可以是具体的空间占用量,例如,100KB。Wherein, the division length may be the number of bytes, for example, 10 bytes; or may be a specific space occupied, for example, 100KB.
将所述所需传输的数据划分为至少一个数据块,包括:利用所述划分长度,将所需传输的数据划分为所述至少一个数据块。Dividing the data to be transmitted into at least one data block includes: dividing the data to be transmitted into the at least one data block by using the division length.
例如,k(si)为10个字节,所需传输的数据为100个字节,那么可以将该所需传输的数据划分为10个数据块,每一个数据块10个字节。For example, k(s i ) is 10 bytes, and the data to be transmitted is 100 bytes, then the data to be transmitted can be divided into 10 data blocks, and each data block is 10 bytes.
再如,k(si)为10个字节,所需传输的数据为99个字节,那么可以将该所需传输的数据划分为10个数据块,前9个数据块中每一个数据块10个字节,最后1个数据块为9个字节。For another example, k(s i ) is 10 bytes, and the data to be transmitted is 99 bytes, then the data to be transmitted can be divided into 10 data blocks, and each data block in the first 9 data blocks The block is 10 bytes, and the last 1 data block is 9 bytes.
在本发明一个实施例中,在该第一数据块、所述第一数据索引和所述第二数据索引发送给数据中心之后,为了保证下一次进行数据块传输时,可以减少传输的数据块数量,可以在将所述第一数据块、所述第一数据索引和所述第二数据索引,发送至数据中心之后,进一步包括:将所述至少一个数据块,以及所述至少一个数据块中每一个数据块对应的数据索引进行本地存储。In an embodiment of the present invention, after the first data block, the first data index and the second data index are sent to the data center, in order to ensure the next data block transmission, the number of data blocks to be transmitted can be reduced Quantity may further include: after sending the first data block, the first data index and the second data index to the data center: sending the at least one data block and the at least one data block The data index corresponding to each data block in the database is stored locally.
在本发明一个实施例中,可以通过如下方式确定所述至少一个数据块中未曾传输过的第一数据块、已经传输过的第二数据块,包括:In an embodiment of the present invention, the first data block that has not been transmitted and the second data block that has been transmitted in the at least one data block can be determined in the following manner, including:
确定本地存储的各个数据块中每一个数据块对应的第一校验码;Determine the first check code corresponding to each data block in each data block stored locally;
计算所述至少一个数据块中每一个数据块对应的第二校验码;calculating a second check code corresponding to each data block in the at least one data block;
根据所述第二校验码,遍历所述第一校验码;将位于所述第一校验码中的所述第二校验码对应的数据块作为所述第二数据块,将未位于所述第一校验码中的所述第二校验码对应的数据块作为所述第一数据块。Traverse the first check code according to the second check code; use the data block corresponding to the second check code in the first check code as the second data block, and A data block corresponding to the second check code in the first check code is used as the first data block.
其中,本地存储的各个数据块中每一个数据块对应的第一校验码的计算方式,与该至少一个数据块中每一个数据块对应的第二校验码的计算方式相同。Wherein, the calculation method of the first check code corresponding to each data block in the locally stored data blocks is the same as the calculation method of the second check code corresponding to each data block in the at least one data block.
例如,该至少一个数据块包括10个数据块,分别为:数据块1、数据块2、数据块3、……、数据块10,每一个数据块对应的第二校验码分别为: A1、A2、A3、……、A10。假设在本地存储的第一校验码中包括A1、A2和A3的校验码,那么可以确定数据块1、数据块2和数据块3为已经传输过的数据块,数据块4、数据块5、数据块6、……数据块10为未曾传输过的数据块。For example, the at least one data block includes 10 data blocks, which are respectively: data block 1, data block 2, data block 3, ..., data block 10, and the second check codes corresponding to each data block are respectively: A1 , A2, A3,..., A10. Assuming that the first check code stored locally includes the check codes of A1, A2, and A3, then it can be determined that data block 1, data block 2, and data block 3 are already transmitted data blocks, and data block 4, data block 5. Data block 6, ... data block 10 are data blocks that have never been transmitted.
在本发明一个实施例中,可以通过如下方式计算该至少一个数据块中每一个数据块对应的第二校验码,该技术方式可以包括:In an embodiment of the present invention, the second check code corresponding to each data block in the at least one data block may be calculated in the following manner, and the technical method may include:
利用第二公式、第三公式和第四公式计算所述至少一个数据块中每一个数据块对应的第二校验码;calculating a second check code corresponding to each data block in the at least one data block by using the second formula, the third formula and the fourth formula;
所述第二公式包括:The second formula includes:
所述第三公式包括:The third formula includes:
所述第四公式包括:The fourth formula includes:
Adler32(1,k)=A(1,k)+216B(1,k);Adler32(1,k)=A(1,k)+2 16 B(1,k);
其中,Adler32(1,k)用于表征第二校验码;A(1,k)用于表征第一中间参数,B(1,k)用于表征第二中间参数;data[j]用于表征当前数据块中的第j个字节对应的数据;M用于表征已知常数;k用于表征该当前数据块包括的字节数。Among them, Adler32(1,k) is used to represent the second check code; A(1,k) is used to represent the first intermediate parameter, B(1,k) is used to represent the second intermediate parameter; data[j] is used to represent is used to represent the data corresponding to the jth byte in the current data block; M is used to represent a known constant; k is used to represent the number of bytes included in the current data block.
在根据上述计算方式计算得到的第二校验码,若根据上述计算方式计算得到的第二校验码,与本地存储的第一校验码均不相同,则表明该不相同的第二校验码对应的第二数据块一定未曾传输过;若与本地存储的第一校验码相同,则表明该相同的第二校验码对应的第二数据块可能已经传输过,还需要利用唯一校验值,例如,MD5值,进行进一步校验,若利用MD5值进一步校验结果相同,则表明该第二校验码对应的第二数据块已经传输过,否则,未曾传输过。In the second check code calculated according to the above calculation method, if the second check code calculated according to the above calculation method is different from the first check code stored locally, it indicates that the different second check code The second data block corresponding to the verification code must have never been transmitted; if it is the same as the first verification code stored locally, it indicates that the second data block corresponding to the same second verification code may have been transmitted, and a unique The check value, for example, the MD5 value, is further checked. If the MD5 value is used to further check the result is the same, it indicates that the second data block corresponding to the second check code has been transmitted, otherwise, it has not been transmitted.
在本发明一个实施例中,也可以直接通过唯一校验值的方式计算第二校验码,以确定出第一数据块和第二数据块。In an embodiment of the present invention, the second check code may also be directly calculated by means of a unique check value, so as to determine the first data block and the second data block.
请参考图2,本发明实施例还提供了一种数据同步方法,应用于数据中心,包括:Please refer to FIG. 2, the embodiment of the present invention also provides a data synchronization method applied to a data center, including:
步骤201:获取当前物理机发送的针对目标类型服务的第一数据块、所述第一数据块对应的第一数据索引、第二数据块对应的第二数据索引;其中,所述第一数据块为所述当前物理机未曾传输过的数据块,所述第二数据块为所述当前物理机已经传输过的数据块。Step 201: Obtain the first data block for the target type service sent by the current physical machine, the first data index corresponding to the first data block, and the second data index corresponding to the second data block; wherein, the first data The block is a data block that has not been transmitted by the current physical machine, and the second data block is a data block that has been transmitted by the current physical machine.
步骤202:根据本地存储的数据副本中包括的各个第三数据块、以及每一个所述第三数据块对应的第三数据索引,以及根据所述第二数据索引,确定出所述第二数据块。Step 202: Determine the second data according to each third data block included in the locally stored data copy, and the third data index corresponding to each third data block, and according to the second data index piece.
数据中心本地存储有各个数据块的数据副本,其中,数据副本是一种提高数据访问效率、系统容错能力、负载均衡能力的通用技术。数据副本中不仅包括各个第三数据块,还包括每一个第三数据块对应的第三数据索引,通过在第三数据索引中查找与第二数据索引相同的数据索引,并利用该相同的数据索引确定相应的数据块,该确定出的数据块即为已经传输过的第二数据块。Data copies of each data block are stored locally in the data center, and data copies are a general technology to improve data access efficiency, system fault tolerance, and load balancing capabilities. The data copy includes not only the third data blocks, but also the third data index corresponding to each third data block, by searching the third data index for the same data index as the second data index, and using the same data The index determines the corresponding data block, and the determined data block is the second data block that has been transmitted.
步骤203:存储所述第一数据块和所述第二数据块。Step 203: Store the first data block and the second data block.
在本实施例中,还可以设置服务Si对应的数据量的阈值,例如,该阈值为100,在该服务包括的数据块个数达到100个时,将该服务对应的数据块存储到云存储中。In this embodiment, the threshold value of the amount of data corresponding to the service S i can also be set, for example, the threshold value is 100, and when the number of data blocks included in the service reaches 100, the data block corresponding to the service is stored in the cloud in storage.
进一步地,数据中心通过判断数据对实时性的需求,将需要实时性的数据存储到实时数据库中,将不需要实时性的数据存储到普通的数据库中。Furthermore, the data center stores the data requiring real-time performance in the real-time database and stores the data not requiring real-time performance in the ordinary database by judging the real-time requirement of the data.
在本发明一个实施例中,由于数据副本的个数的不合理,可能会造成平台的存储压力,浪费大量的存储空间,因此,可以通过如下方式确定数据副本的个数,进一步包括:对存储的所述第一数据块和所述第二数据块,进行如下目标数量的数据副本复制:In one embodiment of the present invention, due to the unreasonable number of data copies, it may cause storage pressure on the platform and waste a lot of storage space. Therefore, the number of data copies can be determined in the following manner, further including: The first data block and the second data block are copied as follows:
其中,number用于表征对所述第一数据块和所述第二数据块进行数据副本复制的所述目标数量;request_dead_lock用于表征多用户请求对所述第一数据块和所述第二数据块的数据竞争造成服务阻塞的数量;all_request用于表征所述目标类型服务对应的并发访问量;a用于表征比例因子,为已知常数;init_size用于表征数据副本的初始化数量。Among them, number is used to represent the target number of data copy replication for the first data block and the second data block; request_dead_lock is used to represent the multi-user request for the first data block and the second data block The number of service blocks caused by block data competition; all_request is used to represent the concurrent access amount corresponding to the target type of service; a is used to represent the scaling factor, which is a known constant; init_size is used to represent the initialization number of data copies.
下面以提供服务的平台为PAAS平台为例,该PAAS平台包括至少一台物理机和数据中心,通过其中一台物理机和数据中心之间进行交互,以对实现的数据同步过程进行详细说明,请参考图3,该方法可以包括以下步骤:The following takes the PAAS platform as an example for providing services. The PAAS platform includes at least one physical machine and a data center. Through the interaction between one of the physical machines and the data center, the data synchronization process is described in detail. Referring to Figure 3, the method may include the following steps:
步骤301:物理机针对不同类型的服务分别配置相应的采样周期。Step 301: The physical machine configures corresponding sampling periods for different types of services.
在每一个物理机上每隔一段时间可能会产生一些新数据,每当这些产生新数据之后,需要对这些数据进行存储。Some new data may be generated at regular intervals on each physical machine, and these data need to be stored whenever new data is generated.
假设包括4种服务类型,对这四种服务类型进行如下表1所示的配置:Assuming that there are 4 service types, configure the four service types as shown in Table 1 below:
表1:Table 1:
在本实施例中,可以分别利用不同的采集单元,分别针对上述四个类型的服务进行数据采集,每一个采集单元,根据上述表1中配置的采样周期,每当到达相应的采样周期时,执行对该相应类型服务数据的采集。In this embodiment, different acquisition units can be used to collect data for the above four types of services respectively. Each acquisition unit, according to the sampling period configured in the above Table 1, whenever it reaches the corresponding sampling period, Execute collection of the service data of the corresponding type.
在本实施例中,用户还需要向数据中心的Agent注册监控信息,如监控指标、监控对象、监控方式等。In this embodiment, the user also needs to register monitoring information, such as monitoring indicators, monitoring objects, monitoring methods, etc., with the Agent in the data center.
为了提供较好的扩展性,物理机中的Agent提供了两种扩展的方式:第 一种是通过脚本调用的方式,由用户传送监控脚本到Agent,由Agent调用脚本运行监控模块,采集服务状态信息。第二种方式是通过调用Agent的API,对其进行扩展同样可以达到获取服务状态信息的目的。In order to provide better scalability, the Agent in the physical machine provides two expansion methods: the first is called by a script, the user transmits the monitoring script to the Agent, and the Agent calls the script to run the monitoring module and collect service status information. The second way is to call the Agent's API and extend it to achieve the purpose of obtaining service status information.
步骤302:在到达目标类型服务对应的采样周期时,采集所述目标类型服务所需传输的数据。Step 302: When the sampling period corresponding to the target type service is reached, collect the data required to be transmitted by the target type service.
在本实施例中,物理机中可以包括多台虚拟机,每一台虚拟机上可以配置有Agent代理,该Agent代理可以实现对数据的采集和处理。In this embodiment, the physical machine may include multiple virtual machines, and an agent may be configured on each virtual machine, and the agent may collect and process data.
步骤303:根据服务类型计算划分长度。Step 303: Calculate the division length according to the service type.
在本实施例中,可以通过如下公式(1)计算划分长度:In this embodiment, the division length can be calculated by the following formula (1):
其中,k(si)用于表征si对应的划分长度;si用于表征平台提供的第i个服务;period(si)用于表征对si配置的采样周期;status(si)用于表征si的状态变化粒度评估;λ用于表示影响因子,为已知常数;defatult_size用于表征数据块的默认长度。Among them, k(s i ) is used to represent the division length corresponding to s i ; s i is used to represent the i-th service provided by the platform; period(s i ) is used to represent the sampling period configured for s i ; status(s i ) is used to represent the state change granularity evaluation of si ; λ is used to represent the impact factor, which is a known constant; default_size is used to represent the default length of the data block.
其中,在该服务的类型确定的情况下,status(si)为已知的参数。Wherein, when the service type is determined, status(s i ) is a known parameter.
其中,该defatult_size的设置是为了保证最终确定的划分长度最小为该默认长度。例如,该最小默认长度为5个字节。Wherein, the setting of the defaultult_size is to ensure that the finally determined division length is at least the default length. For example, the minimum default length is 5 bytes.
步骤304:利用计算的划分长度,将所需传输的数据划分为m个数据块。Step 304: Using the calculated division length, divide the data to be transmitted into m data blocks.
假设将目标类型服务所需传输的数据划分出的m个数据块如下:其中, Assume that the data to be transmitted by the target type service is divided into m data blocks as follows: in,
步骤305:获取本地存储的各个数据块。Step 305: Obtain each data block stored locally.
在物理机的本地存储有已经发送给数据中心的各个数据块,以及每一个数据块对应的数据索引。Each data block that has been sent to the data center and a data index corresponding to each data block are stored locally on the physical machine.
步骤306:确定本地存储的各个数据块中每一个数据块对应的第一校验码。Step 306: Determine the first check code corresponding to each data block in the locally stored data blocks.
步骤307:利用第一校验码的计算方式,计算m个数据块中每一个数据块对应的第二校验码,并根据第一校验码和第二校验码确定m个数据块中未曾传输的第一数据块和已经传输的第二数据块。Step 307: Using the calculation method of the first check code, calculate the second check code corresponding to each of the m data blocks, and determine the m data blocks according to the first check code and the second check code The first data block that has not been transmitted and the second data block that has been transmitted.
在本实施例中,为了保证校验码的一致性,需要利用第一校验码的计算方式来计算第二校验码。In this embodiment, in order to ensure the consistency of the check code, it is necessary to calculate the second check code using the calculation method of the first check code.
其中,该计算方式可以是直接计算每一个数据块的唯一校验值,该唯一校验值可以为MD5值。例如,m个数据块分别对应的第二校验码为:A1、A2、A3、……、Am。在本地存储的第一校验码中包括:A1、A2和A3,那么确定数据块1、数据块2和数据块3为已经传输过的数据块,将数据块1、数据块2和数据块3作为第二数据块,将数据块4、数据块5、数据块6、……、数据块m作为未曾传输过的第一数据块。Wherein, the calculation method may be to directly calculate a unique check value of each data block, and the unique check value may be an MD5 value. For example, the second check codes corresponding to the m data blocks are: A1, A2, A3, . . . , Am. The first verification code stored locally includes: A1, A2, and A3, then determine that data block 1, data block 2, and data block 3 are data blocks that have been transmitted, and data block 1, data block 2, and data block 3 As the second data block, use data block 4, data block 5, data block 6, ..., data block m as the first data block that has not been transmitted.
由于唯一校验值的计算方式较为复杂,用时较长,因此,可以通过如下方式计算第二校验码:Since the calculation method of the unique check value is relatively complicated and takes a long time, the second check code can be calculated in the following way:
在本发明一个实施例中,可以通过如下方式计算该至少一个数据块中每一个数据块对应的第二校验码,该计算方式可以包括:In an embodiment of the present invention, the second check code corresponding to each data block in the at least one data block may be calculated in the following manner, and the calculation method may include:
利用公式(2)、公式(3)和公式(4)计算m个数据块中每一个数据块对应的第二校验码;Utilize formula (2), formula (3) and formula (4) to calculate the second check code corresponding to each data block in m data blocks;
Adler32(1,k)=A(1,k)+216B(1,k) (4)Adler32(1,k)=A(1,k)+2 16 B(1,k) (4)
其中,Adler32(1,k)用于表征第二校验码;A(1,k)用于表征第一中间参数,B(1,k)用于表征第二中间参数;data[j]用于表征当前数据块中的第j个字节对应的数据;M用于表征已知常数;k用于表征该当前数据块包括的字节数。Among them, Adler32(1,k) is used to represent the second check code; A(1,k) is used to represent the first intermediate parameter, B(1,k) is used to represent the second intermediate parameter; data[j] is used to represent is used to represent the data corresponding to the jth byte in the current data block; M is used to represent a known constant; k is used to represent the number of bytes included in the current data block.
在根据上述计算方式计算得到的第二校验码,若根据上述计算方式计算 得到的第二校验码,与本地存储的第一校验码均不相同,则表明该不相同的第二校验码对应的第二数据块一定未曾传输过;假设m个数据块对应的第二校验码分别为:A1、A2、A3、……、Am。A1、A2、A3均位于本地存储的第一校验码中,那么,A4、A5、A6、……、Am未曾传输过,可以将数据块4、数据块5、数据块6、……、数据块m确定为第二数据块。In the second check code calculated according to the above calculation method, if the second check code calculated according to the above calculation method is different from the first check code stored locally, it indicates that the different second check code The second data block corresponding to the verification code must have never been transmitted; assuming that the second verification codes corresponding to the m data blocks are: A1, A2, A3, . . . , Am. A1, A2, and A3 are all located in the first check code stored locally. Then, A4, A5, A6, ..., Am have never been transmitted, and data block 4, data block 5, data block 6, ..., Data block m is determined as the second data block.
若与本地存储的第一校验码相同,则表明该相同的第二校验码对应的第二数据块可能已经传输过,还需要利用唯一校验值,例如,MD5值,进行进一步校验。对数据块1、数据块2和数据块3需要进行进一步校验,计算数据块1、数据块2和数据块3分别对应的MD5值,以及计算本地存储的第一校验码为A1、A2和A3的三个数据块分别对应的MD5值,若相对应数据块的MD5值相同,则表明该数据块已经传输过;否则,该数据块未曾传输过。例如,A1、A2和A3均已经传输过,将数据块1、数据块2和数据块3确定为第一数据块。If it is the same as the first check code stored locally, it indicates that the second data block corresponding to the same second check code may have been transmitted, and a unique check value, such as an MD5 value, needs to be used for further check . Data block 1, data block 2, and data block 3 need to be further verified, and the MD5 values corresponding to data block 1, data block 2, and data block 3 are calculated, and the first check codes stored locally are calculated as A1 and A2 The MD5 values corresponding to the three data blocks of A3, if the MD5 values of the corresponding data blocks are the same, it indicates that the data block has been transmitted; otherwise, the data block has not been transmitted. For example, A1, A2, and A3 have all been transmitted, and data block 1, data block 2, and data block 3 are determined as the first data block.
步骤308:确定第一数据块对应的第一数据索引,以及确定第二数据块对应的第二数据索引。Step 308: Determine the first data index corresponding to the first data block, and determine the second data index corresponding to the second data block.
其中,该第一数据索引需要根据第一数据块来生成。Wherein, the first data index needs to be generated according to the first data block.
第二数据索引可以根据本地存储的索引进行确定。The second data index may be determined according to a locally stored index.
步骤309:将第一数据块、第一数据索引和第二数据索引,发送至数据中心,并将第一数据块和第一数据索引进行本地存储。Step 309: Send the first data block, the first data index and the second data index to the data center, and store the first data block and the first data index locally.
为了降低网络带宽的占用量,可以只传输未曾传输过的第一数据块,而对于已经传输过的第二数据块可以不用传输,只需将第二数据块对应的第二数据索引传输给数据中心即可,从而可以保证网络占用量相对于现有技术减少了第二数据块的网络占用量。In order to reduce the occupation of network bandwidth, only the first data block that has not been transmitted can be transmitted, and the second data block that has been transmitted can not be transmitted, only the second data index corresponding to the second data block is transmitted to the data The center only needs to be sufficient, thereby ensuring that the network occupancy is reduced by the network occupancy of the second data block compared to the prior art.
在本实施例中,在该第一数据块、第一数据索引和第二数据索引发送给数据中心之后,为了保证下一次进行数据块传输时,可以减少传输的数据块数量,可以进一步包括:将至少一个数据块,以及至少一个数据块中每一个数据块对应的数据索引进行本地存储。其中,存储的数据还可以包括第一数 据块和第二数据块对应的时间戳。In this embodiment, after the first data block, the first data index and the second data index are sent to the data center, in order to ensure that the next data block transmission can reduce the number of data blocks to be transmitted, it may further include: Locally store at least one data block and a data index corresponding to each data block in the at least one data block. Wherein, the stored data may also include timestamps corresponding to the first data block and the second data block.
在本实施例中,还可以对第一数据块、第一数据索引和第二数据索引进行压缩,并将压缩后的数据包发送给数据中心。In this embodiment, the first data block, the first data index and the second data index may also be compressed, and the compressed data package is sent to the data center.
步骤310:数据中心利用存储的数据副本中包括的各个第三数据块、每一个第三数据块对应的第三数据索引,以及根据第二数据索引,确定出第二数据块。Step 310: The data center determines the second data block by using each third data block included in the stored data copy, the third data index corresponding to each third data block, and the second data index.
由于第二数据块已经给数据中心传输过,因此,在数据中心中存储有第二数据块和第二数据索引。Since the second data block has been transmitted to the data center, the second data block and the second data index are stored in the data center.
其中,可以通过数据副本来获取第二数据块。Wherein, the second data block may be acquired through data copy.
数据中心本地存储有各个数据块的数据副本,其中,数据副本是一种提高数据访问效率、系统容错能力、负载均衡能力的通用技术。数据副本中不仅包括各个第三数据块,还包括每一个第三数据块对应的第三数据索引,通过在第三数据索引中查找与第二数据索引相同的数据索引,并利用该相同的数据索引确定相应的数据块,该确定出的数据块即为已经传输过的第二数据块。Data copies of each data block are stored locally in the data center, and data copies are a general technology to improve data access efficiency, system fault tolerance, and load balancing capabilities. The data copy includes not only the third data blocks, but also the third data index corresponding to each third data block, by searching the third data index for the same data index as the second data index, and using the same data The index determines the corresponding data block, and the determined data block is the second data block that has been transmitted.
步骤311:根据设置的该目标类型的服务所对应的阈值,确定该第一数据块和第二数据块的个数是否大于该阈值,若大于,则将该第一数据块和第二数据块存储到云存储中,若不大于,则将该第一数据块和第二数据块存储到实时数据库中。Step 311: According to the set threshold corresponding to the service of the target type, determine whether the number of the first data block and the number of the second data block is greater than the threshold, and if so, the number of the first data block and the second data block Stored in cloud storage, if not larger, then store the first data block and the second data block in the real-time database.
进一步地,还可以设置服务Si对应的数据量的阈值,例如,该阈值为100,在该服务包括的数据块个数达到100个时,将该服务对应的数据块存储到云存储中。Further, a threshold value of the amount of data corresponding to the service S i may also be set, for example, the threshold value is 100, and when the number of data blocks included in the service reaches 100, the data block corresponding to the service is stored in the cloud storage.
步骤312:对第一数据块和第二数据块进行数据副本复制。Step 312: Perform data copy replication on the first data block and the second data block.
数据副本的个数的不合理,可能会造成平台的存储压力,浪费大量的存储空间,因此,可以通过如下方式确定数据副本的个数:The unreasonable number of data copies may cause storage pressure on the platform and waste a lot of storage space. Therefore, the number of data copies can be determined in the following way:
该数据副本的数量可以通过如下公式(5)来确定:The number of data copies can be determined by the following formula (5):
其中,number用于表征对所述第一数据块和所述第二数据块进行数据副本复制的所述目标数量;request_dead_lock用于表征多用户请求对所述第一数据块和所述第二数据块的数据竞争造成服务阻塞的数量;all_request用于表征所述目标类型服务对应的并发访问量;a用于表征比例因子,为已知常数;init_size用于表征数据副本的初始化数量。Among them, number is used to represent the target number of data copy replication for the first data block and the second data block; request_dead_lock is used to represent the multi-user request for the first data block and the second data block The number of service blocks caused by block data competition; all_request is used to represent the concurrent access amount corresponding to the target type of service; a is used to represent the scaling factor, which is a known constant; init_size is used to represent the initialization number of data copies.
请参考图4,本发明实施例还提供了一种物理机,可以包括:Please refer to FIG. 4, the embodiment of the present invention also provides a physical machine, which may include:
配置单元401,用于针对不同类型的服务分别配置相应的采样周期;The configuration unit 401 is configured to respectively configure corresponding sampling periods for different types of services;
采集单元402,用于在到达目标类型服务对应的采样周期时,采集所述目标类型服务所需传输的数据;The collection unit 402 is configured to collect the data required to be transmitted by the target type service when the sampling period corresponding to the target type service is reached;
划分单元403,用于将所述所需传输的数据划分为至少一个数据块;A dividing unit 403, configured to divide the data to be transmitted into at least one data block;
确定单元404,用于确定所述至少一个数据块中未曾传输过的第一数据块、已经传输过的第二数据块,以及确定所述第一数据块的第一数据索引、所述第二数据块的第二数据索引;A determining unit 404, configured to determine a first data block that has not been transmitted in the at least one data block, a second data block that has been transmitted, and determine a first data index of the first data block, the second data block a second data index of the data block;
发送单元405,用于将所述第一数据块、所述第一数据索引和所述第二数据索引,发送至数据中心,以使所述数据中心存储所述至少一个数据块。The sending unit 405 is configured to send the first data block, the first data index, and the second data index to a data center, so that the data center stores the at least one data block.
在本发明一个实施例中,可以进一步包括:计算单元,用于利用第一公式计算划分长度;In an embodiment of the present invention, it may further include: a calculation unit, configured to calculate the division length by using the first formula;
所述第一公式包括:The first formula includes:
其中,k(si)用于表征si对应的划分长度;si用于表征平台提供的第i个服务;period(si)用于表征对si配置的采样周期;status(si)用于表征si的状态变化粒度评估;λ用于表示影响因子,为已知常数;defatult_size用于表征数据块的默认长度;Among them, k(s i ) is used to represent the division length corresponding to s i ; s i is used to represent the i-th service provided by the platform; period(s i ) is used to represent the sampling period configured for s i ; status(s i ) is used to characterize the state change granularity evaluation of s i ; λ is used to represent the impact factor, which is a known constant; default_size is used to represent the default length of the data block;
所述划分单元,具体用于利用所述划分长度,将所需传输的数据划分为 所述至少一个数据块;The division unit is specifically configured to use the division length to divide the data to be transmitted into the at least one data block;
在本发明一个实施例中,可以进一步包括:存储单元,用于将所述至少一个数据块,以及所述至少一个数据块中每一个数据块对应的数据索引进行本地存储;In an embodiment of the present invention, it may further include: a storage unit configured to locally store the at least one data block and a data index corresponding to each data block in the at least one data block;
在本发明一个实施例中,所述确定单元,包括:In one embodiment of the present invention, the determining unit includes:
确定模块,用于确定本地存储的各个数据块中每一个数据块对应的第一校验码;A determining module, configured to determine a first check code corresponding to each data block in each data block stored locally;
计算模块,用于计算所述至少一个数据块中每一个数据块对应的第二校验码;A calculation module, configured to calculate a second check code corresponding to each data block in the at least one data block;
遍历模块,用于根据所述第二校验码,遍历所述第一校验码;将位于所述第一校验码中的所述第二校验码对应的数据块作为所述第二数据块,将未位于所述第一校验码中的所述第二校验码对应的数据块作为所述第一数据块。A traversal module, configured to traverse the first check code according to the second check code; use the data block corresponding to the second check code in the first check code as the second check code For a data block, a data block corresponding to the second check code that is not located in the first check code is used as the first data block.
请参考图5,本发明实施例还提供了一种数据中心,可以包括:Please refer to FIG. 5, the embodiment of the present invention also provides a data center, which may include:
获取单元501,用于获取当前物理机发送的针对目标类型服务的第一数据块、所述第一数据块对应的第一数据索引、第二数据块对应的第二数据索引;其中,所述第一数据块为所述当前物理机未曾传输过的数据块,所述第二数据块为所述当前物理机已经传输过的数据块;The obtaining unit 501 is configured to obtain the first data block sent by the current physical machine for the service of the target type, the first data index corresponding to the first data block, and the second data index corresponding to the second data block; wherein, the The first data block is a data block that has not been transmitted by the current physical machine, and the second data block is a data block that has been transmitted by the current physical machine;
确定单元502,用于根据本地存储的数据副本中包括的各个第三数据块、以及每一个所述第三数据块对应的第三数据索引,以及根据所述第二数据索引,确定出所述第二数据块;The determining unit 502 is configured to determine the said second data block;
存储单元503,用于存储所述第一数据块和所述第二数据块。The storage unit 503 is configured to store the first data block and the second data block.
在本发明一个实施例中,可以进一步包括:复制单元,用于对存储的所述第一数据块和所述第二数据块,进行如下目标数量的数据副本复制:In an embodiment of the present invention, it may further include: a copying unit, configured to perform the following target number of data copies on the stored first data block and the second data block:
其中,number用于表征对所述第一数据块和所述第二数据块进行数据副本复制的所述目标数量;request_dead_lock用于表征多用户请求对所述第一数据 块和所述第二数据块的数据竞争造成服务阻塞的数量;all_request用于表征所述目标类型服务对应的并发访问量;a用于表征比例因子,为已知常数;init_size用于表征数据副本的初始化数量。Among them, number is used to represent the target number of data copy replication for the first data block and the second data block; request_dead_lock is used to represent the multi-user request for the first data block and the second data block The number of service blocks caused by block data competition; all_request is used to represent the concurrent access amount corresponding to the target type of service; a is used to represent the scaling factor, which is a known constant; init_size is used to represent the initialization number of data copies.
请参考图6,本发明实施例还提供了一种数据同步系统,可以包括:上述任一所述的数据中心50、和至少一个上述任一所述的物理机40。Please refer to FIG. 6 , an embodiment of the present invention also provides a data synchronization system, which may include: any one of the data centers 50 described above, and at least one physical machine 40 described above.
综上所述,本发明各个实施例至少可以实现如下有益效果:In summary, each embodiment of the present invention can at least achieve the following beneficial effects:
1、在本发明实施例中,在对采集的所需传输的数据进行传输时,通过对所需传输的数据划分为至少一个数据块,将该至少一个数据块中未曾传输过的第一数据块及其第一数据索引、已经传输过的第二数据块对应的第二数据索引,传输给数据中心即可,无需传输第二数据块,从而可以减少传输过程中的数据量,进而可以降低网络带宽的占用量。1. In the embodiment of the present invention, when the collected data to be transmitted is transmitted, by dividing the data to be transmitted into at least one data block, the first data that has not been transmitted in the at least one data block block and its first data index, and the second data index corresponding to the second data block that has already been transmitted can be transmitted to the data center without the need to transmit the second data block, thereby reducing the amount of data in the transmission process, which in turn can reduce Occupancy of network bandwidth.
2、在本发明实施例中,物理机中的Agent代理可以提供灵活的扩展功能,用户可以通过接口扩展或是脚本语言调用的方式,对Agent代理的数据监控采集进行动态的扩展。用户不需要因为引入了订制化的服务而对整个平台系统进行改动。2. In the embodiment of the present invention, the Agent in the physical machine can provide flexible expansion functions, and the user can dynamically expand the data monitoring and collection of the Agent through interface expansion or script language calling. Users do not need to make changes to the entire platform system because of the introduction of customized services.
3、在本发明实施例中,面向服务的多副本数据存储策略,在保证数据实时存储访问的基础上,不仅减少了数据副本的冗余数量,而且提高了系统的并发访问度。3. In the embodiment of the present invention, the service-oriented multi-copy data storage strategy, on the basis of ensuring real-time storage and access of data, not only reduces the redundant number of data copies, but also improves the concurrent access of the system.
上述装置内的各单元之间的信息交互、执行过程等内容,由于与本发明方法实施例基于同一构思,具体内容可参见本发明方法实施例中的叙述,此处不再赘述。The information exchange and execution process among the units in the above-mentioned device are based on the same concept as the method embodiment of the present invention, and the specific content can refer to the description in the method embodiment of the present invention, and will not be repeated here.
需要说明的是,在本文中,诸如第一和第二之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明 确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个〃·····”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同因素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or sequence. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a" does not exclude the presence of additional same elements in the process, method, article or apparatus comprising said element.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储在计算机可读取的存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质中。Those of ordinary skill in the art can understand that all or part of the steps to realize the above method embodiments can be completed by program instructions related hardware, and the aforementioned programs can be stored in a computer-readable storage medium. When the program is executed, the It includes the steps of the above method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.
最后需要说明的是:以上所述仅为本发明的较佳实施例,仅用于说明本发明的技术方案,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所做的任何修改、等同替换、改进等,均包含在本发明的保护范围内。Finally, it should be noted that: the above descriptions are only preferred embodiments of the present invention, and are only used to illustrate the technical solutions of the present invention, and are not used to limit the protection scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610451188.0A CN106209974B (en) | 2016-06-21 | 2016-06-21 | A data synchronization method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610451188.0A CN106209974B (en) | 2016-06-21 | 2016-06-21 | A data synchronization method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106209974A true CN106209974A (en) | 2016-12-07 |
CN106209974B CN106209974B (en) | 2019-03-12 |
Family
ID=57460677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610451188.0A Active CN106209974B (en) | 2016-06-21 | 2016-06-21 | A data synchronization method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106209974B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107070740A (en) * | 2017-03-11 | 2017-08-18 | 郑州云海信息技术有限公司 | A kind of efficient PAAS platform monitoring methods and system |
CN107241447A (en) * | 2017-07-31 | 2017-10-10 | 广东欧珀移动通信有限公司 | Data synchronization control method and device, storage medium and electronic equipment |
CN107357746A (en) * | 2017-07-26 | 2017-11-17 | 郑州云海信息技术有限公司 | A kind of communication means and system |
CN109733444A (en) * | 2018-09-19 | 2019-05-10 | 比亚迪股份有限公司 | Database Systems and train supervision management equipment |
WO2019157881A1 (en) * | 2018-02-13 | 2019-08-22 | 论客科技(广州)有限公司 | Method and device for mail synchronization, and computer-readable storage medium |
CN113364555A (en) * | 2020-03-04 | 2021-09-07 | 英飞凌科技股份有限公司 | Device, controller for device and method of communication |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101355588B (en) * | 2008-09-08 | 2012-08-01 | 创新科存储技术(深圳)有限公司 | Data transmission method and transmission terminal base on peer-to-peer network |
CN102436478B (en) * | 2011-10-12 | 2013-06-19 | 浪潮(北京)电子信息产业有限公司 | System and method for accessing massive data |
CN103873503A (en) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | Data block backup system and method |
CN104063377A (en) * | 2013-03-18 | 2014-09-24 | 联想(北京)有限公司 | Information processing method and electronic equipment using same |
CN104348884A (en) * | 2013-08-08 | 2015-02-11 | 中国科学院计算机网络信息中心 | Cloud storage automatic synchronization method |
-
2016
- 2016-06-21 CN CN201610451188.0A patent/CN106209974B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101355588B (en) * | 2008-09-08 | 2012-08-01 | 创新科存储技术(深圳)有限公司 | Data transmission method and transmission terminal base on peer-to-peer network |
CN102436478B (en) * | 2011-10-12 | 2013-06-19 | 浪潮(北京)电子信息产业有限公司 | System and method for accessing massive data |
CN103873503A (en) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | Data block backup system and method |
CN104063377A (en) * | 2013-03-18 | 2014-09-24 | 联想(北京)有限公司 | Information processing method and electronic equipment using same |
CN104348884A (en) * | 2013-08-08 | 2015-02-11 | 中国科学院计算机网络信息中心 | Cloud storage automatic synchronization method |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107070740A (en) * | 2017-03-11 | 2017-08-18 | 郑州云海信息技术有限公司 | A kind of efficient PAAS platform monitoring methods and system |
CN107357746A (en) * | 2017-07-26 | 2017-11-17 | 郑州云海信息技术有限公司 | A kind of communication means and system |
CN107241447A (en) * | 2017-07-31 | 2017-10-10 | 广东欧珀移动通信有限公司 | Data synchronization control method and device, storage medium and electronic equipment |
WO2019157881A1 (en) * | 2018-02-13 | 2019-08-22 | 论客科技(广州)有限公司 | Method and device for mail synchronization, and computer-readable storage medium |
CN109733444A (en) * | 2018-09-19 | 2019-05-10 | 比亚迪股份有限公司 | Database Systems and train supervision management equipment |
CN113364555A (en) * | 2020-03-04 | 2021-09-07 | 英飞凌科技股份有限公司 | Device, controller for device and method of communication |
Also Published As
Publication number | Publication date |
---|---|
CN106209974B (en) | 2019-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106209974B (en) | A data synchronization method, device and system | |
CN106980625B (en) | Data synchronization method, device and system | |
CN105718364B (en) | Resource capability dynamic assessment method is calculated in a kind of cloud computing platform | |
CN108829581B (en) | Application program testing method and device, computer equipment and storage medium | |
CN103229487B (en) | Partition balancing method, device and server in distributed memory system | |
CN110308984B (en) | Cross-cluster computing system for processing geographically distributed data | |
CN103761146B (en) | A kind of method that MapReduce dynamically sets slots quantity | |
WO2013078583A1 (en) | Method and apparatus for optimizing data access, method and apparatus for optimizing data storage | |
US11341842B2 (en) | Metering data management system and computer readable recording medium | |
CN110868330B (en) | Evaluation method, device and evaluation system for dividing CPU resources of cloud platform | |
CN110839069A (en) | Node data deployment method, node data deployment system and medium | |
CN111639902A (en) | Data auditing method based on kafka, control device, computer equipment and storage medium | |
CN107315756B (en) | A log processing method and device | |
CN108038009A (en) | Front and back end exchange method, device and computer equipment based on Web applications | |
WO2024051454A1 (en) | Method and apparatus for processing transaction log | |
CN104866402A (en) | Server testing method and apparatus | |
CN110391952A (en) | A kind of method for analyzing performance, device and its equipment | |
CN104503846B (en) | A kind of resource management system based on cloud computing system | |
CN103475686B (en) | Communication data distribution system and communication data distribution method for electric analog | |
CN112306848B (en) | Architectural view generation method and device for microservice system | |
CN106453594A (en) | A global logical clock synchronization distributed method | |
CN104506663B (en) | A kind of intelligent cloud computing operation management system | |
CN107707383B (en) | Put-through processing method and device, first network element and second network element | |
CN103106103A (en) | Requesting information classification method and device | |
CN116701410B (en) | Method and system for storing memory state data for data language of digital networking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200522 Address after: 250100 No. 1036 Tidal Road, Jinan High-tech Zone, Shandong Province, S01 Building, Tidal Science Park Patentee after: Tidal Cloud Information Technology Co.,Ltd. Address before: 250100 Ji'nan high tech Zone, Shandong, No. 1036 wave road Patentee before: INSPUR ELECTRONIC INFORMATION INDUSTRY Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 250100 No. 1036 Tidal Road, Jinan High-tech Zone, Shandong Province, S01 Building, Tidal Science Park Patentee after: Inspur cloud Information Technology Co., Ltd Address before: 250100 No. 1036 Tidal Road, Jinan High-tech Zone, Shandong Province, S01 Building, Tidal Science Park Patentee before: Tidal Cloud Information Technology Co.,Ltd. |