CN111309673A

CN111309673A - Snapshot data generation method and device of incremental data

Info

Publication number: CN111309673A
Application number: CN202010089677.2A
Authority: CN
Inventors: 赵平; 孙森
Original assignee: Puxin Hengye Technology Development Beijing Co ltd
Current assignee: Puxin Hengye Technology Development Beijing Co ltd
Priority date: 2020-02-12
Filing date: 2020-02-12
Publication date: 2020-06-19
Anticipated expiration: 2040-02-12
Also published as: CN111309673B

Abstract

The application provides a snapshot data generation method and a snapshot data generation device of incremental data, wherein the method comprises the following steps: after the incremental data of the current period are obtained, storing each incremental data into an incremental data file corresponding to the main key of the incremental data file, and ensuring that the incremental data of the same main key are stored into the same incremental data file; and obtaining the incremental data with the latest timestamp corresponding to the same main key from the incremental data file to obtain the snapshot data corresponding to the main key. According to the scheme, when the incremental data is stored, the incremental data corresponding to the same main key is stored in the same incremental data file, and the same incremental data file is stored in the same node, so that the transmission time consumed by transmitting the increments from different nodes to the same node is saved, and the speed and the efficiency of generating the snapshot data can be improved by using the scheme.

Description

Snapshot data generation method and device of incremental data

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a snapshot data generation method and device of incremental data.

Background

The streaming incremental data refers to data added, deleted and changed in the database at a certain time or after a certain check point. Accumulated delta data refers to all streaming delta data written to the delta file. A snapshot refers to an image of data at a certain point in time.

At present, in a snapshot generating mode of streaming incremental data, all accumulated incremental data need to be read at the same time to perform Reduce calculation, the Reduce calculation refers to aggregation of main keys of data, and only data with the latest update time is reserved for data of the same main key, so that the efficiency of the snapshot generating mode is low.

Disclosure of Invention

In view of this, an object of the present application is to provide a snapshot data generating method and apparatus for incremental data, so as to solve the technical problem that the current snapshot data generating efficiency is low, and a specific technical scheme thereof is as follows:

in a first aspect, the present application provides a snapshot data generation method of incremental data, including:

obtaining incremental data;

respectively storing each incremental data into an incremental data file corresponding to a main key of the incremental data, wherein the incremental data of the same main key are stored into the same incremental data file, and the same incremental data file is stored on the same node;

and acquiring the latest data of the timestamp corresponding to the same main key from the incremental data file to obtain the snapshot data of the main key.

In a possible implementation manner, the respectively storing each incremental data into an incremental data file corresponding to a primary key of the incremental data includes:

obtaining a mapping value of the main key of the incremental data according to a preset mapping relation;

and storing the incremental data into an incremental data file with a file name corresponding to the mapping value.

In a possible implementation manner, obtaining data with the latest timestamp corresponding to the same primary key from the incremental data file to obtain snapshot data of the primary key includes:

for any main key, reading all incremental data corresponding to the main key from an incremental data file corresponding to the main key;

and acquiring the latest data of the timestamp from all the incremental data corresponding to the main key, acquiring snapshot data corresponding to the main key and storing the snapshot data into a snapshot data file corresponding to the main key, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file.

for any incremental data file, reading the incremental data corresponding to the same main key in the current period from the incremental data file;

acquiring snapshot data corresponding to the main key in the previous period adjacent to the current period;

and searching the latest timestamp data corresponding to the main key from the incremental data of the current period corresponding to the main key and the snapshot data of the previous period, and determining the latest timestamp data as the snapshot data of the main key in the current period.

In a possible implementation manner, the obtaining snapshot data corresponding to the primary key in a previous cycle adjacent to the current cycle includes:

obtaining a mapping value corresponding to the main key according to a preset mapping relation;

and reading snapshot data corresponding to the main key in the previous period from snapshot data files corresponding to the file names and the mapping values, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file, and the mapping values corresponding to the main keys of the data stored in each snapshot data file are the same.

In a possible implementation manner, after obtaining the snapshot data of the primary key in the current cycle, the method further includes:

and storing the snapshot data of the main key corresponding to the current period into a data snapshot file corresponding to the current period.

In a second aspect, the present application further provides a snapshot data generating apparatus for incremental data, including:

the first acquisition module is used for acquiring incremental data;

the storage module is used for respectively storing each incremental data into an incremental data file corresponding to the primary key of the incremental data, wherein the incremental data of the same primary key are stored into the same incremental data file, and the same incremental data file is stored on the same node;

and the snapshot data acquisition module is used for acquiring the latest data of the timestamp corresponding to the same main key from the incremental data file to obtain the snapshot data of the main key.

In one possible implementation, the snapshot data obtaining module includes:

the first incremental data reading submodule is used for reading all incremental data corresponding to any main key from an incremental data file corresponding to the main key;

and the first snapshot data determining submodule is used for acquiring data with the latest timestamp from all incremental data corresponding to the main key, obtaining snapshot data corresponding to the main key and storing the snapshot data into a snapshot data file corresponding to the main key, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file.

In one possible implementation, the snapshot data obtaining module includes:

the second incremental data reading submodule is used for reading the incremental data corresponding to the same main key in the current period from any incremental data file;

the historical snapshot data acquisition submodule is used for acquiring snapshot data corresponding to the main key in the previous period adjacent to the current period;

and the second snapshot data determination submodule is used for searching the latest timestamp data corresponding to the main key from the incremental data of the current period and the snapshot data of the previous period corresponding to the main key, and determining the latest timestamp data as the snapshot data of the main key in the current period.

In a third aspect, the present application further provides a storage medium, on which a program is stored, where the program is loaded and executed by a processor, and the method for generating snapshot data of incremental data according to any one of the possible implementation manners of the first aspect is implemented.

According to the snapshot data generation method of the incremental data, after the incremental data of the current period are obtained, the incremental data are stored in the incremental data file corresponding to the main key of the incremental data file, the incremental data of the same main key are guaranteed to be stored in the same incremental data file, then the incremental data with the latest timestamp corresponding to the same main key are obtained from the incremental data file, and the snapshot data corresponding to the main key are obtained. According to the scheme, when the incremental data is stored, the incremental data corresponding to the same main key is stored in the same incremental data file, and the same incremental data file is stored in the same node, namely the incremental data of the same main key is stored in the same node, so that the transmission time consumed by transmitting the incremental data from different nodes to the same node is saved. Therefore, the speed of generating the snapshot data can be improved by using the scheme, and the memory resources consumed by transmitting the incremental data among different nodes are saved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a snapshot data generation method of incremental data according to an embodiment of the present application;

fig. 2 is a schematic diagram of a mapping relationship between incremental data and an incremental data file according to an embodiment of the present application;

fig. 3 is a schematic diagram of a snapshot data generation process of incremental data according to an embodiment of the present application;

fig. 4 is a flowchart of another snapshot data generation method for incremental data according to an embodiment of the present application;

FIG. 5 is a diagram illustrating a snapshot data generation process for incremental data according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a snapshot data generation apparatus for incremental data according to an embodiment of the present application;

fig. 7 is a block diagram of a snapshot data obtaining module according to an embodiment of the present application;

fig. 8 is a block diagram of another snapshot data obtaining module according to an embodiment of the present application.

Detailed Description

In the process of researching the invention, the inventor finds that in the traditional snapshot data generation mode, data of the same primary key needs to be transmitted from each node to the same node first, and then Reduce calculation is performed, so that a large amount of time is consumed in the process of transmitting a large amount of data in a network, and meanwhile, memory resources of the nodes are consumed, and therefore, the efficiency of the snapshot generation process is low. In order to solve the problem, the present application provides a snapshot data generation method of incremental data, in this scheme, when the incremental data is stored, the incremental data corresponding to the same primary key is stored in the same incremental data file, and the same incremental data file is stored in the same node in the distributed system, so when the snapshot data is generated, transmission time consumed for transmitting the incremental data from different nodes to the same node is saved. Therefore, the speed of generating the snapshot data can be improved by using the scheme, and the memory resources consumed by transmitting the incremental data among different nodes are saved.

Referring to fig. 1, a flowchart of a snapshot data generation method for incremental data provided in an embodiment of the present application is shown, where the method is applied to a device with computing capability, such as a server or a server cluster.

As shown in fig. 1, the method comprises the steps of:

and S110, acquiring incremental data.

In an application scenario, log data includes history data of multiple cycles, and in such an application scenario, incremental data corresponding to all history cycles of the log data needs to be acquired, that is, all data in the log data is acquired. In another application scenario, after the historical incremental data in the log data is processed by the method, only the incremental data of the current period needs to be acquired in a new period.

The period for acquiring the incremental data may be set according to actual needs, for example, one day, or may be less than one day, or may be more than one day, which is not limited herein.

And S120, respectively storing the incremental data into the incremental data file corresponding to the primary key of the incremental data. And storing the incremental data of the same primary key into the same incremental data file, wherein the same incremental data file is stored in the same node.

In a data table, there is typically a combination of one or more columns whose value uniquely identifies each row in the data table, such a combination of one or more columns being referred to as a primary key of the data table.

First, N files are created on a Hadoop Distributed File System (HDFS). Wherein the file name of each file corresponds to the mapping value of the primary key of the incremental data.

And then, writing the incremental data into an incremental data file corresponding to the primary key of the incremental data according to a preset mapping relation, and ensuring that the incremental data of the same primary key is written into the same incremental data file. For example, the preset mapping relationship may be a Hash algorithm.

Specifically, for any obtained incremental data, a primary key value corresponding to the incremental data is obtained, a mapping value corresponding to the primary key value is calculated according to a preset mapping relation, and the incremental data is stored in an incremental data file with a file name corresponding to the mapping value.

It should be noted that, if there are different primary keys with the same mapping value calculated according to a preset mapping relationship, incremental data corresponding to the primary key with the same mapping value will be written into the same incremental data file, for example, hash (a) ═ hash (B) ═ m, where a and B are two different primary key values, and m is a hash value corresponding to a and B, and at this time, both the incremental data with the primary key value a and the incremental data with the primary key value B are written into the incremental data file corresponding to m.

For example, in the partial content of the incremental data shown in fig. 2, the "name" field is the primary key value corresponding to the incremental data, the "timestamp" field is the timestamp of the incremental data, and the "id" field is the number of the incremental data.

In one example, the preset mapping relationship is hash (bob) -3, hash (mary) -2, and hash (jue) -1. Therefore, writing the data record with the name of Bob into a delta data file of 3.log, wherein "3" is the name of the delta data file, and "log" is the extension of the delta data file; writing the data record with the name of Mary into an incremental data file corresponding to the 2. log; and writing the data record with the name of Jue into an incremental data file corresponding to the 1. log.

As can be seen from the example shown in fig. 2, according to the preset mapping relationship, the mapping values corresponding to the same primary key are the same, so that the incremental data corresponding to the same primary key are stored in the same file, and the same file is stored in the same node in the distributed file system, that is, all the incremental data corresponding to the same primary key are stored in the same node.

And S130, acquiring the incremental data with the latest timestamp corresponding to the same main key from the incremental data file to obtain the snapshot data of the main key.

The snapshot data refers to the image of the data at a certain time point, i.e. the latest data corresponding to the time point.

In one embodiment, all incremental data corresponding to the same primary key are read from an incremental data file, and then Reduce calculation is performed on all incremental data corresponding to the primary key, that is, data with the latest timestamp is obtained from all incremental data corresponding to the primary key and is used as snapshot data corresponding to the primary key.

For example, in the example shown in fig. 2, of the 3 pieces of data whose name field is Bob, only the time-most data of the "timestamp" field is reserved, that is, the piece of data whose timestamp is 2019-01-0110: 00: 00. Similarly, only the data with the timestamp of 2019-01-0109:12:10 is reserved in the 2 pieces of data with the name field of Mary. Finally, the snapshot data corresponding to the example shown in fig. 2 is shown in table 1:

TABLE 1

id	name	url	timestamp
					1	Bob	/edu	2019-01-01 10:00:00
2	Jue	/cue	2019-01-01 09:10:12
				3	Mary	/draw？id＝2	2019-01-01 09:12:10

In table 1, the "id" field indicates the number of the snapshot data, the "name" field indicates the primary key of the data, and the "timestamp" field indicates the timestamp of the data.

In an embodiment of the present application, after the snapshot data corresponding to each primary key is obtained, the snapshot data may also be written into the corresponding snapshot data file.

Firstly, N snapshot data files are created on an HDFS, wherein the file name rule of the snapshot data files is the same as the file name rule of the incremental data files, and the snapshot data file names correspond to the incremental data file names one by one.

Referring to fig. 3, a schematic diagram of a process for generating snapshot data of incremental data according to an embodiment of the present application is shown, in which a file with an extension log is a delta data file, and a file with an extension snapshot is a snapshot data file.

The data exchange platform acquires incremental data from log data, writes each incremental data into a corresponding incremental data file based on a main key of the data, then performs Reduce calculation according to the main key of the data to obtain snapshot data corresponding to the main key, and writes the snapshot data corresponding to the main key into the corresponding snapshot data file. For example, the incremental data with the primary key of Bob is written into a 3.log incremental data file, and the Snapshot data with the primary key of Bob is written into a 3.Snapshot data file.

It should be noted that the incremental data file and the snapshot data file are respectively created in different directories of the HDFS, for example, the path of the incremental data file is HDFS:// wh/log/; the path of the snapshot data file is hdfs:// wh/snapshot/.

In the snapshot data generation method of incremental data provided in this embodiment, the incremental data corresponding to the same primary key is stored in the same incremental data file, and the same incremental data file is stored in the same node, so that when the snapshot data is generated, transmission time consumed for transmitting the incremental data from different nodes to the same node is saved, and therefore, the speed of generating the snapshot data is increased, and memory resources consumed for transmitting the incremental data between different computing nodes are saved.

As time increases, the amount of data stored in each incremental data file also increases, and if Reduce calculation is performed on all the incremental data stored in the incremental data file each time snapshot data is generated, a large amount of calculation resources will be consumed, and further, the efficiency of generating snapshots is low.

In order to further improve the efficiency of generating snapshot data, the application also provides another method for generating snapshot data of incremental data.

As shown in fig. 4, another method for generating snapshot data of incremental data includes the following steps:

and S210, acquiring incremental data of the current period.

S220, aiming at each incremental data, acquiring a main key of the incremental data, calculating according to a preset mapping relation to obtain a corresponding mapping value, and storing the incremental data into an incremental data file corresponding to the mapping value.

The specific implementation manners of S210 and S220 are the same as the implementation manners of S110 and S120 in the embodiment shown in fig. 1, and are not described herein again.

For example, the incremental data corresponding to the current cycle (10 months, 29 days all day) is shown in table 2:

TABLE 2

id	name	url	timestamp
				11	Bob	/org	2019-10-29 10:00:00
12	Jue	/cue？id＝3	2019-10-29 10:01:00
				13	Xia	/pub	2019-10-29 10:02:00

Meanwhile, assuming that the preset mapping relationship is hash (Bob) -3, hash (Xia) -3, and hash (jue) -1, as shown in fig. 5, data with a main key of Bob is written in the incremental data file of 3.log, data with a main key of Jue is written in the incremental data file of 3.log, and data with a main key of Xia is written in the incremental data file of 3. log.

And S230, reading the incremental data corresponding to the same main key in the current period from the incremental data file.

And each incremental data file stores all historical incremental data corresponding to the corresponding main key, and all data which correspond to the same main key and have the timestamp in the time range corresponding to the current period, namely the incremental data corresponding to the main key in the current period, are obtained from the incremental data file. For example, if a cycle is one day, the incremental data corresponding to the previous day may be acquired every morning, for example, 1 month, 2 days, 1 am, and the incremental data corresponding to 1 month, 1 day, and the whole day may be acquired every morning.

As shown in fig. 5, the incremental data of the current cycle (e.g., 10 months and 29 days in 2019) is read from the file with the extension log, for example, the incremental data shown in table 2 is finally read.

And S240, acquiring snapshot data corresponding to the previous period adjacent to the current period of the primary key.

In one embodiment, N snapshot data files are created, each of which stores historical, up-to-date snapshot data corresponding to a corresponding primary key. The historical latest snapshot data is the latest snapshot data available at the current time.

In this embodiment, the process of obtaining snapshot data corresponding to a certain primary key in the previous period is as follows:

a. and obtaining a mapping value corresponding to the main key according to a preset mapping relation.

b. And reading the snapshot data corresponding to the main key in the previous period from the snapshot data file with the file name corresponding to the mapping value.

For example, if the preset mapping relationship is hash (Bob) 3, hash (xia) 3, hash (mary) 2, or hash (true) 1, the Snapshot data with the primary key of Bob is written into the Snapshot data file of Snapshot 3.

As shown in fig. 5, Snapshot data corresponding to each primary key in the previous cycle is read from a Snapshot data file with an extension of Snapshot. Assuming that the current cycle is 29 days in 2019, 10 and 28 months in 2019, the previous cycle is 28 days in 2019, 10 and 28 months in 2019, and snapshot data corresponding to 28 days in 2019, 10 and 28 months in 2019 are read from each snapshot data file, for example, the finally read snapshot data is shown in table 3:

TABLE 3

id	name	url	timestamp
					1	Bob	/edu	2019-10-28 10:00:00
2	Jue	/cue	2019-10-28 09:10:12
				3	Mary	/draw？id＝2	2019-10-28 09:12:10

And S250, searching the latest timestamp data corresponding to the main key from the incremental data of the current period corresponding to the main key and the snapshot data of the previous period to serve as the snapshot data of the main key in the current period.

And finally, when calculating the corresponding real-time Snapshot of the current period, performing Reduce calculation on the incremental data of the current period read from the incremental data file with the extension of log and the Snapshot data of the previous period read from the Snapshot data file with the extension of Snapshot according to the main key, namely only the latest data of the timestamp (namely timestamp) is reserved in the data of the same main key, and finally obtaining the Snapshot data corresponding to the current period.

For example, reading data with name of Bob and timestamp of 29 days in 10 months in 2019 from a 3.log data increment file, namely, the 1 st data in table 2; meanwhile, the data with the name of Bob, namely the 1 st data in the table 3, is read from the Snapshot data file of the 3. Snapshot. Then, Reduce calculation is carried out on the two pieces of data, and only one piece of data with the latest timestamp is reserved for the two pieces of data with the name of Bob, namely the data corresponding to the timestamp of 2019-10-2910: 00: 00.

The Reduce calculation is performed on the data corresponding to other main keys according to the same mode, and finally the real-time snapshot of the current period is obtained as shown in table 4:

TABLE 4

id	name	url	timestamp
					3	Mary	/draw？id＝2	2019-10-28 09:12:10
11	Bob	/org	2019-10-29 10:00:00
				12	Jue	/cue？id＝3	2019-10-29 10:01:00
13	Xia	/pub	2019-10-29 10:02:00

And S260, storing the snapshot data corresponding to the primary key in the current period into a corresponding snapshot data file.

And after calculating to obtain each snapshot data corresponding to the current period, updating each snapshot data to a corresponding snapshot data file. For example, if the Snapshot data of the current cycle is each data in table 4, then the data with name Mary is written into the Snapshot data file of 2.Snapshot, the data with name Bob is written into the Snapshot data file of 3.Snapshot, the data with name Jue is written into the Snapshot data file of 1.Snapshot, and the data with name Xia is written into the Snapshot data file of 3. Snapshot.

In an embodiment of the application, a corresponding snapshot data file is established in each period, and after the snapshot data of the current period is obtained through calculation, the snapshot data is stored in the corresponding snapshot data file of the current period according to the primary key corresponding to the snapshot data.

For example, the snapshot data corresponding to 2019-10-28 are stored in the snapshot data files corresponding to 2019-10-28 according to the primary key respectively; and the snapshot data corresponding to 2019-10-29 are stored in the snapshot data files corresponding to 2019-10-29 according to the primary keys.

In another embodiment of the present application, the latest data snapshot is kept in each snapshot data file, i.e., the snapshot data file updates data in a data overlay manner.

In the snapshot data generation method of incremental data provided in this embodiment, snapshot data calculated in each period is stored in a corresponding snapshot data file. When calculating the snapshot data corresponding to the current period, directly performing Reduce calculation according to the incremental data of the current period and the snapshot data of the previous period to obtain the real-time snapshot data corresponding to the current period. According to the scheme, historical accumulated incremental data does not need to be read, and compared with the method for reading the snapshot data and the accumulated incremental data, consumed time and memory resources are greatly reduced, meanwhile, the calculation amount is reduced, the calculation speed is increased, and the efficiency of generating the real-time snapshot data is improved.

Corresponding to the embodiment of the snapshot data generation method of the incremental data, the application also provides an embodiment of a snapshot data generation device of the incremental data.

Referring to fig. 6, a block diagram of a snapshot data generating apparatus for incremental data according to an embodiment of the present application is shown, where the apparatus includes:

a first obtaining module 110, configured to obtain the incremental data.

And a storage module 120, configured to store each incremental data into an incremental data file corresponding to the primary key of the incremental data.

And storing the incremental data of the same primary key into the same incremental data file, wherein the same incremental data file is stored on the same node.

In an embodiment of the present application, the storage module 120 is specifically configured to: obtaining a mapping value of a primary key of the incremental data according to a preset mapping relation (such as a hash algorithm); and storing the incremental data into an incremental data file with a file name corresponding to the mapping value. In this embodiment, the incremental data with the same mapping value obtained by mapping the primary key are stored in the same incremental data file, that is, the incremental data file corresponding to the mapping value.

Of course, in other embodiments of the present application, the incremental data corresponding to different primary keys may also be stored in different incremental data files, respectively, in this embodiment, the primary key of the data may be directly used as the file name of the incremental data file, or the identifier corresponding to the primary key in a one-to-one manner may be used as the file name of the incremental data file, which is not described herein again.

And a snapshot data obtaining module 130, configured to obtain, from the incremental data file, data with the latest timestamp corresponding to the same primary key, to obtain snapshot data of the primary key.

In an embodiment of the present application, all incremental data corresponding to the same primary key are read from an incremental data file, and then Reduce calculation is performed on all incremental data corresponding to the primary key, that is, data with the latest timestamp is obtained from all incremental data corresponding to the primary key and is used as snapshot data corresponding to the primary key. As shown in fig. 7, the snapshot data obtaining module 130 includes: a first incremental data reading sub-module 1311 and a first snapshot data determination sub-module 1312.

The first incremental data reading sub-module 1311 is configured to, for any primary key, read all incremental data corresponding to the primary key from an incremental data file corresponding to the primary key.

The first snapshot data determining submodule 1312 is configured to obtain the latest timestamp data from all the incremental data corresponding to the primary key, and obtain snapshot data corresponding to the primary key.

And the snapshot data corresponding to the same primary key is stored in the same snapshot data file.

In order to further improve the efficiency of generating the snapshot data, in another embodiment of the present application, as shown in fig. 8, the snapshot data obtaining module 130 includes: a second incremental data reading sub-module 1321, a historical snapshot data obtaining sub-module 1322, and a second snapshot data determination sub-module 1323.

And the second incremental data reading submodule 1321 is configured to, for any incremental data file, read incremental data corresponding to the same primary key in the current cycle from the incremental data file.

The historical snapshot data obtaining sub-module 1322 is configured to obtain snapshot data corresponding to the previous cycle, which is adjacent to the current cycle, of the primary key.

The second snapshot data determining submodule 1323 is configured to search, from the incremental data of the current cycle and the snapshot data of the previous cycle corresponding to the primary key, data with the latest timestamp corresponding to the primary key, and determine that the data is the snapshot data of the primary key in the current cycle.

The process of obtaining the snapshot data corresponding to the previous cycle adjacent to the current cycle by the primary key is as follows: obtaining a mapping value corresponding to the main key according to a preset mapping relation; and reading snapshot data corresponding to the main key in the previous period from snapshot data files corresponding to the file names and the mapping values, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file, and the mapping values corresponding to the main keys of the data stored in each snapshot data file are the same.

The snapshot data generation apparatus for incremental data according to this embodiment stores incremental data corresponding to the same primary key in the same incremental data file, and the same incremental data file is stored in the same node, so that when snapshot data is generated, transmission time consumed for transmitting incremental data from different nodes to the same node is saved, and thus, a speed of generating snapshot data is increased, and memory resources consumed for transmitting incremental data between different computing nodes are saved.

In addition, in order to further increase the speed of generating the snapshot data, the snapshot data calculated in each period is stored in the corresponding snapshot data file. And when the snapshot data is generated in the next period, reading the snapshot data corresponding to the previous period of the period from the snapshot data file, and comparing the snapshot data with the incremental data of the period to obtain the latest snapshot data of the period. Therefore, historical accumulated incremental data does not need to be read, and compared with the method for reading the accumulated incremental data, the method for reading the snapshot data greatly reduces the consumed time and memory resources, reduces the calculated amount, improves the calculating speed and improves the efficiency of generating the real-time snapshot data.

A computing device is provided that includes a processor and a memory having stored therein a program executable on the processor. The processor implements the above-described snapshot data generation method of the incremental data when running the program stored in the memory. Wherein the computing device may be a server or a cluster of servers, or other device with computing capabilities.

The processor in this document may be a CPU, or an MCU, or a combination of a CPU and an MCU. The processor comprises a kernel, the kernel fetches corresponding programs from the memory, and the kernel can be set to one or more than one.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

The application also provides a storage medium executable by the computing device, wherein the storage medium stores a program, and the program realizes the snapshot data generation method of the incremental data when being executed by the computing device.

While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The steps in the method of the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs.

The device and the modules and sub-modules in the terminal in the embodiments of the present application can be combined, divided and deleted according to actual needs.

In the several embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of a module or a sub-module is only one logical division, and there may be other divisions when the terminal is actually implemented, for example, a plurality of sub-modules or modules may be combined or integrated into another module, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules or sub-modules described as separate parts may or may not be physically separate, and parts that are modules or sub-modules may or may not be physical modules or sub-modules, may be located in one place, or may be distributed over a plurality of network modules or sub-modules. Some or all of the modules or sub-modules can be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, each functional module or sub-module in the embodiments of the present application may be integrated into one processing module, or each module or sub-module may exist alone physically, or two or more modules or sub-modules may be integrated into one module. The integrated modules or sub-modules may be implemented in the form of hardware, or may be implemented in the form of software functional modules or sub-modules.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A snapshot data generation method of incremental data is characterized by comprising the following steps:

obtaining incremental data;

2. The method of claim 1, wherein the respectively storing each incremental data into the incremental data file corresponding to the primary key of the incremental data comprises:

3. The method of claim 1, wherein obtaining the data with the latest timestamp corresponding to the same primary key from the incremental data file to obtain the snapshot data of the primary key comprises:

4. The method of claim 1, wherein obtaining the data with the latest timestamp corresponding to the same primary key from the incremental data file to obtain the snapshot data of the primary key comprises:

5. The method according to claim 4, wherein said obtaining snapshot data corresponding to the primary key in a previous cycle adjacent to the current cycle comprises:

6. The method of claim 5, wherein after obtaining the snapshot data of the primary key in the current cycle, the method further comprises:

7. A snapshot data generating apparatus of incremental data, comprising:

the first acquisition module is used for acquiring incremental data;

8. The apparatus of claim 7, wherein the snapshot data obtaining module comprises:

9. The apparatus of claim 7, wherein the snapshot data obtaining module comprises:

10. A storage medium on which a program is stored, wherein the program is loaded by a processor and when executed implements a snapshot data generation method of incremental data as recited in any one of claims 1 to 6.