CN111309673A - Snapshot data generation method and device of incremental data - Google Patents

Snapshot data generation method and device of incremental data Download PDF

Info

Publication number
CN111309673A
CN111309673A CN202010089677.2A CN202010089677A CN111309673A CN 111309673 A CN111309673 A CN 111309673A CN 202010089677 A CN202010089677 A CN 202010089677A CN 111309673 A CN111309673 A CN 111309673A
Authority
CN
China
Prior art keywords
data
snapshot
incremental data
incremental
main key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010089677.2A
Other languages
Chinese (zh)
Other versions
CN111309673B (en
Inventor
赵平
孙森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puxin Hengye Technology Development Beijing Co ltd
Original Assignee
Puxin Hengye Technology Development Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Puxin Hengye Technology Development Beijing Co ltd filed Critical Puxin Hengye Technology Development Beijing Co ltd
Priority to CN202010089677.2A priority Critical patent/CN111309673B/en
Publication of CN111309673A publication Critical patent/CN111309673A/en
Application granted granted Critical
Publication of CN111309673B publication Critical patent/CN111309673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/128Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a snapshot data generation method and a snapshot data generation device of incremental data, wherein the method comprises the following steps: after the incremental data of the current period are obtained, storing each incremental data into an incremental data file corresponding to the main key of the incremental data file, and ensuring that the incremental data of the same main key are stored into the same incremental data file; and obtaining the incremental data with the latest timestamp corresponding to the same main key from the incremental data file to obtain the snapshot data corresponding to the main key. According to the scheme, when the incremental data is stored, the incremental data corresponding to the same main key is stored in the same incremental data file, and the same incremental data file is stored in the same node, so that the transmission time consumed by transmitting the increments from different nodes to the same node is saved, and the speed and the efficiency of generating the snapshot data can be improved by using the scheme.

Description

Snapshot data generation method and device of incremental data
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a snapshot data generation method and device of incremental data.
Background
The streaming incremental data refers to data added, deleted and changed in the database at a certain time or after a certain check point. Accumulated delta data refers to all streaming delta data written to the delta file. A snapshot refers to an image of data at a certain point in time.
At present, in a snapshot generating mode of streaming incremental data, all accumulated incremental data need to be read at the same time to perform Reduce calculation, the Reduce calculation refers to aggregation of main keys of data, and only data with the latest update time is reserved for data of the same main key, so that the efficiency of the snapshot generating mode is low.
Disclosure of Invention
In view of this, an object of the present application is to provide a snapshot data generating method and apparatus for incremental data, so as to solve the technical problem that the current snapshot data generating efficiency is low, and a specific technical scheme thereof is as follows:
in a first aspect, the present application provides a snapshot data generation method of incremental data, including:
obtaining incremental data;
respectively storing each incremental data into an incremental data file corresponding to a main key of the incremental data, wherein the incremental data of the same main key are stored into the same incremental data file, and the same incremental data file is stored on the same node;
and acquiring the latest data of the timestamp corresponding to the same main key from the incremental data file to obtain the snapshot data of the main key.
In a possible implementation manner, the respectively storing each incremental data into an incremental data file corresponding to a primary key of the incremental data includes:
obtaining a mapping value of the main key of the incremental data according to a preset mapping relation;
and storing the incremental data into an incremental data file with a file name corresponding to the mapping value.
In a possible implementation manner, obtaining data with the latest timestamp corresponding to the same primary key from the incremental data file to obtain snapshot data of the primary key includes:
for any main key, reading all incremental data corresponding to the main key from an incremental data file corresponding to the main key;
and acquiring the latest data of the timestamp from all the incremental data corresponding to the main key, acquiring snapshot data corresponding to the main key and storing the snapshot data into a snapshot data file corresponding to the main key, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file.
In a possible implementation manner, obtaining data with the latest timestamp corresponding to the same primary key from the incremental data file to obtain snapshot data of the primary key includes:
for any incremental data file, reading the incremental data corresponding to the same main key in the current period from the incremental data file;
acquiring snapshot data corresponding to the main key in the previous period adjacent to the current period;
and searching the latest timestamp data corresponding to the main key from the incremental data of the current period corresponding to the main key and the snapshot data of the previous period, and determining the latest timestamp data as the snapshot data of the main key in the current period.
In a possible implementation manner, the obtaining snapshot data corresponding to the primary key in a previous cycle adjacent to the current cycle includes:
obtaining a mapping value corresponding to the main key according to a preset mapping relation;
and reading snapshot data corresponding to the main key in the previous period from snapshot data files corresponding to the file names and the mapping values, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file, and the mapping values corresponding to the main keys of the data stored in each snapshot data file are the same.
In a possible implementation manner, after obtaining the snapshot data of the primary key in the current cycle, the method further includes:
and storing the snapshot data of the main key corresponding to the current period into a data snapshot file corresponding to the current period.
In a second aspect, the present application further provides a snapshot data generating apparatus for incremental data, including:
the first acquisition module is used for acquiring incremental data;
the storage module is used for respectively storing each incremental data into an incremental data file corresponding to the primary key of the incremental data, wherein the incremental data of the same primary key are stored into the same incremental data file, and the same incremental data file is stored on the same node;
and the snapshot data acquisition module is used for acquiring the latest data of the timestamp corresponding to the same main key from the incremental data file to obtain the snapshot data of the main key.
In one possible implementation, the snapshot data obtaining module includes:
the first incremental data reading submodule is used for reading all incremental data corresponding to any main key from an incremental data file corresponding to the main key;
and the first snapshot data determining submodule is used for acquiring data with the latest timestamp from all incremental data corresponding to the main key, obtaining snapshot data corresponding to the main key and storing the snapshot data into a snapshot data file corresponding to the main key, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file.
In one possible implementation, the snapshot data obtaining module includes:
the second incremental data reading submodule is used for reading the incremental data corresponding to the same main key in the current period from any incremental data file;
the historical snapshot data acquisition submodule is used for acquiring snapshot data corresponding to the main key in the previous period adjacent to the current period;
and the second snapshot data determination submodule is used for searching the latest timestamp data corresponding to the main key from the incremental data of the current period and the snapshot data of the previous period corresponding to the main key, and determining the latest timestamp data as the snapshot data of the main key in the current period.
In a third aspect, the present application further provides a storage medium, on which a program is stored, where the program is loaded and executed by a processor, and the method for generating snapshot data of incremental data according to any one of the possible implementation manners of the first aspect is implemented.
According to the snapshot data generation method of the incremental data, after the incremental data of the current period are obtained, the incremental data are stored in the incremental data file corresponding to the main key of the incremental data file, the incremental data of the same main key are guaranteed to be stored in the same incremental data file, then the incremental data with the latest timestamp corresponding to the same main key are obtained from the incremental data file, and the snapshot data corresponding to the main key are obtained. According to the scheme, when the incremental data is stored, the incremental data corresponding to the same main key is stored in the same incremental data file, and the same incremental data file is stored in the same node, namely the incremental data of the same main key is stored in the same node, so that the transmission time consumed by transmitting the incremental data from different nodes to the same node is saved. Therefore, the speed of generating the snapshot data can be improved by using the scheme, and the memory resources consumed by transmitting the incremental data among different nodes are saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a snapshot data generation method of incremental data according to an embodiment of the present application;
fig. 2 is a schematic diagram of a mapping relationship between incremental data and an incremental data file according to an embodiment of the present application;
fig. 3 is a schematic diagram of a snapshot data generation process of incremental data according to an embodiment of the present application;
fig. 4 is a flowchart of another snapshot data generation method for incremental data according to an embodiment of the present application;
FIG. 5 is a diagram illustrating a snapshot data generation process for incremental data according to an embodiment of the present disclosure;
fig. 6 is a block diagram of a snapshot data generation apparatus for incremental data according to an embodiment of the present application;
fig. 7 is a block diagram of a snapshot data obtaining module according to an embodiment of the present application;
fig. 8 is a block diagram of another snapshot data obtaining module according to an embodiment of the present application.
Detailed Description
In the process of researching the invention, the inventor finds that in the traditional snapshot data generation mode, data of the same primary key needs to be transmitted from each node to the same node first, and then Reduce calculation is performed, so that a large amount of time is consumed in the process of transmitting a large amount of data in a network, and meanwhile, memory resources of the nodes are consumed, and therefore, the efficiency of the snapshot generation process is low. In order to solve the problem, the present application provides a snapshot data generation method of incremental data, in this scheme, when the incremental data is stored, the incremental data corresponding to the same primary key is stored in the same incremental data file, and the same incremental data file is stored in the same node in the distributed system, so when the snapshot data is generated, transmission time consumed for transmitting the incremental data from different nodes to the same node is saved. Therefore, the speed of generating the snapshot data can be improved by using the scheme, and the memory resources consumed by transmitting the incremental data among different nodes are saved.
Referring to fig. 1, a flowchart of a snapshot data generation method for incremental data provided in an embodiment of the present application is shown, where the method is applied to a device with computing capability, such as a server or a server cluster.
As shown in fig. 1, the method comprises the steps of:
and S110, acquiring incremental data.
In an application scenario, log data includes history data of multiple cycles, and in such an application scenario, incremental data corresponding to all history cycles of the log data needs to be acquired, that is, all data in the log data is acquired. In another application scenario, after the historical incremental data in the log data is processed by the method, only the incremental data of the current period needs to be acquired in a new period.
The period for acquiring the incremental data may be set according to actual needs, for example, one day, or may be less than one day, or may be more than one day, which is not limited herein.
And S120, respectively storing the incremental data into the incremental data file corresponding to the primary key of the incremental data. And storing the incremental data of the same primary key into the same incremental data file, wherein the same incremental data file is stored in the same node.
In a data table, there is typically a combination of one or more columns whose value uniquely identifies each row in the data table, such a combination of one or more columns being referred to as a primary key of the data table.
First, N files are created on a Hadoop Distributed File System (HDFS). Wherein the file name of each file corresponds to the mapping value of the primary key of the incremental data.
And then, writing the incremental data into an incremental data file corresponding to the primary key of the incremental data according to a preset mapping relation, and ensuring that the incremental data of the same primary key is written into the same incremental data file. For example, the preset mapping relationship may be a Hash algorithm.
Specifically, for any obtained incremental data, a primary key value corresponding to the incremental data is obtained, a mapping value corresponding to the primary key value is calculated according to a preset mapping relation, and the incremental data is stored in an incremental data file with a file name corresponding to the mapping value.
It should be noted that, if there are different primary keys with the same mapping value calculated according to a preset mapping relationship, incremental data corresponding to the primary key with the same mapping value will be written into the same incremental data file, for example, hash (a) ═ hash (B) ═ m, where a and B are two different primary key values, and m is a hash value corresponding to a and B, and at this time, both the incremental data with the primary key value a and the incremental data with the primary key value B are written into the incremental data file corresponding to m.
For example, in the partial content of the incremental data shown in fig. 2, the "name" field is the primary key value corresponding to the incremental data, the "timestamp" field is the timestamp of the incremental data, and the "id" field is the number of the incremental data.
In one example, the preset mapping relationship is hash (bob) -3, hash (mary) -2, and hash (jue) -1. Therefore, writing the data record with the name of Bob into a delta data file of 3.log, wherein "3" is the name of the delta data file, and "log" is the extension of the delta data file; writing the data record with the name of Mary into an incremental data file corresponding to the 2. log; and writing the data record with the name of Jue into an incremental data file corresponding to the 1. log.
As can be seen from the example shown in fig. 2, according to the preset mapping relationship, the mapping values corresponding to the same primary key are the same, so that the incremental data corresponding to the same primary key are stored in the same file, and the same file is stored in the same node in the distributed file system, that is, all the incremental data corresponding to the same primary key are stored in the same node.
And S130, acquiring the incremental data with the latest timestamp corresponding to the same main key from the incremental data file to obtain the snapshot data of the main key.
The snapshot data refers to the image of the data at a certain time point, i.e. the latest data corresponding to the time point.
In one embodiment, all incremental data corresponding to the same primary key are read from an incremental data file, and then Reduce calculation is performed on all incremental data corresponding to the primary key, that is, data with the latest timestamp is obtained from all incremental data corresponding to the primary key and is used as snapshot data corresponding to the primary key.
For example, in the example shown in fig. 2, of the 3 pieces of data whose name field is Bob, only the time-most data of the "timestamp" field is reserved, that is, the piece of data whose timestamp is 2019-01-0110: 00: 00. Similarly, only the data with the timestamp of 2019-01-0109:12:10 is reserved in the 2 pieces of data with the name field of Mary. Finally, the snapshot data corresponding to the example shown in fig. 2 is shown in table 1:
TABLE 1
id name url timestamp
1 Bob /edu 2019-01-01 10:00:00
2 Jue /cue 2019-01-01 09:10:12
3 Mary /draw?id=2 2019-01-01 09:12:10
In table 1, the "id" field indicates the number of the snapshot data, the "name" field indicates the primary key of the data, and the "timestamp" field indicates the timestamp of the data.
In an embodiment of the present application, after the snapshot data corresponding to each primary key is obtained, the snapshot data may also be written into the corresponding snapshot data file.
Firstly, N snapshot data files are created on an HDFS, wherein the file name rule of the snapshot data files is the same as the file name rule of the incremental data files, and the snapshot data file names correspond to the incremental data file names one by one.
Referring to fig. 3, a schematic diagram of a process for generating snapshot data of incremental data according to an embodiment of the present application is shown, in which a file with an extension log is a delta data file, and a file with an extension snapshot is a snapshot data file.
The data exchange platform acquires incremental data from log data, writes each incremental data into a corresponding incremental data file based on a main key of the data, then performs Reduce calculation according to the main key of the data to obtain snapshot data corresponding to the main key, and writes the snapshot data corresponding to the main key into the corresponding snapshot data file. For example, the incremental data with the primary key of Bob is written into a 3.log incremental data file, and the Snapshot data with the primary key of Bob is written into a 3.Snapshot data file.
It should be noted that the incremental data file and the snapshot data file are respectively created in different directories of the HDFS, for example, the path of the incremental data file is HDFS:// wh/log/; the path of the snapshot data file is hdfs:// wh/snapshot/.
In the snapshot data generation method of incremental data provided in this embodiment, the incremental data corresponding to the same primary key is stored in the same incremental data file, and the same incremental data file is stored in the same node, so that when the snapshot data is generated, transmission time consumed for transmitting the incremental data from different nodes to the same node is saved, and therefore, the speed of generating the snapshot data is increased, and memory resources consumed for transmitting the incremental data between different computing nodes are saved.
As time increases, the amount of data stored in each incremental data file also increases, and if Reduce calculation is performed on all the incremental data stored in the incremental data file each time snapshot data is generated, a large amount of calculation resources will be consumed, and further, the efficiency of generating snapshots is low.
In order to further improve the efficiency of generating snapshot data, the application also provides another method for generating snapshot data of incremental data.
As shown in fig. 4, another method for generating snapshot data of incremental data includes the following steps:
and S210, acquiring incremental data of the current period.
S220, aiming at each incremental data, acquiring a main key of the incremental data, calculating according to a preset mapping relation to obtain a corresponding mapping value, and storing the incremental data into an incremental data file corresponding to the mapping value.
The specific implementation manners of S210 and S220 are the same as the implementation manners of S110 and S120 in the embodiment shown in fig. 1, and are not described herein again.
For example, the incremental data corresponding to the current cycle (10 months, 29 days all day) is shown in table 2:
TABLE 2
id name url timestamp
11 Bob /org 2019-10-29 10:00:00
12 Jue /cue?id=3 2019-10-29 10:01:00
13 Xia /pub 2019-10-29 10:02:00
Meanwhile, assuming that the preset mapping relationship is hash (Bob) -3, hash (Xia) -3, and hash (jue) -1, as shown in fig. 5, data with a main key of Bob is written in the incremental data file of 3.log, data with a main key of Jue is written in the incremental data file of 3.log, and data with a main key of Xia is written in the incremental data file of 3. log.
And S230, reading the incremental data corresponding to the same main key in the current period from the incremental data file.
And each incremental data file stores all historical incremental data corresponding to the corresponding main key, and all data which correspond to the same main key and have the timestamp in the time range corresponding to the current period, namely the incremental data corresponding to the main key in the current period, are obtained from the incremental data file. For example, if a cycle is one day, the incremental data corresponding to the previous day may be acquired every morning, for example, 1 month, 2 days, 1 am, and the incremental data corresponding to 1 month, 1 day, and the whole day may be acquired every morning.
As shown in fig. 5, the incremental data of the current cycle (e.g., 10 months and 29 days in 2019) is read from the file with the extension log, for example, the incremental data shown in table 2 is finally read.
And S240, acquiring snapshot data corresponding to the previous period adjacent to the current period of the primary key.
In one embodiment, N snapshot data files are created, each of which stores historical, up-to-date snapshot data corresponding to a corresponding primary key. The historical latest snapshot data is the latest snapshot data available at the current time.
In this embodiment, the process of obtaining snapshot data corresponding to a certain primary key in the previous period is as follows:
a. and obtaining a mapping value corresponding to the main key according to a preset mapping relation.
b. And reading the snapshot data corresponding to the main key in the previous period from the snapshot data file with the file name corresponding to the mapping value.
For example, if the preset mapping relationship is hash (Bob) 3, hash (xia) 3, hash (mary) 2, or hash (true) 1, the Snapshot data with the primary key of Bob is written into the Snapshot data file of Snapshot 3.
As shown in fig. 5, Snapshot data corresponding to each primary key in the previous cycle is read from a Snapshot data file with an extension of Snapshot. Assuming that the current cycle is 29 days in 2019, 10 and 28 months in 2019, the previous cycle is 28 days in 2019, 10 and 28 months in 2019, and snapshot data corresponding to 28 days in 2019, 10 and 28 months in 2019 are read from each snapshot data file, for example, the finally read snapshot data is shown in table 3:
TABLE 3
id name url timestamp
1 Bob /edu 2019-10-28 10:00:00
2 Jue /cue 2019-10-28 09:10:12
3 Mary /draw?id=2 2019-10-28 09:12:10
And S250, searching the latest timestamp data corresponding to the main key from the incremental data of the current period corresponding to the main key and the snapshot data of the previous period to serve as the snapshot data of the main key in the current period.
And finally, when calculating the corresponding real-time Snapshot of the current period, performing Reduce calculation on the incremental data of the current period read from the incremental data file with the extension of log and the Snapshot data of the previous period read from the Snapshot data file with the extension of Snapshot according to the main key, namely only the latest data of the timestamp (namely timestamp) is reserved in the data of the same main key, and finally obtaining the Snapshot data corresponding to the current period.
For example, reading data with name of Bob and timestamp of 29 days in 10 months in 2019 from a 3.log data increment file, namely, the 1 st data in table 2; meanwhile, the data with the name of Bob, namely the 1 st data in the table 3, is read from the Snapshot data file of the 3. Snapshot. Then, Reduce calculation is carried out on the two pieces of data, and only one piece of data with the latest timestamp is reserved for the two pieces of data with the name of Bob, namely the data corresponding to the timestamp of 2019-10-2910: 00: 00.
The Reduce calculation is performed on the data corresponding to other main keys according to the same mode, and finally the real-time snapshot of the current period is obtained as shown in table 4:
TABLE 4
id name url timestamp
3 Mary /draw?id=2 2019-10-28 09:12:10
11 Bob /org 2019-10-29 10:00:00
12 Jue /cue?id=3 2019-10-29 10:01:00
13 Xia /pub 2019-10-29 10:02:00
And S260, storing the snapshot data corresponding to the primary key in the current period into a corresponding snapshot data file.
And after calculating to obtain each snapshot data corresponding to the current period, updating each snapshot data to a corresponding snapshot data file. For example, if the Snapshot data of the current cycle is each data in table 4, then the data with name Mary is written into the Snapshot data file of 2.Snapshot, the data with name Bob is written into the Snapshot data file of 3.Snapshot, the data with name Jue is written into the Snapshot data file of 1.Snapshot, and the data with name Xia is written into the Snapshot data file of 3. Snapshot.
In an embodiment of the application, a corresponding snapshot data file is established in each period, and after the snapshot data of the current period is obtained through calculation, the snapshot data is stored in the corresponding snapshot data file of the current period according to the primary key corresponding to the snapshot data.
For example, the snapshot data corresponding to 2019-10-28 are stored in the snapshot data files corresponding to 2019-10-28 according to the primary key respectively; and the snapshot data corresponding to 2019-10-29 are stored in the snapshot data files corresponding to 2019-10-29 according to the primary keys.
In another embodiment of the present application, the latest data snapshot is kept in each snapshot data file, i.e., the snapshot data file updates data in a data overlay manner.
In the snapshot data generation method of incremental data provided in this embodiment, snapshot data calculated in each period is stored in a corresponding snapshot data file. When calculating the snapshot data corresponding to the current period, directly performing Reduce calculation according to the incremental data of the current period and the snapshot data of the previous period to obtain the real-time snapshot data corresponding to the current period. According to the scheme, historical accumulated incremental data does not need to be read, and compared with the method for reading the snapshot data and the accumulated incremental data, consumed time and memory resources are greatly reduced, meanwhile, the calculation amount is reduced, the calculation speed is increased, and the efficiency of generating the real-time snapshot data is improved.
Corresponding to the embodiment of the snapshot data generation method of the incremental data, the application also provides an embodiment of a snapshot data generation device of the incremental data.
Referring to fig. 6, a block diagram of a snapshot data generating apparatus for incremental data according to an embodiment of the present application is shown, where the apparatus includes:
a first obtaining module 110, configured to obtain the incremental data.
And a storage module 120, configured to store each incremental data into an incremental data file corresponding to the primary key of the incremental data.
And storing the incremental data of the same primary key into the same incremental data file, wherein the same incremental data file is stored on the same node.
In an embodiment of the present application, the storage module 120 is specifically configured to: obtaining a mapping value of a primary key of the incremental data according to a preset mapping relation (such as a hash algorithm); and storing the incremental data into an incremental data file with a file name corresponding to the mapping value. In this embodiment, the incremental data with the same mapping value obtained by mapping the primary key are stored in the same incremental data file, that is, the incremental data file corresponding to the mapping value.
Of course, in other embodiments of the present application, the incremental data corresponding to different primary keys may also be stored in different incremental data files, respectively, in this embodiment, the primary key of the data may be directly used as the file name of the incremental data file, or the identifier corresponding to the primary key in a one-to-one manner may be used as the file name of the incremental data file, which is not described herein again.
And a snapshot data obtaining module 130, configured to obtain, from the incremental data file, data with the latest timestamp corresponding to the same primary key, to obtain snapshot data of the primary key.
In an embodiment of the present application, all incremental data corresponding to the same primary key are read from an incremental data file, and then Reduce calculation is performed on all incremental data corresponding to the primary key, that is, data with the latest timestamp is obtained from all incremental data corresponding to the primary key and is used as snapshot data corresponding to the primary key. As shown in fig. 7, the snapshot data obtaining module 130 includes: a first incremental data reading sub-module 1311 and a first snapshot data determination sub-module 1312.
The first incremental data reading sub-module 1311 is configured to, for any primary key, read all incremental data corresponding to the primary key from an incremental data file corresponding to the primary key.
The first snapshot data determining submodule 1312 is configured to obtain the latest timestamp data from all the incremental data corresponding to the primary key, and obtain snapshot data corresponding to the primary key.
And the snapshot data corresponding to the same primary key is stored in the same snapshot data file.
In order to further improve the efficiency of generating the snapshot data, in another embodiment of the present application, as shown in fig. 8, the snapshot data obtaining module 130 includes: a second incremental data reading sub-module 1321, a historical snapshot data obtaining sub-module 1322, and a second snapshot data determination sub-module 1323.
And the second incremental data reading submodule 1321 is configured to, for any incremental data file, read incremental data corresponding to the same primary key in the current cycle from the incremental data file.
The historical snapshot data obtaining sub-module 1322 is configured to obtain snapshot data corresponding to the previous cycle, which is adjacent to the current cycle, of the primary key.
The second snapshot data determining submodule 1323 is configured to search, from the incremental data of the current cycle and the snapshot data of the previous cycle corresponding to the primary key, data with the latest timestamp corresponding to the primary key, and determine that the data is the snapshot data of the primary key in the current cycle.
The process of obtaining the snapshot data corresponding to the previous cycle adjacent to the current cycle by the primary key is as follows: obtaining a mapping value corresponding to the main key according to a preset mapping relation; and reading snapshot data corresponding to the main key in the previous period from snapshot data files corresponding to the file names and the mapping values, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file, and the mapping values corresponding to the main keys of the data stored in each snapshot data file are the same.
The snapshot data generation apparatus for incremental data according to this embodiment stores incremental data corresponding to the same primary key in the same incremental data file, and the same incremental data file is stored in the same node, so that when snapshot data is generated, transmission time consumed for transmitting incremental data from different nodes to the same node is saved, and thus, a speed of generating snapshot data is increased, and memory resources consumed for transmitting incremental data between different computing nodes are saved.
In addition, in order to further increase the speed of generating the snapshot data, the snapshot data calculated in each period is stored in the corresponding snapshot data file. And when the snapshot data is generated in the next period, reading the snapshot data corresponding to the previous period of the period from the snapshot data file, and comparing the snapshot data with the incremental data of the period to obtain the latest snapshot data of the period. Therefore, historical accumulated incremental data does not need to be read, and compared with the method for reading the accumulated incremental data, the method for reading the snapshot data greatly reduces the consumed time and memory resources, reduces the calculated amount, improves the calculating speed and improves the efficiency of generating the real-time snapshot data.
A computing device is provided that includes a processor and a memory having stored therein a program executable on the processor. The processor implements the above-described snapshot data generation method of the incremental data when running the program stored in the memory. Wherein the computing device may be a server or a cluster of servers, or other device with computing capabilities.
The processor in this document may be a CPU, or an MCU, or a combination of a CPU and an MCU. The processor comprises a kernel, the kernel fetches corresponding programs from the memory, and the kernel can be set to one or more than one.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The application also provides a storage medium executable by the computing device, wherein the storage medium stores a program, and the program realizes the snapshot data generation method of the incremental data when being executed by the computing device.
While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The steps in the method of the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs.
The device and the modules and sub-modules in the terminal in the embodiments of the present application can be combined, divided and deleted according to actual needs.
In the several embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of a module or a sub-module is only one logical division, and there may be other divisions when the terminal is actually implemented, for example, a plurality of sub-modules or modules may be combined or integrated into another module, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules or sub-modules described as separate parts may or may not be physically separate, and parts that are modules or sub-modules may or may not be physical modules or sub-modules, may be located in one place, or may be distributed over a plurality of network modules or sub-modules. Some or all of the modules or sub-modules can be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, each functional module or sub-module in the embodiments of the present application may be integrated into one processing module, or each module or sub-module may exist alone physically, or two or more modules or sub-modules may be integrated into one module. The integrated modules or sub-modules may be implemented in the form of hardware, or may be implemented in the form of software functional modules or sub-modules.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A snapshot data generation method of incremental data is characterized by comprising the following steps:
obtaining incremental data;
respectively storing each incremental data into an incremental data file corresponding to a main key of the incremental data, wherein the incremental data of the same main key are stored into the same incremental data file, and the same incremental data file is stored on the same node;
and acquiring the latest data of the timestamp corresponding to the same main key from the incremental data file to obtain the snapshot data of the main key.
2. The method of claim 1, wherein the respectively storing each incremental data into the incremental data file corresponding to the primary key of the incremental data comprises:
obtaining a mapping value of the main key of the incremental data according to a preset mapping relation;
and storing the incremental data into an incremental data file with a file name corresponding to the mapping value.
3. The method of claim 1, wherein obtaining the data with the latest timestamp corresponding to the same primary key from the incremental data file to obtain the snapshot data of the primary key comprises:
for any main key, reading all incremental data corresponding to the main key from an incremental data file corresponding to the main key;
and acquiring the latest data of the timestamp from all the incremental data corresponding to the main key, acquiring snapshot data corresponding to the main key and storing the snapshot data into a snapshot data file corresponding to the main key, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file.
4. The method of claim 1, wherein obtaining the data with the latest timestamp corresponding to the same primary key from the incremental data file to obtain the snapshot data of the primary key comprises:
for any incremental data file, reading the incremental data corresponding to the same main key in the current period from the incremental data file;
acquiring snapshot data corresponding to the main key in the previous period adjacent to the current period;
and searching the latest timestamp data corresponding to the main key from the incremental data of the current period corresponding to the main key and the snapshot data of the previous period, and determining the latest timestamp data as the snapshot data of the main key in the current period.
5. The method according to claim 4, wherein said obtaining snapshot data corresponding to the primary key in a previous cycle adjacent to the current cycle comprises:
obtaining a mapping value corresponding to the main key according to a preset mapping relation;
and reading snapshot data corresponding to the main key in the previous period from snapshot data files corresponding to the file names and the mapping values, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file, and the mapping values corresponding to the main keys of the data stored in each snapshot data file are the same.
6. The method of claim 5, wherein after obtaining the snapshot data of the primary key in the current cycle, the method further comprises:
and storing the snapshot data of the main key corresponding to the current period into a data snapshot file corresponding to the current period.
7. A snapshot data generating apparatus of incremental data, comprising:
the first acquisition module is used for acquiring incremental data;
the storage module is used for respectively storing each incremental data into an incremental data file corresponding to the primary key of the incremental data, wherein the incremental data of the same primary key are stored into the same incremental data file, and the same incremental data file is stored on the same node;
and the snapshot data acquisition module is used for acquiring the latest data of the timestamp corresponding to the same main key from the incremental data file to obtain the snapshot data of the main key.
8. The apparatus of claim 7, wherein the snapshot data obtaining module comprises:
the first incremental data reading submodule is used for reading all incremental data corresponding to any main key from an incremental data file corresponding to the main key;
and the first snapshot data determining submodule is used for acquiring data with the latest timestamp from all incremental data corresponding to the main key, obtaining snapshot data corresponding to the main key and storing the snapshot data into a snapshot data file corresponding to the main key, wherein the snapshot data corresponding to the same main key are stored in the same snapshot data file.
9. The apparatus of claim 7, wherein the snapshot data obtaining module comprises:
the second incremental data reading submodule is used for reading the incremental data corresponding to the same main key in the current period from any incremental data file;
the historical snapshot data acquisition submodule is used for acquiring snapshot data corresponding to the main key in the previous period adjacent to the current period;
and the second snapshot data determination submodule is used for searching the latest timestamp data corresponding to the main key from the incremental data of the current period and the snapshot data of the previous period corresponding to the main key, and determining the latest timestamp data as the snapshot data of the main key in the current period.
10. A storage medium on which a program is stored, wherein the program is loaded by a processor and when executed implements a snapshot data generation method of incremental data as recited in any one of claims 1 to 6.
CN202010089677.2A 2020-02-12 2020-02-12 Snapshot data generation method and device for incremental data Active CN111309673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010089677.2A CN111309673B (en) 2020-02-12 2020-02-12 Snapshot data generation method and device for incremental data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010089677.2A CN111309673B (en) 2020-02-12 2020-02-12 Snapshot data generation method and device for incremental data

Publications (2)

Publication Number Publication Date
CN111309673A true CN111309673A (en) 2020-06-19
CN111309673B CN111309673B (en) 2023-06-23

Family

ID=71145742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010089677.2A Active CN111309673B (en) 2020-02-12 2020-02-12 Snapshot data generation method and device for incremental data

Country Status (1)

Country Link
CN (1) CN111309673B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112838980A (en) * 2020-12-30 2021-05-25 北京奇艺世纪科技有限公司 Message processing method, system, device, electronic equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183387A (en) * 2007-12-14 2008-05-21 沈阳东软软件股份有限公司 Increment data capturing method and system
CN104424219A (en) * 2013-08-23 2015-03-18 华为技术有限公司 Method and equipment of managing data documents
CN105138635A (en) * 2015-08-21 2015-12-09 中国人民解放军装备学院 Method for performing data increment copying through hash value comparison
CN105956123A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Local updating software-based data processing method and apparatus
US20180137134A1 (en) * 2015-07-14 2018-05-17 Alibaba Group Holding Limited Data snapshot acquisition method and system
WO2018090249A1 (en) * 2016-11-16 2018-05-24 Huawei Technologies Co., Ltd. Log-structured storage method and server
CN108153488A (en) * 2017-12-13 2018-06-12 北京小米移动软件有限公司 Data increase method and device certainly
US20190179918A1 (en) * 2017-12-12 2019-06-13 Rubrik, Inc. Sharding of full and incremental snapshots
US20190227878A1 (en) * 2018-01-19 2019-07-25 Rubrik, Inc. Cloud instantiation using out-of-order incrementals
CN110309126A (en) * 2019-07-02 2019-10-08 拓尔思信息技术股份有限公司 Data save method in relational database increment emigration processing method, relational database
CN110515774A (en) * 2019-08-28 2019-11-29 北京百度网讯科技有限公司 Generation method, device, electronic equipment and the storage medium of memory image

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183387A (en) * 2007-12-14 2008-05-21 沈阳东软软件股份有限公司 Increment data capturing method and system
CN104424219A (en) * 2013-08-23 2015-03-18 华为技术有限公司 Method and equipment of managing data documents
US20180137134A1 (en) * 2015-07-14 2018-05-17 Alibaba Group Holding Limited Data snapshot acquisition method and system
CN105138635A (en) * 2015-08-21 2015-12-09 中国人民解放军装备学院 Method for performing data increment copying through hash value comparison
CN105956123A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Local updating software-based data processing method and apparatus
WO2018090249A1 (en) * 2016-11-16 2018-05-24 Huawei Technologies Co., Ltd. Log-structured storage method and server
US20190179918A1 (en) * 2017-12-12 2019-06-13 Rubrik, Inc. Sharding of full and incremental snapshots
CN108153488A (en) * 2017-12-13 2018-06-12 北京小米移动软件有限公司 Data increase method and device certainly
US20190227878A1 (en) * 2018-01-19 2019-07-25 Rubrik, Inc. Cloud instantiation using out-of-order incrementals
CN110309126A (en) * 2019-07-02 2019-10-08 拓尔思信息技术股份有限公司 Data save method in relational database increment emigration processing method, relational database
CN110515774A (en) * 2019-08-28 2019-11-29 北京百度网讯科技有限公司 Generation method, device, electronic equipment and the storage medium of memory image

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ROBERTO CHOUHY LEBORGNE: "Analysis of voltage sag phasor dynamic", 《2005 IEEE RUSSIA POWER TECH》 *
华琳: "面向制造业的灾难备份与恢复研究", 《万方学位论文库》 *
黎春桃: "基于快照差分技术的增量数据检测方法的研究及实现" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112838980A (en) * 2020-12-30 2021-05-25 北京奇艺世纪科技有限公司 Message processing method, system, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111309673B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN106933854B (en) Short link processing method and device and server
CN110008257B (en) Data processing method, device, system, computer equipment and storage medium
US10121169B2 (en) Table level distributed database system for big data storage and query
CN108932236B (en) File management method and device
US10275347B2 (en) System, method and computer program product for managing caches
CN109766349B (en) Task duplicate prevention method, device, computer equipment and storage medium
CN107977396B (en) Method and device for updating data table of KeyValue database
CN112860592B (en) Data caching method and device based on linked list, electronic equipment and storage medium
CN113220659B (en) Data migration method, system, electronic device and storage medium
CN109254981B (en) Data management method and device of distributed cache system
US10664349B2 (en) Method and device for file storage
CN111309673B (en) Snapshot data generation method and device for incremental data
CN112241474B (en) Information processing method, apparatus and storage medium
CN111291083B (en) Webpage source code data processing method and device and computer equipment
CN115858590A (en) Domain name query request processing method, computer device, apparatus, medium, and product
AU2002351296B2 (en) System and method for processing a request using multiple database units
CN112699149B (en) Target data acquisition method and device, storage medium and electronic device
US10402391B2 (en) Processing method, device and system for data of distributed storage system
US7058773B1 (en) System and method for managing data in a distributed system
CN112148925B (en) User identification association query method, device, equipment and readable storage medium
CN108763498B (en) User identity identification method and device, electronic equipment and readable storage medium
CN113094391B (en) Calculation method, device and equipment for data summarization supporting cache
CN110209679B (en) data storage method and terminal equipment for improving access efficiency
CN111782650B (en) Dynamic data warehousing method and device, electronic equipment and storage medium
CN115694841B (en) Metadata circulation method, device and storage medium based on blockchain and IPFS network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant