CN107015883B - Dynamic data backup method and device - Google Patents

Dynamic data backup method and device Download PDF

Info

Publication number
CN107015883B
CN107015883B CN201610058988.6A CN201610058988A CN107015883B CN 107015883 B CN107015883 B CN 107015883B CN 201610058988 A CN201610058988 A CN 201610058988A CN 107015883 B CN107015883 B CN 107015883B
Authority
CN
China
Prior art keywords
node
nodes
data
database
storage medium
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610058988.6A
Other languages
Chinese (zh)
Other versions
CN107015883A (en
Inventor
陶鸿飞
冯纲
温士帅
周立
王晓东
陈磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai middle shift information technology Co.,Ltd.
China Mobile Group Shanghai Co Ltd
Original Assignee
China Mobile Group Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Shanghai Co Ltd filed Critical China Mobile Group Shanghai Co Ltd
Priority to CN201610058988.6A priority Critical patent/CN107015883B/en
Publication of CN107015883A publication Critical patent/CN107015883A/en
Application granted granted Critical
Publication of CN107015883B publication Critical patent/CN107015883B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1461Backup scheduling policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments

Abstract

The invention discloses a dynamic data backup method and a dynamic data backup device, which are used for realizing stable and reliable data migration of a distributed processing system during dynamic data backup. The method comprises the following steps: copying stock data on a first node of a first database to a target storage medium of the first node, wherein the stock data on the first node is operation data of a first processor which is stored on the first node and bears the first node when backup is started, the first database comprises N first nodes, and the target storage media of any two first nodes are different or part of the target storage media of the first nodes are the same; and distributing stock data on the target storage medium of the first node to a second node corresponding to the first node in a second database, wherein the second database comprises P second nodes, the number of the second nodes corresponding to the first node is at least 1, and P is greater than N, so that the distributed processing system can perform stable and reliable data migration and distribution during dynamic data backup.

Description

Dynamic data backup method and device
Technical Field
The embodiment of the invention relates to the technical field of data migration, in particular to a dynamic data backup method and device.
Background
With the advent of the big data age, the application demand for big data is increasing. In large data applications, the amount of data is increasing over time, and as the amount of data increases, higher demands are placed on the processing power of hardware devices and network structures. In order to meet the ever-expanding data volume and the ever-increasing data processing capacity, the hardware equipment must be continuously upgraded.
Under the condition of considering capacity, data processing capacity and implementation cost, when the hardware equipment is upgraded and modified, the distributed processing system is called as a preferred system, and the distributed processing system has good data processing capacity and upgrading expansion capacity and is suitable for being used as a hardware framework of a large data system. However, the problems encountered by the distributed processing system are also huge, and when the distributed processing system carries out data upgrading, data migration is very difficult. This is because data in the distributed processing system is distributed over several nodes, and after the upgrade, the number of nodes increases, and how to reasonably distribute the data becomes a difficult problem. The problem becomes even more troublesome if the data is dynamically changing. In the prior art, a single backup path is adopted in most backup modes, that is, only a single backup path exists between a data source (database) and a backup target (tape), and data can only be backed up in a serial mode, so that mass data migration takes a long time, and data on an original processing system is changed in the time, so that data on the original processing system is inconsistent with data on a new processing system, and great difficulty is brought to system upgrade and transition. In this respect, the prior art does not have a robust solution.
In summary, there is a need in the prior art for a dynamic data backup method for implementing stable and reliable data migration of a distributed processing system during dynamic data backup.
Disclosure of Invention
The embodiment of the invention provides a dynamic data backup method and device, which are used for realizing stable and reliable data migration of a distributed processing system during dynamic data backup.
The embodiment of the invention provides a dynamic data backup method, which comprises the following steps:
copying stock data on a first node of a first database to a target storage medium of the first node, wherein the stock data on the first node is operation data of a first processor which is stored on the first node and bears the first node when backup is started, the first database comprises N first nodes, and the target storage media of any two first nodes are different or part of the target storage media of the first nodes are the same;
and distributing the inventory data on the target storage medium of the first node to a second node corresponding to the first node in a second database, wherein the second database comprises P second nodes, the number of the second nodes corresponding to the first node is at least 1, and P is greater than N.
An embodiment of the present invention provides a dynamic data backup device, including:
the first processing unit is configured to copy stock data on a first node of a first database onto a target storage medium of the first node, where the stock data on the first node is operation data of a first processor that bears the first node and is stored on the first node when backup is started, and the first database includes N first nodes, where target storage media of any two first nodes are different or target storage media of part of the first nodes are the same;
and the second processing unit is used for distributing the stock data on the target storage medium of the first node to a second node corresponding to the first node in a second database, the second database comprises P second nodes, the number of the second nodes corresponding to the first node is at least 1, and P is greater than N.
In the above embodiment, the first database is divided into a plurality of partitioned databases, that is, N first nodes, the second database is divided into a plurality of partitioned databases, that is, P second nodes, and a second node corresponding to any first node is preset, so that when data backup is performed, stock data on any first node in the first database is copied onto a target storage medium of the first node, and the stock data on the target storage medium of the first node is distributed to a second node corresponding to the first node in the second database, that is, the stock data of the first database is copied into the second database through N copy paths, the N copy paths can be performed simultaneously without interference, and compared with the prior art in which backup of large-batch stock data is performed through only one copy path, the availability of the N copy paths is high, the time of data backup is greatly shortened, the efficiency of data backup of the whole system is improved, and because the mapping relationship between the first node and the second node is preset, when the data is backed up, only the data stored on the appointed first node is copied to the second node which has the mapping relationship with the appointed first node according to the appointed path, but not distributed to the second node which has no mapping relationship with the appointed first node, so that the problem that stock data after migration is difficult to reasonably distribute on a plurality of nodes is avoided, and the problem that the data on an original processing system and a new processing system are inconsistent before and after the system is upgraded is also avoided, therefore, the embodiment of the invention improves the high reliability of data migration.
Drawings
Fig. 1 is a flowchart of a dynamic data backup method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a mapping manner between a first node and a second node according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a mapping manner between a first node and a second node according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a mapping manner between a first node and a second node according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a mapping manner between a first node and a second node according to an embodiment of the present invention;
fig. 6a, fig. 6b, and fig. 6c are schematic structural diagrams of erecting a second node on a server according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a dynamic data backup device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
In order to implement stable and reliable data migration of a distributed processing system during dynamic data backup, an embodiment of the present invention provides a dynamic data backup method as shown in fig. 1, where the specific process includes:
101, copying stock data on a first node of a first database to a target storage medium of the first node, wherein the stock data on the first node is operation data of a first processor bearing the first node and stored on the first node when backup is started, the first database comprises N first nodes, and the target storage media of any two first nodes are different or the target storage media of part of the first nodes are the same;
step 102, distributing the stock data on the target storage medium of the first node to a second node corresponding to the first node in a second database, wherein the second database comprises P second nodes, the number of the second nodes corresponding to the first node is at least 1, and P is larger than N.
In step 101, N is a positive integer greater than 1, and P is a positive integer greater than or equal to N. The stock data on the first node may be identified according to a first marker, the first marker may be a timestamp, the timestamp of the first marker is a time when the backup of the data stored on the first node is started, the stock data on the first node refers to data stored on the first node before the timestamp of the first marker, and the data stored on the first node refers to running data of a processor carrying the first node.
The second database is used to store operation data of a new distributed processing system, the first database is used to store operation data of an original distributed processing system, and the new distributed processing system is an upgrade and extension of the original distributed processing system, so generally speaking, the number of second nodes in the second database will be greater than the number of first nodes in the first database. The first node is a partition database of the first database, and the second node is a partition database of the second database. When data distribution is performed, data originally distributed on each first node needs to be redistributed to the second node according to a certain rule.
The process of backing up the stock data in the first database to the second database according to the method comprises a plurality of backup paths, and the specific implementation is embodied in two aspects, one aspect sets the mapping relation between the first nodes and the second nodes, each first node corresponds to one or more second nodes, and when the data backup is carried out, only the stock data of the specified first node is distributed to the specified second node corresponding to the first node, the number of the backup paths is the number of the backup paths of the number of the first nodes in the first database. On the other hand, migration and distribution of data between any first node and a second node corresponding to the first node are achieved by forwarding a target storage medium corresponding to the first node, each first node corresponds to one target storage medium, that is, one backup path maps one target storage medium, in a specific implementation, the target storage media of any two first nodes are different, that is, N first nodes correspond to N target storage media, or a part of the first nodes correspond to the same target storage medium, so as to establish multiple backup paths from multiple first nodes to multiple media. A plurality of backup paths from a plurality of first nodes to a storage medium may also be established, where the storage medium may be divided into N storage partitions, and the N first nodes correspond to the N storage partitions, thereby implementing the N backup paths.
The target storage medium of the first node is a physical storage medium or a storage partition of a physical storage medium. The storage medium may be implemented using any of a variety of existing media. In practical applications, the choice of storage medium is mainly considered in terms of implementation cost. From a cost perspective, high capacity, low cost tapes are a good choice. But as technology evolves, it is likely that other storage media will be more suitable, or that newer storage media will appear, but this should be considered within the scope of the present invention. The invention considers selecting a medium with the most cost performance when backing up data. The backup of data to a tape can be realized by various existing technologies, but most of the backup methods in the prior art adopt a single backup path, that is, only a single backup path exists between a data source (database) and a backup target (tape), and data can only be backed up in a serial manner. For a distributed processing system with a plurality of nodes, the backup mode of a single backup path is obviously too inefficient, and the process of the invention for backing up the stock data in the first database to the second database comprises a plurality of backup paths, thus realizing the high efficiency of the backup of the data of the plurality of nodes of the distributed processing system.
It is worth noting that there are multiple mapping strategies between N first nodes and P second nodes in the embodiment of the present invention, the number P of second nodes in the second database is an integer multiple N of the number N of first nodes in the first database, the number of second nodes corresponding to each first node is N, and the number P of second nodes in the second database may also be a non-integer multiple of the number N of first nodes in the first database. But are not limited to the examples listed for the embodiments of the invention.
The following is a detailed description of the mapping between N first nodes and P second nodes according to the present invention.
In the embodiment shown in fig. 2, the N first nodes and the P second nodes are both raised to the nth power of 2, the number M of the first nodes is 32, and the number N of the second nodes is 128, according to the most common node number of the distributed system, the number of the second nodes is increased by 4 times relative to the number of the first nodes, in the embodiment shown in fig. 2, the P second nodes corresponding to the N first nodes are embodied by using a matrix of 4 rows × 32 columns, the matrix of 4 rows × 32 columns includes 128 total second nodes, the numbering sequence of the first rows is 1-32, the second rows 33-64, the third row is 65-96, the fourth row is 97-128, the 4 second nodes in each column correspond to one first node, the 32 second nodes in the first row correspond to 32 first nodes in a row in turn, each first node corresponds to 32 first nodes in a row, each first node corresponds to 4 second nodes in the same column, that each first node corresponds to one first node, that each first node in the same column corresponds to one first node, that each first node corresponds to a server, and the four servers are typically provided by a dashed line, and each server is provided by a server, and each of the four nodes are typically provided by a server, and each node number of the first node is reached by a processing unit, and a processing unit is typically provided by a processing unit, and a server is provided by a dashed line, and a server is provided by a server, and a server is provided by a server, for each node number of the first node is provided by a server, and a server is provided by a server, and a server is typically, and a server is provided by a server, and a server is provided by.
In practical applications, several servers with the same performance are selected, and it is the most common choice to configure the number of nodes by the number of powers n of 2, so the node expansion can be realized by adopting the matrix shown in fig. 2. Summarizing, the number of nodes in the new distributed processing system will be 2n times the number of nodes in the old distributed processing system. The number of the first row of the matrix corresponds to a first node in the old distributed system, while the number in the entire matrix corresponds to a second node in the new distributed system. Each old first node corresponds to a plurality (typically 2 n) new second nodes in the same column. The new second nodes on each column are erected on the same server with a corresponding number of processors, each capable of processing one second node.
The embodiment of the invention shown in fig. 3 provides a mapping mode between N first nodes and P second nodes, the number of the first nodes is 36, the number of the second nodes is 108, neither the number of the first nodes nor the number of the second nodes is the nth power of 2, but the number of the second nodes is an integral multiple of the number of the first nodes, the number of the second nodes is expanded by 3 times relative to the number of the first nodes, in the embodiment shown in fig. 3, the P second nodes corresponding to the N first nodes are embodied by a matrix of 3 rows × 36 columns, the numbering sequence of the first rows and the second nodes is 1-36, the numbering sequence of the second nodes in the second rows is 37-72, the numbering sequence of the third rows and the second nodes is 73-96, the 3 second nodes in each column correspond to one first node, the 36 nodes 1-36 in the first row can be understood as corresponding 36 first nodes, each first node corresponds to the second node in the same column, namely, the 3 second nodes in the same column correspond to one first node, namely, the first nodes are constructed by a distributed server, the same number of the first nodes is shown in fig. 3, and the server is constructed by the same number of the same server, and the server, when the server, the server is constructed by the same number of the server, the server is constructed by the server, the server is constructed by the server, the.
Although the mapping shown in fig. 3 is not common in the case where the number of first nodes and second nodes is not an nth power of 2, the present invention is also applicable to this case.
The embodiment of the invention shown in FIG. 4 provides a mapping between N first nodes and P second nodes, the number of first nodes is 19, the number of second nodes is 45, the number of second nodes is a non-integer multiple of the number of first nodes, in this case, the P second nodes corresponding to the N first nodes are implemented by using a matrix of × 19 rows and 3 columns, the numbering sequence of the first row second nodes is 1-19, the numbering sequence of the second row second nodes is 20-38, the numbering sequence of the third row second nodes is 39-45, it is noted that the first and second rows both have 19 second nodes, the third row has only 7 second nodes, the remaining 12 second nodes are idle and are not numbered, the second nodes in each column correspond to a first node, the first 7 columns are three second nodes, the second nodes in the last 12 columns correspond to a first node, the first 19 nodes are a first node, the first column is a second node, the second nodes in the last 12 columns are a second node corresponding to a first row, the first node corresponds to a first node, the first node is a first node, the first column is a second node, the first node corresponds to a first node, the first node is a mapping of the first node, the first column is understood that the first node, the first node corresponds to a first node, and the first node, the mapping of the first node is a mapping of the first node, the mapping of the mapping is a mapping of the mapping.
In the mapping scheme shown in fig. 4, 57 second nodes are installed on 19 servers when implementing the distributed processing system. The performance of the servers used for erection is the same, each server is provided with three processors, the second nodes of each column are erected on the same server, and partial servers only bear two second nodes because of the two second nodes, so that partial idle processors can appear in the servers. In fig. 4, the server where the second node is located is indicated by a dashed box.
The mapping between N first nodes and P second nodes provided in fig. 4 should be rare in practical applications, but this illustrates that the present invention can also be applied to this case.
The original N first nodes are expanded into P second nodes (P > N), when the expansion is carried out, a matrix with m rows and N columns is established, and the value of m is m × N < P < (m +1) × N, if P is not the integral multiple of N, part of second nodes in the matrix with m rows and N columns are idle, the idle second nodes can be positioned at any position except the first row and are not necessarily limited to the last row.
In the above embodiment, preferably, the performance of the servers bearing the second nodes is uniform, and the second nodes in the same column are all installed on the same server.
In practical applications, this is not always the ideal situation as described above. It is possible that some of the servers may fail, at which point the new server used to replace the failed server cannot guarantee consistent performance with the original server. For another example, the servers also need to be updated, and due to cost, the servers are updated in batches, which may cause different batches of servers for erecting the second node to have different performances.
It should be noted that, in the above embodiments, there are various configuration strategies for the second node corresponding to the first node, but the second node is not limited to the examples listed in the embodiments of the present invention. For example, a second node corresponding to the same first node may be deployed on one server or may be deployed on multiple servers.
For example, as shown in fig. 5, in a mapping manner between N first nodes and P second nodes, the number M of the first nodes is 32, and the number N of the second nodes is 128. The matrix used is identical to the embodiment shown in fig. 2, with the difference that the server that spans the 128 second nodes is different. The four second nodes of the first column are mounted on a server having four processors. The four second nodes of the second column are spanned over two servers with two processors, and the processing power of the servers carrying nodes No. 2 and 34 is weaker, while the processing power of the servers carrying nodes No. 66 and 98 is stronger. The 8 nodes in the third and fourth columns are carried on a server with eight processors, one for each second node, and the operating data of the processors will be stored in the corresponding second nodes in the future. The second nodes in other columns may be erected in the same manner as the three previous columns, or in other erection manners, for example, every 4 columns are erected on a server of a multiprocessor, or some second nodes in some two columns are erected on servers of the same multiprocessor, and in fig. 5, the server where the second node is located is also indicated by a dashed box.
Similarly, the plurality of second nodes in fig. 3 and fig. 4 may also be installed on servers with different processing capabilities, and will not be described herein again. However, those skilled in the art can refer to fig. 2 and fig. 5 to change the deployment of the second nodes in fig. 3 and fig. 4, so as to distribute the second nodes on servers with different performances.
In practical applications, the second node may be installed on the server in various manners, and is not limited to the manner shown in fig. 5, and it should be noted that the illustrated manner in fig. 5 is to illustrate that the processing capability of the server or the processor for carrying the second node is different, and is not to limit the installation manner of the second node.
Different ways of erecting the second node on the server are described below with reference to fig. 6a, 6b and 6 c.
As shown in fig. 6a, 4 second nodes, numbered 32, 64, 96, 128, respectively, in the same column are spanned on a single server 606 having 4 processors. Each second node 602 is a partitioned database in the second database, each partitioned database being hosted on a processor 604. In the server 606, data access is performed between the 4 processors 604 and the 4 partitioned databases 602 through the memory 608, the memory 608 is shared, and the 4 processors 604 and the 4 partitioned databases 602 share the same memory 608.
As shown in fig. 6b, a plurality of second nodes are spanned over a plurality of servers 616 having a single processor. Each second node 612 is also a partitioned database, each partitioned database being hosted on a server 616, the server 616 having its own processor 614 and memory 618. Data is accessed between the processor 614 and the second node 612 via respective memories 618. A plurality of servers 616 are erected on a unified platform 610 to form a distributed processing system.
As shown in fig. 6c, a plurality of second nodes are installed in a plurality of servers having a plurality of processors. Fig. 6c may be considered an extension of fig. 6a, wherein a plurality of servers 626 are installed on a platform 620 to form a new distributed system. Each server 626 has a plurality of processors 624, and a plurality of partitioned databases 622 are mounted on each server 626, each partitioned database 622 being a second node, and each partitioned database 622 being mounted on a processor 624. For each server 626, data is accessed between the processors 624 and the partitioned databases 622 via the memory 628, the memory 628 is shared, and the processors 624 and the partitioned databases 622 share the same memory 628.
Based on the above examples of erecting the second nodes on the server, the erection of the P second nodes in the second database on the server also includes multiple deployment manners, and the P second nodes in the second database are deployed on multiple servers with the same performance or different performances, or may be all deployed on the same server, and each server includes at least one processor, because each second node corresponds to one processor bearing the second node.
When the data distribution is performed in step 102, according to the extended mapping relationship between the first node and the second node in the above embodiment, the dynamic data stored on the first node is recovered from the storage medium and distributed to the second node corresponding to each first node. Specifically, when the stock data on the target storage medium of the first node is distributed to the second node corresponding to the first node in the second database, the following rule is adopted:
first, data originally stored in a first node is distributed to a plurality of second nodes corresponding to the first node, that is, the second nodes in the same column in the matrix. For the same first node, data on the first node is distributed in a column corresponding to the matrix and the first node all the time, cross-column data distribution cannot occur, and therefore data distribution efficiency is improved. The amount of data originally stored at each first node is different, and the total amount of data distributed within each column at the time of distribution is also different.
Next, within each column, data distribution is performed according to the processing capacity of the server carrying the respective second node. The second node with strong processing capability is divided into more data, and the second node with weak processing capability is divided into less data, so that more reasonable load distribution is carried out to obtain better data processing speed.
Specifically, if the processing capability of the server or the processor bearing the second node corresponding to the first node is the same, step 102 specifically includes: and averagely distributing the stock data on the target storage medium of the first node to a second node corresponding to the first node in a second database.
If the processing capability of the server or the processor carrying the second node corresponding to the first node is different, step 102 specifically includes: and distributing the stock data on the target storage medium of the first node to a second node corresponding to the first node in a second database as required according to the processing capacity of a server or a processor bearing the second node corresponding to the first node.
For example, if the performance of the server carrying the second nodes is the same, several second nodes in each column will be evenly distributed in the manner shown in fig. 2-4. Taking the example shown in fig. 2, the data originally stored in a first node would be evenly distributed to 4 second nodes.
If the performance of the servers carrying the nodes is different, the data distributed to the second node will depend on the processing power of the nodes. For example, in the manner shown in fig. 5, four second nodes numbered 1, 33, 65, and 97 in the first column are erected on a server having four processors, and the processing capacity of the four nodes is the same, so that data originally belonging to the first node numbered 1 will be distributed equally among the four second nodes. A similar situation applies to the second nodes of the third and fourth columns, and although the 8 nodes of the third and fourth columns are implemented by the same server, the processing capacity of each second node in one column is the same for the second nodes in each column, so that the data originally belonging to the first node numbered 3 will be distributed evenly among the second nodes numbered 3, 35, 67 and 99. The data originally belonging to the first node numbered 4 will be distributed evenly within the second nodes numbered 4, 36, 68 and 100. In the second column, the four second nodes of the second column are mounted on two servers with two processors, the processing power of the server carrying nodes No. 2 and 34 is weaker, and the processing power of the server carrying nodes No. 66 and 98 is stronger. Thus, the data distribution by the second node in the second column is not even, nodes 66 and 99 with stronger processing power will be distributed to more data, while nodes 2 and 34 with weaker processing power will be distributed to less data.
Similarly, if the second node in the manner shown in fig. 3 and 4 is installed on a server with different processing capabilities, the data processing capabilities of the nodes are also considered when data is distributed to the second node, and load balancing is performed according to the strength of the processing capabilities of the nodes, so as to achieve the best overall performance.
Migration and distribution of inventories of dynamic data on new distributed systems has been accomplished.
After step 102, step 103 and step 104 are also included.
103, directly importing incremental data on a first node of a first database into a second node corresponding to the first node; the incremental data on the first node is the running data of a first processor stored on the first node in a time period from backup starting to the time period when all stock data on the first node are distributed to a second node corresponding to the first node;
step 104, starting the second node to receive the operation data of the second processor carrying the second node.
And for the same first node, the incremental data on the first node can be identified by a first mark and a second mark, the first mark and the second mark can be time stamps, and the time stamp of the second mark is later than that of the first mark. The first marked timestamp is the time for starting backup and is used for marking stock data on the first node, the second marked timestamp is the time for distributing all the stock data on the first node to the corresponding second node, and the incremental data of the first node refers to the data newly added on the first node between the first marked timestamp and the second marked timestamp.
It should be noted that although the time stamp is described as an example of the first marker and the second marker, a person skilled in the art may use other markers to distinguish between stock data and incremental data. The principles of generation of inventory data and incremental data have been disclosed herein: the stock data refers to data that already exists before the implementation of the method of the present invention is started, and the incremental data refers to data that is newly generated during the backup and distribution of the aforementioned stock data.
The data volume of the incremental data is much smaller than that of the stock data, so that migration by a method similar to that of the stock data is not necessary. For incremental data, a method that enables fast data movement between two databases (or distributed processing systems) is a good choice. In one implementation, a remote database cursor (simply referred to as a remote cursor) is selected to implement migration of incremental data, i.e., the remote cursor is used to import incremental dynamic data from the first database into the second database.
One preferred implementation is as follows: incremental data is first identified in a first database (in the original distributed processing system), such as by identifying data with a timestamp between the first marker and the second marker as incremental data. These incremental data will be distributed over the various first nodes. And setting a cursor on each first node, and then establishing a data transfer path between the first database and the second database, which is also called a network pipeline, wherein the two ends of the network pipeline are the first node and the second node corresponding to the first node respectively. When the incremental data is migrated, the incremental data on one first node is only imported to the corresponding second node (i.e. the second nodes in the same column in the mapping expansion matrix) by using the mapping expansion mode and the data distribution rule. A plurality of network pipes between one first node and a plurality of second nodes may be established to import incremental data, and when the incremental data is imported and distributed, the data processing capacity of the second nodes is also considered, and the distributed data amount is determined according to the data processing capacity.
The method has the advantages that the intermediate data does not fall on a magnetic disk or other storage media, so that a large amount of data reading and writing time can be saved, and the method is simple to use.
After the transfer of the incremental data is completed, the backup of the dynamic data from the first database to the second database is completed. It should be noted that although the amount of the incremental data is not large, a certain time is required for transferring the incremental data, new data is generated in this time, and for the new data in this part, because the data amount is very small, the solution can be solved by adopting step 104, specifically: after the backup of the stock data is completed (namely the time point of the second mark), the second database starts to work, and the data generated from the time corresponding to the second mark is normally received and generated, so that the data in the second database can be kept complete after the data with the time stamp between the first mark and the second mark is leveled into the second database.
In summary, the dynamic data backup method of the present invention has good expandability, and can flexibly expand a plurality of second nodes of the second database according to the requirements, and distribute data according to the expanded second nodes. The backup method of the dynamic data can complete migration and backup of a large amount of dynamic data on the distributed processing system at lower cost and higher speed.
In the above method flow, the first database is divided into a plurality of partitioned databases, that is, N first nodes, the second database is divided into a plurality of partitioned databases, that is, P second nodes, and a second node corresponding to any first node is preset, so that when data backup is performed, stock data on any first node in the first database is copied to a target storage medium of the first node, and stock data on the target storage medium of the first node is distributed to a second node corresponding to the first node in the second database, that is, the stock data of the first database is copied to the second database through N copy paths, the N copy paths can be performed simultaneously without interference, and compared with the prior art in which backup of large-batch stock data is performed through only one copy path, the availability of the N copy paths is high, the time of data backup is greatly shortened, the efficiency of data backup of the whole system is improved, and because the mapping relationship between the first node and the second node is preset, when the data is backed up, only the data stored on the appointed first node is copied to the second node which has the mapping relationship with the appointed first node according to the appointed path, but not distributed to the second node which has no mapping relationship with the appointed first node, so that the problem that stock data after migration is difficult to reasonably distribute on a plurality of nodes is avoided, and the problem that the data on an original processing system and a new processing system are inconsistent before and after the system is upgraded is also avoided, therefore, the embodiment of the invention improves the high reliability of data migration.
The embodiments described above are provided to enable persons skilled in the art to make or use the invention and that modifications or variations can be made to the embodiments described above by persons skilled in the art without departing from the inventive concept of the present invention, and therefore the scope of protection of the present invention is not limited by the embodiments described above but should be accorded the widest scope consistent with the innovative features set forth in the claims.
Based on the above method flow of the present invention, embodiments of the present invention further provide a dynamic data backup device, and specific contents of these devices are referred to in the above method section, which will not be described herein again.
A dynamic data backup apparatus as shown in fig. 7, comprising:
a first processing unit 701, configured to copy stock data on a first node of a first database onto a target storage medium of the first node, where the stock data on the first node is operation data of a first processor that bears the first node and is stored on the first node when a backup is started, and the first database includes N first nodes, where target storage media of any two first nodes are different or target storage media of some first nodes are the same;
the second processing unit 702 is configured to distribute inventory data on a target storage medium of the first node to a second node corresponding to the first node in a second database, where the second database includes P second nodes, the number of the second nodes corresponding to the first node is at least 1, and P is greater than N.
Further, the device also comprises a third processing unit;
a third processing unit, configured to, after the second processing unit 702 distributes the stock data on the target storage medium of the first node to a second node corresponding to the first node in the second database, directly import the incremental data on the first node of the first database into the second node corresponding to the first node, and start the second node to receive the operation data of the second processor bearing the second node; the incremental data on the first node is the running data of the first processor stored on the first node in a time period from the backup starting to the condition that the stock data on the first node are all distributed to the second node corresponding to the first node.
Further, the target storage medium of the first node is a physical storage medium or a storage partition of a physical storage medium.
Further, if the number P of the second nodes in the second database is an integer multiple N of the number N of the first nodes in the first database, the number of the second nodes corresponding to each first node is N.
Furthermore, a second node corresponding to the first node is deployed on a server; or the second node corresponding to the first node is deployed on a plurality of servers.
Further, the second processing unit 702 is specifically configured to:
if the processing capacity of a server or a processor bearing a second node corresponding to the first node is the same, averagely distributing stock data on a target storage medium of the first node to the second node corresponding to the first node in a second database;
and if the processing capacities of the server or the processor bearing the second node corresponding to the first node are different, distributing the stock data on the target storage medium of the first node to the second node corresponding to the first node in the second database as required according to the processing capacity of the server or the processor bearing the second node corresponding to the first node.
In the above embodiment, the first database is divided into a plurality of partitioned databases, that is, N first nodes, the second database is divided into a plurality of partitioned databases, that is, P second nodes, and a second node corresponding to any first node is preset, so that when data backup is performed, stock data on any first node in the first database is copied onto a target storage medium of the first node, and the stock data on the target storage medium of the first node is distributed to a second node corresponding to the first node in the second database, that is, the stock data of the first database is copied into the second database through N copy paths, the N copy paths can be performed simultaneously without interference, and compared with the prior art in which backup of large-batch stock data is performed through only one copy path, the availability of the N copy paths is high, the time of data backup is greatly shortened, the efficiency of data backup of the whole system is improved, and because the mapping relationship between the first node and the second node is preset, when the data is backed up, only the data stored on the appointed first node is copied to the second node which has the mapping relationship with the appointed first node according to the appointed path, but not distributed to the second node which has no mapping relationship with the appointed first node, so that the problem that stock data after migration is difficult to reasonably distribute on a plurality of nodes is avoided, and the problem that the data on an original processing system and a new processing system are inconsistent before and after the system is upgraded is also avoided, therefore, the embodiment of the invention improves the high reliability of data migration.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (12)

1. A method for dynamic data backup, comprising:
copying stock data on a first node of a first database to a target storage medium of the first node, wherein the stock data on the first node is operation data of a first processor which is stored on the first node and bears the first node when backup is started, the first database comprises N first nodes, and the target storage media of any two first nodes are different or part of the target storage media of the first nodes are the same; each first node corresponds to a backup path, and the number of the backup paths is N;
and distributing the stock data on the target storage medium of the first node to a second node corresponding to the first node in a second database through the backup path corresponding to the first node, wherein the second database comprises P second nodes, the number of the second nodes corresponding to the first node is at least 1, and P is greater than N.
2. The method of claim 1,
after the distributing the inventory data on the target storage medium of the first node to the second node corresponding to the first node in the second database, the method further includes:
the incremental data on the first node of the first database are directly imported into a second node corresponding to the first node, and the second node is started to receive the running data of a second processor bearing the second node; the incremental data on the first node is the running data of the first processor stored on the first node in a time period from the backup starting to the condition that the stock data on the first node are all distributed to a second node corresponding to the first node.
3. The method of claim 1,
the target storage medium of the first node is a physical storage medium or a storage partition of a physical storage medium.
4. The method of claim 1, wherein if the number P of second nodes in the second database is an integer multiple N of the number N of first nodes in the first database, then there are N second nodes corresponding to each first node.
5. The method of claim 1,
a second node corresponding to the first node is deployed on a server; or the second node corresponding to the first node is deployed on a plurality of servers.
6. The method of claim 1, wherein said distributing inventory data on a target storage medium of the first node to a second node in a second database corresponding to the first node comprises:
if the processing capacity of a server or a processor bearing a second node corresponding to the first node is the same, distributing the stock data on the target storage medium of the first node to the second node corresponding to the first node in a second database on average; alternatively, the first and second electrodes may be,
and if the processing capacity of the server or the processor bearing the second node corresponding to the first node is different, distributing the stock data on the target storage medium of the first node to a second node corresponding to the first node in a second database as required according to the processing capacity of the server or the processor bearing the second node corresponding to the first node.
7. A dynamic data backup apparatus, comprising:
the first processing unit is configured to copy stock data on a first node of a first database onto a target storage medium of the first node, where the stock data on the first node is operation data of a first processor that bears the first node and is stored on the first node when backup is started, and the first database includes N first nodes, where target storage media of any two first nodes are different or target storage media of part of the first nodes are the same; each first node corresponds to a backup path, and the number of the backup paths is N;
and the second processing unit is used for distributing the stock data on the target storage medium of the first node to a second node corresponding to the first node in a second database through the backup path corresponding to the first node, the second database comprises P second nodes, the number of the second nodes corresponding to the first node is at least 1, and P is greater than N.
8. The apparatus of claim 7, further comprising a third processing unit;
the third processing unit is configured to, after the second processing unit distributes the stock data on the target storage medium of the first node to a second node corresponding to the first node in a second database, directly import incremental data on the first node of the first database into the second node corresponding to the first node, and start the second node to receive operation data of a second processor bearing the second node; the incremental data on the first node is the running data of the first processor stored on the first node in a time period from the backup starting to the condition that the stock data on the first node are all distributed to a second node corresponding to the first node.
9. The apparatus of claim 7,
the target storage medium of the first node is a physical storage medium or a storage partition of a physical storage medium.
10. The apparatus of claim 7, wherein if the number P of second nodes in the second database is an integer multiple N of the number N of first nodes in the first database, then there are N second nodes corresponding to each first node.
11. The apparatus of claim 7,
a second node corresponding to the first node is deployed on a server; or the second node corresponding to the first node is deployed on a plurality of servers.
12. The apparatus as claimed in claim 7, wherein said second processing unit is specifically configured to:
if the processing capacity of a server or a processor bearing a second node corresponding to the first node is the same, distributing the stock data on the target storage medium of the first node to the second node corresponding to the first node in a second database on average; alternatively, the first and second electrodes may be,
and if the processing capacity of the server or the processor bearing the second node corresponding to the first node is different, distributing the stock data on the target storage medium of the first node to a second node corresponding to the first node in a second database as required according to the processing capacity of the server or the processor bearing the second node corresponding to the first node.
CN201610058988.6A 2016-01-28 2016-01-28 Dynamic data backup method and device Active CN107015883B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610058988.6A CN107015883B (en) 2016-01-28 2016-01-28 Dynamic data backup method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610058988.6A CN107015883B (en) 2016-01-28 2016-01-28 Dynamic data backup method and device

Publications (2)

Publication Number Publication Date
CN107015883A CN107015883A (en) 2017-08-04
CN107015883B true CN107015883B (en) 2020-07-17

Family

ID=59439161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610058988.6A Active CN107015883B (en) 2016-01-28 2016-01-28 Dynamic data backup method and device

Country Status (1)

Country Link
CN (1) CN107015883B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019095227A1 (en) 2017-11-15 2019-05-23 Oppo广东移动通信有限公司 Method for controlling duplication and transmission of data, user equipment, primary node, and secondary node
CN109062727B (en) * 2018-06-20 2023-04-14 平安科技(深圳)有限公司 Data synchronization system and method
CN109407976B (en) * 2018-09-21 2021-09-14 联想(北京)有限公司 Distributed storage method and distributed storage device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662793A (en) * 2012-03-07 2012-09-12 江苏引跑网络科技有限公司 Hot backup and recovery method of distributed database with guarantee of data consistency
CN102880531A (en) * 2012-09-27 2013-01-16 新浪网技术(中国)有限公司 Database backup system and backup method and slave database server of database backup system
CN104679614A (en) * 2015-03-31 2015-06-03 成都文武信息技术有限公司 Database disaster backup system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8630980B2 (en) * 2010-04-06 2014-01-14 Microsoft Corporation Synchronization framework that restores a node from backup
US9792185B2 (en) * 2014-06-24 2017-10-17 International Business Machines Corporation Directed backup for massively parallel processing databases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662793A (en) * 2012-03-07 2012-09-12 江苏引跑网络科技有限公司 Hot backup and recovery method of distributed database with guarantee of data consistency
CN102880531A (en) * 2012-09-27 2013-01-16 新浪网技术(中国)有限公司 Database backup system and backup method and slave database server of database backup system
CN104679614A (en) * 2015-03-31 2015-06-03 成都文武信息技术有限公司 Database disaster backup system

Also Published As

Publication number Publication date
CN107015883A (en) 2017-08-04

Similar Documents

Publication Publication Date Title
CN111124277B (en) Deep learning data set caching method, system, terminal and storage medium
CN106843745A (en) Capacity expansion method and device
CN108810115B (en) Load balancing method and device suitable for distributed database and server
US8768973B2 (en) Apparatus and method for expanding a shared-nothing system
EP3079065B1 (en) Redo-logging for partitioned in-memory datasets
CN107357689B (en) Fault processing method of storage node and distributed storage system
US9330107B1 (en) System and method for storing metadata for a file in a distributed storage system
CN107015883B (en) Dynamic data backup method and device
US20150170316A1 (en) Subgraph-based distributed graph processing
CN109408590A (en) Expansion method, device, equipment and the storage medium of distributed data base
US20190332275A1 (en) Information processing system and volume allocation method
CN104102535A (en) Process migration method and migratable operating system
CN102999571A (en) Realizing method for multiple nodes of single computer in cluster
CN111291062B (en) Data synchronous writing method and device, computer equipment and storage medium
US20050154786A1 (en) Ordering updates in remote copying of data
CN112256433A (en) Partition migration method and device based on Kafka cluster
US10437797B1 (en) In-memory distributed database with a remote data store
CN114565502A (en) GPU resource management method, scheduling method, device, electronic equipment and storage medium
US10592493B1 (en) Spot-instanced bulk data uploading
CN110298031B (en) Dictionary service system and model version consistency distribution method
US20230113180A1 (en) Methods and systems for expanding gpu memory footprint based on hybrid-memory
CN109120674B (en) Deployment method and device of big data platform
CN114493602B (en) Block chain transaction execution method and device, electronic equipment and storage medium
CN112527561B (en) Data backup method and device based on Internet of things cloud storage
CN109032525A (en) A kind of method, apparatus, equipment and storage medium being automatically positioned low-quality disk

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20201223

Address after: No.200 Changshou Road, Putuo District, Shanghai 200060

Patentee after: CHINA MOBILE GROUP SHAIHAI Co.,Ltd.

Patentee after: Shanghai middle shift information technology Co.,Ltd.

Address before: No.200 Changshou Road, Putuo District, Shanghai 200060

Patentee before: CHINA MOBILE GROUP SHAIHAI Co.,Ltd.

TR01 Transfer of patent right