WO2017067397A1 - 一种数据恢复方法和装置 - Google Patents

一种数据恢复方法和装置 Download PDF

Info

Publication number
WO2017067397A1
WO2017067397A1 PCT/CN2016/101730 CN2016101730W WO2017067397A1 WO 2017067397 A1 WO2017067397 A1 WO 2017067397A1 CN 2016101730 W CN2016101730 W CN 2016101730W WO 2017067397 A1 WO2017067397 A1 WO 2017067397A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
change
log
change log
time
Prior art date
Application number
PCT/CN2016/101730
Other languages
English (en)
French (fr)
Inventor
杨卓荦
夏晨
张云远
陈昱康
戴志勇
连杰红
李剑
徐常亮
吕余全
田美红
袁冶平
杨少华
李淼
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017067397A1 publication Critical patent/WO2017067397A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control

Definitions

  • the present invention relates to the field of data processing, and in particular, to a data recovery method and apparatus.
  • the collection of data stored in databases, servers, and the like is becoming larger and larger.
  • the user may operate on the data set, which may cause data changes of the data set, and the data change may include operations such as deleting, adding, replacing, etc. of the data in the data set.
  • the reliability of the system for large-scale data collection is very high. However, during the operation of the data collection, there may be erroneous operations, etc., resulting in data changes that should not occur in the data collection. In order to ensure the reliability of the data collection, a reliable mechanism is needed to ensure that data changes of the data collection can be recovered within a certain period of time. For example, the files that are deleted by mistake can be restored, or the data added by mistake can be deleted. Wait.
  • the traditional way to implement data recovery is, for example, the Binlog mechanism in MySQL.
  • the Binlog mechanism periodically creates a mirror of the data set.
  • the Binlog timeline is the basis of the record.
  • the operation of sequentially changing the data changes in the data set is started from the point in time when the mirror is created. Until the end of a recording period. For example, if the period is one hour, the image of the data set can be created from the beginning of 19 o'clock. During the period of the cycle (from the end of 19 o'clock to the end of 20 o'clock), Binlog records the data changes of the data set in chronological order.
  • the image of the data set is re-established at 20 o'clock, and the data change is recorded using another corresponding Binlog.
  • the corresponding image is determined according to the time of the data change that needs to be restored.
  • the Binlog of the image is used to re-render the data changes recorded by Binlog in time sequence from the time point of the image creation until the user I hope that the data will be recovered at a specific point in time.
  • Binlog's time-based recording method in data recovery, can only be reproduced in chronological order from the corresponding mirror starting point.
  • the user has erroneously operated a data change at 19:50 and needs data recovery, after calling the mirror and Binlog, it must start from the creation point of the 19-point mirror image, and reproduce the time sequence. Mirror until it reappears until 19:50 Complete this data recovery. That is to say, in the Binlog mode, the data recovery efficiency for a large amount of data recovery requirements is not high.
  • the present invention provides a data recovery method and apparatus such that data recovery can be performed against chronological order, thereby improving the efficiency of data recovery.
  • the first change log is a change log created by the first data change performed on the target data object,
  • the first change log includes a setup time of the first change log, a change type of the first data change, and data information of a data change portion of the target data object by using the first data change;
  • the method further includes:
  • the data rewrite request including requesting to rewrite data of the target data object from data before performing the first data change to performing second data modification Data
  • the second change log is a change log for performing the second data change on the target data object
  • the second change log belongs to the to-be-recovered log set
  • the method further includes:
  • the data information that causes the data modification part of the target data object by using the first data modification includes a data part and a metadata part, where the data part includes modified data content, and the metadata part includes Information for describing the content of the data portion.
  • the first change log further includes a life cycle, and when the time when the first change log exists exceeds the life cycle, deleting the first change log.
  • a data recovery device comprising:
  • a log establishing unit configured to detect a data change performed on the target data object, and establish a corresponding change log for the data change;
  • the first change log is a change log created by the first data change performed on the target data object,
  • the first change log includes a setup time of the first change log, a change type of the first data change, and data information of a data modification part of the target data object by using the first data change;
  • a first receiving unit configured to receive a data recovery request that carries the first change log identifier, where the data recovery request includes requesting to restore data of the target data object to data before performing the first data change;
  • a first determining unit configured to obtain, according to the receiving time of the data recovery request, the set time of the first change log determined according to the first change log identifier, the log to be restored, the log to be restored
  • the set includes a change time that is less than or equal to the receiving time of the data recovery request, and is greater than or equal to the change log of the first change log establishment time;
  • a data recovery unit configured to perform data recovery on the target data object according to the change log in the to-be-recovered log set, in order of changing log creation time from large to small, until included according to the first change log
  • the change type and the data information restore the data of the target data object to perform the first data modification Previous data.
  • it also includes:
  • a second receiving unit configured to: after triggering the data recovery unit, receive a data rewriting request carrying a second change log identifier, where the data rewriting request includes requesting data of the target data object from executing the Data before a data change is rewritten to perform data after the second data change, the second change log is a change log for which the second data change is performed for the target data object, and the second change log Belong to the collection of logs to be restored;
  • a second determining unit configured to obtain a log set to be rewritten according to the establishment time of the second change log determined according to the second change log identifier, where the set of to-be-rewritten logs includes a setup time less than or equal to the second Changing the log creation time, which is greater than or equal to the change log of the first change log establishment time;
  • a data rewriting unit configured to perform data rewriting on the target data object according to the change log in the log set to be rewritten, in order of changing the log creation time from small to large, until the second change log is performed according to the second change log
  • the included change type and data information restores data of the target data object to data after performing the second data change.
  • it also includes:
  • a detecting unit configured to: after triggering the data recovery unit, detecting a third data change to the target data object, and establishing a corresponding third change log for the third data change;
  • a setting unit configured to set a state of the change log corresponding to the establishment time of the first change log in the change log corresponding to the target data object to be unavailable, and the change log set to be unavailable is not included The third change log.
  • the data information that causes the data modification part of the target data object by using the first data modification includes a data part and a metadata part, where the data part includes modified data content, and the metadata part includes Information for describing the content of the data portion.
  • the first change log further includes a life cycle, and when the time when the first change log exists exceeds the life cycle, deleting the first change log.
  • the change log includes the establishment time of the change log, the type of change corresponding to the data change, and the data caused by the data change to the data object. Change some of the data information.
  • the data recovery request may be determined according to the data recovery request in an order of changing the log creation time from large to small, and Determining, according to the received time of receiving the data recovery request and the establishment time of the first change log, the log set to be recovered, and according to the determined data recovery sequence, sequentially, according to the change log in the to-be-recovered log set, the target
  • the data object performs data recovery until the data of the target data object is restored to the data before the execution of the first data change according to the type of change and the data information included in the first change log.
  • Data recovery through reverse chronological order can effectively satisfy a large number of data recovery requirements and significantly improve data recovery efficiency.
  • FIG. 1 is a flowchart of a method for data recovery according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for data rewriting according to an embodiment of the present invention
  • FIG. 3 is a structural diagram of a device of a data recovery apparatus according to an embodiment of the present disclosure
  • FIG. 4 is a structural diagram of a device of a data rewriting apparatus according to an embodiment of the present invention.
  • the collection of data stored in databases, servers, and the like is becoming larger and larger.
  • the user may operate on the data set, which may cause data changes of the data set, and the data change may include operations such as deleting, adding, replacing, etc. of the data in the data set.
  • the reliability of the system for large-scale data collection is very high. However, during the operation of the data collection, there may be erroneous operations, etc., resulting in data changes that should not occur in the data collection.
  • a reliable mechanism is needed to ensure that data changes of the data collection can be recovered within a certain period of time. For example, the files that are deleted by mistake can be restored, or the data added by mistake can be deleted. Wait.
  • Binlog mechanism periodically creates data sets.
  • the mirror image, Binlog timeline is the record basis, from the point in time when the mirror is created, the data change operation of the data set is recorded sequentially until the end of a recording period.
  • the corresponding image is determined according to the time of the data change that needs to be restored.
  • the Binlog of the image is used to re-render the data changes recorded by Binlog in time sequence from the time point of the image creation until the user I hope that the data will be recovered at a specific point in time.
  • Binlog's time-based recording method in data recovery, can only be reproduced in chronological order from the corresponding mirror starting point.
  • the downside of this recurrence mechanism is that data recovery for just the wrong operation may take a long time to complete, and the Binlog mechanism is based on time, not data based changes, that is, in a Binlog During the time period, whether or not data changes occur, it is still necessary to record the content into the image.
  • This method is applicable for MySQL, which is a small data set, but as the size of the data collection grows larger, establishing periodic continuous mirroring for large-scale data collections will impose an additional burden on the system.
  • the Binlog mechanism can not only effectively improve the efficiency of data recovery, but also can not adapt to the application scenarios of large-scale data collections that are commonly used at present.
  • an embodiment of the present invention provides a data recovery method and apparatus, and a corresponding change log is established for a data change of a data object, where the change log includes a change log creation time and a change type of the corresponding data change. Data changes caused by data changes to data objects.
  • determining, according to the data recovery request, that the order of performing data recovery is an order in which the change log establishment time is from large to small, and may be based on receiving the data recovery request receiving time and the first Determining the log collection time, determining the to-be-recovered log set, and performing data recovery on the target data object according to the change log in the to-be-recovered log set according to the determined data recovery sequence, until according to the first change log
  • the included change type and data information restores data of the target data object to data prior to execution of the first data change.
  • Data recovery through reverse chronological order can effectively satisfy a large number of data recovery requirements and significantly improve data recovery efficiency.
  • data rewriting in a positive time sequence may be implemented according to the data rewriting request after data recovery in reverse time order. Determining, by the data rewriting request, performing data rewriting in an order of changing the log establishment time from small to large, and sequentially performing data recovery on the target data object according to the change log in the to-be-rewritten log set until the The type of the change and the data information included in the second change log restore the data of the target data object to the data after the second data change is performed.
  • the method of data rewriting is added, which can be flexibly applied to different specific application scenarios.
  • the change log mechanism of the present invention is applied to a data set having a plurality of data objects, and is characterized in that a corresponding change log can be established for each data change of a data object in the data set.
  • a corresponding change log is created for a data change by detecting data changes performed on the target data object.
  • the target data object may be any one of a plurality of data objects included in the data set.
  • the data object herein can be understood to include a specific form such as a data table, a view, or a data resource.
  • the data modification may include deleting, adding, overwriting, rewriting, and the like of the data in the data object.
  • the data change that can trigger the establishment of the change log may include: writing data to the table or partition in an overlay manner (OVERWRITE), writing data to the table or partition in an incremental manner (INSERT INTO/APPEND), deleting DROP TABLE, DROP PARTITION, Table Rename, RENAME PARTITION, ALTER TABLE ADD COLUMN, ALTER TABLE CHANGE COLUMN RENAME TO), create a partition (ALTER TABLE ADD PARTITION), cross-cluster data replication (REPLICATION), and so on.
  • the first change log includes a setup time of the first change log, a change type of the first data change, and data information of a data change portion of the target data object by the first data change.
  • the establishment time of the first change log may be consistent with the execution time of the first data change.
  • the type of modification of the first data modification may be the specific data modification type as described above.
  • the data information that causes the data modification part of the target data object by using the first data modification may be embodied by a data part and a metadata part, where the data part includes the modified data content, the metadata
  • the section includes information for describing the content of the data portion.
  • the metadata part includes a table structure (field, partition key, table comment, column comment, partition key comment, etc.), such as a current partition list, user authority information, a statistic item, a lifecycle attribute, a timestamp, and the like.
  • the lifecycle attribute can be understood as the time required to retain the first change log, because the data change before a certain period of time is exceeded, and the possibility of being restored by the data is very low, and there is no need to waste storage space storage. Therefore, when the time when the first change log exists exceeds the life cycle, the first change log is deleted. This alleviates the storage pressure of the storage change log and saves system resources.
  • the change log can be as shown in Table 1, for example:
  • the data object corresponding to the change log of logid1 is a data view, and the setup time is 201510070200.
  • the type of data change is to write data in an overwrite manner, and the state is Undoable.
  • the data object corresponding to the change log identified as login2 is the data table, the setup time is 201510070300, the type of the data change is the delete data table, and the status is revocable.
  • the data object corresponding to the change log identified as logid3 is a data resource, and the setup time is 201510070400.
  • the type of data change is to write data in an increased manner, and the state is revocable. It should be noted that both revokable and rewritable (Redoable) are a form of data processing.
  • Revocable means that the data changes made before can be revoked.
  • the change log The status of the change log can be changed from reversible to rewritable after the data change corresponding to the change log is restored.
  • the data change a specifically deletes the data table a
  • the change log a corresponding to the data change a is established, the state can be revoked by default, meaning that the deleted data table a can be deleted and restored again by data recovery.
  • Rewritable means that data changes that were previously undone can be re-implemented.
  • the state of the change log a is correspondingly changed from reversible to rewritable, meaning that the data change of the data table a can be re-implemented, or the data can be passed.
  • the target data object there may be multiple corresponding change logs, or a data change is performed for the target data object, and a corresponding change log is created.
  • FIG. 1 is a flowchart of a method for data recovery according to an embodiment of the present invention, where the method includes:
  • S101 Receive a data recovery request that carries the first change log identifier, where the data recovery request includes requesting to restore data of the target data object to data before performing the first data change.
  • the first data recovery request includes, in addition to requesting to restore data of the target data object to data before performing the first data modification, and may further include requesting to restore data of other data objects to data before the corresponding data modification. , no longer repeat them here.
  • the receiving time of the data recovery request is greater than the setup time of the first change log. That is, upon receiving the data recovery request, the target data object has been subjected to the first data modification.
  • the target data object may also be subjected to one or more data changes during the time after the first data modification is performed on the target data object until the data recovery request is received. Or The target data object is no longer subject to other data changes during the time period after the first data modification is performed on the target data object until the data recovery request is received.
  • the first data recovery request may be requested to restore the data of the target data object to an hour ago to increase the data.
  • the data before a is to add data a to the target data object one hour ago.
  • S102 Obtain a log set to be recovered according to the receiving time of receiving the data recovery request and the establishing time of the first change log determined according to the first change log identifier, where the set of the to-be-recovered log includes a setup time less than The receiving time equal to the receiving time of the data recovery request is greater than or equal to the change log of the first change log establishing time.
  • S103 Perform data recovery on the target data object according to the change log in the to-be-recovered log set, in order of changing the log creation time from large to small, until the type of change included in the first change log is The data information restores data of the target data object to data prior to performing the first data modification.
  • the change log in the to-be-recovered log set is selected according to a certain order to restore the data corresponding to the changed data in the target data object.
  • the change log is selected from the to-be-recovered log set in an inverse time sequence, that is, in order of changing the log establishment time from large to small.
  • a specific data request type may be determined by the content carried by the received data request. For example, if the data request specifically includes "undo tablename to changelogid", the data request can be identified as a data recovery request for the "tablename".
  • the to-be-recovered log set includes only the first change log, that is, when the first data recovery request is received, the newly created change log is the first change log, or The last data change performed by the target data object is the first data change.
  • the data modification made by modifying the first data on the target data object is restored, and is restored to the time when the first data change is not performed. data.
  • the to-be-recovered log set further includes at least one change log.
  • the first change log is change log 1
  • the first to-be-recovered log set includes change log 2, change log 3, and change log 4.
  • the establishment time sequence of the four change logs is that the change log 4 is established before the change log 3, the change log 3 is established before the change log 2, and the change log 2 is established before the change log 1. Then, when data recovery is performed on the target data object according to the data recovery request, the change log is established.
  • the data change made by the change log 4 to the target data object is first restored, and at the same time, the state of the change log 4 can be changed from reversible to rewritable.
  • the change log 3 is then used to recover the data changes made by the target data object, and at the same time, the state of the change log 3 can be changed from reversible to rewritable.
  • the change log 2 restores the data modification made by the target data object, and at the same time, the state of the change log 2 can be changed from reversible to rewritable.
  • the change log 1 is used to recover the data changes made by the target data object, and at the same time, the state of the change log 1 can be changed from reversible to rewritable.
  • the change log includes the creation time of the change log, the type of the change corresponding to the data change, and the data of the data change caused by the data change. information.
  • determining, according to the data recovery request, that the order of performing data recovery is an order in which the change log establishment time is from large to small, and may be based on receiving the data recovery request receiving time and the first Determining the log collection time, determining the to-be-recovered log set, and performing data recovery on the target data object according to the change log in the to-be-recovered log set according to the determined data recovery sequence, until according to the first change log
  • the included change type and data information restores data of the target data object to data prior to execution of the first data change.
  • Data recovery through reverse chronological order can effectively satisfy a large number of data recovery requirements and significantly improve data recovery efficiency.
  • the target data object After the data recovery of the target data object, there may be a need to re-implement the recovered data changes. For example, when the target data object is erroneously restored to the data change a that should not be restored, it is necessary to rewrite the data change a on the target data object by data rewriting, or re-execute the data change in the target data object. .
  • the change log in the log set to be rewritten is also selected according to a certain order, and the data corresponding to the selected change log is changed in the target data. Reimplemented on the object.
  • the change log is selected from the set of to-be-rewritten logs by using the positive time and then changing the log establishment time from small to large.
  • a specific data request type may be determined by the content carried by the received data request. For example, if the data request includes "redo ttablename to changelogid", the data request can be identified as a data rewrite request for the "tablename”.
  • FIG. 2 is a flowchart of a method for data rewriting according to an embodiment of the present invention, where the method includes:
  • S201 Receive a data recovery request that carries the first change log identifier, where the data recovery request includes requesting to restore data of the target data object to data before performing the first data change.
  • S202 Obtain a log set to be recovered according to the receiving time of receiving the data recovery request and the establishing time of the first change log determined according to the first change log identifier, where the set of the to-be-recovered log includes a setup time less than The receiving time equal to the receiving time of the data recovery request is greater than or equal to the change log of the first change log establishing time.
  • S203 Perform data recovery on the target data object according to the change log in the to-be-recovered log set, in order of changing the log creation time from large to small, until the type of change included in the first change log is The data information restores data of the target data object to data prior to performing the first data modification.
  • S204 Receive a data rewrite request carrying a second change log identifier, where the data rewrite request includes requesting to rewrite data of the target data object from performing data before performing the first data change to performing second data change
  • the second change log is a change log for performing the second data change on the target data object, and the second change log belongs to the to-be-recovered log set.
  • the second change log may be understood as the first change log in the embodiment corresponding to FIG. 2 .
  • the change log set includes a plurality of change logs, in the embodiment corresponding to FIG. 2, the second change log may be understood as the first change log, or is understood to be the to-be-recovered log set.
  • Other change logs When the second change log is another change log of the log to be restored that is not the first change log, the setup time of the second change log is larger than the setup time of the first change log. That is, the target data object is executed after the first data modification, and then the second data modification is performed.
  • the second change log belongs to the to-be-recovered log set when the data is restored in S201 to S203, that is, the data change caused by the second data change on the target data object has been changed in S201 to S203.
  • the data recovery is restored.
  • the second data modification needs to be re-implemented on the target data object after the data recovery from S201 to S203.
  • S205 Obtain a log set to be rewritten according to the establishment time of the second change log determined according to the second change log identifier, where the set of the log to be rewritten includes a setup time less than or equal to the second change log setup time. A change log that is greater than or equal to the first change log setup time.
  • the to-be-rewritten date Only the second change log or only the first change log is included in the collection. Then, the second data corresponding to the second change log is directly re-executed on the target data object, so as to restore data of the target data object to be performed after performing the second data modification. data. Data rewriting of the second data modification on the target data object is implemented.
  • the set of to-be-rewritten logs will include at least the first change log and the second change log, or the set of logs to be rewritten will be at least The first change log, the second change log, and the change log of the setup time between the first change log and the second change log setup time are included. Then, in the positive time sequence, the rewriting logs in the log set to be rewritten are sequentially retrieved, and the data restored by the data in S201 to S203 is re-implemented on the target data object. In the process of data rewriting, the last data rewritten is the second data change corresponding to the second change log.
  • the data rewriting request it is determined that the data rewriting is performed in the order of changing the log creation time from small to large, and the data recovery is performed on the target data object according to the change log in the to-be-rewritten log set. And recovering data of the target data object to data after performing the second data modification according to the type of change and the data information included in the second change log.
  • the method of data rewriting is added, which can be flexibly applied to different specific application scenarios.
  • the data of the target data object is restored to the data before the performing the first data modification, the data is further included in the embodiment corresponding to the embodiment of FIG. 1 or the corresponding embodiment of FIG. 2, and further includes:
  • a third data change to the target data object is detected, and a corresponding third change log is created for the third data change.
  • the data to be restored before the data recovery is performed.
  • the data change corresponding to the change log in the log collection may not be rewritten due to the coverage and other issues, so the status of the change log in the to-be-recovered log collection before the data recovery will be set to be unavailable ( Unavailable).
  • Unavailable The change log whose status is set to be unavailable will no longer be used in data recovery or data rewriting, that is, when data recovery or data rewriting is performed after the third change log is established, The change log with the status Unavailable will not be included in the corresponding log collection to be recovered and the rewritten log collection.
  • FIG. 3 is a structural diagram of a device of a data recovery apparatus according to an embodiment of the present disclosure, where the apparatus includes:
  • the log establishing unit 300 is configured to detect data changes performed on the target data object, and establish a corresponding change log for one data change; the first change log is a change log created by the first data change performed on the target data object.
  • the first change log includes a setup time of the first change log, a change type of the first data change, and data information of a data change portion of the target data object by the first data change.
  • the first receiving unit 301 is configured to receive a data recovery request that carries the first change log identifier, where the data recovery request includes requesting to restore data of the target data object to data before performing the first data change.
  • the data information that causes the data modification part of the target data object by using the first data modification includes a data part and a metadata part, where the data part includes modified data content, and the metadata part includes Information for describing the content of the data portion.
  • the first change log further includes a life cycle, and when the time that the first change log exists exceeds the life cycle, deleting the first change log.
  • the first determining unit 302 is configured to obtain a log set to be recovered according to the receiving time of receiving the data recovery request and the establishing time of the first change log determined according to the first change log identifier, where the to-be-recovered log is to be restored.
  • the log set includes a change log whose setup time is less than or equal to the receive time of the data recovery request, and is greater than or equal to the first change log setup time.
  • the data recovery unit 303 is configured to perform data recovery on the target data object according to the change log in the to-be-recovered log set, in order of changing the log establishment time from large to small, until the log is restored according to the first change log.
  • the included change type and data information restores data of the target data object to data prior to execution of the first data change.
  • the change log includes the creation time of the change log, the type of the change corresponding to the data change, and the data information of the data change caused by the data change to the data object.
  • determining, according to the data recovery request, that the order of performing data recovery is an order in which the change log establishment time is from large to small, and may be based on receiving the data recovery request receiving time and the first
  • the settling time of the change log determines the set of logs to be recovered, and is determined according to
  • the data recovery sequence is performed on the target data object according to the change log in the to-be-recovered log set, until the target data object is obtained according to the change type and the data information included in the first change log.
  • the data is restored to the data before the execution of the first data change.
  • Data recovery through reverse chronological order can effectively satisfy a large number of data recovery requirements and significantly improve data recovery efficiency.
  • FIG. 4 is a structural diagram of a device of a data rewriting device according to an embodiment of the present invention, where the device further includes:
  • the second receiving unit 401 is configured to: after triggering the data recovery unit, receive a data rewriting request that carries a second change log identifier, where the data rewrite request includes requesting to perform data of the target data object from performing The data before the first data change is rewritten to perform data after the second data change, the second change log is a change log for the second data change performed on the target data object, and the second change The log belongs to the collection of logs to be restored.
  • the second determining unit 402 is configured to obtain a log set to be rewritten according to the establishment time of the second change log determined according to the second change log identifier, where the set of the log to be rewritten includes a setup time less than or equal to the first
  • the change log creation time is greater than or equal to the change log of the first change log establishment time.
  • the data rewriting unit 403 is configured to perform data rewriting on the target data object according to the change log in the log set to be rewritten in order of changing the log creation time from small to large, until according to the second change.
  • the type of change included in the log and the data information restore the data of the target data object to the data after the second data modification is performed.
  • the data rewriting request it is determined that the data rewriting is performed in the order of changing the log creation time from small to large, and the data recovery is performed on the target data object according to the change log in the to-be-rewritten log set. And recovering data of the target data object to data after performing the second data modification according to the type of change and the data information included in the second change log.
  • the method of data rewriting is added, which can be flexibly applied to different specific application scenarios.
  • the method further includes:
  • a detecting unit configured to: after triggering the data recovery unit, detect a third data change to the target data object, and establish a corresponding third change log for the third data change.
  • a setting unit configured to set a state of the change log in the change log corresponding to the target data object to be not greater than the setup time of the first change log, and set the change log to be unavailable
  • the third change log is included.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据恢复方法和装置,该方法包括:检测对目标数据对象执行的数据改动,针对一次数据改动建立一个对应的改动日志;接收携带所述第一改动日志标识的数据恢复请求(S101);根据接收到所述数据恢复请求的接收时间和根据所述第一改动日志标识确定出的所述第一改动日志的建立时间得到待恢复日志集合(S102),以改动日志建立时间从大到小的顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据(S103)。通过逆向时间顺序进行数据恢复,可以有效满足很大程度上的数据恢复需求,并显著提高了数据恢复效率。

Description

一种数据恢复方法和装置
本申请要求2015年10月20日递交的申请号为201510684597.0、发明名称为“一种数据恢复方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理领域,特别是涉及一种数据恢复方法和装置。
背景技术
随着网络技术的发展,数据库、服务器等所存储的数据集合规模越来越大。用户可以对数据集合进行操作,其中可能会造成数据集合的数据改动,所述数据改动可以包括对数据集合中数据的删除、增加、替换等会造成数据发生变化的操作。
系统对大规模数据集合的可靠性要求很高,然而人为对数据集合的操作过程中,可能会出现错误操作等情况导致数据集合出现了不应该发生的数据改动。为了保证数据集合的可靠性,需要有一种可靠的机制可以保证数据集合在一定时间段内的数据改动能够被恢复,例如可以将误删除的文件重新恢复回来,或者可以将错误添加的数据删除掉等。
传统实现数据恢复的方式例如可以是MySQL中的Binlog机制,Binlog机制周期性创建数据集合的镜像,Binlog时间轴为记录依据,从创建镜像的时间点开始顺序记录该数据集合所出现数据改动的操作,直到一个记录周期结束为止。例如一个周期为一小时,可以从19点整开始创建数据集合的镜像,在该周期时间段内(从19点整开始20点整结束),Binlog按照时间顺序记录对该数据集合的数据改动。在下一个周期(20点整到21点整),在20点整重新建立该数据集合的镜像,并使用另一个对应的Binlog记录数据改动。当需要进行数据恢复时,根据需要被恢复的数据改动的时间确定出对应的镜像,通过该镜像的Binlog,从镜像创建的时间点开始,以时间顺序重新呈现Binlog所记录的数据改动,直到用户希望数据恢复的具体时间点为止。
在数据恢复的需求中,有很大程度都是针对刚完成的数据改动或最近几次的数据改动所提出的。而Binlog的这种以时间轴为记录依据的方式,在数据恢复时,只能按照时间顺序,从对应的镜像起始点开始重现。在上述例子中,如果用户在19点50分错误操作了一次数据改动,需要进行数据恢复的话,调用镜像和Binlog后,必须从19点整即镜像的创建起始点开始,依据时间顺序重现该镜像,直到重现到19点50分之前,才能 完成本次数据恢复。也就是说,在Binlog方式下,对很大程度上的数据恢复需求的数据恢复效率不高。
发明内容
为了解决上述技术问题,本发明提供了一种数据恢复方法和装置,以使得可以逆着时间顺序进行数据恢复,由此提高了数据恢复的效率。
本发明实施例公开了如下技术方案:
一种数据恢复方法,检测对目标数据对象执行的数据改动,针对一次数据改动建立一个对应的改动日志;第一改动日志为针对所述目标数据对象进行的第一数据改动所建立的改动日志,所述第一改动日志包括所述第一改动日志的建立时间、所述第一数据改动的改动类型、通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息;所述方法包括:
接收携带所述第一改动日志标识的数据恢复请求,所述数据恢复请求包括请求将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据;
根据接收到所述数据恢复请求的接收时间和根据所述第一改动日志标识确定出的所述第一改动日志的建立时间得到待恢复日志集合,所述待恢复日志集合包括建立时间小于等于所述数据恢复请求的接收时间,大于等于所述第一改动日志建立时间的改动日志;
以改动日志建立时间从大到小的顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。
可选的,在将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据之后,还包括:
接收携带第二改动日志标识的数据重写请求,所述数据重写请求包括请求将所述目标数据对象的数据从执行所述第一数据改动之前的数据重写为执行第二数据改动之后的数据,所述第二改动日志为针对所述目标数据对象被执行所述第二数据改动的改动日志,且所述第二改动日志属于所述待恢复日志集合;
根据所述第二改动日志标识确定出的所述第二改动日志的建立时间得到待重写日志集合,所述待重写日志集合包括建立时间小于等于所述第二改动日志建立时间,大于等于所述第一改动日志建立时间的改动日志;
以改动日志建立时间从小到大的顺序,依次根据所述待重写日志集合中的改动日志 对所述目标数据对象进行数据重写,直到根据所述第二改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复为执行所述第二数据改动之后的数据。
可选的,在将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据之后,还包括:
检测到对所述目标数据对象的第三数据改动,针对所述第三数据改动建立对应的第三改动日志;
将对应于所述目标数据对象的改动日志中建立时间大于等于所述第一改动日志的建立时间的改动日志的状态设置为不可用,设置为不可用的改动日志中不包括所述第三改动日志。
可选的,所述通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息包括数据部分和元数据部分,所述数据部分包括改动的数据内容,所述元数据部分包括用于描述所述数据部分内容的信息。
可选的,
所述第一改动日志还包括生命周期,当所述第一改动日志存在的时间超出所述生命周期时,删除所述第一改动日志。
一种数据恢复装置,所述装置包括:
日志建立单元,用于检测对目标数据对象执行的数据改动,针对一次数据改动建立一个对应的改动日志;第一改动日志为针对所述目标数据对象进行的第一数据改动所建立的改动日志,所述第一改动日志包括所述第一改动日志的建立时间、所述第一数据改动的改动类型、通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息;
第一接收单元,用于接收携带所述第一改动日志标识的数据恢复请求,所述数据恢复请求包括请求将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据;
第一确定单元,用于根据接收到所述数据恢复请求的接收时间和根据所述第一改动日志标识确定出的所述第一改动日志的建立时间得到待恢复日志集合,所述待恢复日志集合包括建立时间小于等于所述数据恢复请求的接收时间,大于等于所述第一改动日志建立时间的改动日志;
数据恢复单元,用于以改动日志建立时间从大到小的顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之 前的数据。
可选的,还包括:
第二接收单元,用于在触发所述数据恢复单元之后,接收携带第二改动日志标识的数据重写请求,所述数据重写请求包括请求将所述目标数据对象的数据从执行所述第一数据改动之前的数据重写为执行第二数据改动之后的数据,所述第二改动日志为针对所述目标数据对象被执行所述第二数据改动的改动日志,且所述第二改动日志属于所述待恢复日志集合;
第二确定单元,用于根据所述第二改动日志标识确定出的所述第二改动日志的建立时间得到待重写日志集合,所述待重写日志集合包括建立时间小于等于所述第二改动日志建立时间,大于等于所述第一改动日志建立时间的改动日志;
数据重写单元,用于以改动日志建立时间从小到大的顺序,依次根据所述待重写日志集合中的改动日志对所述目标数据对象进行数据重写,直到根据所述第二改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复为执行所述第二数据改动之后的数据。
可选的,还包括:
检测单元,用于在触发所述数据恢复单元之后,检测到对所述目标数据对象的第三数据改动,针对所述第三数据改动建立对应的第三改动日志;
设置单元,用于将对应于所述目标数据对象的改动日志中建立时间大于等于所述第一改动日志的建立时间的改动日志的状态设置为不可用,设置为不可用的改动日志中不包括所述第三改动日志。
可选的,所述通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息包括数据部分和元数据部分,所述数据部分包括改动的数据内容,所述元数据部分包括用于描述所述数据部分内容的信息。
可选的,
所述第一改动日志还包括生命周期,当所述第一改动日志存在的时间超出所述生命周期时,删除所述第一改动日志。
由上述技术方案可以看出,针对数据对象的一次数据改动,就建立一个对应的改动日志,所述改动日志包括改动日志的建立时间、对应数据改动的改动类型、通过数据改动对数据对象造成数据改动部分的数据信息。当接收到数据恢复请求时,可以根据所述数据恢复请求确定出进行数据恢复的顺序为以改动日志建立时间从大到小的顺序,以及 可以根据接收所述数据恢复请求的接收时间和第一改动日志的建立时间确定出待恢复日志集合,根据确定出的数据恢复顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。通过逆向时间顺序进行数据恢复,可以有效满足很大程度上的数据恢复需求,并显著提高了数据恢复效率。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种数据恢复方法的方法流程图;
图2为本发明实施例提供的一种数据重写方法的方法流程图;
图3为本发明实施例提供的一种数据恢复装置的装置结构图;
图4为本发明实施例提供的一种数据重写装置的装置结构图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
随着网络技术的发展,数据库、服务器等所存储的数据集合规模越来越大。用户可以对数据集合进行操作,其中可能会造成数据集合的数据改动,所述数据改动可以包括对数据集合中数据的删除、增加、替换等会造成数据发生变化的操作。系统对大规模数据集合的可靠性要求很高,然而人为对数据集合的操作过程中,可能会出现错误操作等情况导致数据集合出现了不应该发生的数据改动。为了保证数据集合的可靠性,需要有一种可靠的机制可以保证数据集合在一定时间段内的数据改动能够被恢复,例如可以将误删除的文件重新恢复回来,或者可以将错误添加的数据删除掉等。
传统实现数据恢复的如MySQL中的Binlog机制,Binlog机制周期性创建数据集合 的镜像,Binlog时间轴为记录依据,从创建镜像的时间点开始顺序记录该数据集合所出现数据改动的操作,直到一个记录周期结束为止。当需要进行数据恢复时,根据需要被恢复的数据改动的时间确定出对应的镜像,通过该镜像的Binlog,从镜像创建的时间点开始,以时间顺序重新呈现Binlog所记录的数据改动,直到用户希望数据恢复的具体时间点为止。
在数据恢复的需求中,有很大程度都是针对刚完成的数据改动或最近几次的数据改动所提出的。而Binlog的这种以时间轴为记录依据的方式,在数据恢复时,只能按照时间顺序,从对应的镜像起始点开始重现。这种重现机制的坏处在于,对于刚刚出现的错误操作的数据恢复,可能需要重现很长时间才能完成,而且,Binlog机制是基于时间,而不是基于数据改动,也就是说,在一个Binlog的时间周期内,不管是否出现数据改动,依然需要记录内容到镜像中。这种方式针对数据集合规模较小的MySQL来说,暂时还是适用的,但是随着数据集合的规模越来越大,针对大规模数据集合建立周期性持续镜像会会对系统带来额外的负担,显然Binlog机制不仅不能有效提高数据恢复效率,而且也无法适应目前常用的大规模数据集合的应用场景。
为此,本发明实施例提供了一种数据恢复方法和装置,针对数据对象的一次数据改动,就建立一个对应的改动日志,所述改动日志包括改动日志的建立时间、对应数据改动的改动类型、通过数据改动对数据对象造成数据改动部分的数据信息。当接收到数据恢复请求时,可以根据所述数据恢复请求确定出进行数据恢复的顺序为以改动日志建立时间从大到小的顺序,以及可以根据接收所述数据恢复请求的接收时间和第一改动日志的建立时间确定出待恢复日志集合,根据确定出的数据恢复顺序依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。通过逆向时间顺序进行数据恢复,可以有效满足很大程度上的数据恢复需求,并显著提高了数据恢复效率。
进一步的,在本发明实施例中,还可以在逆时间顺序进行数据恢复后,根据数据重写请求实现正时间顺序的数据重写。通过所述数据重写请求确定出以改动日志建立时间从小到大的顺序进行数据重写,依次根据所述待重写日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第二改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第二数据改动之后的数据。在数据恢复的前提下增加了数据重写的方式,可以灵活的适用于不同的具体应用场景。
实施例一
在介绍本发明实施例的数据恢复方法之前,需要先说明本发明实施例提供的改动日志(change log)机制。
本发明的改动日志机制应用于具有多个数据对象的数据集合,特点是可以针对数据集合中的一个数据对象的每一次数据改动建立一个对应的改动日志。通过检测对目标数据对象执行的数据改动,针对一次数据改动建立一个对应的改动日志。所述目标数据对象可以为数据集合中所包含的多个数据对象中的任意一个数据对象。这里的数据对象可以理解为包括数据表(table)、数据视图(view)或数据资源(resource)等具体形态。
其中,所述数据改动可以包括对数据对象中数据的删除、增加、覆盖、改写等。具体的,能够触发建立改动日志的数据改动具体可以包括:以覆盖的方式往表或分区中写数据(OVERWRITE)、以增量的方式往表或分区中写数据(INSERT INTO/APPEND)、删除非分区表(DROP TABLE)、删除分区表的指定分区(DROP PARTITION)、表重命名(RENAME TABLE)、分区重命名(RENAME PARTITION)、增加列(ALTER TABLE ADD COLUMN)、列重命名(ALTER TABLE CHANGE COLUMN RENAME TO)、创建一个分区(ALTER TABLE ADD PARTITION)、跨集群数据复制(REPLICATION)等。
所述第一改动日志包括所述第一改动日志的建立时间、所述第一数据改动的改动类型、通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息。其中,所述第一改动日志的建立时间可以与所述第一数据改动的执行时间一致。所述第一数据改动的改动类型可以如上述的具体数据改动类型。可选的,所述通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息可以通过数据部分和元数据部分体现,所述数据部分包括改动的数据内容,所述元数据部分包括用于描述所述数据部分内容的信息。具体的,所述元数据部分包括表结构(字段,分区键,表注释,列注释,分区键注释等),例如当前的分区列表、用户权限信息、统计项、生命周期属性、时间戳等。其中生命周期属性可以理解为需要保留所述第一改动日志的时间,因为超出一定时间段之前的数据改动,需要再被数据恢复的可能性就非常低了,没有必要再浪费存储空间存储。故当所述第一改动日志存在的时间超出所述生命周期时,删除所述第一改动日志。以此缓解存储改动日志的存储压力,节约系统资源。改动日志可以例如如表1所示:
表1
logid type time operation status
Logid1 View 201510070200 Overwrite Undoable
Logid2 Table 201510070300 Drop table Undoable
Logid3 Resource 201510070400 Append Undoable
其中,标识为logid1的改动日志所对应的数据对象为数据视图,建立时间为201510070200,数据改动的类型为以覆盖的方式写数据,状态为可撤销(Undoable)。标识为logid2的改动日志所对应的数据对象为数据表,建立时间为201510070300,数据改动的类型为删除数据表,状态为可撤销。标识为logid3的改动日志所对应的数据对象为数据资源,建立时间为201510070400,数据改动的类型为以增加的方式写数据,状态为可撤销。需要注意的是,可撤销以及可重写(Redoable)均属于对数据处理的一种形式,可撤销是指可以撤销之前做出的数据改动,一般来说,一个改动日志建立时,该改动日志的状态可以默认为可撤销,在该改动日志所对应的数据改动被恢复后,该改动日志的状态可以从可撤销改变为可重写。例如数据改动a具体为删除了数据表a,那么数据改动a对应的改动日志a建立时,其状态可以默认为可撤销,意思就是可以通过数据恢复将删除的数据表a取消删除,再次恢复出来。可重写是指可以将之前撤销的数据改动再次重新实现。例如通过数据恢复,恢复了删除的数据表a后,改动日志a的状态也会相应的从可撤销改变为可重写,意思是可以重新实现删除数据表a的数据改动,或者说可以通过数据重写重新执行数据改动a,将之前恢复了的数据表a重新删除。
对于所述目标数据对象,可以有多个对应的改动日志,或者说,针对所述目标数据对象进行一次数据改动,就会建立一个对应的改动日志。
接下来对本发明实施例提供的数据恢复方法进行说明,图1为本发明实施例提供的一种数据恢复方法的方法流程图,所述方法包括:
S101:接收携带所述第一改动日志标识的数据恢复请求,所述数据恢复请求包括请求将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。
举例说明,当用户在对数据集合进行操作时,会具有将之前对数据集合造成的数据改动恢复的需求。所述第一数据恢复请求中除了包括请求将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据,还可以包括请求将其他数据对象的数据恢复到相应数据改动之前的数据,这里不再赘述。所述数据恢复请求的接收时间比所述第一改动日志的建立时间大。也就是说,接收到所述数据恢复请求时,所述目标数据对象已经被执行了所述第一数据改动。
需要注意的是,在对所述目标数据对象执行所述第一数据改动之后到接收到所述数据恢复请求的这段时间内,所述目标数据对象还可以被执行一次或多次数据改动。或者 在对所述目标数据对象执行所述第一数据改动之后到接收到所述数据恢复请求之前的这段时间内,所述目标数据对象没有再被执行其他数据改动。
举例说明,若所述第一数据改动为一小时前在目标数据对象中增加了数据a,则所述第一数据恢复请求可以为请求将所述目标数据对象的数据恢复到一小时前增加数据a之前的数据。
S102:根据接收到所述数据恢复请求的接收时间和根据所述第一改动日志标识确定出的所述第一改动日志的建立时间得到待恢复日志集合,所述待恢复日志集合包括建立时间小于等于所述数据恢复请求的接收时间,大于等于所述第一改动日志建立时间的改动日志。
S103:以改动日志建立时间从大到小的顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。
对所述目标数据对象进行数据恢复时,会依据一定顺序选取所述待恢复日志集合中的改动日志,以恢复被选取的改动日志对应的数据改动在所述目标数据对象上改动的数据。在本发明实施例中,当接收到所述数据恢复请求时,采用逆时间顺序即以改动日志建立时间从大到小的顺序从所述待恢复日志集合中选取改动日志。在本发明实施例中,可以通过接收到的数据请求所携带的内容判断出具体的数据请求类型。例如,若数据请求中具体包括“undo tablename to changelogid”,则可以识别该数据请求为针对该“tablename”的数据恢复请求。
对于所述待恢复日志集合所包含的改动日志,可能会出现两种情况。
第一种情况,所述待恢复日志集合只包括所述第一改动日志,即在接收到所述第一数据恢复请求时,最新建立的改动日志就是所述第一改动日志,或者说对所述目标数据对象执行的最近一次的数据改动就是所述第一数据改动。这种情况下,直接根据所述第一改动日志,将所述第一数据改动在所述目标数据对象上所做出的数据改动进行恢复,恢复为未被执行所述第一数据改动时的数据。
第二种情况,除了包括所述第一改动日志以外,所述待恢复日志集合还包括至少一个改动日志。举例说明,若第一改动日志为改动日志1,第一待恢复日志集合中还包括改动日志2、改动日志3和改动日志4。这四个改动日志的建立时间顺序为改动日志4先于改动日志3建立,改动日志3先于改动日志2建立,改动日志2先于改动日志1建立。那么在根据所述数据恢复请求对所述目标数据对象进行数据恢复时,将以改动日志建立 时间从大到小的顺序,先将改动日志4对所述目标数据对象做出的数据改动恢复,同时,可以将改动日志4的状态从可撤销改为可重写。再将改动日志3对所述目标数据对象的做出的数据改动恢复,同时,可以将改动日志3的状态从可撤销改为可重写。再将改动日志2对所述目标数据对象的做出的数据改动恢复,同时,可以将改动日志2的状态从可撤销改为可重写。最后再将改动日志1对所述目标数据对象做出的数据改动恢复,同时,可以将改动日志1的状态从可撤销改为可重写。
可以看出,针对数据对象的一次数据改动,就建立一个对应的改动日志,所述改动日志包括改动日志的建立时间、对应数据改动的改动类型、通过数据改动对数据对象造成数据改动部分的数据信息。当接收到数据恢复请求时,可以根据所述数据恢复请求确定出进行数据恢复的顺序为以改动日志建立时间从大到小的顺序,以及可以根据接收所述数据恢复请求的接收时间和第一改动日志的建立时间确定出待恢复日志集合,根据确定出的数据恢复顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。通过逆向时间顺序进行数据恢复,可以有效满足很大程度上的数据恢复需求,并显著提高了数据恢复效率。
在对目标数据对象进行数据恢复之后,有可能还有将恢复了的数据改动重新实现的需求。例如对目标数据对象错误的恢复了原本不应该恢复的数据改动a时,就需要通过数据重写将数据改动a在目标数据对象上重写实现,或者说在目标数据对象重新执行一遍数据改动a。
在数据恢复后若需要对所述目标数据对象进行数据重写,也会依据一定顺序选取所述待重写日志集合中的改动日志,将被选取的改动日志对应的数据改动在所述目标数据对象上重新实现。当接收到所述数据重写请求时,采用正时间顺即以改动日志建立时间从小到大的顺序从所述待重写日志集合中选取改动日志。
在本发明实施例中,可以通过接收到的数据请求所携带的内容判断出具体的数据请求类型。例如,若数据请求中包括“redo ttablename to changelogid”,则可以识别该数据请求为针对该“tablename”的数据重写请求。
可选的,在图1所对应实施例的基础上,图2为本发明实施例提供的一种数据重写方法的方法流程图,所述方法包括:
S201:接收携带所述第一改动日志标识的数据恢复请求,所述数据恢复请求包括请求将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。
S202:根据接收到所述数据恢复请求的接收时间和根据所述第一改动日志标识确定出的所述第一改动日志的建立时间得到待恢复日志集合,所述待恢复日志集合包括建立时间小于等于所述数据恢复请求的接收时间,大于等于所述第一改动日志建立时间的改动日志。
S203:以改动日志建立时间从大到小的顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。
S204:接收携带第二改动日志标识的数据重写请求,所述数据重写请求包括请求将所述目标数据对象的数据从执行所述第一数据改动之前的数据重写为执行第二数据改动之后的数据,所述第二改动日志为针对所述目标数据对象被执行所述第二数据改动的改动日志,且所述第二改动日志属于所述待恢复日志集合。
举例说明,根据上述提到的所述待恢复日志集合所包含的改动日志可能会出现两种情况。若所述待恢复日志集合中仅包括所述第一改动日志,那么在图2所对应实施例中,可以将所述第二改动日志理解为就是第一改动日志。若所述待恢复日志集合中包括多个改动日志,那么在图2所对应实施例中,可以将所述第二改动日志理解为就是第一改动日志,或者理解为所述待恢复日志集合中的其他改动日志。当所述第二改动日志为所述待恢复日志集合中不是所述第一改动日志的其他改动日志时,所述第二改动日志的建立时间比所述第一改动日志的建立时间要大,即所述目标数据对象是先被执行了所述第一数据改动后,再被执行所述第二数据改动。
由于所述第二改动日志属于S201至S203中数据恢复时的所述待恢复日志集合,也就是说,所述第二数据改动在所述目标数据对象上造成的数据改动已经被S201至S203中的数据恢复所恢复。但是由于不同的具体情况,需要将所述第二数据改动在由S201至S203的数据恢复后的所述目标数据对象上重新实现。
S205:根据所述第二改动日志标识确定出的所述第二改动日志的建立时间得到待重写日志集合,所述待重写日志集合包括建立时间小于等于所述第二改动日志建立时间,大于等于所述第一改动日志建立时间的改动日志。
S206:以改动日志建立时间从小到大的顺序,依次根据所述待重写日志集合中的改动日志对所述目标数据对象进行数据重写,直到根据所述第二改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复为执行所述第二数据改动之后的数据。
举例说明,若所述第二改动日志和第一改动日志为同一个改动日志,所述待重写日 志集合中仅包括所述第二改动日志或者说仅包括所述第一改动日志。那么直接将所述第二改动日志所对应的所述第二数据改动在所述目标数据对象上重新执行一次,以实现将所述目标数据对象的数据恢复为执行所述第二数据改动之后的数据。实现了在所述目标数据对象上对所述第二数据改动的数据重写。
若所述第二改动日志与第一改动日志为不同的改动日志,所述待重写日志集合将至少包括所述第一改动日志和第二改动日志,或者所述待重写日志集合将至少包括所述第一改动日志、第二改动日志和建立时间处于所述第一改动日志和第二改动日志建立时间之间的改动日志。那么以正时间顺序,将所述待重写日志集合中的改写日志,依次调取,将在S201至S203中被数据恢复的数据改动在所述目标数据对象上重新实现。在数据重写的过程中,最后一个被数据重写的是所述第二改动日志所对应的第二数据改动。
可见,通过所述数据重写请求确定出以改动日志建立时间从小到大的顺序进行数据重写,依次根据所述待重写日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第二改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第二数据改动之后的数据。在数据恢复的前提下增加了数据重写的方式,可以灵活的适用于不同的具体应用场景。
需要注意的是,在进行了数据恢复后,有可能用户将会对所述目标数据对象进行新的数据改动,在这种情况下,需要对在数据恢复过程中被调取的改动日志的状态进行更新。可选的,在图1所对应实施例或者图2所对应实施例的基础上,在将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据之后,还包括:
检测到对所述目标数据对象的第三数据改动,针对所述第三数据改动建立对应的第三改动日志。
将对应于所述目标数据对象的改动日志中建立时间大于等于所述第一改动日志的建立时间的改动日志的状态设置为不可用,设置为不可用的改动日志中不包括所述第三改动日志。
举例说明,在将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据之后,若对所述目标数据对象进行了其他的数据改动,那么之前数据恢复时的所述待恢复日志集合中的改动日志所对应的数据改动可能就会因为覆盖等问题无法再被重写,故之前数据恢复时的所述待恢复日志集合中的改动日志的状态将会被设置为不可用(Unavailable)。状态被设置为不可用的改动日志将不会再被使用在数据恢复或者数据重写中,也就是说,在建立所述第三改动日志之后再进行的数据恢复或者数据重写时, 对应的待恢复日志集合和所重写日志集合中将不会包括状态为不可用的改动日志。
实施例二
图3为本发明实施例提供的一种数据恢复装置的装置结构图,所述装置包括:
日志建立单元300,用于检测对目标数据对象执行的数据改动,针对一次数据改动建立一个对应的改动日志;第一改动日志为针对所述目标数据对象进行的第一数据改动所建立的改动日志,所述第一改动日志包括所述第一改动日志的建立时间、所述第一数据改动的改动类型、通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息。
第一接收单元301,用于接收携带所述第一改动日志标识的数据恢复请求,所述数据恢复请求包括请求将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。
可选的,所述通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息包括数据部分和元数据部分,所述数据部分包括改动的数据内容,所述元数据部分包括用于描述所述数据部分内容的信息。
可选的,所述第一改动日志还包括生命周期,当所述第一改动日志存在的时间超出所述生命周期时,删除所述第一改动日志。
第一确定单元302,用于根据接收到所述数据恢复请求的接收时间和根据所述第一改动日志标识确定出的所述第一改动日志的建立时间得到待恢复日志集合,所述待恢复日志集合包括建立时间小于等于所述数据恢复请求的接收时间,大于等于所述第一改动日志建立时间的改动日志。
数据恢复单元303,用于以改动日志建立时间从大到小的顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。
可见,针对数据对象的一次数据改动,就建立一个对应的改动日志,所述改动日志包括改动日志的建立时间、对应数据改动的改动类型、通过数据改动对数据对象造成数据改动部分的数据信息。当接收到数据恢复请求时,可以根据所述数据恢复请求确定出进行数据恢复的顺序为以改动日志建立时间从大到小的顺序,以及可以根据接收所述数据恢复请求的接收时间和第一改动日志的建立时间确定出待恢复日志集合,根据确定出 的数据恢复顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。通过逆向时间顺序进行数据恢复,可以有效满足很大程度上的数据恢复需求,并显著提高了数据恢复效率。
在图3所对应实施例的基础上,图4为本发明实施例提供的一种数据重写装置的装置结构图,所述装置还包括:
第二接收单元401,用于在触发所述数据恢复单元之后,接收携带第二改动日志标识的数据重写请求,所述数据重写请求包括请求将所述目标数据对象的数据从执行所述第一数据改动之前的数据重写为执行第二数据改动之后的数据,所述第二改动日志为针对所述目标数据对象被执行所述第二数据改动的改动日志,且所述第二改动日志属于所述待恢复日志集合。
第二确定单元402,用于根据所述第二改动日志标识确定出的所述第二改动日志的建立时间得到待重写日志集合,所述待重写日志集合包括建立时间小于等于所述第二改动日志建立时间,大于等于所述第一改动日志建立时间的改动日志。
数据重写单元403,用于以改动日志建立时间从小到大的顺序,依次根据所述待重写日志集合中的改动日志对所述目标数据对象进行数据重写,直到根据所述第二改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复为执行所述第二数据改动之后的数据。
可见,通过所述数据重写请求确定出以改动日志建立时间从小到大的顺序进行数据重写,依次根据所述待重写日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第二改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第二数据改动之后的数据。在数据恢复的前提下增加了数据重写的方式,可以灵活的适用于不同的具体应用场景。
需要注意的是,在进行了数据恢复后,有可能用户将会对所述目标数据对象进行新的数据改动,在这种情况下,需要对在数据恢复过程中被调取的改动日志的状态进行更新。可选的,在图3所对应实施例或者图4所对应实施例的基础上,还包括:
检测单元,用于在触发所述数据恢复单元之后,检测到对所述目标数据对象的第三数据改动,针对所述第三数据改动建立对应的第三改动日志。
设置单元,用于将对应于所述目标数据对象的改动日志中建立时间大于等于所述第一改动日志的建立时间的改动日志的状态设置为不可用,设置为不可用的改动日志中不 包括所述第三改动日志。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质可以是下述介质中的至少一种:只读存储器(英文:read-only memory,缩写:ROM)、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于设备及系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的设备及系统实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。

Claims (10)

  1. 一种数据恢复方法,其特征在于,检测对目标数据对象执行的数据改动,针对一次数据改动建立一个对应的改动日志;第一改动日志为针对所述目标数据对象进行的第一数据改动所建立的改动日志,所述第一改动日志包括所述第一改动日志的建立时间、所述第一数据改动的改动类型、通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息;所述方法包括:
    接收携带所述第一改动日志标识的数据恢复请求,所述数据恢复请求包括请求将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据;
    根据接收到所述数据恢复请求的接收时间和根据所述第一改动日志标识确定出的所述第一改动日志的建立时间得到待恢复日志集合,所述待恢复日志集合包括建立时间小于等于所述数据恢复请求的接收时间,大于等于所述第一改动日志建立时间的改动日志;
    以改动日志建立时间从大到小的顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。
  2. 根据权利要求1所述的方法,其特征在于,在将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据之后,还包括:
    接收携带第二改动日志标识的数据重写请求,所述数据重写请求包括请求将所述目标数据对象的数据从执行所述第一数据改动之前的数据重写为执行第二数据改动之后的数据,所述第二改动日志为针对所述目标数据对象被执行所述第二数据改动的改动日志,且所述第二改动日志属于所述待恢复日志集合;
    根据所述第二改动日志标识确定出的所述第二改动日志的建立时间得到待重写日志集合,所述待重写日志集合包括建立时间小于等于所述第二改动日志建立时间,大于等于所述第一改动日志建立时间的改动日志;
    以改动日志建立时间从小到大的顺序,依次根据所述待重写日志集合中的改动日志对所述目标数据对象进行数据重写,直到根据所述第二改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复为执行所述第二数据改动之后的数据。
  3. 根据权利要求1或2所述的方法,其特征在于,在将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据之后,还包括:
    检测到对所述目标数据对象的第三数据改动,针对所述第三数据改动建立对应的第 三改动日志;
    将对应于所述目标数据对象的改动日志中建立时间大于等于所述第一改动日志的建立时间的改动日志的状态设置为不可用,设置为不可用的改动日志中不包括所述第三改动日志。
  4. 根据权利要求1所述的方法,其特征在于,所述通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息包括数据部分和元数据部分,所述数据部分包括改动的数据内容,所述元数据部分包括用于描述所述数据部分内容的信息。
  5. 根据权利要求1所述的方法,其特征在于,
    所述第一改动日志还包括生命周期,当所述第一改动日志存在的时间超出所述生命周期时,删除所述第一改动日志。
  6. 一种数据恢复装置,其特征在于,所述装置包括:
    日志建立单元,用于检测对目标数据对象执行的数据改动,针对一次数据改动建立一个对应的改动日志;第一改动日志为针对所述目标数据对象进行的第一数据改动所建立的改动日志,所述第一改动日志包括所述第一改动日志的建立时间、所述第一数据改动的改动类型、通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息;
    第一接收单元,用于接收携带所述第一改动日志标识的数据恢复请求,所述数据恢复请求包括请求将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据;
    第一确定单元,用于根据接收到所述数据恢复请求的接收时间和根据所述第一改动日志标识确定出的所述第一改动日志的建立时间得到待恢复日志集合,所述待恢复日志集合包括建立时间小于等于所述数据恢复请求的接收时间,大于等于所述第一改动日志建立时间的改动日志;
    数据恢复单元,用于以改动日志建立时间从大到小的顺序,依次根据所述待恢复日志集合中的改动日志对所述目标数据对象进行数据恢复,直到根据所述第一改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复到执行所述第一数据改动之前的数据。
  7. 根据权利要求6所述的装置,其特征在于,还包括:
    第二接收单元,用于在触发所述数据恢复单元之后,接收携带第二改动日志标识的数据重写请求,所述数据重写请求包括请求将所述目标数据对象的数据从执行所述第一数据改动之前的数据重写为执行第二数据改动之后的数据,所述第二改动日志为针对所 述目标数据对象被执行所述第二数据改动的改动日志,且所述第二改动日志属于所述待恢复日志集合;
    第二确定单元,用于根据所述第二改动日志标识确定出的所述第二改动日志的建立时间得到待重写日志集合,所述待重写日志集合包括建立时间小于等于所述第二改动日志建立时间,大于等于所述第一改动日志建立时间的改动日志;
    数据重写单元,用于以改动日志建立时间从小到大的顺序,依次根据所述待重写日志集合中的改动日志对所述目标数据对象进行数据重写,直到根据所述第二改动日志所包括的改动类型和数据信息将所述目标数据对象的数据恢复为执行所述第二数据改动之后的数据。
  8. 根据权利要求6或7所述的装置,其特征在于,还包括:
    检测单元,用于在触发所述数据恢复单元之后,检测到对所述目标数据对象的第三数据改动,针对所述第三数据改动建立对应的第三改动日志;
    设置单元,用于将对应于所述目标数据对象的改动日志中建立时间大于等于所述第一改动日志的建立时间的改动日志的状态设置为不可用,设置为不可用的改动日志中不包括所述第三改动日志。
  9. 根据权利要求6所述的装置,其特征在于,所述通过所述第一数据改动对所述目标数据对象造成数据改动部分的数据信息包括数据部分和元数据部分,所述数据部分包括改动的数据内容,所述元数据部分包括用于描述所述数据部分内容的信息。
  10. 根据权利要求6所述的装置,其特征在于,
    所述第一改动日志还包括生命周期,当所述第一改动日志存在的时间超出所述生命周期时,删除所述第一改动日志。
PCT/CN2016/101730 2015-10-20 2016-10-11 一种数据恢复方法和装置 WO2017067397A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510684597.0A CN106599006B (zh) 2015-10-20 2015-10-20 一种数据恢复方法和装置
CN201510684597.0 2015-10-20

Publications (1)

Publication Number Publication Date
WO2017067397A1 true WO2017067397A1 (zh) 2017-04-27

Family

ID=58555110

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/101730 WO2017067397A1 (zh) 2015-10-20 2016-10-11 一种数据恢复方法和装置

Country Status (2)

Country Link
CN (1) CN106599006B (zh)
WO (1) WO2017067397A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117312B (zh) * 2018-08-23 2022-03-01 北京小米智能科技有限公司 数据恢复方法及装置
CN111913972A (zh) * 2019-05-10 2020-11-10 阿里巴巴集团控股有限公司 数据处理方法、装置及设备
CN114077517A (zh) * 2020-08-13 2022-02-22 华为技术有限公司 数据处理的方法、设备及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060031267A1 (en) * 2004-08-04 2006-02-09 Lim Victor K Apparatus, system, and method for efficient recovery of a database from a log of database activities
CN101436207A (zh) * 2008-12-16 2009-05-20 浪潮通信信息系统有限公司 一种基于日志快照的数据恢复和同步方法
CN102609337A (zh) * 2012-01-19 2012-07-25 北京神州数码思特奇信息技术股份有限公司 一种内存数据库快速数据恢复方法
CN104715041A (zh) * 2015-03-24 2015-06-17 深圳市乾华数据科技有限公司 一种数据库恢复方法及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100504905C (zh) * 2007-11-16 2009-06-24 中国科学院软件研究所 数据库恶意事务处理方法及其系统
US8856593B2 (en) * 2010-04-12 2014-10-07 Sandisk Enterprise Ip Llc Failure recovery using consensus replication in a distributed flash memory system
CN103412803B (zh) * 2013-08-15 2016-08-10 华为技术有限公司 数据恢复的方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060031267A1 (en) * 2004-08-04 2006-02-09 Lim Victor K Apparatus, system, and method for efficient recovery of a database from a log of database activities
CN101436207A (zh) * 2008-12-16 2009-05-20 浪潮通信信息系统有限公司 一种基于日志快照的数据恢复和同步方法
CN102609337A (zh) * 2012-01-19 2012-07-25 北京神州数码思特奇信息技术股份有限公司 一种内存数据库快速数据恢复方法
CN104715041A (zh) * 2015-03-24 2015-06-17 深圳市乾华数据科技有限公司 一种数据库恢复方法及系统

Also Published As

Publication number Publication date
CN106599006A (zh) 2017-04-26
CN106599006B (zh) 2020-08-04

Similar Documents

Publication Publication Date Title
CN106407356B (zh) 一种数据备份方法及装置
US11416344B2 (en) Partial database restoration
US9645892B1 (en) Recording file events in change logs while incrementally backing up file systems
US8015430B1 (en) Using asset dependencies to identify the recovery set and optionally automate and/or optimize the recovery
CN100498796C (zh) 逻辑日志生成方法、数据库备份/恢复方法与系统
US7801867B2 (en) Optimizing backup and recovery utilizing change tracking
US20080162599A1 (en) Optimizing backup and recovery utilizing change tracking
US10120595B2 (en) Optimizing backup of whitelisted files
CN106844102B (zh) 数据恢复方法和装置
CN110543386B (zh) 一种数据存储方法、装置、设备和存储介质
CN108255638B (zh) 一种快照回滚方法及装置
WO2018107792A1 (zh) 数据的增量恢复方法和装置
US8271454B2 (en) Circular log amnesia detection
WO2018068639A1 (zh) 数据恢复方法、装置和存储介质
WO2017067397A1 (zh) 一种数据恢复方法和装置
CN105302488A (zh) 一种存储系统的数据写入方法及系统
CN109753381B (zh) 一种基于对象存储的持续数据保护方法
TW201516655A (zh) 基於分散式文檔系統的資料備份還原系統及方法
US11093348B2 (en) Method, device and computer program product for recovering metadata
US8595271B1 (en) Systems and methods for performing file system checks
CN108089942B (zh) 一种数据备份、恢复方法及装置
CN106997305B (zh) 一种事务处理方法与装置
US10423494B2 (en) Trimming unused blocks from a versioned image backup of a source storage that is stored in a sparse storage
CN110471796B (zh) 一种面向文件目录的完全和增量的备份恢复方法
CN111221801A (zh) 一种数据库迁移方法、系统及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16856830

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16856830

Country of ref document: EP

Kind code of ref document: A1