WO2022033269A1 - 数据处理的方法、设备及系统 - Google Patents

数据处理的方法、设备及系统 Download PDF

Info

Publication number
WO2022033269A1
WO2022033269A1 PCT/CN2021/106701 CN2021106701W WO2022033269A1 WO 2022033269 A1 WO2022033269 A1 WO 2022033269A1 CN 2021106701 W CN2021106701 W CN 2021106701W WO 2022033269 A1 WO2022033269 A1 WO 2022033269A1
Authority
WO
WIPO (PCT)
Prior art keywords
version
data
host
log
shared server
Prior art date
Application number
PCT/CN2021/106701
Other languages
English (en)
French (fr)
Inventor
阙鸣健
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022033269A1 publication Critical patent/WO2022033269A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Definitions

  • the present application relates to the field of computers, and in particular, to a method, device and system for data processing.
  • data modification does not directly affect the memory page (data file), but after the data is modified in the buffer pool (buffer pool), the modification process will generate a log and the log will be Store to the log disk, and then write the memory pages in the buffer pool to the file disk when the buffer pool is full or other preset conditions.
  • the standby machine or the host can obtain the memory pages of the historical version from the file disk, and then combine them with the log disk. Only by replaying the logs in the order of the stored logs, the latest version of the data can be obtained. In this way, if there is a large amount of data that has not been written to the file disk in the buffer pool, log playback will take a lot of time, which not only affects the backup efficiency of the standby machine, but also affects the recovery efficiency after the host is damaged, making the entire data backup and failure recovery. The process takes a long time.
  • the present invention provides a data processing method, device and system, which are used to solve the problem of long time-consuming data backup and fault recovery processes.
  • a data processing method includes the following steps: a standby machine obtains a data version relationship recorded by a shared server, and obtains a first version of data and a first log according to the data version relationship, wherein the first log It is used to identify the operation process of changing the second version relative to the first version.
  • the data version relationship records the recovery dependencies of different versions of the data.
  • the standby machine stores the data in the standby machine according to the recovery dependencies and the first version data and the first log.
  • the restore dependencies can refer to the order of log playback. For example, the V2 version is obtained through log playback according to the memory pages of the V1 version and the log LOGV2, and then the V3 version is obtained through the log playback according to the memory pages of the V2 version and the log LOGV3. Version.
  • the host consumes less system resources to write memory pages to the shared server through the network, the frequency of persistent storage of memory pages by the host can be increased, so that the standby machine can obtain memory pages from the shared server when data recovery is performed.
  • the version gap between the version of the memory page and the latest log stored in the log disk is small, the number of log playback required to restore the latest version of data is reduced, and individual scenarios do not even need to perform log playback operations, just from the shared server.
  • Obtaining the latest version of the memory page improves the efficiency of data backup on the standby machine and the failover efficiency of the standby machine when the host fails.
  • the data of the first version refers to the memory page of the first version
  • the data of the second version refers to the second version of the memory page
  • the first version is earlier than the second version.
  • the fact that the first version is earlier than the second version means that the first version is a version adjacent to the second version, in other words, no other version data exists between the first version and the second version.
  • the standby machine can perform a log playback according to the first version data and the first log to restore the second version data.
  • the fact that the first version is earlier than the second version may also mean that other version data exists between the second version and the first version, for example, the second version is adjacent to the third version, and the third version is adjacent to the first version.
  • Adjacent, at this time, the first log may include a second log and a third log, wherein the second log is used to identify the operation process of the change of the third version relative to the first version, and the third log is used to identify the second version relative to the first version. The operation process of the third version change.
  • the standby machine restores the second version data in the standby machine according to the first version data and the first log
  • it can first restore the third version data in the standby machine according to the first version data and the second log, and then restore the third version data in the standby machine according to the first version data and the second log.
  • the third-version data and the third log are restored to the second-version data on the standby machine.
  • the first version can also be the same version as the second version.
  • the data of the first version stored in the shared server is the data of the current latest version of the host, so the standby server does not need log playback, and the The data of the first version is obtained from the shared server, and the data recovery is completed, so that the efficiency of data backup on the standby machine is improved, and the efficiency of failover of the standby machine when the host fails can also be improved.
  • the standby machine when the standby machine obtains the first version of the data and the first log according to the data version relationship, it may first determine that the data version stored in the standby machine is earlier than the first version, and then obtain the first version from the shared server. One version of data, if the data version stored in the standby machine is the same as the first version, the standby machine can obtain the first version data locally.
  • the standby machine first determines that the first version of the data does not exist in the local machine, and then obtains the data from the shared server, which can avoid obtaining the data from the shared server again when the first version of the data already exists locally. Efficiency of data recovery.
  • a data processing method includes the following steps: the host sends the first version data and the information of the current latest version to the shared server, and sends the first log to the standby machine, wherein the second version data is the current version.
  • the shared server is connected to the host and the standby server
  • the host is used to receive read and write requests to the database system
  • the standby server is used in the database system to back up data
  • the first log is used to identify the operation process of the change of the second version relative to the first version.
  • the host can write the memory page to the shared server according to the time period, for example, write the latest version of the memory page to the shared server every 1 minute; it can also write the memory page to the shared server according to the number of modifications, such as the memory page every Modify 5 times, and write the latest version of the memory page to the shared server; you can also write the memory page to the shared server according to the modification amount. For example, when a 50G memory page is modified, the latest version of the memory page is written to the shared server.
  • the frequency at which memory pages are written to the shared server can be determined based on experience, which is not specifically limited in this application.
  • the frequency of persistent storage of memory pages by the host can be increased, so that the host or standby machine can obtain data from the shared server when restoring data.
  • Memory page, the version gap between the version of the memory page and the log stored in the log disk is small, and the number of log playbacks required to restore the latest version of data is reduced. In some scenarios, log playback operations are not even required.
  • the latest version of memory pages can be obtained, which improves the efficiency of data backup on the standby machine, and the efficiency of host failure recovery or standby machine failover after a host failure can also be improved.
  • the host may first obtain the recorded data version relationship from the shared server, and the data version relationship is used to record the recovery dependencies of different versions of data , the data version relationship is obtained by the shared server according to the information of the first version data and the current latest version sent by the host, and then obtains the first version data and the first log according to the data version relationship, and finally according to the first version data and the first log in the The second version data is restored on the host.
  • the frequency of persistent storage of memory pages by the host can be increased, so that when the host fails to perform data recovery, the memory obtained from the shared server can be increased.
  • the version gap between the page version and the latest log stored in the log disk is small, the number of log playback required for data recovery is reduced, and log playback operations are not even required in individual scenarios, so that the efficiency of failure recovery after a host failure can also be obtained. promote.
  • the host may write the first version data into the shared server through a remote direct memory access (RDMA) method, and send the information of the current latest version to the shared server.
  • RDMA remote direct memory access
  • RDMA is a direct memory access technology.
  • An intelligent network interface card iNIC can transfer memory pages from the host's memory to the remote shared server's memory, and the transfer process does not require the intervention of both CPUs, thereby eliminating the need for The overhead of replication and context switching required by the remote shared server when receiving data, to achieve the purpose of low-latency, low-overhead, and high-bandwidth data transmission.
  • the frequency of persistent storage of memory pages by the host through the shared server is increased, so that when data recovery is performed after a host failure, the version gap between the memory page version obtained from the shared server and the log stored in the log disk is small.
  • the number of log replays required for data recovery is reduced, and the efficiency of failure recovery after a host failure can also be improved.
  • the data of the first version refers to the memory page of the first version
  • the data of the second version refers to the second version of the memory page
  • the first version is earlier than the second version.
  • the fact that the first version is earlier than the second version means that the first version is a version adjacent to the second version, in other words, no other version data exists between the first version and the second version.
  • the standby machine can perform a log playback according to the first version data and the first log to restore the second version data.
  • the fact that the first version is earlier than the second version may also mean that other version data exists between the second version and the first version, for example, the second version is adjacent to the third version, and the third version is adjacent to the first version.
  • Adjacent, at this time, the first log may include a second log and a third log, wherein the second log is used to identify the operation process of the change of the third version relative to the first version, and the third log is used to identify the second version relative to the first version. The operation process of the third version change.
  • the host restores the second version data in the host according to the first version data and the first log, it can first restore the third version data in the host according to the first version data and the second log, and then according to the third version data And the third log restores the second version data in the host.
  • the first version can also be the same version as the second version.
  • the data of the first version stored in the shared server is the data of the current latest version of the host, so the host does not need log playback, from the shared server.
  • the data of the first version is obtained from the server and the data recovery is completed, so that the efficiency of data backup by the host is improved, and the efficiency of failover of the host after a host failure can also be improved.
  • a data processing method comprising the following steps: the shared server receives the first version data and the information of the current latest version sent by the host, wherein the second version data is the latest data in the host at the current moment, The information of the current latest version includes the information of the second version, and then the data version relationship is generated according to the data of the first version and the information of the current latest version, and then the data version relationship is sent to the standby machine, wherein the data version relationship is used to record different versions recovery dependencies of the data.
  • the host consumes less system resources to write memory pages to the shared server through the network, the frequency of persistent storage of memory pages by the host can be increased, so that the standby machine can obtain memory pages from the shared server when data recovery is performed.
  • the version gap between the version of the memory page and the latest log stored in the log disk is small, the number of log playback required to restore the latest version of data is reduced, and individual scenarios do not even need to perform log playback operations, just from the shared server.
  • Obtaining the latest version of the memory page improves the efficiency of data backup on the standby machine and the failover efficiency of the standby machine when the host fails.
  • the shared buffer pool in the shared server can realize the function of recording the data version relationship by recording the starting point, recovery point and end point of each memory page, wherein the starting point is the earliest appearance of the memory page in the shared buffer pool.
  • the recovery point is the version corresponding to the first version of the data sent by the host received by the shared server, that is, the first version
  • the end point is the version corresponding to the current latest version of the information received by the shared server and sent by the host. It is the second version.
  • the version of memory page 1 received by the shared buffer pool for the first time is V0
  • the shared buffer pool can record the starting point of memory page 1 in the shared buffer pool as V0.
  • the shared buffer pool When receiving the information that the V2 version of memory page 1 and the latest version of memory page 1 is V3 sent by the host, the shared buffer pool can record that the recovery point of memory page 1 is V2 and the end point is V3. In this way, whenever the shared buffer pool receives the information of the first version data and the current latest version sent by the host, the recovery point and end point of the memory page corresponding to the first version data can be updated based on this, so as to realize the recovery of recording different versions of data Purpose of dependencies.
  • the standby machine may send an acquisition request for the first version data to the shared server according to the received data version relationship, and the shared server receives the acquisition request for the first version data sent by the standby machine, and sends the data acquisition request to the standby machine.
  • the computer returns the first version data, wherein the standby computer determines that the data version stored by the standby computer is earlier than the first version according to the data version relationship when the acquisition request of the first version data is generated. If the standby server needs to restore the latest version of memory page 1, the acquisition request can carry the identification of memory page 1, and the shared server responds to the acquisition request and writes the data version relationship into the standby computer by means of RDMA. If it is the same server as the standby server, the shared server can also write the data version relationship to the standby server by means of DMA.
  • the standby machine first determines that the first version of the data does not exist in the local machine, and then obtains the data from the shared server, which can avoid obtaining the data from the shared server again when the first version of the data already exists locally. Efficiency of data recovery.
  • a check code may be added at the end of the first version data, and the check code is used for the standby machine to determine the received first version data is the complete data.
  • the standby machine determines whether the first version of data exists in its buffer pool, it can further confirm whether the first version of the data is complete through the check code. If the version data is incomplete, the log playback fails; if the data is complete, the standby server sends a request for obtaining the first version data to the shared server.
  • check codes can be used to ensure the integrity of data communication between the shared server and the standby server, for example, through the completion queue, which contains the completed work requests in the work queue. Completion determines whether the first version data is complete.
  • the shared server may store the data of the first version, and delete the data whose data version is earlier than the first version.
  • the shared buffer pool can manage the received memory pages. Each time a new memory page is received, its historical version is deleted, so as to improve the memory utilization. If the memory page has not been modified for a long time, it will also It can be deleted, so as to achieve the purpose of saving memory occupation, and at the same time improve the recovery efficiency of frequently modified memory pages (ie, hot pages).
  • the shared server can manage the shared buffer pool through a linked list.
  • the linked list can follow the principle of first-in, first-out.
  • the above linked list can be implemented by the least recently used (LRU) algorithm.
  • the LRU algorithm can record the experience of a memory page since it was last accessed by assigning an access field to each memory page in the shared buffer pool. Time t is selected, and the page with the largest t value among the existing pages is selected, and the page that has not been used for the longest time recently is eliminated.
  • the linked list can also be implemented by other algorithms, which is not limited in this application.
  • the shared server deletes data whose data version is earlier than the first version, so that each memory page has and only one version is stored in the shared server, which greatly reduces the occupation of the shared buffer pool and improves memory utilization.
  • the earliest received data may be deleted.
  • the above linked list can be set to a fixed length. After the page is stored in the linked list, if the length of the linked list is full, the memory page at the head (that is, the memory page that has not been modified for a long time) will be deleted from the linked list. The purpose of saving memory usage is achieved, and at the same time, more hot pages can be stored in the shared buffer pool, and the data recovery efficiency of hot pages is improved.
  • Each new page received by the shared buffer pool will be placed at the end of the linked list, so the page at the head is a page that has not been modified for a long time (also known as a cold page), and this page has a high probability of being written by the host.
  • These pages can be deleted from the shared buffer pool for persistent storage, so as to save memory, so that the shared buffer pool can store more hot pages with high modification frequency, which can improve host failure recovery and standby machine failover. and the efficiency of backup data on the standby machine.
  • a data processing system including a host computer, a standby computer, and a shared server, wherein the standby computer implements the operation steps of the method described in the first aspect or any possible implementation manner of the first aspect,
  • the host is used to implement the operation steps of the method described in the second aspect or any possible implementation manner of the second aspect
  • the shared server is used to implement the method described in the third aspect or any possible implementation manner of the third aspect steps of the method.
  • a backup machine in a fifth aspect, includes an acquisition unit and a recovery unit.
  • the acquisition unit is used to acquire the data version relationship recorded by the shared server, the data version relationship is used to record the recovery dependency relationship of data of different versions
  • the standby machine is the device used for backing up data in the database system
  • the shared server is connected to the standby machine and the host computer connected, the host is used to receive read and write requests to the database system
  • the acquisition unit is used to obtain the first version data and the first log according to the data version relationship, wherein the second version data is the latest data in the host at the current moment, and the first log
  • the operation process used to identify the change of the second version relative to the first version, the first version is earlier than the second version
  • the recovery unit is configured to restore the second version data in the standby machine according to the first version data and the first log.
  • the first version being earlier than the second version includes: the first version is a version adjacent to the second version.
  • the fact that the first version is earlier than the second version includes: the second version is adjacent to the third version, and the third version is adjacent to the first version; the first log includes a second log and a third log, wherein the second The log is used to identify the operation process that the third version changes relative to the first version, and the third log is used to identify the operation process that the second version changes relative to the third version;
  • the data of the third version is restored in the standby machine; the recovery unit is used to restore the data of the second version in the standby machine according to the data of the third version and the third log.
  • the obtaining unit is configured to determine, according to the data version relationship, that the data version stored on the standby machine is earlier than the first version; the obtaining unit is configured to obtain the first version data from the shared server, and the standby machine receives the first log sent by the shared server.
  • a host in a sixth aspect, includes a sending unit, an obtaining unit, and a restoring unit.
  • the sending unit is used for the host to send the first version data and the current latest version information to the shared server, wherein the second version data is the latest data in the host at the current moment, the current latest version information includes the second version information, the shared server
  • the server is connected with the host computer and the standby computer, the host computer is used to receive read and write requests to the database system, and the standby computer is a device used for backing up data in the database system.
  • the sending unit is further configured to send a first log to the standby machine, where the first log is used to identify an operation process of changing the second version relative to the first version, and the first version is earlier than the second version.
  • the host also includes an acquisition unit and a recovery unit, wherein the acquisition unit is used to acquire the data version relationship recorded by the shared server, the data version relationship is used to record the recovery dependency relationship of data of different versions, and the data version relationship is the shared server according to the data version relationship.
  • the data of the first version and the information of the current latest version sent by the host are obtained; the acquiring unit is used to obtain the data of the first version and the first log according to the data version relationship; the recovery unit is used to obtain the data of the first version and the first log in the host Restore the second version data.
  • the sending unit is configured to write the data of the first version into the shared server by using the remote direct memory access RDMA method, and send the information of the current latest version to the shared server.
  • the first version being earlier than the second version includes: the first version is a version adjacent to the second version.
  • the fact that the first version is earlier than the second version includes: the second version is adjacent to the third version, and the third version is adjacent to the first version; the first log includes a second log and a third log, wherein the second The log is used to identify the operation process that the third version changes relative to the first version, and the third log is used to identify the operation process that the second version changes relative to the third version;
  • the data of the third version is restored in the host; the restoring unit is used to restore the data of the second version in the host according to the data of the third version and the third log.
  • a shared server in a seventh aspect, includes a receiving unit, a generating unit, and a sending unit.
  • the receiving unit is used to receive the first version data and the current latest version information sent by the host, wherein the second version data is the latest data in the host at the current moment, the current latest version information includes the second version information, and the shared server and host Connected to the standby machine, the host is used to receive read and write requests to the database system, and the standby machine is a device used for backing up data in the database system.
  • the generating unit is configured to generate a data version relationship according to the first version data and the information of the current latest version, and the data version relationship is used to record the restoration dependency relationship of data of different versions.
  • the sending unit is used to send the data version relationship to the standby machine.
  • the receiving unit is further configured to receive an acquisition request of the first version data sent by the standby machine, wherein, when the acquisition request of the first version data is performed, the standby machine determines that the data version stored in the standby machine is earlier than the first version according to the data version relationship. It is generated after the version; the sending unit is also used to send the data of the first version to the standby machine.
  • the sending unit is used to send the first version data and the check code to the backup machine, the check code is located at the end of the first version data, and the check code is used for the backup machine to determine that the received first version data is complete. data.
  • the sharing server 300 further includes a deletion unit 840, and the deletion unit 840 is configured to store the data of the first version and delete the data whose data version is earlier than the first version.
  • the deletion unit 840 is further configured to delete the earliest received data when the amount of stored data reaches a threshold.
  • a computer-readable storage medium is provided, and instructions are stored in the computer-readable storage medium, which, when executed on a computer, cause the computer to execute the methods described in the above aspects.
  • a data processing device in a tenth aspect, includes a processor, and the processor is configured to execute the methods described in the above aspects.
  • the present application may further combine to provide more implementation manners.
  • FIG. 1 is a schematic structural diagram of a database system provided by the present embodiment
  • FIG. 2 is a schematic diagram of the deployment of a database system provided in this embodiment
  • FIG. 3 is a schematic flowchart of steps of a data processing method provided in this embodiment.
  • FIG. 4 is an example diagram of a start point, a recovery point, and an end point of a memory page 1 in a shared buffer pool provided by this embodiment;
  • FIG. 5 is a schematic structural diagram of a shared buffer pool provided in this embodiment.
  • FIG. 6 is a schematic flowchart of steps in an application scenario of a data processing method provided in this embodiment
  • FIG. 7 is a schematic structural diagram of a host provided by this embodiment.
  • FIG. 9 is a schematic structural diagram of a standby machine provided by the present embodiment.
  • FIG. 10 is a schematic structural diagram of a data processing device provided in this embodiment.
  • Data backup is the basis of disaster recovery. It refers to the process of copying all or part of the data set from the host's hard disk or array to other storage media in order to prevent data loss due to system operating errors or system failures.
  • RDMA Remote direct memory access
  • Linked list is a non-consecutive, non-sequential storage structure on physical storage units.
  • the logical order of storage units is realized by the link order of pointers in the linked list.
  • the linked list can realize flexible memory dynamic management with the help of pointers.
  • Host Also known as the master node, it assumes the role of the master in the master-standby mode and provides read and write services to the outside world.
  • Standby Also known as a standby node, it assumes the role of a standby in the active-standby mode and does not provide read and write services to the outside world, but synchronizes data with the host through log playback.
  • a database is a kind of filing cabinet that can be regarded as electronic, and it is a collection of large amounts of data that is organized, shared, and managed in a long-term computer.
  • the user can perform operations such as adding, querying, updating, and deleting files in the database.
  • the host After receiving the user's operation request, the host usually modifies the memory page where the file is located in memory (for example, writing a new file at a specific address). data or update the data of existing address pages), and generate the corresponding log.
  • the log records the operation information of the database when the old version of the memory page V1 is modified to obtain the new version of the memory page V2, and the log playback is to re-execute the operation recorded in the log on the old version of the memory page V1, and then The memory page V2 of the new version is obtained. Therefore, if the memory page V2 of the new version is lost, the log can be used to replay the memory page V1 of the old version, and the memory page V2 of the new version can be obtained again.
  • the logs are written to disk in the order of modification, and a corresponding log is generated for each modification.
  • the original version of memory page X is V1.
  • the memory page X of the V2 version is obtained, and the corresponding log is LOGV2.
  • the memory page X of the V3 version is obtained, and the corresponding log If it is LOGV3, then the memory page X of the V2 version can be recovered according to the memory page X of the V1 version and the log LOGV2, and the memory page X of the V3 version can be recovered according to the memory page X of the V2 version and the log LOGV3.
  • the memory page of the latest version is the memory page obtained by the most recent modification in the host from the current moment.
  • the memory page of the V3 version is the memory page of the latest version
  • the memory page of the V2 version and the memory page of the V1 version are both.
  • the following description will be unified by taking the version number of the memory page of the latest version of the current host larger than the memory page of the old version as an example.
  • the memory page of the V4 version is the memory page of the latest version
  • the V1 The memory pages of the ⁇ V3 version are the memory pages of the old version.
  • log playback can occur in the following application scenarios.
  • the first application scenario is the scenario where the host where the database is deployed fails and the host recovers from the failure. It should be understood that, in order to ensure data security and improve system performance when the server crashes, in today's databases, data modification often involves modifying memory pages in the buffer pool first, and after generating the corresponding logs in the buffer pool, each The log is stored persistently and written to the log disk used to store the log, and then the modified memory page is stored persistently and written to the file disk used to store the data file.
  • the log disk and the file disk may be two storage spaces in the hard disk. However, since memory pages need to consume a lot of resources for persistent storage on disk, not every version of memory pages is written to the file disk. Usually, the latest version of memory pages will be written to the file disk under certain conditions.
  • Persistent storage such as writing the latest version of the memory page in the buffer pool to the file disk every half an hour, or writing the current latest version of the memory page to the file disk when the buffer pool is occupied by 70%, or whenever the memory page is used.
  • the modification amount reaches 1G
  • the latest version of the memory page in the buffer pool is written to the file disk and so on. Therefore, when a user commits a transaction to modify a memory page, each version of the written memory page is stored asynchronously, while each version of the log file is stored synchronously. In this way, when the database crashes, some memory pages in the buffer pool that have not yet been written to the file disk will be lost. After the database is restarted, the lost memory pages in the buffer pool need to be recovered. At this time, you can compare the version of the data page that has been persistently stored in the file disk with the latest log version in the log disk to determine the logs involved in failure recovery, and then play back the logs in sequence to restore the data in the buffer pool. .
  • the second application scenario is that after the host where the database is deployed fails, the host cannot be restarted, and the backup machine restores the data in the host machine and promotes it to the host machine.
  • This process is also called standby machine failover (failover) or role switching (switchover) .
  • the database generally uses backup to prevent data loss.
  • the host where the database is deployed is usually connected to multiple standby machines.
  • the logs on disk are backed up. Therefore, in the case of database failure recovery, not only the host can recover data through the above log playback method, but if the host fails to restart after a failure, the standby server can also restore the data state when the host fails through the above log playback method. Failover.
  • the third application scenario is an application scenario in which the backup machine backs up the data in the host, which is also called a replication scenario of the master and the slave.
  • the host can modify the memory page in the buffer pool and generate the corresponding log, and then send the log to the standby machine so that the standby machine can obtain the latest version of data by replaying the log.
  • the modification of the data page on the host is It can be synchronized to the standby machine, so as to achieve the purpose of copying the memory pages in the host machine by the standby machine. Since the size of the log is much smaller than the size of the memory page, the above-mentioned method of replaying the log on the standby machine to perform the master-standby replication can reduce the traffic.
  • the present application provides a database system 10, as shown in FIG. 1 , the system 10 includes a host 100, a standby The server 200 and the shared server 300, wherein, between the host 100 and the shared server 300, and between the standby machine 200 and the shared server 300 are all transmitted through a network, such as Ethernet (Ethernet), which can be a wired network or a wireless network. It is very strong in itself and does not impose specific restrictions.
  • Ethernet Ethernet
  • the host 100 and the standby machine 200 may be connected through the above-mentioned network, or may be connected through a bus, such as a high-speed serial computer expansion bus standard (Peripheral Component Interconnect-express, PCIE) bus, which is not specifically limited in this application.
  • PCIE Peripheral Component Interconnect-express
  • the host 100, the standby machine 200, and the shared server 300 may be physical servers, such as X86 servers, ARM servers, etc.; they may also be virtual machines (virtual machines) implemented based on general-purpose physical servers combined with network functions virtualization (NFV) technology.
  • virtual machine, VM virtual machine refers to a complete computer system with complete hardware system functions simulated by software and running in a completely isolated environment, such as a virtual machine in a cloud data center, which is not specifically limited in this application.
  • the host 100, the standby machine 200, and the shared server 300 may also be a server cluster composed of a plurality of the above-mentioned physical servers or a plurality of the above-mentioned virtual machines, and may also be other storage devices with storage functions, such as storage arrays, which are not specifically limited in this application. .
  • the host 100 and the standby machine 200 respectively include a buffer pool, a file disk and a log disk.
  • Shared server 300 includes shared server 310 .
  • the host 1 includes buffer pool 1, file disk 1 and log disk 1
  • the standby machine 1 includes buffer pool 2, file disk 2 and log disk 2
  • the standby machine 2 includes buffer pool 3, file disk 3 and log disk 3 as
  • the unit modules of the host 100, the standby machine 200 and the shared server 300 may be divided into various divisions, for example, the host 100, the standby machine 200 and the shared server 300 may also include processors, communication modules, etc.
  • each module may be a software module, or a hardware module, or a part of a software module and a part of a hardware module, which is not limited in this application.
  • the positional relationship between the devices and modules shown in FIG. 1 does not constitute any limitation.
  • the file disk 1 and the log disk 1 are placed in the host, and the file disk 2 and the log disk 2 are placed in the standby machine.
  • the file disk 3 and the log disk 3 are placed in the standby machine 2
  • the shared buffer pool 310 is placed in the shared server 300.
  • the file disk and/or the log disk can also be the host and/or the standby machine.
  • the external memory, the shared buffer pool 310 may also be the external memory of the shared server 300, which is not specifically limited in this application.
  • File disks and log disks are disks or storage arrays that can store data persistently, which can be on the host or standby. When the host or standby fails, the data in the file disks and log disks on the host or standby will not be lost.
  • the file disk is used to store data pages, which may be obtained by persistently storing memory pages in the buffer pool
  • the log disk is used to store logs.
  • the file disk and the log disk may be a hard disk drive (HDD), a solid state drive (SSD), a hybrid hard drive (HDD), redundant arrays independent disks, RAID) and so on.
  • the buffer pool and the shared buffer pool 310 are memories that can perform high-speed data exchange, wherein the buffer pools in the main machine and the standby machine can respectively include a continuous or discontinuous storage space of the memory or cache of the main machine or the standby machine, and the shared buffer pool 310 may be a continuous or discontinuous storage space of the memory or cache of the shared server 300 .
  • the buffer pool and the shared buffer pool 310 may also be a collection of partial storage spaces in the memory or cache of the multiple servers.
  • the read and write rate of the buffer pool is very fast, and the modification of memory pages and log generation are carried out in the buffer pool, but when the host or standby machine fails, the data in the buffer pool on the host or standby machine will be lost. .
  • the buffer pool and the shared buffer pool 310 may be volatile memory (volatile memory), random access memory (RAM), dynamic random access memory (dynamic RAM, DRAM), static random memory (static RAM, SRAM), synchronous dynamic random access memory (synchronous dynamic RAM, SDRAM), double rate synchronous dynamic random access memory (double data rate RAM, DDR), read-only memory (read-only memory, ROM), cache (cache), etc. , which is not specifically limited in this application.
  • the modified memory page (specifically, the data stored in the memory page or the indication information indicating the data or data structure in the memory page, wherein the data refers to The valid data stored in the database) is sent to the shared server 300, so that either the standby machine 200 or the host 100 can obtain memory pages from the shared server 300 as required.
  • the host 100 needs to consume a lot of system resources to write the memory page to the disk. Therefore, the host 100 shown in FIG. There is a big gap between the page format and the format of the latest log, the number of valid logs participating in log playback is large, and the log playback takes a long time.
  • the host 100 may send the memory page to the shared server 300 through the network, and the shared server 300 performs persistent storage on the memory page. It can be understood that, compared with the host 100 writing the memory page to the disk for persistent storage, the system resources consumed by directly sending the memory page to the shared server 300 are very few. Therefore, in the solution provided by this application, the host 100 can share The server 300 persistently stores memory pages, so that the frequency of persistence of memory pages can be greatly increased, and when the latest version of memory pages is obtained through log playback, the version gap between the persistently stored memory pages and the latest version of memory pages is greatly reduced. Small, the required number of log playbacks is greatly reduced, and even without log playback, the latest version of the memory page can be directly obtained from the shared server 300, thereby improving the efficiency of data recovery.
  • the host 100 can write the memory page to the shared server 300 according to the time period, for example, write the latest version of the memory page to the shared server 300 every 1 minute; it can also write the memory page to the shared server 300 according to the number of modifications, For example, every time the memory page is modified 5 times, the latest version of the memory page is written to the shared server 300; the memory page can also be written to the shared server 300 according to the modification amount. For example, when a 50G memory page is modified, the latest version of the memory page For writing to the shared server 300, the frequency of writing memory pages to the shared server 300 may be specifically determined according to experience, which is not specifically limited in this application.
  • the database system 10 is flexible in deployment, and can be deployed in a database system including a single host 100, multiple standby machines 200, and a single shared server 300, such as the database system shown in FIG. 1; Deployed in a database system including multiple hosts 100, multiple standby machines 200, and multiple shared servers 300, such as the database system shown in FIG. 3, the database system includes two hosts 100, respectively a host 1001 and a host 1002, Among them, the backup machine 2001, the backup machine 2002, and the backup machine 2003 back up the data in the host 1001, and the host 1001 can write the latest version of the memory page to the shared buffer pool 3101 of the shared server 3001 at a preset frequency for persistent storage.
  • the backup machine 2004, the backup machine 2005 and the backup machine 2006 back up the data in the host 1002, and the host 1002 can write the latest version of the memory page to the shared buffer pool 3102 of the shared server 3002 at a preset frequency for persistent storage.
  • FIG. 2 is only used for illustration, and the database system provided by this application does not limit the number of hosts 100, standby machines 200 and shared servers 300, and each host 100 may correspond to one shared server 300, or may correspond to multiple servers 300.
  • the number of shared servers 300 is not limited in this application.
  • the shared server 300 may be other servers that exist independently of the host 100 and the standby server 200 , or may be the same server as the standby server 200 , that is, the shared buffer pool 310 is a part of the cache on the standby server 200 , the standby machine 200 can read memory pages from the shared buffer pool 310 in the form of DMA, which is not specifically limited in this application.
  • the host 100 can write the modified memory pages into the shared buffer pool 310 of the shared server 300 through the network, so that the frequency of persistent storage of the memory pages by the host 100 can be greatly increased , and then use the log to perform log playback of memory pages to restore the latest version of memory pages, due to the increase in the frequency of persistent storage of memory pages, the version gap between the persistently stored memory pages and the latest version of the memory page is reduced, and the data
  • the number of log playbacks during recovery is greatly reduced, and in some scenarios, log playback operations are not even required, and the latest version of memory pages are directly obtained from the shared server 300, which improves the efficiency of log playback and data recovery, thereby enabling database system failure recovery,
  • the efficiency of failover and primary-standby replication is improved.
  • the present application provides a data processing method, which is applied to the database system shown in FIG. 1 .
  • the system includes a host computer 100 , a standby computer 200 and a shared server 300 , wherein the host computer 100 and the standby computer 200 and the shared server 300 are connected to each other, and the method includes the following steps:
  • S410 The host 100 sends the first version data and the current latest version information to the sharing server 300.
  • the first version data may be a memory page of a certain version, such as the memory page of the V1 version shown in FIG. 1 .
  • the host 100 can write the memory pages to the shared server 300 according to the preset frequency, and the preset frequency can include the time period, the modification times, and the modification amount, etc., which will not be repeated here.
  • the information of the current latest version refers to the latest version of the memory page of the host 100 in step S410. Assuming that the current second version data is the latest version of the memory page, the information of the current latest version may include the information of the second version.
  • the host 100 may write the first version data to the shared server 300 by means of remote direct memory access (RDMA).
  • RDMA is a direct memory access technology.
  • An intelligent network interface card (iNIC) can transfer memory pages from the memory of the host 100 to the memory of the remote shared server 300 without the intervention of the CPUs of both parties.
  • iNIC intelligent network interface card
  • the information of the current latest version is the information of the second version.
  • the host 100 writes the first version data into the shared buffer pool by means of RDMA
  • the first version data is usually copied into a buffer for RDMA first, and then RDMA is performed, and in the process of data copying , maybe the host 100 is modifying the first version data to the second version data, so that when the host 100 performs RDMA, the latest version of the memory page is no longer the first version data, but becomes the second version data, so the host 100 will After the first version data is copied into the cache, when the RDMA starts, the information of the current latest version of the memory page corresponding to the first version data can be sent to the shared buffer pool at the same time.
  • the version written to the memory page is V1
  • the latest version is V3.
  • the shared buffer pool can record the information of the received version of the first version of the data and the latest version, and generate a data version relationship based on this.
  • the data version relationship is used to record the recovery dependencies of different versions of the data.
  • the host or standby machine can obtain the data version relationship from the shared buffer pool, and use this for log playback.
  • the host 100 compared to writing data to the file disk of the host 100 for persistent storage, the host 100 writes the first version of the data to the shared server 300 through RDMA, and the shared server 300 persists the first version of the data.
  • storage which can reduce the system resources consumed by the host 100 for persistent storage, thereby increasing the frequency of the host 100 for persistent storage of memory pages, so that when the log is played back, the format of the persistent storage in the file disk and the latest memory
  • the gap between page formats is greatly reduced, the number of valid logs participating in log playback is small, and the latest version of memory pages can be directly obtained from the shared server 300 without log playback, thereby improving data recovery efficiency.
  • the host 100 sends the first log to the standby machine 200.
  • the first log records the operation process of modifying the data of the first version to the data of the second version, and the first log is used to identify the operation process of changing the second version relative to the first version, and the first version is earlier than the second version.
  • the first log is used for log playback, and the first log is played back according to the first version data to obtain the second version data.
  • the process of the host 100 modifying the first version data and generating the second version data and the first log and the description of the log may refer to the embodiments in FIG. 1 to FIG. 2 , which will not be repeated here.
  • first version is earlier than the second version may mean that the first version and the second version are adjacent versions, wherein the second version is a new version and the first version is an old version, as shown in FIG. 1 .
  • the V1 version is the first version
  • the V2 version is the second version.
  • the first version is earlier than the second version may also mean that there are one or more versions between the first version and the second version, for example, there is a third version between the first version and the second version, wherein the second version and the third version
  • the versions are adjacent, and the third version is adjacent to the first version, wherein the first version can be modified to obtain the third version, and the third version can be modified to obtain the second version
  • the first log can include the second log and a third log
  • the second log is used to identify the operation process of the third version changing relative to the first version
  • the third log is used to identify the operation process that the second version is changed relative to the third version, still shown in Fig.
  • the V1 version is the first version
  • the V2 version is the third version
  • V3 is the second version
  • LOGV2 is the second log
  • LOGV3 is the third log.
  • the host 100 may send the first log to the standby machine 200 by means of direct memory access (DMA).
  • DMA direct memory access
  • the standby machine 200 can also obtain the data version relationship from the shared buffer pool first, and obtain the first version data from the shared buffer pool based on this, and then according to the first log sent by the host 100 and the shared buffer pool.
  • log playback is performed on the first version data according to the recovery dependencies recorded in the data version relationship to complete the failover.
  • each time the host 100 generates a new log the log will be stored in the log disk for persistent storage. In this way, even if the host 100 fails and the memory pages in the cache are lost, the host 100 can Obtain the data version relationship from the shared buffer pool, obtain the first version data from the shared buffer pool based on this, obtain the first log from the log disk, and then perform log playback of the first version data according to the recovery dependency recorded by the data version relationship, The failure recovery of the host 100 is realized. The detailed description of this process will be described in steps S470 to S480 below.
  • the sharing server 300 generates a data version relationship according to the first version data and the current latest version information.
  • the data version relationship is used to record the recovery dependencies of different versions of data, and the recovery dependencies can refer to the order of log playback.
  • the memory page of the version and the log LOGV3 get the V3 version through log playback.
  • the shared buffer pool 310 in the shared server 300 can realize the function of recording the data version relationship by recording the start point, recovery point and end point of each memory page, wherein the start point is that the memory page is in the shared buffer pool
  • the recovery point is the version corresponding to the first version data received by the shared server 300 and sent by the host, that is, the first version
  • the end point is the information corresponding to the current latest version received by the shared server 300 and sent by the host.
  • the shared buffer pool can record that the starting point of memory page 1 in the shared buffer pool is V0
  • the shared buffer pool can record that the recovery point of memory page 1 is V2 and the end point is V3.
  • the recovery point and end point of the memory page corresponding to the first version data can be updated based on this, so as to realize the recovery of recording different versions of data Purpose of dependencies.
  • FIG. 4 is a data version relationship recorded in a shared buffer pool in an application scenario.
  • the data version relationship is the data version relationship of memory page 1. It is assumed that when the shared buffer pool receives memory page 1 for the first time at time T0, its data version is V0, so the starting point 1 of memory page 1 corresponds to version V0.
  • the shared buffer pool receives the V3 version of memory page 1 at time T1, and also receives the current latest version information of memory page 1 as V5, the shared buffer pool can update the recovery point corresponding to V3 and the end point corresponding to V5.
  • the data version relationship can be obtained from the shared server 300, and the recovery dependency relationship this time is determined as: first, the V4 version is recovered according to the V3 version, and then the V4 version is recovered according to the V4 version.
  • the memory page 1 of the V3 version can be obtained from the shared server 300, and log playback is performed in sequence according to the recovery dependencies.
  • the shared buffer pool 310 can manage the received memory pages, and each time a new memory page is received, the historical version of the memory page that has been stored in the standby machine is deleted, thereby improving memory utilization If the memory page has not been modified for a long time, it can also be deleted, thereby achieving the purpose of saving memory occupation and improving the recovery efficiency of frequently modified memory pages (ie, hot pages).
  • the shared server 300 can manage the shared buffer pool through a linked list.
  • the linked list can follow the principle of first-in, first-out.
  • the linked list can be set to a fixed length.
  • the memory page at the head ie, the memory page that has not been modified for a long time
  • the memory page at the head will be deleted from the linked list, thereby saving The purpose of memory occupation, and at the same time, more hot pages can be stored in the shared buffer pool, and the data recovery efficiency of hot pages can be improved.
  • the shared server 300 first determines that the historical version of page 1 already exists in the linked list S1, that is, V1 version, then put the new page at the end of the linked list S1, and delete the page 1 of the historical version from the linked list S1 to obtain the updated linked list S2. Similarly, assuming that at time T2, the host 100 writes page n+1 into the shared buffer pool through RDMA, and the shared server 300 first determines that there is no historical version of page n+1 in the linked list S2, and then puts the new page into the linked list At the end of S1, if the length of the linked list reaches the threshold, the page 0 of the V6 version of the header is deleted at this time. It should be understood that the above examples are only for illustration, and are not specifically limited in the present application.
  • each new page received by the shared buffer pool 310 will be placed at the end of the linked list, so the page at the head is a page that has not been modified for a long time (also known as a cold page), and this page is very likely to be
  • This type of pages can be deleted from the shared buffer pool to save memory, so that the shared buffer pool can store more hot pages with high modification frequency and improve fault recovery. , failover and primary-standby replication efficiency.
  • FIG. 4 is only used for illustration, and is not specifically limited in the present application.
  • the linked list can be implemented by the least recently used (LRU) algorithm.
  • LRU least recently used
  • the LRU algorithm can record the experience of a memory page since it was last accessed by assigning an access field to each memory page in the shared buffer pool. time t, and select the existing page with the largest t value, and eliminate the page that has not been used for the longest time recently. It should be understood that the linked list can also be implemented by other algorithms, which is not limited in this application.
  • each shared server can serve different hosts and store different memory pages sent by different hosts.
  • the shared server 1 receives and stores the memory page sent by the host 1, and records the data version relationship corresponding to the memory page
  • the shared server 2 receives and stores the memory page sent by the host 2, and records the memory page The data version relationship corresponding to the page.
  • a shared server provides services to multiple hosts, when the shared server stores and records the data version relationship, it will additionally record the information of the host and standby machine to which each memory page belongs.
  • shared server 3 is host 3 and Host 4 serves, then the data version relationship of memory page 1 recorded by shared server 3 corresponds to host 1, and the data version relationship of memory page 2 corresponds to host 2.
  • the storage space of the shared buffer pool can also be divided according to the number of hosts.
  • the storage space 1 is used to store the memory pages of the host 1
  • the storage space 2 is used to store the memory pages of the host 2, which is not specifically limited in this application.
  • the standby machine 200 obtains the data version relationship from the shared server 300 .
  • steps S440 to S450 may occur in the scenario of master-standby replication or standby machine failover.
  • the standby machine 200 may send a request for obtaining the data version relationship to the shared server 300. If the standby machine 200 needs to restore the latest version of memory page 1, the acquisition request may carry the identifier of memory page 1.
  • the shared buffer pool 310 writes the data version relationship into the standby machine 200 by means of RDMA. If the shared buffer pool 310 is deployed on the standby machine 200, that is, the shared server 300 and the standby machine 200 are the same server, Then, the shared buffer pool 310 can also write the data version relationship into the buffer pool of the standby machine 200 by means of DMA.
  • S450 The standby machine obtains the first version data and the first log according to the data version relationship.
  • the standby machine may first determine whether the first version data and the first log already exist in the file disk or buffer pool of the standby machine according to the data version relationship.
  • the first version of the data already exists in the file, and the first log already exists in the buffer pool or log disk the standby machine can perform step S460 to perform log playback; if the first version of the data already exists in the file disk or buffer pool of the standby machine. , but the first log does not exist in the buffer pool or log disk, the standby machine can send a log acquisition request to the host 100 or wait for the host 100 to transmit the first log, and then perform log playback to further improve the accuracy of log playback.
  • the standby machine 200 may send an acquisition request to the shared buffer pool 310, and the shared buffer pool 310 responds to the request. Obtain the request, and write the first version data into the standby machine 200 by means of RDMA. If the shared buffer pool 310 is deployed on the standby machine 200, that is, the shared server 300 and the standby machine 200 are the same server, then the shared buffer pool 310 also The first version data can be written into the buffer pool of the standby machine 200 by means of DMA.
  • a check code may be added to the end of the data of the first version, and the check code is used by the standby machine to determine the received first version of the data. Whether the version data is complete. It should be understood that in the scenario of active-standby replication, the standby machine continuously obtains memory pages from the shared server, and performs log playback in combination with the logs sent by the host to realize active-standby replication. If the shared server writes the first version of data to the standby machine, the process of If the backup machine determines that the first version of data exists in its buffer pool, the check code can be used to further confirm whether the first version of the data is complete.
  • the backup machine 200 can send an acquisition request to the shared server 300, so as to avoid the occurrence of log playback failure due to incomplete data of the first version; if the data is complete, the backup machine 200 can perform step S460.
  • the above check code is used for illustration, and the shared server and the standby machine can also ensure the integrity of data communication in other ways, such as through a completion queue, the completion queue contains completed work requests in the work queue, Whether the first version data is complete is determined according to the completion status in the queue.
  • S460 The standby machine restores the data of the second version according to the data of the first version and the first log.
  • the standby machine can restore the data of the second version according to the data of the first version and the first log.
  • the second log is used to identify the operation process that the third version changes relative to the first version
  • the third log is used to identify the operation process that the second version changes relative to the third version
  • the backup When the host performs primary-standby replication or failover, and the host performs fault recovery, the data recovery can be completed after the first version of data is obtained from the shared server 300 without log playback. The replication efficiency is greatly improved.
  • the host 100 obtains the log LOGV2 and the V2 version of the memory page 1 after modifying the V1 version of the memory page 1, and obtains the log LOGV3 after modifying the V2 version of the memory page 1.
  • the host 100 can send the log LOGV2 and the log LOGV3 to the standby machine 200, and write the V3 version of the memory page 1 into the shared buffer pool 310, if this
  • the backup machine 200 can first obtain the data version relationship of memory page 1 from the shared buffer pool 310, and the recovery point for obtaining memory page 1 is the V3 version, and the end point is also V3 version, the version gap between the two is 0, which means that log playback is not required for this failover.
  • the standby machine 200 can first determine that there is no memory page 1 of the V3 version on the local machine, and then obtain the V3 version from the shared buffer pool 310.
  • the memory page 1 of the version completes this failover, which greatly improves the efficiency of failover. Similarly, it can also improve the efficiency of active-standby replication. It should be understood that the above examples are for illustration only, and do not constitute specific limitations.
  • step S470 The host 100 obtains the data version relationship from the sharing server 300, and obtains the first version data and the first log according to the data version relationship. It should be understood that steps S470 to S480 occur in the scenario of failure recovery of the host 100, and the process of obtaining the data version relationship and the first version data from the shared server 300 by the host 100 may refer to step S440 and its optional steps, which will not be repeated here.
  • the host 100 can first determine the host. Whether there is a first version of data stored in the file disk, if not, the first version data can be obtained from the shared server 300, and then the first log can be obtained from the log disk.
  • S480 The host 100 restores the second version data according to the first version data and the first log. For this step, reference may be made to the foregoing step S450 and its optional steps, which will not be repeated here.
  • the database system includes a host 1, a standby server 1 and a shared server 1, wherein the host 1 includes a buffer pool 1, a log disk 1 and a file disk 1 , the standby machine 1 includes a buffer pool 2 , a log disk 2 and a file disk 2 , and the shared server includes a shared buffer pool 1 .
  • Standby 1 first replicates host 1 in the active and standby mode. After host 1 generates V5 memory page 1, a fault occurs and cannot be restarted temporarily. After standby 1 fails over, host 1 restarts successfully and starts data recovery.
  • the data processing method provided by this application includes the following steps 1 to 11, wherein, steps 1 to 4 are the scenario of master-standby replication, steps 5 to 8 are the scenario of standby machine failover, and step 9 ⁇ Step 11 is a scenario of host failure recovery.
  • steps 1 to 4 are the scenario of master-standby replication
  • steps 5 to 8 are the scenario of standby machine failover
  • step 9 ⁇ Step 11 is a scenario of host failure recovery.
  • Step 1 Host 1 modifies the memory page 1 of the V1 version to the memory page 1 of the V2 version, writes the V2 version of the memory page 1 into the shared buffer pool 1, and writes the information of the latest version of the current memory page 1 into the shared buffer pool 1. Buffer pool 1. For this step, reference may be made to steps S410 to S420 in the foregoing content, which will not be repeated here.
  • FIG. 6 takes the information of the latest version as V3 in step 1 as an example.
  • the host 1 copies the V2 version of the memory page 1 to the cache for RDMA
  • the host 1 has Page 1 is modified to version V3, so host 1 executes the RDMA step to write memory page 1 of version V2 into shared buffer pool 1, and also sends the information that the latest version of memory page 1 is V3 to the shared buffer pool.
  • Step 2 The shared server 1 records or updates the data version relationship of the memory page 1 based on the received information of the V2 version of the memory page 1 and the latest version of V3. Specifically, the memory page in the shared buffer pool can be processed through the linked list. Update, delete the memory page 1 of the historical version from the shared buffer pool, and then update the recovery point and end point of memory page 1. For the content not described in this step, reference may be made to the foregoing step S430, which will not be repeated here.
  • the shared server 1 can first determine whether there is a historical version of the memory page 1 in the shared buffer pool 1, and if so, delete the historical version of the memory page 1 first, and then put the memory page 1 of the V2 version into the shared The end of the linked list in the buffer pool; if it does not exist, the memory page 1 of the V2 version is placed at the end of the linked list in the shared buffer pool. Then point the recovery point of memory page 1 to version V2 to obtain recovery point 1, and point the end point of memory page 1 to version V3 to obtain end point 1.
  • Step 3 The standby machine 1 obtains the data version relationship of the memory page 1 from the shared buffer pool 1, and obtains the V2 version of the memory page 1 and the log LOGV2 according to the data version relationship.
  • steps S440 to S450 in the foregoing content It is not repeated here.
  • the standby machine 1 can first determine whether the memory page 1 exists in the file disk or buffer pool of the standby machine 1. If the memory page 1 does not exist in the file disk or buffer pool of the standby machine 1, the standby machine 1 can send a message to the shared buffer pool. For the acquisition request of the memory page 1, the shared buffer pool 1 returns the V2 version of the memory page 1 to the standby machine; if the V2 version of the memory page 1 exists in the file disk or buffer pool of the standby machine 1, the standby machine 1 can further determine the V2 version Whether the memory page 1 of the version is complete can be confirmed by means of verification code authentication. Refer to the aforementioned step S450 for its optional implementation, which will not be repeated here.
  • the memory of the V2 version can be obtained from the shared buffer pool. If page 1 is complete, you can perform step 4 to play back the log. It should be understood that if the standby machine 1 has not yet received the log LOGV2 sent by the host 1, the standby machine 1 may send an acquisition request to the host, or perform step 4 after receiving the log LOGV2 sent by the host 1.
  • Step 4 Based on the data version relationship, the backup machine 1 plays back the log LOGV3 according to the memory page 1 of the V2 version obtained in step 3 to obtain the current latest version of the memory page V3, so as to achieve the purpose of the backup machine 1 synchronizing the memory pages in the host 1.
  • Implement primary-standby replication For the content not described in this step, reference may be made to the foregoing step S460, which will not be repeated here.
  • the transmission step can occur at any time between steps 1 and 4, which is not limited in this application.
  • Step 5 The host 1 modifies the memory page 1 of the V3 version to the V4 version, and writes the memory page 1 of the V4 version to the shared buffer pool 1.
  • the host 1 modifies the memory page 1 of the V3 version to the V4 version, and writes the memory page 1 of the V4 version to the shared buffer pool 1.
  • FIG. 6 takes the information of the latest version as V5 in step 5 as an example.
  • the host 1 copies the V4 version of the memory page 1 to the cache for RDMA
  • the host 1 has Page 1 is modified to V5 version, so host 1 executes the RDMA step to write V4 version memory page 1 into shared buffer pool 1, and also sends the information that the latest version of memory page 1 is V5 to the shared buffer pool.
  • Step 6 The shared server 1 updates the data version relationship of the memory page 1 according to the received information of the V4 version of the memory page 1 and the latest version of V5. Specifically, the memory page in the shared buffer pool can be updated through the linked list, and the history can be updated.
  • the memory page 1 of the version is deleted from the shared buffer pool (the memory page 1 of the V2 version received in step 1 is deleted, and the newly received memory page of the V4 version is stored at the end of the linked list), and then the recovery point of the memory page 1 and The end point is updated.
  • the foregoing step S430 and step 2 for the content not described in this step, reference may be made to the foregoing step S430 and step 2, and details are not repeated here.
  • the shared server 1 can first determine whether there is a historical version of the memory page 1 in the shared buffer pool 1. Since the memory page 1 of the V2 version received in step 1, the shared server 1 first deletes the memory page 1 of the V2 version, Then, the memory page 1 of version 4 is placed at the end of the linked list in the shared buffer pool. Then, point the recovery point of memory page 1 from version V2 to version V4 to obtain recovery point 2, and point the end point of memory page 1 from version V3 to version V5 to obtain end point 2.
  • Step 7 The standby machine 1 receives the message that the host 1 is faulty and cannot be recovered temporarily, starts the failover, obtains the data version relationship of the memory page 1 from the shared buffer pool 1, and obtains the memory page 1 and log of the V4 version according to the data version relationship.
  • LOGV4 for this step, reference may be made to steps S440 to S450 and step 3 in the foregoing content, which will not be repeated here.
  • Step 8 According to the version gap between recovery point 2 and end point 2 of memory page 1, that is, the gap between the V4 version and the V5 version, the standby machine 1 determines that the valid log required for this log playback is LOGV5, and then according to the steps 7. The memory page 1 of the V4 version is obtained, and the valid log LOGV5 is played back to obtain the memory page V5 of the latest version, so as to achieve the purpose of failover of the standby machine 1. For this step, reference may be made to step S460 and step 4 in the foregoing content, and details are not repeated here.
  • Step 9 After receiving the fault recovery command, the host 1 obtains the data version relationship from the shared buffer pool 1, and obtains the V4 version memory page 1 from the local file disk or shared server 1 according to the data version relationship, and obtains the memory page 1 from the local log disk. Get log LOGV5.
  • step S470, step 3, and step 7 For this step, reference may be made to step S470, step 3, and step 7 in the foregoing content, and details are not repeated here.
  • Step 10 Host 1 obtains the latest version of memory page V5 by playing back the log LOGV5 according to the recovery dependency relationship recorded in the data version relationship, and then according to the memory page 1 of the V4 version obtained in step 9, so as to achieve the purpose of host failure recovery.
  • step S480, step 4 and step 8 For the content not described in this step, reference may be made to the foregoing step S480, step 4 and step 8, which will not be repeated here.
  • the host can write the modified memory page into the shared buffer pool of the shared server through the network, which reduces the resources consumed by the host for persistently storing the memory page, and enables the host to store the memory page.
  • the frequency of persistent storage has been increased, thereby reducing the number of log playbacks required in the data recovery process.
  • log playback operations are not even required, and the latest version of memory pages are directly obtained from the shared server, improving log playback and data.
  • the efficiency of recovery is improved, thereby improving the efficiency of database system host failure recovery, standby machine failover, and active-standby replication.
  • FIG. 7 is a schematic structural diagram of a host 100 provided by the present application.
  • the host 100 includes a sending unit 710 , an obtaining unit 720 and a restoring unit 730 .
  • the sending unit 710 is used by the host to send the data of the first version and the information of the current latest version to the shared server, wherein the data of the second version is the latest data in the host at the current moment, and the information of the current latest version includes the information of the second version, and the shared server It is connected with the host computer and the standby computer, the host computer is used to receive read and write requests to the database system, and the standby computer is a device used for backing up data in the database system.
  • the sending unit 710 is further configured to send a first log to the standby machine, where the first log is used to identify an operation process of changing the second version relative to the first version, and the first version is earlier than the second version.
  • the host 100 further includes an acquisition unit 720 and a recovery unit 730, wherein the acquisition unit 720 is used to acquire the data version relationship recorded by the shared server, and the data version relationship is used to record the recovery dependency relationship of data of different versions, and the data version relationship. It is obtained by the shared server according to the first version data and the current latest version information sent by the host; the obtaining unit 720 is used to obtain the first version data and the first log according to the data version relationship; the recovery unit 730 is used to obtain the first version data and the first log according to the first version data and The first log restores the second version data in the host.
  • the acquisition unit 720 is used to acquire the data version relationship recorded by the shared server, and the data version relationship is used to record the recovery dependency relationship of data of different versions, and the data version relationship. It is obtained by the shared server according to the first version data and the current latest version information sent by the host; the obtaining unit 720 is used to obtain the first version data and the first log according to the data version relationship; the recovery unit 730 is used to obtain
  • the sending unit 710 is configured to write the data of the first version into the shared server by using the remote direct memory access (RDMA) method, and send the information of the current latest version to the shared server.
  • RDMA remote direct memory access
  • the first version being earlier than the second version includes: the first version is a version adjacent to the second version.
  • the fact that the first version is earlier than the second version includes: the second version is adjacent to the third version, and the third version is adjacent to the first version; the first log includes a second log and a third log, wherein the second The log is used to identify the operation process that the third version changes relative to the first version, and the third log is used to identify the operation process that the second version changes relative to the third version;
  • the data of the third version is restored in the host; the restoring unit is used to restore the data of the second version in the host according to the data of the third version and the third log.
  • the host provided by the present application can write the modified memory page into the shared buffer pool of the shared server through the network, which reduces the resources consumed by the host for persistently storing the memory page, and enables the host to persistently store the memory page.
  • the frequency of data recovery has been increased, thereby reducing the number of log playbacks required in the data recovery process.
  • log playback operations are not even required, and the latest version of memory pages are directly obtained from the shared server, improving the efficiency of log playback and data recovery. This further improves the efficiency of database system host failure recovery, standby machine failover, and active-standby replication.
  • the host in this embodiment may be implemented by an application-specific integrated circuit (ASIC), or a programmable logic device (PLD), and the PLD may be a complex program logic device (complex).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • CPLD programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • the host 100 may correspond to executing the method described in this embodiment, and the above-mentioned and other operations and/or functions of the various units in the host 100 are respectively for implementing the corresponding methods in FIG. 2 to FIG. 6 .
  • the process, for the sake of brevity, will not be repeated here.
  • FIG. 8 is a schematic structural diagram of a shared server 300 provided by the present application.
  • the shared server 300 receives a unit 810 , a generating unit 820 and a sending unit 830 .
  • the receiving unit 810 is configured to receive the data of the first version and the information of the current latest version sent by the host, wherein the data of the second version is the latest data in the host at the current moment, the information of the current latest version includes the information of the second version, and the shared server is connected with the information of the second version.
  • the main machine is connected to the standby machine, the main machine is used to receive read and write requests to the database system, and the standby machine is a device used for backing up data in the database system.
  • the generating unit 820 is configured to generate a data version relationship according to the first version data and the information of the current latest version, and the data version relationship is used to record the restoration dependency relationship of data of different versions.
  • the sending unit 830 is configured to send the data version relationship to the standby machine.
  • the receiving unit 810 is further configured to receive an acquisition request of the first version data sent by the standby machine, wherein, when the acquisition request of the first version data is performed, the standby machine determines that the data version stored by the standby machine is earlier than the first version according to the data version relationship. It is generated after one version; the sending unit 830 is further configured to send the data of the first version to the standby machine.
  • the sending unit 830 is used to send the first version data and the check code to the backup machine, the check code is located at the end of the first version data, and the check code is used for the backup machine to determine that the received first version data is. complete data.
  • the sharing server 300 further includes a deletion unit 840, and the deletion unit 840 is configured to store the data of the first version and delete the data whose data version is earlier than the first version.
  • the deletion unit 840 is further configured to delete the earliest received data when the amount of stored data reaches a threshold.
  • the shared server provided by the present application can receive and store memory pages sent by the host through the network, which reduces the resources consumed by the host for persistently storing memory pages, and increases the frequency of the host for persistent storage of memory pages, thereby increasing the frequency of persistent storage of memory pages by the host.
  • the number of log playbacks required in the data recovery process is reduced.
  • log playback operations are not even required, and the latest version of memory pages can be obtained directly from the shared server, which improves the efficiency of log playback and data recovery, thereby causing the database system host to fail.
  • Recovery, standby failover, and primary-standby replication are all more efficient.
  • the host in this embodiment may be implemented by an application specific integrated circuit (ASIC), or a programmable logic device (PLD), and the above PLD may be a complex program logic device (CPLD), a field programmable gate array (FPGA) , General Array Logic (GAL), or any combination thereof.
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • CPLD complex program logic device
  • FPGA field programmable gate array
  • GAL General Array Logic
  • the shared server 300 may correspond to executing the method described in this embodiment, and the above-mentioned and other operations and/or functions of the respective units in the shared server 300 are to implement the respective methods in FIG. 2 to FIG. 6 , respectively.
  • the corresponding process for the sake of brevity, will not be repeated here.
  • FIG. 9 is a standby machine 200 provided by the present application.
  • the standby machine 200 includes an acquisition unit 910 and a recovery unit 920 .
  • the obtaining unit 910 is used to obtain the data version relationship recorded by the shared server, the data version relationship is used to record the recovery dependency relationship of data of different versions, the backup machine is a device used for backing up data in the database system, and the shared server is connected to the backup machine and the host , the host is used to receive read and write requests to the database system;
  • the obtaining unit 910 is configured to obtain the first version data and the first log according to the data version relationship, wherein the second version data is the latest data in the host at the current moment, and the first log is used to identify the change of the second version relative to the first version. Operation process, the first version is earlier than the second version;
  • the restoration unit 920 is configured to restore the data of the second version in the standby machine according to the data of the first version and the first log.
  • the first version being earlier than the second version includes: the first version is a version adjacent to the second version.
  • the fact that the first version is earlier than the second version includes: the second version is adjacent to the third version, and the third version is adjacent to the first version; the first log includes a second log and a third log, wherein the second The log is used to identify the operation process that the third version changes relative to the first version, and the third log is used to identify the operation process that the second version changes relative to the third version; the recovery unit 920 is used to identify the operation process of the first version data and the second log The data of the third version is restored in the standby machine; the restoring unit 920 is configured to restore the data of the second version in the standby machine according to the data of the third version and the third log.
  • the obtaining unit 910 is used to determine, according to the data version relationship, that the data version stored by the standby machine is earlier than the first version; the obtaining unit 910 is used to obtain the first version data from the shared server, and the standby machine receives the first version sent by the shared server. log.
  • the standby machine provided by the present application can obtain memory pages from the shared server for log playback after the host writes the memory pages to the shared server through the network for persistent storage. Since the host stores memory pages persistently through the shared server, This reduces the resources consumed by the host for persistent storage of memory pages, which in turn increases the frequency of persistent storage of memory pages by the host, thereby reducing the number of log playbacks required for the data recovery process. Some scenarios do not even require log playback. Operation, directly obtain the latest version of memory pages from the shared server, improve the efficiency of log playback and data recovery, and thus improve the efficiency of database system host failure recovery, standby machine failover, and active-standby replication.
  • the host in this embodiment may be implemented by an application specific integrated circuit (ASIC), or a programmable logic device (PLD), and the above PLD may be a complex program logic device (CPLD), a field programmable gate array (FPGA) , General Array Logic (GAL), or any combination thereof.
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • CPLD complex program logic device
  • FPGA field programmable gate array
  • GAL General Array Logic
  • the standby machine 200 may correspond to executing the method described in this embodiment, and the above-mentioned and other operations and/or functions of each unit in the standby machine 200 are for implementing the respective methods in FIG. 2 to FIG. 6 , respectively.
  • the corresponding process for the sake of brevity, will not be repeated here.
  • FIG. 10 is a schematic structural diagram of a data processing device 1000 provided by an embodiment of the present application.
  • the device 1000 for data processing may be the host, standby or shared server in the foregoing content.
  • the device 1000 for data processing includes: a processor 1010 , a communication interface 1020 and a memory 1030 .
  • the processor 1010, the communication interface 1020 and the memory 1030 can be connected to each other through the internal bus 1040, and can also communicate through other means such as wireless transmission.
  • the embodiments of the present application take the connection via the bus 1040 as an example, and the bus 1040 may be a peripheral component interconnect express (PCIe) bus or an extended industry standard architecture (EISA) bus or the like.
  • PCIe peripheral component interconnect express
  • EISA extended industry standard architecture
  • the bus 1040 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 10, but it does not mean that there is only one bus or one type
  • this embodiment may be implemented by a general-purpose physical server, for example, a physical server, such as an X106 server, etc., may also be implemented based on a general-purpose physical server combined with a network functions virtualization (Network Functions Virtualization, NFV) technology Realized by virtual machine (VM), virtual machine refers to a complete computer system with complete hardware system functions simulated by software and running in a completely isolated environment. It may also be implemented by a server cluster composed of a plurality of the foregoing physical servers or a plurality of the foregoing virtual machines, which is not specifically limited in this application.
  • NFV Network Functions Virtualization
  • the processor 1010 may be composed of at least one general-purpose processor, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip.
  • the above-mentioned hardware chip may be an application specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof.
  • the above-mentioned PLD may be a complex programmable logic device (CPLD), a field programmable logic gate array (FPGA), a general-purpose array logic (GAL), or any combination thereof.
  • the processor 1010 executes various types of digitally stored instructions, such as software or firmware programs stored in the memory 1030, which enable the electronic device 1000 to provide a wide variety of services.
  • the memory 1030 is used to store program codes, and is controlled and executed by the processor 1010 to execute the processing steps of the host in the above-mentioned embodiments of FIGS. 1-6 .
  • One or more software modules may be included in the program code.
  • the one or more software modules may be the software modules provided in the embodiment shown in FIG. 7 (in this embodiment each software module, such as a sending unit, an obtaining unit and a restoring unit).
  • the sending unit can be used for the host to send the information of the first version data and the current latest version to the shared server
  • the obtaining unit can be used to obtain the data version relationship recorded by the shared server, and obtain the first version data and the first log according to the data version relationship
  • restore The unit is operable to restore the second version data in the host based on the first version data and the first log.
  • it can be used to execute steps S410 to S420, steps S470 to S480 and their optional steps in the embodiment of FIG. 3 , and steps 1 and 4 and their optional steps in the embodiment of FIG. 6 , and can also be used to execute FIG. 1 - Other steps described in the embodiment of FIG. 6 will not be repeated here.
  • the memory 1030 is used to store program codes, and is controlled and executed by the processor 1010 to execute the processing steps of the shared server in the above-mentioned embodiments of FIGS. 1-6 .
  • One or more software modules may be included in the program code.
  • the one or more software modules may be the software modules provided in the embodiment shown in FIG. 8 (in this embodiment each software module, such as a receiving unit, a generating unit and a sending unit).
  • the receiving unit is used to receive the first version data and the current latest version information sent by the host
  • the generating unit is used to generate the data version relationship according to the first version data and the current latest version information
  • the sending unit is used to send the data version to the standby machine. relation.
  • it can be used to perform step S430 and its optional steps in the embodiment of FIG. 3 , step 2 and step 6 and its optional steps in the embodiment of FIG. 6 , and can also be used to perform other steps described in the embodiment of FIG. 1 to FIG. 6 . , which will not be repeated here.
  • the memory 1030 is used to store program codes, and is controlled and executed by the processor 1010 to execute the processing steps of the standby machine in the above-mentioned embodiments of FIG. 1 to FIG. 6 .
  • One or more software modules may be included in the program code.
  • the one or more software modules may be the software modules provided in the embodiment shown in FIG. 9 (in this embodiment each software module, such as an acquisition unit and a restoration unit).
  • the obtaining unit is used to obtain the data version relationship recorded by the shared server, and obtain the first version data and the first log according to the data version relationship
  • the restoring unit is used to restore the second version in the standby machine according to the first version data and the first log data.
  • it can be used to execute steps S440 to S460 and its optional steps in the embodiment of FIG. 3 , and steps 4 to 5, 7 to 8 and their optional steps in the embodiment of FIG. 6 , and can also be used to execute FIG. 1 .
  • Other steps described in the embodiment of FIG. 6 will not be repeated here.
  • the memory 1030 may include a volatile memory (Volatile Memory), such as a random access memory (Random Access Memory, RAM); the memory 1030 may also include a non-volatile memory (Non-Volatile Memory), such as a read-only memory (Read- Only Memory (ROM), flash memory (Flash Memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); the memory 1030 may also include a combination of the above types.
  • the memory 1030 may store program codes, and may specifically include program codes for executing other steps described in the embodiment of FIG. 4 or FIG. 5 , which will not be repeated here.
  • the memory 1030 may include a buffer pool, a file disk and a log disk, and in the case where the data processing device 1000 is the shared server in the foregoing content , the memory 1030 may include a shared buffer pool.
  • the communication interface 1020 can be a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface), or a wireless interface (such as a cellular network interface or using a wireless local area network interface) to communicate with other devices or modules.
  • a wired interface such as an Ethernet interface
  • PCIe Peripheral Component Interconnect express
  • the device 1000 for data processing may correspond to the host 100, the shared server 300, and the standby machine 200 in the embodiment of the present application, and may correspond to the corresponding subject executing the method shown in FIG. 3,
  • the above and other operations and/or functions of each module in the device 1000 are respectively to implement the corresponding processes of each method in FIG. 2 to FIG. 6 , and are not repeated here for brevity.
  • FIG. 10 is only a possible implementation manner of the embodiment of the present application.
  • the data processing device 1000 may further include more or less components, which is not limited here.
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes at least one computer instruction.
  • computer program instructions When computer program instructions are loaded or executed on a computer, the procedures or functions according to the embodiments of the present invention result in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server, or data center over a wire (e.g.
  • Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless means to transmit to another website site, computer, server or data center.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage node such as a server, a data center, or the like containing at least one set of available media.
  • Useful media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, high density digital video discs (DVDs), or semiconductor media.
  • the semiconductor media may be SSDs.

Abstract

一种数据处理方法,包括以下步骤:备机获取共享服务器记录的数据版本关系,然后根据数据版本关系获取第一版本数据和第一日志,其中,第二版本数据为当前时刻主机中最新的数据,第一日志用于标识第二版本相对于第一版本变更的操作过程,备机根据第一版本数据和第一日志在备机中恢复第二版本数据。主机将第一版本数据写入共享服务器,使得备机进行数据备份或者数据恢复时,主机或备机可从共享服务器中获取数据,从而降低数据备份和数据恢复所需的时间。

Description

数据处理的方法、设备及系统 技术领域
本申请涉及计算机领域,尤其涉及一种数据处理的方法、设备及系统。
背景技术
为了保证数据库的吞吐性能,现今的数据库中,数据的修改并非直接作用于内存页(data file),而是在缓冲池(buffer pool)对数据进行修改后,将修改过程生成日志并将该日志存储至日志盘,在缓冲池已满或者其他预设条件下再将缓冲池中的内存页写入文件盘。
由于缓冲池中存在一部分文件盘还未来得及存储的数据,在备机备份数据和主机损坏恢复数据的场景下,备机或者主机可以从文件盘中获取历史版本的内存页,再结合日志盘中存储的日志的顺序依次回放日志,才能获得最新版本的数据。这样,如果缓冲池中存在了大量还未写入文件盘中的数据,日志回放将耗费大量的时间,不仅影响备机备份效率,还影响主机损坏之后的恢复效率,使得整个数据备份和故障恢复过程耗时较长。
发明内容
本发明提供了一种数据处理的方法、设备及系统,用于解决数据备份和故障恢复过程耗时长的问题。
第一方面,提供了一种数据处理方法,该方法包括以下步骤:备机获取共享服务器记录的数据版本关系,并根据该数据版本关系获取第一版本数据和第一日志,其中,第一日志用于标识第二版本相对第一版本变更的操作过程,数据版本关系记录了不同版本的数据的恢复依赖关系,备机按照该恢复依赖关系,根据第一版本数据和第一日志在备机中恢复第二版本数据,恢复依赖关系可以是指日志回放的顺序,比如先根据V1版本的内存页和日志LOGV2通过日志回放得到V2版本,再根据V2版本的内存页和日志LOGV3通过日志回放得到V3版本。
由于主机通过网络将内存页写入共享服务器所消耗的系统资源低,使得主机将内存页进行持久化存储的频率可以得到提升,进而使得备机进行数据恢复时,可从共享服务器中获取内存页,该内存页的版本与日志盘中存储的最新日志之间的版本差距小,恢复最新版本的数据时所需的日志回放次数降低,个别场景甚至不需要进行日志回放操作,从共享服务器即可获取到最新版本的内存页,使得备机进行数据备份的效率提升,主机故障后备机进行故障转移的效率也可得到提升。
在一种可能的实现方式中,第一版本数据指的是第一版本的内存页,第二版本数据指的是该内存页的第二版本,上述第一版本早于第二版本。
可选地,第一版本早于第二版本指的是第一版本是与第二版本相邻的版本,换句话说,第一版本与第二版本之间不存在其他版本数据。此时备机可根据第一版本数据和第一日志进行一次日志回放,恢复出第二版本数据。
可选地,第一版本早于第二版本还可以是指的是第二版本与第一版本之间存在其他版本 数据,比如第二版本与第三版本相邻,第三版本与第一版本相邻,此时第一日志可包括第二日志和第三日志,其中,第二日志用于标识第三版本相对于第一版本变更的操作过程,第三日志用于标识第二版本相对于第三版本变更的操作过程。该种情况下备机根据第一版本数据和第一日志在备机中恢复第二版本数据时,可先根据第一版本数据和第二日志在备机中恢复第三版本数据,再根据第三版本数据和第三日志在备机中恢复第二版本数据。
可选地,第一版本还可以与第二版本是同一个版本,换句话说,共享服务器中存储的第一版本数据即为主机当前最新版本的数据,那么备机可以不需要日志回放,从共享服务器中获取第一版本数据,完成本次数据恢复,使得备机进行数据备份的效率提升,主机故障后备机进行故障转移的效率也可得到提升。
在一种可能的实现方式中,备机在根据数据版本关系获取第一版本数据和第一日志时,可先确定备机存储的数据版本早于该第一版本之后,再向共享服务器获取第一版本数据,如果备机存储的数据版本与第一版本相同,备机可从本地获取该第一版本数据。
备机先确定本机不存在该第一版本数据后,再向共享服务器获取该数据,可以避免本地已存在第一版本数据的情况下,再次向共享服务器获取数据,提高备机进行数据备份和数据恢复的效率。
第二方面,提供了一种数据处理方法,该方法包括以下步骤:主机向共享服务器发送第一版本数据和当前最新版本的信息,向备机发送第一日志,其中,第二版本数据为当前时刻主机中最新的数据,当前最新版本的信息包括第二版本的信息,共享服务器与主机和备机相连,主机用于接收对数据库系统的读写请求,备机为数据库系统中用于备份数据的设备,第一日志用于标识第二版本相对于第一版本变更的操作过程。
具体实现中,主机可按时间周期将内存页写入共享服务器,比如每隔1分钟将最新版本的内存页写入共享服务器;还可按修改次数将内存页写入共享服务器,比如内存页每修改5次,将最新版本的内存页写入共享服务器;还可按修改量将内存页写入共享服务器,比如有50G的内存页被修改时,将最新版本的内存页写入共享服务器,具体可根据经验确定内存页写入共享服务器的频率,本申请不作具体限定。
由于主机通过网络将内存页写入共享服务器所消耗的系统资源低,使得主机将内存页进行持久化存储的频率可以得到提升,进而使得主机或者备机进行数据恢复时,可从共享服务器中获取内存页,该内存页的版本与日志盘中存储的日志之间的版本差距小,恢复最新版本的数据时所需的日志回放次数降低,个别场景甚至不需要进行日志回放操作,从共享服务器即可获取到最新版本的内存页,使得备机进行数据备份的效率提升,主机故障后主机故障恢复或者备机进行故障转移的效率也可得到提升。
在一种可能的实现方式中,当主机发生故障进行故障恢复的应用场景中,主机可先向共享服务器获取其记录的数据版本关系,该数据版本关系用于记录不同版本的数据的恢复依赖关系,数据版本关系是共享服务器根据主机发送的第一版本数据和当前最新版本的信息获得的,然后根据数据版本关系获取第一版本数据和第一日志,最后根据第一版本数据和第一日志在主机中恢复第二版本数据。
由于主机通过网络将内存页写入共享服务器所消耗的系统资源低,使得主机将内存页进行持久化存储的频率可以得到提升,进而使得主机故障后进行数据恢复时,从共享服务器中获取的内存页版本与日志盘中存储的最新日志之间的版本差距小,数据恢复时所需的日志回放次数降低,个别场景甚至不需要进行日志回放操作,使得主机故障后进行故障恢复的效率也可得到提升。
在一种可能的实现方式中,主机可通过远程直接内存访问(remote direct memory access,RDMA)方法将第一版本数据写入共享服务器,向共享服务器发送当前最新版本的信息。
RDMA是一种直接内存访问的技术,可由智能网卡(intelligent network interface card,iNIC)将内存页从主机的内存传输到远端的共享服务器的内存,并且传输过程中无需双方CPU的介入,从而消除远端的共享服务器在接收数据时所需的复制和上下文切换的开销,达到低时延、低开销、高带宽传输数据的目的。进而使得主机通过共享服务器将内存页进行持久化存储的频率得到提升,使得主机故障后进行数据恢复时,从共享服务器中获取的内存页版本与日志盘中存储的日志之间的版本差距小,数据恢复时所需的日志回放次数降低,主机故障后进行故障恢复的效率也可得到提升。
在一种可能的实现方式中,第一版本数据指的是第一版本的内存页,第二版本数据指的是该内存页的第二版本,上述第一版本早于第二版本。
可选地,第一版本早于第二版本指的是第一版本是与第二版本相邻的版本,换句话说,第一版本与第二版本之间不存在其他版本数据。此时备机可根据第一版本数据和第一日志进行一次日志回放,恢复出第二版本数据。
可选地,第一版本早于第二版本还可以是指的是第二版本与第一版本之间存在其他版本数据,比如第二版本与第三版本相邻,第三版本与第一版本相邻,此时第一日志可包括第二日志和第三日志,其中,第二日志用于标识第三版本相对于第一版本变更的操作过程,第三日志用于标识第二版本相对于第三版本变更的操作过程。该种情况下主机根据第一版本数据和第一日志在主机中恢复第二版本数据时,可先根据第一版本数据和第二日志在主机中恢复第三版本数据,再根据第三版本数据和第三日志在主机中恢复第二版本数据。
可选地,第一版本还可以与第二版本是同一个版本,换句话说,共享服务器中存储的第一版本数据即为主机当前最新版本的数据,那么主机可以不需要日志回放,从共享服务器中获取第一版本数据,完成本次数据恢复,使得主机进行数据备份的效率提升,主机故障后主机进行故障转移的效率也可得到提升。
第三方面,提供了一种数据处理方法,该方法包括以下步骤:共享服务器接收主机发送的第一版本数据和当前最新版本的信息,其中,第二版本数据为当前时刻主机中最新的数据,当前最新版本的信息包括第二版本的信息,然后根据第一版本数据和当前最新版本的信息生成数据版本关系,再将该数据版本关系发送至备机,其中,数据版本关系用于记录不同版本的数据的恢复依赖关系。
由于主机通过网络将内存页写入共享服务器所消耗的系统资源低,使得主机将内存页进行持久化存储的频率可以得到提升,进而使得备机进行数据恢复时,可从共享服务器中获取内存页,该内存页的版本与日志盘中存储的最新日志之间的版本差距小,恢复最新版本的数据时所需的日志回放次数降低,个别场景甚至不需要进行日志回放操作,从共享服务器即可获取到最新版本的内存页,使得备机进行数据备份的效率提升,主机故障后备机进行故障转移的效率也可得到提升。
具体实现中,共享服务器中的共享缓冲池可通过记录每个内存页的起始点、恢复点和终点来实现记录数据版本关系的功能,其中,起始点是该内存页在共享缓冲池中最早出现的版本,恢复点是共享服务器接收到的由主机发送的第一版本数据对应的版本,也就是第一版本,终点是共享服务器接收到的由主机发送的当前最新版本的信息对应的版本,也就是第二版本,举例来说,共享缓冲池第一次接收到内存页1的版本为V0,那么共享缓冲池可以记录内存页1在共享缓池冲中的起始点为V0,当共享缓冲池接收到主机发送的V2版本的内存页1,以 及内存页1的最新版本为V3的信息时,共享缓冲池可记录内存页1的恢复点为V2,终点为V3。这样,每当共享缓冲池接收到主机发送的第一版本数据和当前最新版本的信息时,可以基于此更新该第一版本数据对应内存页的恢复点和终点,从而实现记录不同版本数据的恢复依赖关系的目的。
在一种可能的实现方式中,备机可根据接收到的数据版本关系,向共享服务器发送第一版本数据的获取请求,共享服务器接收备机发送的第一版本数据的获取请求,并向备机返回该第一版本数据,其中,第一版本数据的获取请求时备机根据数据版本关系,确定备机存储的数据版本早于第一版本后生成的。如果备机需要恢复内存页1的最新版本,那么该获取请求中可携带内存页1的标识,共享服务器响应于该获取请求,通过RDMA的方式将数据版本关系写入备机中,如果共享服务器与备机是同一台服务器,那么共享服务器还可通过DMA的方式将数据版本关系写入备机的。
备机先确定本机不存在该第一版本数据后,再向共享服务器获取该数据,可以避免本地已存在第一版本数据的情况下,再次向共享服务器获取数据,提高备机进行数据备份和数据恢复的效率。
在一种可能的实现方式中,共享服务器向备机发送第一版本数据时,可以在第一版本数据的末尾添加校验码,校验码用于供备机确定接收到的第一版本数据是完整数据。
备机确定其缓冲池中存在第一版本数据时,可再通过校验码进一步确认该第一版本数据是否完整,如果数据不完整,备机可向共享服务器发送获取请求,从而避免由于第一版本数据不完整导致日志回放失败情况的发生;如果数据完整,备机再向共享服务器发送第一版本数据的获取请求。当然,共享服务器和备机之间还可通过除了校验码以外的其他方式确保数据通信的完整,比如通过完成队列实现,该完成队列包含了工作队列中已完成的工作请求,根据队列中的完成情况确定第一版本数据是否完整。
在一种可能的实现方式中,共享服务器可存储第一版本数据,并删除数据版本早于第一版本的数据。具体地,共享缓冲池可以对接收到的内存页进行管理,每接收到一个新的内存页,将其历史版本删除,从而提高内存利用率的目的,如果长时间未进行修改的内存页,还可将其删除,从而达到节省内存占用的目的,同时提高修改频繁的内存页(即热页)的恢复效率。
具体实现中,共享服务器可通过链表实现对共享缓冲池的管理,该链表可遵循先进先出的原则,每接收到主机发送的第一版本数据时,首先确认该第一版本数据对应的内存页在链表中是否存在历史版本,若存在,则删除该历史版本的内存页,并将该第一版本数据置于链表末尾,若不存在,则直接将该第一版本数据置于链表末尾。其中,上述链表可通过最近最少使用(least recently used,LRU)算法实现,LRU算法可通过赋予共享缓冲池中的每个内存页一个访问字段,来记录一个内存页自上次被访问所经历的时间t,而选择现有页面中其t值最大的,将最近最久未使用的页面予以淘汰,应理解,该链表还可通过其他算法来实现,本申请也不对此进行限定。
共享服务器删除数据版本早于第一版本的数据,可使得每个内存页都有且只有一个版本存储于该共享服务器中,极大程度减少了共享缓冲池占用,提高内存利用率。
在一种可能的实现方式中,共享服务器存储的数据达到阈值的情况下,可将最早接收到的数据删除。具体实现中,上述链表可设置为固定长度,页面存入链表之后,如果链表长度已满,那么头部的内存页(即长时间未被修改过的内存页)将会从链表中删除,从而达到节省内存占用的目的,同时使得更多的热页可以被存储进共享缓冲池,提高热页的数据恢复效 率。
共享缓冲池每次接收到的新页面都将置于链表尾部,因此头部的页面是长时间未被修改过的页面(又称为冷页),该页面有极大可能性已被主机写入磁盘进行持久化存储,该类页面可以从共享缓冲池中删除,从而达到节省内存的目的,使得共享缓冲池可以存储更多修改频率较高的热页,提高主机故障恢复、备机故障转移和备机备份数据的效率。
第四方面,提供了一种数据处理系统,包括主机、备机和共享服务器,其中,备机实现如第一方面或第一方面任意一种可能的实现方式中所描述的方法的操作步骤,主机用于实现如第二方面或第二方面任意一种可能的实现方式中描述的方法的操作步骤,共享服务器用于实现如第三方面或第三方面任意一种可能的实现方式中描述的方法的操作步骤。
第五方面,提供了一种备机,该备机包括获取单元以及恢复单元。其中,获取单元用于获取共享服务器记录的数据版本关系,数据版本关系用于记录不同版本的数据的恢复依赖关系,备机为数据库系统中用于备份数据的设备,共享服务器与备机和主机相连,主机用于接收对数据库系统的读写请求;获取单元用于根据数据版本关系获取第一版本数据和第一日志,其中,第二版本数据为当前时刻主机中最新的数据,第一日志用于标识第二版本相对于第一版本变更的操作过程,第一版本早于第二版本;恢复单元用于根据第一版本数据和第一日志在备机中恢复第二版本数据。
可选地,第一版本早于第二版本包括:第一版本是与第二版本相邻的版本。
可选地,第一版本早于第二版本包括:第二版本与第三版本相邻,第三版本与第一版本相邻;第一日志包括第二日志和第三日志,其中,第二日志用于标识第三版本相对于第一版本变更的操作过程,第三日志用于标识第二版本相对于第三版本变更的操作过程;恢复单元用于根据第一版本数据和第二日志在备机中恢复第三版本数据;恢复单元用于根据第三版本数据和第三日志在备机中恢复第二版本数据。
可选地,获取单元用于根据数据版本关系,确定备机存储的数据版本早于第一版本;获取单元用于向共享服务器获取第一版本数据,备机接收共享服务器发送的第一日志。
第六方面,提供了一种主机,该主机包括发送单元、获取单元以及恢复单元。其中,发送单元用于主机向共享服务器发送第一版本数据和当前最新版本的信息,其中,第二版本数据为当前时刻主机中最新的数据,当前最新版本的信息包括第二版本的信息,共享服务器与主机和备机相连,主机用于接收对数据库系统的读写请求,备机为数据库系统中用于备份数据的设备。发送单元还用于向备机发送第一日志,第一日志用于标识第二版本相对于第一版本变更的操作过程,第一版本早于第二版本。
可选地,主机还包括获取单元和恢复单元,其中,获取单元用于获取共享服务器记录的数据版本关系,数据版本关系用于记录不同版本的数据的恢复依赖关系,数据版本关系是共享服务器根据主机发送的第一版本数据和当前最新版本的信息获得的;获取单元用于根据数据版本关系获取第一版本数据和第一日志;恢复单元用于根据第一版本数据和第一日志在主机中恢复第二版本数据。
可选地,发送单元用于通过远程直接内存访问RDMA方法将第一版本数据写入共享服务器,向共享服务器发送当前最新版本的信息。
可选地,第一版本早于第二版本包括:第一版本是与第二版本相邻的版本。
可选地,第一版本早于第二版本包括:第二版本与第三版本相邻,第三版本与第一版本相邻;第一日志包括第二日志和第三日志,其中,第二日志用于标识第三版本相对于第一版本变更的操作过程,第三日志用于标识第二版本相对于第三版本变更的操作过程;恢复单元 用于根据第一版本数据和第二日志在主机中恢复第三版本数据;恢复单元用于根据第三版本数据和第三日志在主机中恢复第二版本数据。
第七方面,提供了一种共享服务器,该共享服务器接收单元、生成单元以及发送单元。接收单元用于接收主机发送的第一版本数据和当前最新版本的信息,其中,第二版本数据为当前时刻主机中最新的数据,当前最新版本的信息包括第二版本的信息,共享服务器与主机和备机相连,主机用于接收对数据库系统的读写请求,备机为数据库系统中用于备份数据的设备。生成单元,用于根据第一版本数据和当前最新版本的信息生成数据版本关系,数据版本关系用于记录不同版本的数据的恢复依赖关系。发送单元,用于向备机发送数据版本关系。
可选地,接收单元还用于接收备机发送的第一版本数据的获取请求,其中,第一版本数据的获取请求时备机根据数据版本关系,确定备机存储的数据版本早于第一版本后生成的;发送单元还用于向备机发送第一版本数据。
可选地,发送单元用于向备机发送第一版本数据和校验码,校验码位于第一版本数据的末尾,校验码用于供备机确定接收到的第一版本数据是完整数据。
可选地,共享服务器300还包括删除单元840,删除单元840用于存储第一版本数据,并删除数据版本早于第一版本的数据。
可选地,删除单元840还用于在存储的数据量达到阈值的情况下,将最早接收到的数据删除。
第八方面,提供了一种计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
第九方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
第十方面,提供了一种数据处理的设备,该设备包括处理器,该处理器用于执行上述各方面描述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1是本实施例提供的一种数据库系统的结构示意图;
图2是本实施例提供的一种数据库系统的部署示意图;
图3是本实施例提供的一种数据处理方法的步骤流程示意图;
图4是本实施例提供的一种共享缓冲池中内存页1的起始点、恢复点和终点示例图;
图5是本实施例提供的一种共享缓冲池的结构示意图;
图6是本实施例提供的一种数据处理方法在一应用场景下的步骤流程示意图;
图7是本实施例提供的一种主机的结构示意图;
图8是本实施例提供的一种共享服务器的结构示意图;
图9是本实施例提供的一种备机的结构示意图;
图10是本实施例提供的一种数据处理的设备的结构示意图。
具体实施方式
为了便于理解本发明的技术方案,首先,对本发明涉及的部分术语进行解释说明。
备份(backup):数据备份是容灾的基础,是指为防止系统出现操作失误或系统故障导致数据丢失,而将全部或部分数据集合从主机的硬盘或阵列复制到其他存储介质的过程。
远程直接内存访问(remote direct memory access,RDMA):RDMA是一种直接内存访问技术,可将数据从一台服务器的内存传输到另一台服务器的内存,而无需双方操作系统的介入。
链表(linked list):链表是一种物理存储单元上非连续、非顺序的存储结构,存储单元的逻辑顺序是通过链表中的指针链接次序实现的,链表可借助指针实现灵活的内存动态管。
主机(master):又称为主节点,在主备模式中承担主的角色,对外提供读写服务。
备机(standby):又称为备节点,在主备模式中承担备的角色,不对外提供读写服务,而是通过日志回放的方式实现与主机的数据同步。
值得说明的是,本实施方式使用的术语仅用于对具体实施例进行解释,而非旨在限定本发明的技术方案。
接下来,对本发明涉及的“日志回放”的应用场景进行解释说明。
数据库是一种可视为电子化的文件柜,是一个长期存储在计算机内的有组织、可共享、统一管理的大量数据的集合。用户可对数据库中的文件进行新增、查询、更新和删除等操作,而主机接收到用户的操作请求后,通常在内存中对文件所在的内存页进行修改(例如,在具体地址写入新的数据或更新已有地址页的数据),并生成相应的日志。其中,日志记录了数据库在对旧版本的内存页V1进行修改获得新版本的内存页V2时的操作信息,而日志回放则是对旧版本的内存页V1重新执行一遍日志所记录的操作,进而获得新版本的内存页V2,因此,如果新版本的内存页V2丢失,可以使用日志对旧版本的内存页V1进行日志回放,即可重新获得新版本的内存页V2。
值得注意的是,日志是按修改顺序写入磁盘的,一次修改生成一个对应的日志。举例来说,内存页X的原始版本为V1,在进行第一次修改获得V2版本的内存页X,对应的日志为LOGV2,进行第二次修改后获得V3版本的内存页X,对应的日志为LOGV3,那么,根据V1版本的内存页X和日志LOGV2可恢复出V2版本的内存页X,根据V2版本的内存页X和日志LOGV3可恢复出V3版本的内存页X。
应理解,最新版本的内存页是主机中距离当前时刻最近一次修改获得的内存页,上述例子中,V3版本的内存页是最新版本的内存页,V2版本的内存页和V1版本的内存页都是旧版本的内存页。为了本申请能够被更好地理解,下文将统一以当前主机当前最新版本的内存页的版本号大于旧版本内存页为例进行说明,比如V4版本的内存页是最新版本的内存页,那么V1~V3版本的内存页是旧版本的内存页。
通常情况下,日志回放可出现在以下应用场景中。
第一种应用场景是部署有数据库的主机故障后,主机故障恢复的场景。应理解,为了在服务器崩溃时保证数据安全并提升系统性能,现今的数据库中,数据的修改往往先是在缓冲池中对内存页进行修改,在缓冲池中生成相应的日志后,先将每个日志进行持久化存储,写入用于存放日志的日志盘中,再将修改后的内存页进行持久化存储,写入用于存放数据文件的文件盘中。其中,日志盘和文件盘可以是硬盘中的两段存储空间。但是,由于内存页在磁盘进行持久化存储需要消耗大量的资源,内存页并不是每个版本都写入文件盘,通常是在某种条件下才会将最新版本的内存页写入文件盘进行持久化存储,比如以半小时为周期定时将缓冲池中最新版本的内存页写入文件盘,或者缓冲池已占用70%时将当前最新版本的内存页写入文件盘,或者每当内存页的修改量达到1G时将缓冲池中最新版本的内存页写入文件盘 等等。因此,当用户提交事务对内存页进行修改时,每个版本的写内存页是异步存储,而每个版本的日志文件则是同步存储。这样,就可能导致数据库崩溃时,缓冲池中存在一部分还未来得及写入文件盘的内存页将会丢失,数据库在恢复重启之后,需要恢复缓冲池中丢失的内存页。此时可将文件盘中已进行持久化存储的数据页的版本与日志盘中最新的日志版本进行比对,确定参与故障恢复的日志,然后将日志依次进行回放,恢复出缓冲池中的数据。
第二种应用场景是部署有数据库的主机故障后,主机无法重启,备机恢复主机中的数据并提升为主机的场景,该过程又称为备机故障转移(failover)或者角色转换(switchover)。应理解,数据库为了保证海量数据的可靠性,一般通过备份的方式来防止数据丢失,部署有数据库的主机通常与多个备机相连,每个备机都对主机文件盘中的数据页和日志盘中的日志进行备份。因此在数据库故障恢复时,不仅主机可通过上述日志回放的方式进行数据恢复,如果主机故障后出现无法重启的情况,备机也可通过上述日志回放的方式恢复出主机故障时的数据状态,实现故障转移。
第三种应用场景是备机对主机内的数据进行备份的应用场景中,该场景又称为主备复制(replication)场景。在该场景中,主机可在缓冲池对内存页进行修改并生成相应的日志后,向备机发送该日志,以供备机通过回放日志获得最新版本的数据,主机上对数据页的修改即可同步到备机,从而达到备机复制主机内内存页的目的,由于日志大小远小于内存页的大小,通过上述备机回放日志的方式进行主备复制可以减少通信量。
当然,本申请提供的数据处理方法还适用于其他通过日志回放恢复数据的应用场景,这里不再一一举例,本申请不对此进行限定。
为了解决传统技术中日志回放效率低导致主机100故障恢复、故障转移以及主备复制效率低的问题,本申请提供了一种数据库系统10,如图1所示,该系统10包括主机100、备机200以及共享服务器300,其中,主机100和共享服务器300之间、备机200和共享服务器300之间均通过网络传输,比如以太网(Ethernet),具体可以是有线网络也可以是无线网络,本身很强不作具体限定。主机100和备机200之间可通过上述网络连接,也可通过总线连接,比如高速串行计算机扩展总线标准(Peripheral Component Interconnect-express,PCIE)总线,本申请不作具体限定。
主机100、备机200以及共享服务器300可以是物理服务器,比如X86服务器、ARM服务器等等;也可以是基于通用的物理服务器结合网络功能虚拟化(network functions virtualization,NFV)技术实现的虚拟机(virtual machine,VM),虚拟机指通过软件模拟的具有完整硬件系统功能的、运行在一个完全隔离的环境中的完整计算机系统,比如云数据中心内的虚拟机,本申请不作具体限定。主机100、备机200和共享服务器300还可以是由多个上述物理服务器或者多个上述虚拟机组成的服务器集群,还可以是其他具有存储功能的存储设备,如存储阵列,本申请不作具体限定。
其中,主机100和备机200分别包括缓冲池、文件盘和日志盘。共享服务器300包括共享服务器310。图1以主机1包括缓冲池1、文件盘1和日志盘1,备机1包括缓冲池2、文件盘2和日志盘2,备机2包括缓冲池3、文件盘3和日志盘3为例进行了说明,应理解,主机100、备机200以及共享服务器300的单元模块可以有多种划分,比如主机100、备机200和共享服务器300还可包括处理器、通信模块等等,图1仅为一种示例性的划分方式,且各个模块可以是软件模块,也可以是硬件模块,也可以部分是软件模块部分是硬件模块,本申请不对其进行限制。并且,图1中所示设备和模块之间的位置关系不构成任何限制,例如,在图1中的文件盘1和日志盘1置于主机中,文件盘2和日志盘2置于备机1中,文件盘3 和日志盘3置于备机2中,共享缓冲池310置于共享服务器300中,在其他情况下,文件盘和/或日志盘也可以是主机和/或备机的外部存储器,共享缓冲池310也可以是共享服务器300的外部存储器,本申请不作具体限定。
文件盘和日志盘是能够将数据进行持久化存储的磁盘或者存储阵列,可以是主机或者备机的当主机或者备机故障后,主机或者备机上的文件盘和日志盘中的数据不会丢失,其中,文件盘用于存储数据页,该数据页可以是对缓冲池中内存页进行持久化存储后得到的,日志盘用于存储日志。具体地,文件盘和日志盘可以是硬盘(hard disk drive,HDD)、固态硬盘(solid state drive,SSD)、混合硬盘(hybrid hard drive,HDD)、独立冗余磁盘阵列(redundant arrays independent disks,RAID)等等。
缓冲池和共享缓冲池310是可以进行高速数据交换的存储器,其中,主机、备机中的缓冲池可以分别包括主机或备机的内存或缓存的一段连续或者不连续的存储空间,共享缓冲池310可以是共享服务器300的内存或缓存的一段连续或者不连续的存储空间。在主机100、备机200和共享服务器300是由多个服务器组成的服务器集群的情况下,缓冲池和共享缓冲池310还可以是多个服务器的内存或缓存中的部分存储空间的集合。应理解,缓冲池的读写速率很快,内存页的修改和日志生成是在缓冲池中进行的,但是当主机或者备机发生故障后,主机或者备机上的缓冲池中的数据将会丢失。
具体地,缓冲池和共享缓冲池310可以是易失性存储器(volatile memory)、随机存取存储器(random access memory,RAM)、动态随机存储器(dynamic RAM,DRAM)、静态随机存储器(static RAM,SRAM)、同步动态随机存储器(synchronous dynamic RAM,SDRAM)、双倍速率同步动态随机存储器(double data rate RAM,DDR)、只读存储器(read-only memory,ROM)、高速缓存(cache)等等,本申请不作具体限定。
在本实施例中,主机100对内存页进行修改后,可将修改后的内存页(具体可以是内存页中存储的数据或指示内存页中数据或数据结构的指示信息,其中,数据是指数据库中存储的有效数据)发送至共享服务器300,这样,备机200或者主机100都可根据需要从共享服务器300中获取内存页。参考前述内容可知,主机100将内存页写入磁盘需要消耗大量的系统资源,因此,图1所示的主机100将内存页进行持久化存储的频率很低,导致日志回放时,文件盘中数据页格式与最新日志的格式差距较大,参与日志回放的有效日志数量很大,日志回放所需的时间很长。而主机100可通过网络将内存页发送至共享服务器300,由共享服务器300对内存页进行持久化存储。可以理解的,相比于主机100将内存页写入磁盘进行持久化存储,直接将内存页发送至共享服务器300所消耗的系统资源很少,因此本申请提供的方案中,主机100可以通过共享服务器300持久化存储内存页,使得内存页持久化的频率可以大大提高,进而在通过日志回放获得最新版本的内存页时,持久化存储的内存页与最新版本内存页之间的版本差距大大减小,所需的日志回放次数大大降低,甚至可以不进行日志回放,直接从共享服务器300中获取到最新版本的内存页,提高数据恢复的效率。
具体实现中,主机100可按时间周期将内存页写入共享服务器300,比如每隔1分钟将最新版本的内存页写入共享服务器300;还可按修改次数将内存页写入共享服务器300,比如内存页每修改5次,将最新版本的内存页写入共享服务器300;还可按修改量将内存页写入共享服务器300,比如有50G的内存页被修改时,将最新版本的内存页写入共享服务器300,具体可根据经验确定内存页写入共享服务器300的频率,本申请不作具体限定。
值得注意的是,本申请提供的数据库系统10部署灵活,可部署在包括单个主机100、多个备机200的、单个共享服务器300的数据库系统中,比如图1所示的数据库系统;还可部 署在包括多个主机100、多个备机200以及多个共享服务器300的数据库系统中,比如图3所示的数据库系统,该数据库系统包括两台主机100,分别为主机1001和主机1002,其中,备机2001、备机2002以及备机2003对主机1001中的数据进行备份,主机1001可按预设频率,向共享服务器3001的共享缓冲池3101写入最新版本的内存页进行持久化存储,备机2004、备机2005和备机2006对主机1002中的数据进行备份,主机1002可按预设频率,向共享服务器3002的共享缓冲池3102写入最新版本的内存页进行持久化存储。应理解,图2仅用于举例说明,本申请提供的数据库系统不对主机100、备机200以及共享服务器300的数量进行限定,并且,每个主机100可对应一个共享服务器300,还可对应多个共享服务器300,本申请对此不做限定。
在一些可能的实施方式中,共享服务器300可以是独立于主机100和备机200存在的其他服务器,还可以与备机200是同一个服务器,即共享缓冲池310是备机200上缓存的一部分,备机200可以通过DMA的形式从共享缓冲池310中读取内存页,本申请不作具体限定。
综上可知,本申请提供的数据库系统,主机100可通过网络将修改后的内存页写入共享服务器300的共享缓冲池310中,使得主机100将内存页进行持久化存储的频率可以极大提高,进而在使用日志对内存页进行日志回放来恢复最新版本的内存页时,由于内存页持久化存储的频率提高,持久化存储的内存页与最新版本的内存页之间的版本差距降低,数据恢复时的日志回放次数大大降低,个别场景甚至不需要进行日志回放操作,直接从共享服务器300中获取最新版本的内存页,使得日志回放和数据恢复的效率得到提升,进而使得数据库系统故障恢复、故障转移以及主备复制的效率得到提升。
下面结合附图,对上述图1所示的数据库系统10如何进行日志回放的具体步骤过程进行详细介绍。
如图3所示,本申请提供了一种数据处理方法,该方法应用于图1所示的数据库系统中,该系统包括主机100、备机200以及共享服务器300,其中,主机100、备机200以及共享服务器300之间相互连接,该方法包括以下步骤:
S410:主机100向共享服务器300发送第一版本数据和当前最新版本的信息。
具体实现中,第一版本数据可以是某一版本的内存页,比如图1所示的V1版本的内存页。其中,参考前述内容可知,主机100可按照预设频率,将内存页写入共享服务器300,预设频率可包括时间周期、修改次数以及修改量等等,这里不重复赘述。当前最新版本的信息指的是主机100在步骤S410时该内存页的最新版本,假设当前第二版本数据为该内存页的最新版本,那么当前最新版本的信息可包括该第二版本的信息。
在一种可能的实施例中,主机100可通过远程直接内存访问(remote direct memory access,RDMA)的方式将第一版本数据写入共享服务器300。RDMA是一种直接内存访问的技术,可由智能网卡(intelligent network interface card,iNIC)将内存页从主机100的内存传输到远端的共享服务器300的内存,并且传输过程中无需双方CPU的介入,从而消除远端的共享服务器300在接收数据时所需的复制和上下文切换的开销,达到低时延、低开销、高带宽传输数据的目的。
在一种可能的实施例中,如果主机中的第一版本数据对应的内存页的最新版本是第二版本,那么当前最新版本的信息即为第二版本的信息。应理解,主机100通过RDMA的方式将第一版本数据写入共享缓冲池时,通常先将第一版本数据拷贝入一段用于RDMA的缓存中,然后再进行RDMA,而在数据拷贝的过程当中,可能主机100正在将第一版本数据修改为第二版本数据,使得主机100进行RDMA时,内存页的最新版本不再是第一版本数据,而变成 了第二版本数据,因此主机100将第一版本数据拷贝入缓存后,开始进行RDMA时,可同时将第一版本数据对应内存页的当前最新版本的信息发送给共享缓冲池,比如写入内存页的版本为V1,最新版本为V3。这样,共享缓冲池可记录接收到的第一版本数据的版本和最新版本的信息,并基于此生成数据版本关系,该数据版本关系用于记录不同版本的数据的恢复依赖关系,当备机进行主备复制、主机进行故障恢复或者备机进行故障转移的情况下,主机或者备机可向共享缓冲池获取该数据版本关系,并以此进行日志回放。
可以理解的,相比于将数据写入主机100的文件盘进行持久化存储,主机100通过RDMA的方式将第一版本数据写入共享服务器300,由共享服务器300对第一版本数据进行持久化存储,可以使得主机100持久化存储消耗的系统资源降低,从而使得主机100对内存页进行持久化存储的频率得到提升,使得日志回放时,文件盘中进行持久化存储的内存页格式与最新内存页的格式之间的差距大大减小,参与日志回放的有效日志数量很少,甚至可以不进行日志回放,直接从共享服务器300中获取到最新版本的内存页,提高数据恢复的效率。
S420:主机100向备机200发送第一日志。其中,第一日志记录了第一版本数据修改为第二版本数据的操作过程,第一日志用于标识第二版本相对于第一版本变更的操作过程,第一版本早于第二版本。
可以理解的,第一日志用于日志回放,而根据第一版本数据,对第一日志进行回放,可获得第二版本数据。其中,主机100对第一版本数据进行修改并生成第二版本数据和第一日志的过程以及日志的描述可以参考图1~图2实施例,这里不重复赘述。
值得注意的是,第一版本早于第二版本可以是指第一版本与第二版本是相邻的版本,其中第二版本是新版本,第一版本是旧版本,以图1所示的应用场景为例,V1版本即为第一版本,V2版本是第二版本。第一版本早于第二版本还可以是指第一版本与第二版本之间存在一个或者多个版本,比如第一版本与第二版本之间存在第三版本,其中第二版本与第三版本相邻,而第三版本与第一版本相邻,其中,对第一版本进行修改可获得第三版本,对第三版本进行修改可获得第二版本,那么第一日志可包括第二日志和第三日志,第二日志用于标识第三版本相对于第一版本变更的操作过程,第三日志用于标识所述第二版本相对于所述第三版本变更的操作过程,仍以图1所示的应用场景为例,V1版本即为第一版本,V2版本即为第三版本,V3即为第二版本,LOGV2即为第二日志,LOGV3即为第三日志,应理解,上述例子是以第一版本和第二版本之间在一个版本,即第三版本为例进行的说明,具体实现中,本申请不对第一版本和第二版本之间的版本数量进行限定。
在一种可能的实施例中,主机100可通过直接存储器访问(direct memory access,DMA)的方式向备机200发送第一日志。主机100每生成一个新的日志,会将该日志传输至备机200,以供备机200根据日志和共享缓冲池中的内存页进行日志回放,实现主备复制的目的;当主机100发生故障需要备机200进行故障转移时,备机200也可先从共享缓冲池获取数据版本关系,并基于此向共享缓冲池获取第一版本数据,然后根据主机100发送的第一日志和共享缓冲池中的第一版本数据,按照数据版本关系记录的恢复依赖关系对第一版本数据进行日志回放,完成故障转移。该过程的详细描述将在下文的步骤S450~步骤S460进行描述。
在一种可能的实施例中,主机100每生成一个新的日志,会将该日志存储至日志盘进行持久化存储,这样,即使主机100故障导致缓存中的内存页丢失,主机100也可以先从共享缓冲池中获取数据版本关系,并基于此向共享缓冲池获取第一版本数据,从日志盘获取第一日志,然后按照数据版本关系记录的恢复依赖关系对第一版本数据进行日志回放,实现主机100的故障恢复。该过程的详细描述将在下文的步骤S470~步骤S480进行描述。
S430:共享服务器300根据第一版本数据和当前最新版本的信息生成数据版本关系。其中,该数据版本关系用于记录不同版本的数据的恢复依赖关系,恢复依赖关系可以是指日志回放的顺序,比如先根据V1版本的内存页和日志LOGV2通过日志回放得到V2版本,再根据V2版本的内存页和日志LOGV3通过日志回放得到V3版本。
具体实现中,共享服务器300中的共享缓冲池310可通过记录每个内存页的起始点、恢复点和终点来实现记录数据版本关系的功能,其中,起始点是该内存页在共享缓冲池中最早出现的版本,恢复点是共享服务器300接收到的由主机发送的第一版本数据对应的版本,也就是第一版本,终点是共享服务器300接收到的由主机发送的当前最新版本的信息对应的版本,也就是第二版本,举例来说,共享缓冲池第一次接收到内存页1的版本为V0,那么共享缓冲池可以记录内存页1在共享缓池冲中的起始点为V0,当共享缓冲池接收到主机发送的V2版本的内存页1,以及内存页1的最新版本为V3的信息时,共享缓冲池可记录内存页1的恢复点为V2,终点为V3。这样,每当共享缓冲池接收到主机发送的第一版本数据和当前最新版本的信息时,可以基于此更新该第一版本数据对应内存页的恢复点和终点,从而实现记录不同版本数据的恢复依赖关系的目的。
示例性地,图4是一种应用场景下共享缓冲池记录的数据版本关系。该数据版本关系是内存页1的数据版本关系,假设共享缓冲池在T0时刻第一次接收到内存页1时,其数据版本为V0,因此内存页1的起始点1与版本V0对应。当共享缓冲池在T1时刻接收到V3版本的内存页1之后,同时接收到了内存页1的当前最新版本信息为V5,共享缓冲池可随之更新恢复点对应至V3,终点对应至V5。这样,当主机100或者备机200需要进行数据恢复时,可以向共享服务器300获取该数据版本关系,确定本次的恢复依赖关系为:先根据V3版本恢复得到V4版本,再根据V4版本恢复得到V5版本,基于此可向共享服务器300获取V3版本的内存页1,然后按照恢复依赖关系依次进行日志回放。应理解,上述举例仅用于说明,并不能构成具体限定。
在一种可能的实施例中,共享缓冲池310可以对接收到的内存页进行管理,每接收到一个新的内存页,将已存储至备机的内存页的历史版本删除,从而提高内存利用率的目的,如果长时间未进行修改的内存页,还可将其删除,从而达到节省内存占用的目的,同时提高修改频繁的内存页(即热页)的恢复效率。
具体实现中,共享服务器300可通过链表实现对共享缓冲池的管理,该链表可遵循先进先出的原则,每接收到主机100发送的第一版本数据时,首先确认该第一版本数据对应的内存页在链表中是否存在历史版本,若存在,则删除该历史版本的内存页,并将该第一版本数据置于链表末尾,若不存在,则直接将该第一版本数据置于链表末尾,使得每个内存页都有且只有一个版本存储于该共享服务器300中,极大程度减少了共享缓冲池占用,提高内存利用率。进一步地,链表可设置为固定长度,页面存入链表之后,如果链表长度已满,那么头部的内存页(即长时间未被修改过的内存页)将会从链表中删除,从而达到节省内存占用的目的,同时使得更多的热页可以被存储进共享缓冲池,提高热页的数据恢复效率。
举例来说,如图5所示,假设T0时刻链表S1中存储了多个内存页,包括页0~页n,其中,页0的版本为V6,页1的版本为V1,页2的版本为V10,页3的版本为V55,…,页n的版本为V32。假设T1时刻,主机100通过RDMA的方式将新页面(即图5所示的V3版本的页1)写入共享缓冲池后,共享服务器300先确定链表S1中已存在页1的历史版本,即V1版本,然后将新页面放入链表S1的末尾,并将历史版本的页1从链表S1中删除,获得更新后的链表S2。同理,假设T2时刻,主机100再通过RDMA的方式将页n+1写入该共享缓 冲池,共享服务器300先确定链表S2中没有页n+1的历史版本,然后将新页面放入链表S1的末尾,如果链表长度达到阈值,此时将头部的V6版本的页0删除。应理解,上述举例仅用于说明,本申请不作具体限定。
可以理解的,共享缓冲池310每次接收到的新页面都将置于链表尾部,因此头部的页面是长时间未被修改过的页面(又称为冷页),该页面有极大可能性已被主机100写入磁盘进行持久化存储,该类页面可以从共享缓冲池中删除,从而达到节省内存的目的,使得共享缓冲池可以存储更多修改频率较高的热页,提高故障恢复、故障转移和主备复制的效率。应理解,图4仅用于举例说明,本申请不作具体限定。
具体实现中,链表可通过最近最少使用(least recently used,LRU)算法实现,LRU算法可通过赋予共享缓冲池中的每个内存页一个访问字段,来记录一个内存页自上次被访问所经历的时间t,而选择现有页面中其t值最大的,将最近最久未使用的页面予以淘汰,应理解,该链表还可通过其他算法来实现,本申请也不对此进行限定。
值得注意的是,如果数据库系统由多个主机、多个备机以及多个共享服务器构成,那么每个共享服务器可对不同主机进行服务,分别存储不同主机发送的不同内存页。比如图2所示的结构中,共享服务器1接收并存储主机1发送的内存页,并记录该内存页对应的数据版本关系,共享服务器2接收并存储主机2发送的内存页,并记录该内存页对应的数据版本关系。但是,如果一个共享服务器向多个主机提供服务,那么共享服务器在存储和记录数据版本关系时,还会额外记录每个内存页所属的主机和备机的信息,比如共享服务器3为主机3和主机4服务,那么共享服务器3记录的内存页1的数据版本关系对应主机1,内存页2的数据版本关系对应主机2,具体实现中,还可根据主机数量对共享缓冲池进行存储空间的划分,比如存储空间1用于存储主机1的内存页,存储空间2用于存储主机2的内存页,本申请不作具体限定。
S440:备机200向共享服务器300获取数据版本关系。
应理解,步骤S440~步骤S450可发生在主备复制或者备机故障转移的场景下。具体实现中,备机200可向共享服务器300发送数据版本关系的获取请求,如果备机200需要恢复内存页1的最新版本,那么该获取请求中可携带内存页1的标识。共享缓冲池310响应于该获取请求,通过RDMA的方式将数据版本关系写入备机200中,如果共享缓冲池310部署于备机200上,即共享服务器300与备机200是同一台服务器,那么共享缓冲池310还可通过DMA的方式将数据版本关系写入备机200的缓冲池中。
S450:备机根据数据版本关系获取第一版本数据和第一日志。
在一种可能的实施例中,备机可先根据数据版本关系,先确定备机的文件盘或者缓冲池中是否已存在第一版本数据和第一日志,如果备机的文件盘或者缓冲池中已存在该第一版本数据,且缓冲池或者日志盘中已存在该第一日志,备机可执行步骤S460进行日志回放;如果备机的文件盘或者缓冲池中已存在该第一版本数据,但是缓冲池或者日志盘中不存在该第一日志,备机可向主机100发送日志获取请求或者等待主机100传输第一日志,然后再进行日志回放,进一步提高日志回放的准确性。
在一种可能的实施例中,备机确定备机的文件盘或者缓冲池中没有第一版本数据的情况下,备机200可向共享缓冲池310发送获取请求,共享缓冲池310响应于该获取请求,通过RDMA的方式将第一版本数据写入备机200中,如果共享缓冲池310部署于备机200上,即共享服务器300与备机200是同一台服务器,那么共享缓冲池310还可通过DMA的方式将第一版本数据写入备机200的缓冲池中。
在一种可能的实施例中,共享服务器向备机发送第一版本数据时,可将校验码添加于第一版本数据的末尾,该校验码用于供备机确定接收到的第一版本数据是否完整。应理解,在主备复制的场景中,备机不断从共享服务器中获取内存页,结合主机发送的日志进行日志回放,实现主备复制,如果共享服务器将第一版本数据写入备机的过程中,主机发生了故障,备机在进行故障恢复时,如果备机确定其缓冲池中存在第一版本数据,可再通过校验码进一步确认该第一版本数据是否完整,如果数据不完整,备机200可向共享服务器300发送获取请求,从而避免由于第一版本数据不完整导致日志回放失败情况的发生;如果数据完整,备机200可执行步骤S460。应理解,上述校验码用于举例说明,共享服务器和备机之间还可通过其他方式确保数据通信的完整,比如通过完成队列实现,该完成队列包含了工作队列中已完成的工作请求,根据队列中的完成情况确定第一版本数据是否完整。
S460:备机根据第一版本数据和第一日志恢复出第二版本数据。
具体实现中,如果第一版本数据和第二版本数据是相邻版本的数据,备机可根据第一版本数据和第一日志恢复出第二版本数据。
如果第一版本数据和第二版本数据之间存在一个或者多个版本的数据,比如第一版本和第二版本之间还存在第三版本,第一版本和第三版本相邻,第三版本和第二版本相邻,第二日志用于标识第三版本相对于第一版本变更的操作过程,第三日志用于标识第二版本相对于第三版本变更的操作过程,那么根据数据版本关系,可先根据第一版本数据和第二日志在备机中恢复出第三版本数据,再根据第三版本数据和第三日志在备机中恢复第二版本数据。
如果第一版本数据和第二版本数据之间没有版本差距,即第一版本数据就是第二版本数据,共享缓冲池中存储的第一版本数据就是该内存页在主机中的最新版本,那么备机进行主备复制或者故障转移,以及主机进行故障恢复时,可以无需日志回放,从共享服务器300获取第一版本数据后即可完成本次数据恢复,使得故障转移效率、故障恢复效率以及主备复制效率得到极大提升。
以图1所示的应用场景为例,假设主机100对内存页1的V1版本进行修改后获得了日志LOGV2和内存页1的V2版本,对内存页1的V2版本进行修改后获得了日志LOGV3和内存页1的V3版本,那么使用本申请提供的数据处理方法,主机100可将日志LOGV2和日志LOGV3发送给备机200,并将内存页1的V3版本写入共享缓冲池310,如果此时主机100发生故障,需要备机200进行故障转移时,备机200可先从共享缓冲池310获取得到内存页1的数据版本关系,获得内存页1的恢复点为V3版本,且终点也为V3版本,二者之间的版本差距为0,表示本次故障转移无需日志回放,接下来备机200可先确定本机上没有V3版本的内存页1之后,从共享缓冲池310获取该V3版本的内存页1,完成本次故障转移,使得故障转移的效率大大提升,同理,也可提高主备复制的效率。应理解,上述举例仅用于说明,并不能构成具体限定。
S470:主机100从共享服务器300获取数据版本关系,并根据数据版本关系获取第一版本数据和第一日志。应理解,步骤S470~步骤S480发生在主机100故障恢复的场景下,主机100从共享服务器300获取数据版本关系以及第一版本数据的过程可参考步骤S440及其可选步骤,这里不重复赘述。
可以理解的,主机100发生故障后,缓存中的内存页将会丢失,而已经写入文件盘和日志盘的数据仍将保留,因此,主机100故障恢复的场景下,主机100可先确定主机的文件盘中是否存储有第一版本数据,如果不存在,可从共享服务器300中获取该第一版本数据,然后再从日志盘中获取第一日志。
S480:主机100根据第一版本数据和第一日志恢复出第二版本数据。该步骤可参考前述步骤S450及其可选步骤,这里不重复赘述。
下面结合具体的应用场景,对本申请提供的数据处理方法进行解释说明。
假设在一应用场景中,本申请提供的数据库系统如图6所示,该数据库系统包括主机1、备机1以及共享服务器1,其中,主机1包括缓冲池1、日志盘1和文件盘1,备机1包括缓冲池2、日志盘2和文件盘2,共享服务器包括共享缓冲池1。备机1先对主机1进行主备复制,主机1生成V5版本的内存页1之后发生了故障,并且暂时无法重启,备机1进行故障转移之后,主机1重启成功开始数据恢复。
在该应用场景中,本申请提供的数据处理方法包括以下步骤1~步骤11,其中,步骤1~步骤4是主备复制的场景,步骤5~步骤8是备机故障转移的场景,步骤9~步骤11是主机故障恢复的场景,下面结合图6,对上述各个步骤进行详细解释。
步骤1、主机1将V1版本的内存页1修改为V2版本的内存页1,并将内存页1的V2版本写入共享缓冲池1,并将当前内存页1的最新版本的信息写入共享缓冲池1。该步骤可参考前述内容的步骤S410~步骤S420,这里不重复赘述。
应理解,图6以在步骤1时,最新版本的信息为V3为例进行了说明,简单来说,主机1将内存页1的V2版本拷贝至用于RDMA的缓存之后,主机1已将内存页1修改为V3版本,因此主机1执行RDMA步骤将V2版本的内存页1写入共享缓冲池1的同时,将内存页1的最新版本为V3的信息也一同发送至共享缓冲池。
步骤2、共享服务器1根据接收到的V2版本的内存页1以及最新版本为V3的信息,基于此记录或者更新内存页1的数据版本关系,具体可通过链表对共享缓冲池中的内存页进行更新,将历史版本的内存页1从共享缓冲池中删除,然后对内存页1的恢复点和终点进行更新。该步骤未描述的内容可参考前述步骤S430,这里不重复赘述。
具体实现中,共享服务器1可先判断共享缓冲池1中是否存在内存页1的历史版本,若存在,则先将该内存页1的历史版本删除,再将V2版本的内存页1放入共享缓冲池中链表的末尾;若不存在,则将V2版本的内存页1放入共享缓冲池中链表的末尾。然后再将内存页1的恢复点指向版本V2,获得恢复点1,将内存页1的终点指向版本V3,获得终点1。
步骤3、备机1向共享缓冲池1获取内存页1的数据版本关系,根据该数据版本关系获取V2版本的内存页1和日志LOGV2,该步骤可参考前述内容中的步骤S440~步骤S450,这里不重复赘述。
具体地,备机1可先确定备机1的文件盘或者缓冲池是否存在该内存页1,如果备机1的文件盘或者缓冲池不存在内存页1,备机1可向共享缓冲池发送获取该内存页1的获取请求,共享缓冲池1向备机返回V2版本的内存页1;如果备机1的文件盘或者缓冲池存在内存页1的V2版本,备机1可进一步确定该V2版本的内存页1是否完整,具体可通过校验码认证的方式确认,参考前述步骤S450将其可选实施方式,这里不重复赘述,如果不完整,可向共享缓冲池获取该V2版本的内存页1,如果完整,可执行步骤4进行日志回放。应理解,如果备机1还未接收到主机1发送的日志LOGV2,备机1可向主机发送获取请求,或者等待接收到主机1发送的日志LOGV2之后再执行步骤4。
步骤4、备机1基于数据版本关系,根据步骤3获得的V2版本的内存页1,回放日志LOGV3获得当前最新版本的内存页V3,从而达到备机1同步主机1中的内存页的目的,实现主备复制。该步骤未描述的内容可参考前述步骤S460,这里不重复赘述。
值得注意的是,主机1生成一个新的日志之后,将会同步传输给备机1,传输步骤可以 是在步骤1~步骤4之间任意时刻发生,本申请对此不做限定。
步骤5、主机1将V3版本的内存页1修改为V4版本,并将V4版本的内存页1写入共享缓冲池1。该步骤未描述的内容可参考前述步骤S410~步骤S420以及步骤1,这里不重复赘述。
应理解,图6以在步骤5时,最新版本的信息为V5为例进行了说明,简单来说,主机1将内存页1的V4版本拷贝至用于RDMA的缓存之后,主机1已将内存页1修改为V5版本,因此主机1执行RDMA步骤将V4版本的内存页1写入共享缓冲池1的同时,将内存页1的最新版本为V5的信息也一同发送至共享缓冲池。
步骤6、共享服务器1根据接收到的V4版本的内存页1以及最新版本为V5的信息,更新内存页1的数据版本关系,具体可通过链表对共享缓冲池中的内存页进行更新,将历史版本的内存页1从共享缓冲池中删除(删除步骤1接收到的V2版本的内存页1,将新接收到的V4版本的内存页存储于链表末尾),然后对内存页1的恢复点和终点进行更新。该步骤未描述的内容可参考前述步骤S430和步骤2,这里不重复赘述。
具体地,共享服务器1可先判断共享缓冲池1中是否存在内存页1的历史版本,由于步骤1接收到的V2版本的内存页1,因此共享服务器1先将V2版本的内存页1删除,再将,4版本的内存页1放入共享缓冲池中链表的末尾。然后再将内存页1的恢复点由版本V2指向版本V4,获得恢复点2,将内存页1的终点由版本V3指向版本V5,获得终点2。
步骤7、备机1接收到主机1故障且暂时无法恢复的消息,开始故障转移,向共享缓冲池1获取内存页1的数据版本关系,根据该数据版本关系获取V4版本的内存页1和日志LOGV4,该步骤可参考前述内容的步骤S440~步骤S450以及步骤3,这里不重复赘述。
步骤8、备机1根据内存页1的恢复点2和终点2之间的版本差距,即V4版本和V5版本之间的差距,确定本次日志回放所需的有效日志为LOGV5,然后根据步骤7获得的V4版本的内存页1,回放有效日志LOGV5获得当前最新版本的内存页V5,从而达到备机1故障转移的目的。该步骤可参考前述内容的步骤S460及步骤4,这里不重复赘述。
步骤9、主机1接收到故障恢复的命令后,向共享缓冲池1获取数据版本关系,并根据该数据版本关系从本地文件盘或者共享服务器1中获取V4版本的内存页1,从本地日志盘获取日志LOGV5。该步骤可参考前述内容中的步骤S470、步骤3以及步骤7,这里不重复赘述。
步骤10、主机1根据数据版本关系中记载的恢复依赖关系,然后根据步骤9获得的V4版本的内存页1,回放日志LOGV5获得当前最新版本的内存页V5,从而达到主机故障恢复的目的。该步骤未描述的内容可参考前述步骤S480、步骤4以及步骤8,这里不重复赘述。
综上可知,使用本申请提供数据处理方法,主机可通过网络将修改后的内存页写入共享服务器的共享缓冲池中,降低了主机持久化存储内存页所消耗的资源,使得主机将内存页进行持久化存储的频率得到提升,从而降低了数据恢复的过程所需的日志回放次数,个别场景甚至不需要进行日志回放操作,直接从共享服务器中获取最新版本的内存页,提升日志回放和数据恢复的效率,进而使得数据库系统主机故障恢复、备机故障转移以及主备复制的效率均得到提升。
值得说明的是,对于上述方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请所必须的。
本领域的技术人员根据以上描述的内容,能够想到的其他合理的步骤组合,也属于本申 请的保护范围内。其次,本领域技术人员也应该熟悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请所必须的。
上文中结合图1至图6,详细描述了根据本实施例所提供的数据处理的方法,下面将结合图7至图10,描述根据本实施例所提供的数据处理的设备和装置。
图7是本申请提供的一种主机100的结构示意图。该主机100包括发送单元710、获取单元720以及恢复单元730。
发送单元710用于主机向共享服务器发送第一版本数据和当前最新版本的信息,其中,第二版本数据为当前时刻主机中最新的数据,当前最新版本的信息包括第二版本的信息,共享服务器与主机和备机相连,主机用于接收对数据库系统的读写请求,备机为数据库系统中用于备份数据的设备。
发送单元710还用于向备机发送第一日志,第一日志用于标识第二版本相对于第一版本变更的操作过程,第一版本早于第二版本。
可选地,主机100还包括获取单元720和恢复单元730,其中,获取单元720用于获取共享服务器记录的数据版本关系,数据版本关系用于记录不同版本的数据的恢复依赖关系,数据版本关系是共享服务器根据主机发送的第一版本数据和当前最新版本的信息获得的;获取单元720用于根据数据版本关系获取第一版本数据和第一日志;恢复单元730用于根据第一版本数据和第一日志在主机中恢复第二版本数据。
可选地,发送单元710用于通过远程直接内存访问RDMA方法将第一版本数据写入共享服务器,向共享服务器发送当前最新版本的信息。
可选地,第一版本早于第二版本包括:第一版本是与第二版本相邻的版本。
可选地,第一版本早于第二版本包括:第二版本与第三版本相邻,第三版本与第一版本相邻;第一日志包括第二日志和第三日志,其中,第二日志用于标识第三版本相对于第一版本变更的操作过程,第三日志用于标识第二版本相对于第三版本变更的操作过程;恢复单元用于根据第一版本数据和第二日志在主机中恢复第三版本数据;恢复单元用于根据第三版本数据和第三日志在主机中恢复第二版本数据。
综上可知,本申请提供的主机可通过网络将修改后的内存页写入共享服务器的共享缓冲池中,降低了主机持久化存储内存页所消耗的资源,使得主机将内存页进行持久化存储的频率得到提升,从而降低了数据恢复的过程所需的日志回放次数,个别场景甚至不需要进行日志回放操作,直接从共享服务器中获取最新版本的内存页,提升日志回放和数据恢复的效率,进而使得数据库系统主机故障恢复、备机故障转移以及主备复制的效率均得到提升。
应理解的是,本实施例的主机可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现图3所示的数据处理方法时,主机及其各个模块也可以为软件模块。
根据本申请实施例的主机100可对应于执行本实施例中描述的方法,并且主机100中的各个单元的上述和其它操作和/或功能分别为了实现图2至图6中的各个方法的相应流程,为了简洁,在此不再赘述。
图8是本申请提供的一种共享服务器300的结构示意图。该共享服务器300接收单元810、生成单元820以及发送单元830。
接收单元810用于接收主机发送的第一版本数据和当前最新版本的信息,其中,第二版本数据为当前时刻主机中最新的数据,当前最新版本的信息包括第二版本的信息,共享服务器与主机和备机相连,主机用于接收对数据库系统的读写请求,备机为数据库系统中用于备份数据的设备。
生成单元820,用于根据第一版本数据和当前最新版本的信息生成数据版本关系,数据版本关系用于记录不同版本的数据的恢复依赖关系。
发送单元830,用于向备机发送数据版本关系。
可选地,接收单元810还用于接收备机发送的第一版本数据的获取请求,其中,第一版本数据的获取请求时备机根据数据版本关系,确定备机存储的数据版本早于第一版本后生成的;发送单元830还用于向备机发送第一版本数据。
可选地,发送单元830用于向备机发送第一版本数据和校验码,校验码位于第一版本数据的末尾,校验码用于供备机确定接收到的第一版本数据是完整数据。
可选地,共享服务器300还包括删除单元840,删除单元840用于存储第一版本数据,并删除数据版本早于第一版本的数据。
可选地,删除单元840还用于在存储的数据量达到阈值的情况下,将最早接收到的数据删除。
综上可知,本申请提供的共享服务器可通过网络接收并存储主机发送的内存页,降低了主机持久化存储内存页所消耗的资源,使得主机将内存页进行持久化存储的频率得到提升,从而降低了数据恢复的过程所需的日志回放次数,个别场景甚至不需要进行日志回放操作,直接从共享服务器中获取最新版本的内存页,提升日志回放和数据恢复的效率,进而使得数据库系统主机故障恢复、备机故障转移以及主备复制的效率均得到提升。
应理解的是,本实施例的主机可以通过专用集成电路(ASIC)实现,或可编程逻辑器件(PLD)实现,上述PLD可以是复杂程序逻辑器件(CPLD),现场可编程门阵列(FPGA),通用阵列逻辑(GAL)或其任意组合。也可以通过软件实现图3所示的数据处理方法时,主机及其各个模块也可以为软件模块。
根据本申请实施例的共享服务器300可对应于执行本实施例中描述的方法,并且共享服务器300中的各个单元的上述和其它操作和/或功能分别为了实现图2至图6中的各个方法的相应流程,为了简洁,在此不再赘述。
图9是本申请提供的一种备机200。该备机200包括获取单元910以及恢复单元920。
获取单元910用于获取共享服务器记录的数据版本关系,数据版本关系用于记录不同版本的数据的恢复依赖关系,备机为数据库系统中用于备份数据的设备,共享服务器与备机和主机相连,主机用于接收对数据库系统的读写请求;
获取单元910用于根据数据版本关系获取第一版本数据和第一日志,其中,第二版本数据为当前时刻主机中最新的数据,第一日志用于标识第二版本相对于第一版本变更的操作过程,第一版本早于第二版本;
恢复单元920用于根据第一版本数据和第一日志在备机中恢复第二版本数据。
可选地,第一版本早于第二版本包括:第一版本是与第二版本相邻的版本。
可选地,第一版本早于第二版本包括:第二版本与第三版本相邻,第三版本与第一版本相邻;第一日志包括第二日志和第三日志,其中,第二日志用于标识第三版本相对于第一版本变更的操作过程,第三日志用于标识第二版本相对于第三版本变更的操作过程;恢复单元920用于根据第一版本数据和第二日志在备机中恢复第三版本数据;恢复单元920用于根据 第三版本数据和第三日志在备机中恢复第二版本数据。
可选地,获取单元910用于根据数据版本关系,确定备机存储的数据版本早于第一版本;获取单元910用于向共享服务器获取第一版本数据,备机接收共享服务器发送的第一日志。
综上可知,本申请提供的备机可在主机通过网络将内存页写入共享服务器进行持久化存储以后,从共享服务器获取内存页进行日志回放,由于主机通过共享服务器进行持久化存储内存页,使得主机持久化存储内存页所消耗的资源降低,进而使得主机将内存页进行持久化存储的频率得到提升,从而降低了数据恢复的过程所需的日志回放次数,个别场景甚至不需要进行日志回放操作,直接从共享服务器中获取最新版本的内存页,提升日志回放和数据恢复的效率,进而使得数据库系统主机故障恢复、备机故障转移以及主备复制的效率均得到提升。
应理解的是,本实施例的主机可以通过专用集成电路(ASIC)实现,或可编程逻辑器件(PLD)实现,上述PLD可以是复杂程序逻辑器件(CPLD),现场可编程门阵列(FPGA),通用阵列逻辑(GAL)或其任意组合。也可以通过软件实现图3所示的数据处理方法时,主机及其各个模块也可以为软件模块。
根据本申请实施例的备机200可对应于执行本实施例中描述的方法,并且备机200中的各个单元的上述和其它操作和/或功能分别为了实现图2至图6中的各个方法的相应流程,为了简洁,在此不再赘述。
图10为本申请实施例提供的一种数据处理的设备1000的结构示意图。其中,数据处理的设备1000可以是前述内容中的主机、备机或者共享服务器。如图10所示,数据处理的设备1000包括:处理器1010、通信接口1020以及存储器1030。其中,处理器1010、通信接口1020以及存储器1030可以通过内部总线1040相互连接,也可通过无线传输等其他手段实现通信。本申请实施例以通过总线1040连接为例,总线1040可以是外设部件互连标准(peripheral component interconnect express,PCIe)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线1040可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
需要说明的是,本实施例可以是通用的物理服务器实现的,例如,物理服务器,如X106服务器等,也可以是基于通用的物理服务器结合网络功能虚拟化(network functions virtualization,NFV)技术实现的虚拟机(virtual machine,VM)实现的,虚拟机指通过软件模拟的具有完整硬件系统功能的、运行在一个完全隔离环境中的完整计算机系统。还可以是多个上述物理服务器或者多个上述虚拟机组成的服务器集群实现的,本申请不作具体限定。
处理器1010可以由至少一个通用处理器构成,例如中央处理器(central processing unit,CPU),或者CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(ASIC)、可编程逻辑器件(PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(CPLD)、现场可编程逻辑门阵列(FPGA)、通用阵列逻辑(GAL)或其任意组合。处理器1010执行各种类型的数字存储指令,例如存储在存储器1030中的软件或者固件程序,它能使电子设备1000提供较宽的多种服务。
在数据处理的设备1000是前述内容中的主机的情况下,存储器1030用于存储程序代码,并由处理器1010来控制执行,以执行上述图1-图6实施例中主机的处理步骤。程序代码中可以包括一个或多个软件模块。这一个或多个软件模块可以为图7所示实施例中提供的软件模块(在该实施例中各软件模块,如发送单元,获取单元以及恢复单元)。例如发送单元可用于主机向共享服务器发送第一版本数据和当前最新版本的信息,获取单元可用于获取共享服务 器记录的数据版本关系,并根据数据版本关系获取第一版本数据和第一日志,恢复单元可用于根据第一版本数据和第一日志在主机中恢复第二版本数据。具体可用于执行图3实施例中的步骤S410~步骤S420、步骤S470~步骤S480及其可选步骤,图6实施例的步骤1和步骤4及其可选步骤,还可以用于执行图1-图6实施例描述的其他步骤,这里不再进行赘述。
在数据处理的设备1000是前述内容中的共享服务器的情况下,存储器1030用于存储程序代码,并由处理器1010来控制执行,以执行上述图1-图6实施例中共享服务器的处理步骤。程序代码中可以包括一个或多个软件模块。这一个或多个软件模块可以为图8所示实施例中提供的软件模块(在该实施例中各软件模块,如接收单元,生成单元以及发送单元)。例如接收单元用于接收主机发送的第一版本数据和当前最新版本的信息,生成单元用于根据第一版本数据和当前最新版本的信息生成数据版本关系,发送单元用于向备机发送数据版本关系。具体可用于执行图3实施例中的步骤S430及其可选步骤,图6实施例的步骤2和步骤6及其可选步骤,还可以用于执行图1-图6实施例描述的其他步骤,这里不再进行赘述。
在数据处理的设备1000是前述内容中的备机的情况下,存储器1030用于存储程序代码,并由处理器1010来控制执行,以执行上述图1-图6实施例中备机的处理步骤。程序代码中可以包括一个或多个软件模块。这一个或多个软件模块可以为图9所示实施例中提供的软件模块(在该实施例中各软件模块,如获取单元以及恢复单元)。例如获取单元用于获取共享服务器记录的数据版本关系,并根据数据版本关系获取第一版本数据和第一日志,恢复单元用于根据第一版本数据和第一日志在备机中恢复第二版本数据。具体可用于执行图3实施例中的步骤S440~步骤S460及其可选步骤,图6实施例的步骤4~步骤5、步骤7~步骤8及其可选步骤,还可以用于执行图1-图6实施例描述的其他步骤,这里不再进行赘述。
存储器1030可以包括易失性存储器(Volatile Memory),例如随机存取存储器(Random Access Memory,RAM);存储器1030也可以包括非易失性存储器(Non-Volatile Memory),例如只读存储器(Read-Only Memory,ROM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);存储器1030还可以包括上述种类的组合。存储器1030可以存储有程序代码,具体可以包括用于执行图4或图5实施例描述的其他步骤的程序代码,这里不再进行赘述。其中,在数据处理的设备1000是前述内容中的主机和备机的情况下,存储器1030可包括缓冲池、文件盘以及日志盘,在数据处理的设备1000是前述内容中的共享服务器的情况下,存储器1030可包括共享缓冲池。
通信接口1020可以为有线接口(例如以太网接口),可以为内部接口(例如高速串行计算机扩展总线(Peripheral Component Interconnect express,PCIe)总线接口)、有线接口(例如以太网接口)或无线接口(例如蜂窝网络接口或使用无线局域网接口),用于与与其他设备或模块进行通信。
应理解,根据本实施例的数据处理的设备1000可对应于本申请实施例中的主机100、共享服务器300和备机200,并可以对应于执行根据图3所示的方法中的相应主体,并且设备1000中的各个模块的上述和其它操作和/或功能分别为了实现图2至图6中的各个方法的相应流程,为了简洁,在此不再赘述。
需要说明的,图10仅仅是本申请实施例的一种可能的实现方式,实际应用中,数据处理的设备1000还可以包括更多或更少的部件,这里不作限制。关于本申请实施例中未示出或未描述的内容,可参见前述图1-图6实施例中的相关阐述,这里不再赘述。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。计算机程序产品 包括至少一个计算机指令。在计算机上加载或执行计算机程序指令时,全部或部分地产生按照本发明实施例的流程或功能。计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含至少一个可用介质集合的服务器、数据中心等数据存储节点。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD)、或者半导体介质。半导体介质可以是SSD。
以上,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (16)

  1. 一种数据处理方法,其特征在于,所述方法包括:
    备机获取共享服务器记录的数据版本关系,所述数据版本关系用于记录不同版本的数据的恢复依赖关系,所述备机为数据库系统中用于备份数据的设备,所述共享服务器与所述备机和主机相连,所述主机用于接收对所述数据库系统的读写请求;
    所述备机根据所述数据版本关系获取第一版本数据和第一日志,其中,第二版本数据为当前时刻所述主机中最新的数据,所述第一日志用于标识所述第二版本相对于所述第一版本变更的操作过程,所述第一版本早于所述第二版本;
    所述备机根据所述第一版本数据和所述第一日志在所述备机中恢复所述第二版本数据。
  2. 根据权利要求1所述的方法,其特征在于,所述第一版本早于所述第二版本包括:所述第一版本是与所述第二版本相邻的版本。
  3. 根据权利要求1所述的方法,其特征在于,所述第一版本早于所述第二版本包括:
    所述第二版本与第三版本相邻,所述第三版本与所述第一版本相邻,所述第一日志包括第二日志和第三日志,其中,所述第二日志用于标识所述第三版本相对于所述第一版本变更的操作过程,所述第三日志用于标识所述第二版本相对于所述第三版本变更的操作过程;
    所述备机根据所述第一版本数据和所述第一日志在所述备机中恢复所述第二版本数据包括:
    所述备机根据所述第一版本数据和所述第二日志在所述备机中恢复所述第三版本数据;
    所述备机根据所述第三版本数据和所述第三日志在所述备机中恢复所述第二版本数据。
  4. 根据权利要求1至3任一权利要求所述的方法,其特征在于,所述备机根据所述数据版本关系获取所述第一版本数据和所述第一日志包括:
    所述备机根据所述数据版本关系,确定所述备机存储的数据版本早于所述第一版本;
    所述备机向所述共享服务器获取所述第一版本数据,所述备机接收所述共享服务器发送的所述第一日志。
  5. 一种数据处理方法,其特征在于,所述方法包括:
    主机向共享服务器发送第一版本数据和当前最新版本的信息,其中,第二版本数据为当前时刻所述主机中最新的数据,所述当前最新版本的信息包括所述第二版本的信息,所述共享服务器与所述主机和备机相连,所述主机用于接收对数据库系统的读写请求,所述备机为所述数据库系统中用于备份数据的设备;
    所述主机向所述备机发送第一日志,所述第一日志用于标识所述第二版本相对于所述第一版本变更的操作过程,所述第一版本早于所述第二版本。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    所述主机获取所述共享服务器记录的数据版本关系,所述数据版本关系用于记录不同版本的数据的恢复依赖关系,所述数据版本关系是所述共享服务器根据所述主机发送的第一版本数据和当前最新版本的信息获得的;
    所述主机根据所述数据版本关系获取所述第一版本数据和所述第一日志;
    所述主机根据所述第一版本数据和所述第一日志在所述主机中恢复所述第二版本数据。
  7. 根据权利要求5或6所述的方法,其特征在于,所述主机向共享服务器发送第一版本数据和当前最新版本的信息包括:
    所述主机通过远程直接内存访问RDMA方法将所述第一版本数据写入所述共享服务器,向所述共享服务器发送所述当前最新版本的信息。
  8. 根据权利要求5至7任一权利要求所述的方法,其特征在于,所述第一版本早于所述第二版本包括:所述第一版本是与所述第二版本相邻的版本。
  9. 根据权利要求5至7任一权利要求所述的方法,其特征在于,所述第一版本早于所述第二版本包括:
    所述第二版本与第三版本相邻,所述第三版本与所述第一版本相邻;
    所述第一日志包括第二日志和第三日志,其中,所述第二日志用于标识所述第三版本相对于所述第一版本变更的操作过程,所述第三日志用于标识所述第二版本相对于所述第三版本变更的操作过程;
    所述主机根据所述第一版本数据和所述第一日志在所述主机中恢复所述第二版本数据包括:
    所述主机根据所述第一版本数据和所述第二日志在所述主机中恢复所述第三版本数据;
    所述主机根据所述第三版本数据和所述第三日志在所述主机中恢复所述第二版本数据。
  10. 一种数据处理方法,其特征在于,所述方法包括:
    共享服务器接收主机发送的第一版本数据和当前最新版本的信息,其中,第二版本数据为当前时刻所述主机中最新的数据,所述当前最新版本的信息包括所述第二版本的信息,所述共享服务器与所述主机和备机相连,所述主机用于接收对数据库系统的读写请求,所述备机为所述数据库系统中用于备份数据的设备;
    所述共享服务器根据所述第一版本数据和当前最新版本的信息生成数据版本关系,所述数据版本关系用于记录不同版本的数据的恢复依赖关系;
    所述共享服务器向所述备机发送所述数据版本关系。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    所述共享服务器接收所述备机发送的第一版本数据的获取请求,其中,所述第一版本数据的获取请求时所述备机根据所述数据版本关系,确定所述备机存储的数据版本早于所述第一版本后生成的;
    所述共享服务器向所述备机发送所述第一版本数据。
  12. 根据权利要求11所述的方法,其特征在于,所述共享服务器向所述备机发送所述第一版本数据包括:
    所述共享服务器向所述备机发送所述第一版本数据和校验码,所述校验码位于所述第一版本数据的末尾,所述校验码用于供所述备机确定接收到的第一版本数据是完整数据。
  13. 根据权利要求10或12所述的方法,其特征在于,所述共享服务器存储所述第一版本数据,并删除数据版本早于所述第一版本的数据。
  14. 根据权利要求10至12任一权利要求所述的方法,其特征在于,所述方法还包括:所述共享服务器存储的数据达到阈值的情况下,将最早接收到的数据删除。
  15. 一种数据处理系统,其特征在于,所述系统包括主机、备机和共享服务器,其中,所述共享服务器与所述备机和所述主机相连,所述备机用于执行如权利要求1至4任一权利要求所述的方法的操作步骤,所述主机用于执行如权利要求5至9任一权利要求所述的方法的操作步骤,所述共享服务器用于执行如权利要求10至14任一权利要求所述的方法的操作步骤。
  16. 一种数据处理的设备,其特征在于,所述设备包括处理器,所述处理器用于执行如权利要求1至14中任一所述方法的操作步骤。
PCT/CN2021/106701 2020-08-13 2021-07-16 数据处理的方法、设备及系统 WO2022033269A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010814112.6A CN114077517A (zh) 2020-08-13 2020-08-13 数据处理的方法、设备及系统
CN202010814112.6 2020-08-13

Publications (1)

Publication Number Publication Date
WO2022033269A1 true WO2022033269A1 (zh) 2022-02-17

Family

ID=80246843

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106701 WO2022033269A1 (zh) 2020-08-13 2021-07-16 数据处理的方法、设备及系统

Country Status (2)

Country Link
CN (1) CN114077517A (zh)
WO (1) WO2022033269A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116302673B (zh) * 2023-05-26 2023-08-22 四川省华存智谷科技有限责任公司 一种提高Ceph存储系统数据恢复速率的方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136783A (zh) * 2007-10-15 2008-03-05 中兴通讯股份有限公司 一种网管系统配置数据的备份、恢复方法及装置
CN101436207A (zh) * 2008-12-16 2009-05-20 浪潮通信信息系统有限公司 一种基于日志快照的数据恢复和同步方法
US20160117228A1 (en) * 2014-10-28 2016-04-28 Microsoft Corporation Point in Time Database Restore from Storage Snapshots
CN105955845A (zh) * 2016-04-26 2016-09-21 浪潮电子信息产业股份有限公司 一种数据恢复方法及装置
CN106599006A (zh) * 2015-10-20 2017-04-26 阿里巴巴集团控股有限公司 一种数据恢复方法和装置
CN109753381A (zh) * 2018-11-09 2019-05-14 深圳供电局有限公司 一种基于对象存储的持续数据保护方法
CN110196788A (zh) * 2018-03-30 2019-09-03 腾讯科技(深圳)有限公司 一种数据读取方法、装置、系统及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136783A (zh) * 2007-10-15 2008-03-05 中兴通讯股份有限公司 一种网管系统配置数据的备份、恢复方法及装置
CN101436207A (zh) * 2008-12-16 2009-05-20 浪潮通信信息系统有限公司 一种基于日志快照的数据恢复和同步方法
US20160117228A1 (en) * 2014-10-28 2016-04-28 Microsoft Corporation Point in Time Database Restore from Storage Snapshots
CN106599006A (zh) * 2015-10-20 2017-04-26 阿里巴巴集团控股有限公司 一种数据恢复方法和装置
CN105955845A (zh) * 2016-04-26 2016-09-21 浪潮电子信息产业股份有限公司 一种数据恢复方法及装置
CN110196788A (zh) * 2018-03-30 2019-09-03 腾讯科技(深圳)有限公司 一种数据读取方法、装置、系统及存储介质
CN109753381A (zh) * 2018-11-09 2019-05-14 深圳供电局有限公司 一种基于对象存储的持续数据保护方法

Also Published As

Publication number Publication date
CN114077517A (zh) 2022-02-22

Similar Documents

Publication Publication Date Title
US7487311B2 (en) System and method for asynchronous backup of virtual disks in a distributed storage array
US9535907B1 (en) System and method for managing backup operations of virtual machines
US8738813B1 (en) Method and apparatus for round trip synchronous replication using SCSI reads
US8924358B1 (en) Change tracking of individual virtual disk files
US9940205B2 (en) Virtual point in time access between snapshots
KR101833114B1 (ko) 분산 데이터베이스 시스템들을 위한 고속 장애 복구
KR101771246B1 (ko) 분산 데이터 시스템들을 위한 전 시스템에 미치는 체크포인트 회피
JP6362685B2 (ja) オンライン・ホット・スタンバイ・データベースのためのレプリケーション方法、プログラム、および装置
US8667330B1 (en) Information lifecycle management assisted synchronous replication
US20140208012A1 (en) Virtual disk replication using log files
US11307776B2 (en) Method for accessing distributed storage system, related apparatus, and related system
WO2015010327A1 (zh) 数据发送方法、数据接收方法和存储设备
JPH07239799A (ja) 遠隔データ・シャドーイングを提供する方法および遠隔データ二重化システム
CN103516549B (zh) 一种基于共享对象存储的文件系统元数据日志机制
WO2015054897A1 (zh) 数据存储方法、数据存储装置和存储设备
WO2023046042A1 (zh) 一种数据备份方法和数据库集群
US7386664B1 (en) Method and system for mirror storage element resynchronization in a storage virtualization device
JP6133396B2 (ja) 計算機システム、サーバ、及び、データ管理方法
WO2018076633A1 (zh) 一种远程数据复制方法、存储设备及存储系统
WO2019109256A1 (zh) 一种日志管理方法、服务器和数据库系统
CN113885809B (zh) 数据管理系统及方法
WO2022033269A1 (zh) 数据处理的方法、设备及系统
JPH10326220A (ja) ファイルシステムおよびファイル管理方法
CN110134551B (zh) 一种持续数据保护方法及装置
JP6376626B2 (ja) データ格納方法、データストレージ装置、及びストレージデバイス

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21855322

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21855322

Country of ref document: EP

Kind code of ref document: A1