WO2024078001A1 - 数据处理系统、数据处理方法、装置及相关设备 - Google Patents

数据处理系统、数据处理方法、装置及相关设备 Download PDF

Info

Publication number
WO2024078001A1
WO2024078001A1 PCT/CN2023/101476 CN2023101476W WO2024078001A1 WO 2024078001 A1 WO2024078001 A1 WO 2024078001A1 CN 2023101476 W CN2023101476 W CN 2023101476W WO 2024078001 A1 WO2024078001 A1 WO 2024078001A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage node
data
target
node
data processing
Prior art date
Application number
PCT/CN2023/101476
Other languages
English (en)
French (fr)
Inventor
任仁
陈明军
王伟
武装
曹宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024078001A1 publication Critical patent/WO2024078001A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation

Definitions

  • the present application relates to the field of database technology, and in particular to a data processing system, a data processing method, a device and related equipment.
  • data processing systems are usually deployed with a main center (or production center) and at least one disaster recovery center.
  • the main center provides data reading and writing services to the outside world
  • the disaster recovery center is responsible for backing up the data stored in the main center.
  • the disaster recovery center can continue to provide data reading and writing services to the outside world using the backed-up data to avoid data loss, thereby ensuring the reliability of data storage.
  • the main center usually copies the data of the main center to the disaster recovery center by sending a binlog file (a binary log file) to the disaster recovery center.
  • a binlog file a binary log file
  • the binlog file sent by the main center records the database statements used to update the data on the main center.
  • the disaster recovery center updates the data of the disaster recovery center by executing the database statements in the binlog, thereby replicating the data of the main center.
  • this method of data replication will usually result in a longer recovery point objective (RTO) of the data processing system, affecting the fault recovery performance of the data processing system.
  • RTO recovery point objective
  • a data processing system a data processing method, a data processing apparatus, a computing device, a computer-readable storage medium, and a computer program product to shorten the RTO of the data processing system and improve the fault recovery performance of the data processing system.
  • an embodiment of the present application provides a data processing system, which includes a computing cluster and a storage cluster.
  • the computing cluster and the storage cluster are connected via a network, for example, they can be connected via a wired network or a wireless network, etc.
  • the computing cluster includes a master computing node and a slave computing node, the slave computing node serves as a disaster recovery for the master computing node, and the storage cluster includes at least one storage node.
  • the master computing node and the slave computing node can share the storage node, and when the storage node includes multiple storage nodes, some storage nodes can serve as disaster recovery for another part of the storage nodes, and the master computing node and the slave computing node respectively access different storage nodes; wherein the master computing node is used to receive access requests, such as an access request for requesting to write new data to the data processing system, or for requesting to An access request to modify or delete data that has been persistently stored in the data processing system, and write data to the storage cluster, where the written data may be, for example, a redo log, a data page, or other types of data generated in the process of responding to the access request; a data processing device deployed on the storage side, for monitoring the data written by the main computing node to the storage cluster, and when it is recognized that the written data includes a target redo log, controlling a first storage node in at least one storage node to replay the target redo log, so as to update the target data
  • the slave computing node when the slave computing node is upgraded to the master computing node (such as when the original master computing node fails or the slave computing node receives an upgrade instruction for master-slave switching, etc.), since the data processing device on the storage side has controlled the storage node to update the persistently stored data by replaying the redo log, the slave computing node does not need to perform the process of replaying the redo log, but can directly take over the access request on the master computing node according to the persistently stored data in the storage node to continue to provide data read and write services for the client or other devices, thereby effectively shortening the RTO of the data processing system and improving the fault recovery performance of the data processing system.
  • the storage node updates the persistently stored data according to the redo log (which belongs to the physical log), which is compared with the method of updating data through binlog (which belongs to the logical log), because there is no need to repeat the database statement, but directly modify the data on the physical data page, which can effectively reduce the amount of resources consumed by data update.
  • At least one storage node includes a second storage node in addition to the first storage node, and the first storage node serves as a disaster recovery for the second storage node, such as the second storage node supports the main computing node for persistent storage of data.
  • the first storage node is used to back up the data persistently stored by the second storage node, etc.;
  • the second storage node is used to store the data written by the main computing node;
  • the data processing device is used to control the second storage node to send the target redo log to the first storage node when it is identified that the data written by the main computing node includes the target redo log, so that the first storage node plays back the target redo log to update the data persistently stored in the first storage node.
  • the slave computing node when the slave computing node takes over the access request on the main computing node, there is no need to control the first storage node to execute the process of playing back the redo log, but it can directly take over the access request on the main computing node according to the data persistently stored in the first storage node, thereby effectively shortening the RTO of the data processing system and improving the fault recovery performance of the data processing system.
  • the second storage node and the first storage node are deployed in the same physical area, such as in the same data center. In this way, the reliability of data storage in the local area can be achieved.
  • the second storage node and the first storage node are deployed in different physical areas, such as the first storage node is deployed in AZ1, and the second storage node is deployed in AZ2, etc. In this way, the reliability of data storage in a different location can be improved.
  • the first storage node and the second storage node are deployed in the same physical area, and the data processing device is further used to create the first storage node in the physical area according to the second storage node, and the data in the first storage node is obtained by taking a snapshot or cloning the data in the second storage node.
  • the storage node used as a disaster recovery can be quickly created by means of snapshots or cloning.
  • a target application such as MySQL
  • the target redo log is generated by the target application during operation.
  • the target application may include a service layer and a storage engine layer, wherein, in the process of responding to access requests, the service layer may generate a corresponding binlog, and the storage engine layer may generate a corresponding redo log, such as the above-mentioned target redo log.
  • the main computing node may be deployed with multiple applications, including the target application.
  • the data processing device is specifically used to identify the target redo log according to the configuration file of the target application or the naming format of the redo log of the target application.
  • the naming rules of the redo log generated by the target application can be known in advance, so that the data processing device can identify the target redo log according to the naming rules; or, the configuration file can record information for distinguishing the target redo log generated by the target application, such as the name of the target redo log, so that the data processing device can identify the target redo log according to the configuration file of the target application.
  • the first storage node is used to replay the target redo log according to the format of the data page corresponding to the target application to update the data page on the first storage node.
  • the first storage node can restore the data page that needs to be modified in the first storage node according to the format of the data page, and modify the modification on the data page accordingly according to the modification operation recorded in the target redo log, and then store the modified data page persistently.
  • the target redo log can be used to update the data page on the storage node.
  • the first storage node is also used to obtain the format of the data page corresponding to the target application before replaying the target redo log, so as to restore the data page that needs to be modified on the first storage node according to the format of the data page.
  • the format of the data page can be pre-configured in the code program of the first storage node by a technician; or, the format of the data page can be configured in a configuration file in the first storage node, so that the first storage node can read the format of the data page from the configuration file; or, the data processing device can determine the format of the data page by identifying the data page in the data written by the main computing node, and notify the first storage node of the format of the data page so that the first storage node knows the format of the data page.
  • the target application on the main computing node may include a relational database management system (RDBMS), which may be at least one of MySQL, PostgreSQL, openGauss, and oracle, or may be other types of applications.
  • RDBMS relational database management system
  • the application on the main computing node is an open source database application such as MySQL, PostgreSQL, and openGauss
  • the playback operation of the redo log is controlled by the data processing device on the storage side, there is no need for the application on the main computing node on the computing side to intervene, and the application is transparent.
  • the first storage node and the second storage node are deployed in different physical areas.
  • the second storage node is also used to send baseline data to the first storage node before sending the target redo log to the first storage node.
  • the baseline data can be the data persistently stored by the first storage node at a certain moment and the redo log it has.
  • the first storage node is also used to store the baseline data before recycling the target redo log. In this way, the first storage node and the second storage node can achieve initial data synchronization in advance through the baseline replication process, so that the real-time synchronization of data between the two storage nodes can be maintained according to the target redo log later.
  • the data processing device is further used to set a lower limit value of the storage space for caching redo logs in the second storage node before controlling the second storage node to send the baseline data. In this way, it is possible to avoid as much as possible that part of the redo logs generated during the baseline copy process are recycled if they are not copied to the first storage node in time, thereby avoiding as much as possible the second execution of the baseline copy process in a short period of time.
  • the data processing device is also used to control the second storage node to send the binary log (binlog) corresponding to the target data to the first storage node, and the binary log is used to record database statements; the first storage node is specifically used to verify the target redo log using the binary log, and if the verification passes, replay the target redo log to update the data persistently stored in the first storage node.
  • the binlog can be used to verify the correctness of the redo log to be replayed by the first storage node to ensure the correctness of the data stored in the first storage node.
  • the first storage node may refuse to replay the target redo log, and may instruct the slave computing node to perform a binlog replay process to ensure the correctness of data stored in the first storage node.
  • the binary log is sent from the master computing node to the slave computing node, and the slave computing node sends the binary log to the slave storage node.
  • both the master computing node and the slave computing node can access the first storage node, and the first storage node includes a read cache area. Then, the first storage node is also used to cache the target data to the read cache area before replaying the target redo log; the master computing node is also used to read the target data from the read cache area. In this way, when the master computing node needs to read the target data, compared with reading the data from the persistent storage of the first storage node, directly reading the target data from the read cache area can effectively improve data reading efficiency.
  • the first storage node is used to eliminate the target data from the read cache area after replaying the target redo log. In this way, the storage space occupied by the target data in the read cache area can be released, so that the released storage space can be used to continue caching other data pages newly written by the main computing node, thereby realizing sustainable utilization of the read cache area.
  • At least one storage node in the storage cluster includes a storage array, and the storage array is used to persistently store data, thereby improving the reliability of data storage in the storage array.
  • an embodiment of the present application provides a data processing system, which includes a computing cluster and a storage cluster.
  • the computing cluster and the storage cluster are connected through a network, for example, they can be connected through a wired network or a wireless network, etc.
  • the computing cluster includes a master computing node and a slave computing node, the slave computing node serves as a disaster recovery for the master computing node, and the storage cluster includes a first storage node and a second storage node, the second storage node serves as a disaster recovery for the first storage node;
  • the master computing node is used to receive an access request and generate a binary log corresponding to the target data, the binary log records database statements, and writes data to the storage cluster;
  • a data processing device is used to monitor the data written by the master computing node, and when it is identified that the written data includes a binary log, control the first storage node to send the binary log to the second storage node, or send the binary log to the second storage node;
  • the slave computing node is used to replay the binary log to update the target data to the data persistently stored in the second storage node, and take over the access request on the master computing node according to the updated data persistently stored in the second storage node.
  • each disaster recovery center can receive binlog through the storage node without the need for the main computing node to send binlog to each disaster recovery center.
  • the slave computing nodes in each disaster recovery center can synchronize data with the main center by replaying the binlog, ensuring that the RPO of the data processing system can be 0, and avoiding the main computing node from affecting the replication of binlog between the main center and the disaster recovery center due to excessive load, resulting in inconsistency between the data of the disaster recovery center and the data of the main center, affecting the RPO of the data processing system.
  • an embodiment of the present application provides a data processing method, which is applied to a data processing system, wherein the data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected via a network, the computing cluster includes a master computing node and a slave computing node, the storage cluster includes at least one storage node and a data processing device, and the slave computing node serves as a disaster recovery for the master computing node.
  • the method includes: the data processing device monitors the data written by the master computing node to the storage cluster; when the data processing device identifies that the written data includes a target redo log, the data processing device controls a first storage node among at least one storage node to replay the target redo log to update the target data recorded in the target redo log to the data persistently stored by at least one storage node, and the updated persistently stored data in the at least one storage node is used by the slave computing node to take over the access request on the master computing node.
  • At least one storage node further includes a second storage node, and the first storage node serves as a disaster recovery for the second storage node.
  • the method further includes: when the data processing device recognizes that the written data includes the target redo log, controlling the second storage node The target redo log is sent to the first storage node, so that the first storage node plays back the target redo log to update the data persistently stored in the first storage node.
  • the second storage node and the first storage node are deployed in the same physical area; or, the second storage node and the first storage node are deployed in different physical areas.
  • a target application is running on the primary computing node, and a target redo log is generated by the target application during its running.
  • the main computing node is deployed with multiple applications, and the multiple applications include the above-mentioned target application.
  • the data processing device identifies that the written data includes a target redo log, including: the data processing device identifies that the written data includes a target redo log according to a configuration file of a target application or a naming format of a redo log of a target application.
  • the first storage node and the second storage node are deployed in the same physical area
  • the method further includes: a data processing device creates a first storage node in the physical area based on the second storage node, and the data in the first storage node is obtained by taking a snapshot or cloning of the data in the second storage node.
  • a data processing device controls a first storage node among at least one storage node to replay a target redo log, including: the data processing device controls the first storage node to replay the target redo log according to the format of the data page corresponding to the target application to update the data page on the first storage node.
  • the method further includes: the data processing device controls the first storage node to obtain the format of the data page corresponding to the target application before playing back the target redo log.
  • the method further includes: the data processing device controls the second storage node to send the binary log corresponding to the target data to the first storage node, the binary log is used to record database statements; the data processing device controls the first storage node among the at least one storage node to replay the target redo log, including: the data processing device controls the first storage node to verify the target redo log using the binary log, so that the first storage node replays the target redo log when determining that the verification is passed.
  • both the master computing node and the slave computing node can access the first storage node, and the first storage node includes a read cache area; the method also includes: the data processing device controls the first storage node to cache the target data to the read cache area before playing back the target redo log, and the target data in the read cache area can be read by the master computing node.
  • the method further includes: the data processing device controls the first storage node to eliminate the target data from the read cache area after playing back the target redo log.
  • the target application includes a relational database management system RDBMS
  • the RDBMS includes at least one of MySQL, PostgreSQL, openGauss, and Oracle.
  • the first storage node and the second storage node are deployed in different physical areas; the method also includes: the data processing device controls the second storage node to send the baseline data to the first storage node before sending the target redo log to the first storage node; the data processing device controls the first storage node to store the baseline data before playing back the target redo log.
  • the method further includes: before the data processing device controls the second storage node to send the baseline data, setting a lower limit value of the storage space in the second storage node for caching redo logs.
  • At least one storage node includes a storage array, and the storage array is used for persistent storage of data.
  • the binary log is sent from the master computing node to the slave computing node, and the slave computing node sends the binary log to the first storage node.
  • the technical effects of the third aspect and each implementation method in the third aspect can refer to the corresponding first aspect and the technical effects of each implementation method in the first aspect, and will not be repeated here.
  • an embodiment of the present application provides a data processing device applied to a data processing system, wherein the data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected via a network, the computing cluster includes a master computing node and a slave computing node, the storage cluster includes at least one storage node and a data processing device, the slave computing node serves as a disaster recovery for the master computing node, and the data processing device includes: a monitoring module for monitoring data written by the master computing node to the storage cluster; a control module for controlling a first storage node in at least one storage node to replay the target redo log when identifying that the written data includes a target redo log, so as to update the target data recorded in the target redo log to the data persistently stored by at least one storage node, and the updated persistently stored data in the at least one storage node is used by the slave computing node to take over the access request on the master computing node.
  • At least one storage node also includes a second storage node, and the first storage node serves as a disaster recovery for the second storage node; the control module is further used to control the second storage node to send the target redo log to the first storage node when it is identified that the written data includes a target redo log, so that the first storage node replays the target redo log to update the data persistently stored in the first storage node.
  • a target application is running on the primary computing node, and a target redo log is generated by the target application during its running.
  • control module is configured to identify, based on a configuration file of a target application or a naming format of a redo log of a target application, that the written data includes a target redo log.
  • control module is specifically configured to control the first storage node to replay the target redo log according to the format of the data page corresponding to the target application, so as to update the data page on the first storage node.
  • control module is further configured to control the first storage node to obtain the format of the data page corresponding to the target application before playing back the target redo log.
  • control module is further used to control the second storage node to send the binary log corresponding to the target data to the first storage node, and the binary log is used to record database statements; the control module is specifically used to control the first storage node to use the binary log to verify the target redo log, so that the first storage node replays the target redo log when determining that the verification is passed.
  • both the master computing node and the slave computing node can access the first storage node, and the first storage node includes a read cache area;
  • the control module is also used to control the first storage node to cache the target data into the read cache area before playing back the target redo log, and the target data in the read cache area can be read by the main computing node.
  • control module is further configured to control the first storage node to eliminate target data from the read cache area after replaying the target redo log.
  • At least one storage node includes a storage array, and the storage array is used for persistent storage of data.
  • control module is further configured to set a lower limit value of a storage space in the second storage node for caching redo logs before controlling the second storage node to send the baseline data.
  • the main computing node is deployed with multiple applications, and the multiple applications include the above-mentioned target application.
  • the first storage node and the second storage node are deployed in the same physical area, and the control module is further used to create the first storage node in the physical area based on the second storage node, and the data in the first storage node is obtained by taking a snapshot or cloning of the data in the second storage node.
  • the binary log is sent from the master computing node to the slave computing node, and the slave computing node sends the binary log to the first storage node.
  • an embodiment of the present application also provides a data processing method, which is applied to a data processing system, wherein the data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected through a network, the computing cluster includes a master computing node and a slave computing node, the storage cluster includes at least one storage node and a data processing device, and the slave computing node serves as a disaster recovery for the master computing node.
  • the method includes: a first storage node among at least one storage node obtains a target redo log; the first storage node obtains the format of a data page in the first storage node; the first storage node replays the target redo log according to the format of the data page to update the target data recorded in the target redo log to the data persistently stored in at least one storage node, and the updated persistently stored data in the first storage node is used by the slave computing node to take over the access request on the master computing node.
  • the first storage node reads the format of the data page from a configuration file.
  • the first storage node receives the format of the data page sent by the data processing device.
  • the format of the data page may also be pre-configured in the code program of the first storage node.
  • an embodiment of the present application provides a data processing device, which is applied to a first storage node in a data processing system, the data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected through a network, the computing cluster includes a master computing node and a slave computing node, the storage cluster includes at least one storage node and a data processing device, the slave computing node serves as a disaster recovery for the master computing node, the at least one storage node includes the first storage node, and the data processing device includes: an acquisition module, which is used to acquire a target redo log and acquire the format of a data page in the first storage node; a playback module, which is used to play back the target redo log according to the format of the data page, so as to update the target data recorded in the target redo log to the data persistently stored by at least one storage node, The updated persistently stored data in the first storage node is used by the slave computing node to take over the access request on the master computing node
  • the acquisition module is used to read the format of the data page from a configuration file.
  • the acquisition module is used to receive the format of the data page sent by the data processing device.
  • the format of the data page may also be pre-configured in the code program of the first storage node.
  • an embodiment of the present application provides a data processing device, comprising: a processor and a memory; the memory is used to store instructions, and when the data processing device is running, the processor executes the instructions stored in the memory, so that the data processing device executes the data processing method described in the third aspect or any implementation of the third aspect.
  • the memory can be integrated into the processor or can be independent of the processor.
  • the computing device may also include a bus.
  • the processor is connected to the memory via a bus.
  • the memory may include a readable memory and a random access memory.
  • an embodiment of the present application provides a chip, comprising a power supply circuit and a processing circuit, wherein the power supply circuit is used to power the processing circuit, and the processing circuit executes the data processing method as described in the third aspect or any implementation of the third aspect.
  • an embodiment of the present application further provides a computer-readable storage medium, in which a program or instruction is stored.
  • the computer-readable storage medium is run on a computer, the data processing method described in the first device in the above-mentioned third aspect or any implementation of the third aspect is executed.
  • an embodiment of the present application also provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to execute the data processing method described in the third aspect or any implementation of the third aspect.
  • FIG1 is a schematic diagram of the structure of a data processing system
  • FIG2 is a schematic diagram of the structure of an exemplary data processing system provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application.
  • FIG5 is a schematic diagram showing that storage node 1021 and storage node 1022 can be deployed in the same physical area
  • FIG6 is a schematic diagram showing that storage node 1021 and storage node 1022 may be deployed in different physical areas
  • FIG7 is a schematic diagram of an exemplary execution of baseline replication and log replication between two storage nodes
  • FIG8 is a schematic diagram of another exemplary implementation of baseline replication and log replication between two storage nodes
  • FIG9 is a flow chart of a data processing method provided in an embodiment of the present application.
  • FIG10 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application.
  • FIG11 is a schematic diagram of the structure of a data processing device provided in an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the hardware structure of a data processing device provided in an embodiment of the present application.
  • FIG. 1 it is a schematic diagram of the structure of an exemplary data processing system 100, and the data processing system 100 may adopt a storage-computing separation architecture.
  • the data processing system 100 includes a computing cluster 101 and a storage cluster 102, and the computing cluster 101 and the storage cluster 102 may communicate with each other through a network, such as a wired network or a wireless network.
  • the computing cluster 101 includes a plurality of computing nodes, and different computing nodes can communicate with each other, and each computing node can be a computing device including a processor, such as a server, a desktop computer, etc.
  • each computing node can be a computing device including a processor, such as a server, a desktop computer, etc.
  • some computing nodes can be used as disaster recovery for other computing nodes.
  • FIG1 takes the computing cluster 101 including a master computing node 1011 and a slave computing node 1012 as an example for illustrative explanation, and the slave computing node 1012 is used as a disaster recovery for the master computing node 1011.
  • the slave computing node Specifically, 1012 can serve as a cold standby for the main computing node 1011, that is, when the main computing node 1011 is running, the slave computing node 1012 may not be running, or the slave computing node 1012 may use the computing resources thereon to process other businesses.
  • the slave computing node 1012 runs/recovers the computing resources and takes over the businesses on the main computing node 1011 using the backed-up data.
  • the storage cluster 102 may include one or more storage nodes, each of which may be a device including a persistent storage medium, such as a network attached storage (NAS), a storage server, etc., which may be used to store data persistently.
  • the persistent storage medium in the storage node may be, for example, a hard disk, such as a solid state disk or a shingled magnetic recording hard disk.
  • some storage nodes may serve as disaster recovery for another part of the storage nodes.
  • FIG1 takes the example that the storage cluster 102 includes a storage node 1021 and a storage node 1022, and the storage node 1022 serves as a disaster recovery for the storage node 1021.
  • the main computing node 1011 uses the storage node 1021 to provide data read and write services, and after the main computing node 1011 fails, the slave computing node 1012 uses the data backed up on the storage node 1022 to take over the business on the main computing node 1011.
  • the master computing node 1011 and the storage node 1021 can constitute a master center (usually a production site), and the slave computing node 1012 and the storage node 1022 can constitute a disaster recovery center (usually a disaster recovery site).
  • the storage cluster 102 can also include a storage node, in which case the master computing node 1011 and the slave computing node 1012 can share the storage node, so that after the master computing node 1011 fails, the slave computing node 1012 can continue to provide data read and write services using the data stored in the storage node.
  • One or more applications may be deployed on the main computing node 1011, and the deployed applications may be, for example, database applications or other applications.
  • the database application may be, for example, a relational database management system (RDBMS), and the RDBMS may include at least one of MySQL, PostgreSQL, openGauss, and Oracle, or may be other types of database systems.
  • RDBMS relational database management system
  • the main computing node 1011 usually receives an access request sent by a client or other device on the user side, such as receiving an access request sent by a client on the user side for reading or modifying data in the storage node 1021.
  • the application on the main computing node 1011 may respond to the access request and provide corresponding data read and write services for the client or other device.
  • the application on the master computing node 1011 will generate a binary log (binlog), which is a logical log, and is used to record database statements, such as SQL statements, for updating the data persistently stored in the storage node 1021.
  • the application may include a service layer and a storage engine layer, and the binlog may be generated by the service layer. Then, the master computing node 1011 will send the generated binlog to the slave computing node 1012, and the slave computing node 1012 will update the data in the storage node 1022 by executing the database statements in the binlog, so that the data in the storage node 1022 is consistent with the data in the storage node 1021, that is, the data in the storage node 1021 is copied to the storage node 1022.
  • the slave computing node 1012 After the failure of the master computing node 1011, the slave computing node 1012 needs to run/reclaim computing resources and use the computing resources to start running the application on the slave computing node 1012. Then, the application on the slave computing node 1012 executes the database statements recorded in the binlog sent by the master computing node 1011 before the failure, so that the data in the storage node 1022 is consistent with the data before the failure of the storage node 1021. In this way, the slave computing node 1012 can take over the uncompleted access request on the master computing node 1011 according to the data stored in the storage node 1022.
  • RTO refers to the time interval between the moment when the data processing system 100 business is suspended after the disaster occurs and the moment when the data processing system 100 resumes business.
  • the application on the slave computing node 1012 needs to re-execute the database statements in the binlog to synchronize the data stored in the storage node 1021 and the storage node 1022, which also consumes more resources.
  • the present application provides a data processing system 200.
  • a data processing device 201 is added on the storage side.
  • the data processing device 201 can be deployed on the storage node 1021, for example, to control the copying of data in the storage node 1021 to the storage node 1022.
  • redo log which is a physical log and can record the new data requested to be written or the modified data requested or the data requested to be deleted by the client or other devices.
  • redolog is a physical log and can record the new data requested to be written or the modified data requested or the data requested to be deleted by the client or other devices.
  • the data processing device 201 monitors the data written by the main computing node 1011 to the storage cluster 102.
  • the written data may include redo logs, and may also include updated data pages or other types of data, so that
  • the data processing device 201 recognizes that the data written by the main computing node 1011 includes a redo log, it controls the storage node 1021 to send the redo log to the storage node 1022, so that the storage node 1022 updates the written new data or modified data to the data persistently stored by the storage node 1022 by replaying the redo log, or deletes part of the data persistently stored by the storage node 1022.
  • the storage node 1021 can also update its own persistently stored data by replaying the redo log, so as to keep the data persistently stored in the storage node 1021 and the storage node 1022 consistent.
  • the slave computing node 1012 when the slave computing node 1012 is upgraded to the master computing node (such as when the original master computing node 1011 fails or the slave computing node 1012 receives an upgrade instruction for master-slave switching, etc.), since the data processing device 201 on the storage side has controlled the storage node 1022 to update the persistently stored data by replaying the redo log, the slave computing node 1012 does not need to execute the process of replaying the redo log, but can directly take over the access request on the master computing node 1011 based on the data persistently stored in the storage node 1022, so as to continue to provide data read and write services for clients or other devices, thereby effectively shortening the RTO of the data processing system 200 and improving the fault recovery performance of the data processing system 200.
  • the storage node 1022 updates the persistently stored data according to the redo log (which belongs to the physical log).
  • the redo log which belongs to the physical log.
  • binlog which belongs to the logical log
  • the process of updating the data page in the storage node using the data recorded in the redo log can be called log consolidation.
  • the application on the main computing node 1011 is an open source database application such as MySQL, PostgreSQL, openGauss, etc.
  • the playback operation of the redo log is controlled by the data processing device 201 on the storage side, there is no need for application intervention on the main computing node 1011 on the computing side, and it is transparent to the application.
  • This makes it unnecessary to modify the application to have the ability to control the storage node 1021 to play back the redo log when the application is deployed on the main computing node 1011, thereby effectively reducing the difficulty of deploying applications on a single machine and improving the efficiency of application deployment.
  • the data processing device 201 deployed on the storage side may be implemented by software, such as program code deployed on a hardware device, etc.
  • the data processing device 201 may be deployed in the storage node 1021 (for example, deployed in the controller of the storage node 1021) in the form of software such as a plug-in, component or application.
  • the data processing device 201 may be implemented by a physical device, wherein the physical device may be, for example, a CPU, or may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), a system on chip (SoC), a software-defined infrastructure (SDI) chip, an artificial intelligence (AI) chip, a data processing unit (DPU), or any other processor or any combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL generic array logic
  • SoC system on chip
  • SDI software-defined infrastructure
  • AI artificial intelligence
  • DPU data processing unit
  • data processing system 200 shown in Fig. 2 is only an exemplary description, and in actual application, the data processing system 200 may also be implemented in other ways. For ease of understanding, this embodiment provides the following implementation examples.
  • the data processing device 201 may also be deployed outside the storage node 1021, such as being deployed in the storage node 1022, or the data processing device 201 may be deployed independently of the storage node 1021 and the storage node 1022, etc.
  • the master computing node 1011 and the slave computing node 1012 can share the same storage node 301, and the data processing device 201 is deployed in the storage node 301, as shown in FIG3. At this time, both the master computing node 1011 and the slave computing node 1012 can access the storage node 301, so that in the case of a failure of the master computing node 1011, the slave computing node 1012 can take over the business on the master computing node 1011 according to the data in the storage node 301.
  • the data processing device 201 on the storage side has controlled the storage node 301 to complete the data update process according to the redo log when the slave computing node 1012 takes over the business on the master computing node 1011, this enables the slave computing node 1012 to continue to provide data read and write services directly according to the data persistently stored in the storage node 301, thereby effectively shortening the RTO of the data processing system 200, thereby improving the fault recovery performance of the data processing system 200.
  • the storage node 301 may also be configured with a read cache area 202.
  • the read cache area 202 may be implemented by a storage medium such as a dynamic random access memory (DRAM).
  • DRAM dynamic random access memory
  • the read cache area 202 is used to cache the data pages written by the main computing node 1011.
  • the data pages cached in the read cache area 202 may be, for example, new data pages generated by the main computing node 1011.
  • the main computing node 1011 may write the modified data pages to the storage node 1021.
  • the read cache area 202 may cache the data page.
  • the main computing node 1011 when the main computing node 1011 needs to read the data page, it may directly read the data from the read cache area 202 in the storage node 1021. Compared with the method of reading data from the persistent storage medium (such as a hard disk, etc.) of the storage node 1021, the read efficiency of the main computing node 1011 may be effectively improved. The efficiency of data retrieval is improved, and the data access latency is reduced.
  • the persistent storage medium such as a hard disk, etc.
  • the data can be eliminated from the read cache area 202 to release the cache space occupied by the data page, thereby supporting the read cache area 202 to cache the data page newly written by the main computing node 1011, and this embodiment does not limit this.
  • the computing cluster and the storage cluster in the data processing system 200 may include three or more nodes, as shown in FIG4.
  • the computing cluster includes a plurality of computing nodes 410, each of which can communicate with each other, and some computing nodes 410 can serve as disaster recovery for another computing node 401.
  • Each computing node 410 is a computing device including a processor, such as a server, a desktop computer, etc.
  • the computing node 410 includes at least a processor 412, a memory 413, a network card 414, and a storage medium 415.
  • the processor 412 is a central processing unit (CPU) for processing data access requests from outside the computing node 410, or requests generated inside the computing node 410.
  • the processor 412 reads data from the memory 413, or, when the total amount of data in the memory 413 reaches a certain threshold, the processor 412 sends the data stored in the memory 413 to the storage node 400 for persistent storage.
  • FIG4 shows only one CPU 412. In practical applications, there are often multiple CPUs 412, wherein one CPU 412 has one or more CPU cores. This embodiment does not limit the number of CPUs or CPU cores.
  • Memory 413 refers to an internal memory that directly exchanges data with the processor. It can read and write data at any time and at a high speed, and serves as a temporary data storage for the operating system or other running programs. Memory includes at least two types of memory. For example, memory can be either a random access memory or a read-only memory (ROM). In actual applications, multiple memories 413 and different types of memories 413 can be configured in the computing node 410. This embodiment does not limit the number and type of memory 413.
  • the network card 414 is used to communicate with the storage node 400. For example, when the total amount of data in the memory 413 reaches a certain threshold, the computing node 410 can send a request to the storage node 400 through the network card 414 to store the data persistently.
  • the computing node 410 can also include a bus for communication between components inside the computing node 410.
  • the computing node 410 can also have a small number of hard disks built in, or a small number of hard disks connected externally.
  • Each computing node 410 can access the storage node 400 in the storage cluster through the network.
  • the storage cluster includes multiple storage nodes 400, and some storage nodes 400 can be used as disaster recovery for another part of the storage nodes 400.
  • a storage node 400 includes one or more controllers 401, a network card 404 and multiple hard disks 405.
  • the network card 404 is used to communicate with the computing node 410.
  • the hard disk 405 is used to store data persistently, and can be a disk or other types of storage media, such as a solid state drive or a shingled magnetic recording hard disk.
  • the controller 401 is used to write data to the hard disk 405 or read data from the hard disk 405 according to the read/write data request sent by the computing node 410.
  • controller 401 In the process of reading and writing data, the controller 401 needs to convert the address carried in the read/write data request into an address that can be recognized by the hard disk. In addition, some controllers 401 can also be used to implement the functions of the above-mentioned data processing device 201 to achieve data synchronization between the storage node 400 and other storage nodes 400.
  • one or more applications are running on the master computing node 1011.
  • the target application can support the master computing node 1011 to provide data read and write services for users.
  • the target application can first read the data page where the data requested to be modified by the access request is located from the storage node 1021 to the buffer pool in the master computing node 1011, and complete the modification of the data page in the buffer pool according to the access request.
  • the target application will generate a binlog and a target redo log for the modified content of the data page.
  • the binlog is used to record the database statement for modifying the data page
  • the target redo log is used to record that part of the data on the data page in the storage node 1021 is modified to new data (the new data can be empty, and part of the data on the data page is deleted at this time).
  • the new data is referred to as target data below.
  • the target redo log generated by the target application may be a group of files named ib_logfile, for example, a group of files respectively named ib_logfile0 and ib_logfile1, etc.
  • the master computing node 1011 can feedback to the client that the data is written/modified successfully. Since the speed of writing data to the buffer pool is usually higher than the speed of persistently storing the data, this can speed up the master computing node 1011 to respond to access requests.
  • the target application can write data to the storage cluster 102, and the written data may include redo logs, modified data pages, etc.
  • the data processing device 201 can monitor the data written by the main computing node 1011 to the storage cluster 102 on the storage side, and identify the written data. Whether the target redo log generated by the target application is included in the data (the data written by the primary computing node 1011 may also be other types of data such as a configuration file for the storage node 1021).
  • the data processing device 201 can identify the target redo log according to the naming format of the redo log (for example, the naming rule of the redo log generated by the target application is known in advance, and the target redo log is identified according to the naming rule), or the data processing device 201 can identify the target redo log according to the configuration file of the target application (the configuration file may record information for distinguishing the target redo log generated by the target application, such as the file name of the target redo log: ib_logfile0 and ib_logfile1, etc., and the target redo log is identified according to the information), and control the storage node 1021 to replay the target redo log on the storage side to update the target data recorded in the target redo log to the data persistently stored by the storage node 1021, thereby improving the reliability of data storage.
  • the above-mentioned technology of updating the persistently stored data according to the target redo log can be called page materialization technology, or log is data technology.
  • the data processing device 201 can identify the format of the data page of the storage node 1021. For example, the data processing device 1021 can determine the format of the data page by identifying the data page in the data written by the primary computing node 1011; or, the data processing device 201 can obtain the format of the data page from the configuration file in the storage node 1021. Then, the data processing device 201 can notify the storage node 1021 of the format of the data page.
  • the storage node 1021 plays back the target redo log, it can restore the data page where the target data is located in the storage node 1021 according to the format of the data page, and according to the modification operation recorded in the target redo log, modify the corresponding data on the data page to the target data, and store the modified data page persistently.
  • a file system FS
  • the data processing device 201 can instruct the FS in the storage node 1021 to replay the target redo log to update the target data recorded in the redo log to the data persistently stored in the storage node 1021.
  • the function of updating the data persistently stored in the storage node 1021 according to the redo log can be unloaded from the computing side (originally implemented by the main computing node 1011 according to the binlog) to the storage side (now implemented by the data processing device 201 and the storage node 1021), without the need for the main computing node 1011 to perceive and execute the operation of replaying the target redo log.
  • the slave computing node 1012 is deployed with the same target application as the master computing node 1011, and serves as a disaster recovery for the master computing node 1011, which can be a hot backup or a cold backup.
  • the slave computing node 1012 is used as a hot backup, the slave computing node 1012 and the master computing node 1011 are continuously in operation; thus, when the master computing node 1011 fails, the slave computing node 1012 can use the data stored in the storage node 1022 to immediately take over the business on the master computing node 1011, specifically, to process the access requests that were not processed by the master computing node 1011 at the time of the failure.
  • the slave computing node 1012 When the slave computing node 1012 is used as a cold backup, during the normal operation of the master computing node 1011, the slave computing node 1012 may not run (such as being in a dormant state, etc.), or the slave computing node 1012 may release the computing resources thereon and use the released computing resources to process other businesses, such as offline computing businesses, etc.
  • the main computing node 1011 fails, the slave computing node 1012 starts to run/reclaim computing resources, and uses the data stored in the storage node 1022 to take over the business on the main computing node 1011.
  • the storage node 1022 serves as a disaster recovery for the storage node 1021 and is used to back up the data persistently stored in the storage node 1021.
  • the main computing node 1011 can have multiple slave computing nodes as disaster recovery nodes, so that some slave computing nodes can be used as cold backup for the main computing node 1011, and another part of the slave computing nodes can be used as hot backup for the main computing node 1011.
  • the data processing device 201 not only controls the storage node 1021 to replay the target redo log update data, but also controls the storage node 1022 to replay the target redo log to update the data persistently stored in the storage node 1022, thereby achieving consistency of data between the storage node 1021 and the storage node 1022.
  • the data processing device 201 can control the storage node 1021, send the generated target redo log to the storage node 1022 through the network card or communication interface in the storage node 1021, and instruct the storage node 1022 to update its own persistent storage data according to the received target redo log, or the storage node 1022 can automatically update the persistent storage data according to the received target redo log.
  • the data processing device 201 can establish a wired or wireless connection with the storage node 1022, so that the data processing device 201 can send the target redo log to the storage node 1022 through the connection with the storage node 1022.
  • the data processing device 201 instructs the storage node 1022 again, and the storage node 1022 can also automatically update its own persistent storage data according to the received target redo log.
  • the storage node 1022 can immediately execute the playback process for the target redo log, so that the storage node 1022 and the storage node 1021 can maintain data consistency in real time.
  • the data processing device 201 when the data processing device 201 (or the control storage node 1021) sends the target redo log to the storage node 1022, it can send the newly added log record in the target redo log relative to the redo log sent last time to the storage node 1022. For example, before the data processing device 201 (or the control storage node 1021) sends the redo log last time, it can record the redo log. The location of the latest log record in the target redo log is used to identify the log record that has been sent to the storage node 1022.
  • the data processing device 201 can determine the newly added log records in the target redo log that have not been sent to the storage node 1022 according to the recorded location, and further send the newly added log records to the storage node 1022. Accordingly, the data processing device 201 can play back the received log records to implement the update of the persistently stored data.
  • the data processing device 201 may send the entire log file of the target redo log to the storage node 1022.
  • the storage node 1022 may search for the position of the currently played back log record in the redo log file, thereby determining the log record in the target redo log that has not yet been played back based on the position, and starting from the position, play back each newly added log record in the target redo log one by one in sequence.
  • the data processing device 201 each time a log record is updated in the redo log, the data processing device 201 (or the control storage node 1021) can send the log record or the entire log file of the redo log to the storage node 1022.
  • the data processing device 201 monitors that the redo log is updated (such as the LSN of the redo log changes, etc.), it can determine whether the number of updated log records in the redo log reaches a preset number (such as 10, etc.), and when the number of updated log records reaches the preset number, it is determined to send the newly added log record or the entire log file of the redo log to the storage node 1022, so as to reduce the input/output (IO) of the synchronization of the redo log between the storage node 1021 and the storage node 1022.
  • IO input/output
  • the data processing device 201 when the data processing device 201 monitors that the redo log is updated, it can determine that the storage space occupied by the updated log record in the redo log reaches a preset threshold, and when the occupied storage space reaches the preset threshold, it is determined to send the newly added log record or the entire log file of the redo log to the storage node 1022.
  • the data processing device 201 may start timing when monitoring that the redo log is updated, and when the timing duration reaches a preset duration, all log records added in the redo log within the timing period may be sent to the storage node 1022, or the entire log file of the redo log may be directly sent to the storage node 1022, etc.
  • the storage node 1022 Before replaying the target redo log, the storage node 1022 can obtain the format of the data page in the storage node 1022. In this way, the storage node 1022 can restore the data page that needs to be modified in the storage node 1022 according to the format of the data page, and modify the corresponding data on the data page to the target data according to the modification operation recorded in the target redo log. Then, the storage node 1022 performs persistent storage on the modified data page.
  • the format of the data page can be pre-configured in the storage node 1022 by a technician, such as statically configured in the code program of the storage node 1022, or recorded in a configuration file in the storage node 1022, so that the storage node 1022 can read the format of the data page from the configuration file.
  • the configuration file can record a variety of data page formats, and different data page formats correspond to different applications, or to different versions of the same application, so that the storage node 1022 can query the data page format corresponding to the target application from the configuration file according to the target application currently running on the main computing node 1011 or the slave computing node 1012.
  • the data processing device 201 can determine the format of the data page by identifying the data page in the data written by the main computing node 1011, and notify the storage node 1022 of the format of the data page so that the storage node 1022 knows the format of the data page.
  • the slave computing node 1012 can directly continue to process the unfinished access request of the master computing node 1011 for the user based on the data persistently stored in the storage node 1022, without executing the process of replaying the target redo log to update the data of the storage node 1022 to the latest state, which can effectively reduce the delay of the data processing system 200 to restore the business, that is, it can effectively reduce the RTO of the data processing system 200.
  • the data processing device 201 when the data processing device 201 is deployed on the storage node 1021, the data processing device 201 can control the storage node 1021 and at least one other storage node as its disaster recovery to replay the target redo log, which enables a single storage node to have the management capability of a cluster (the cluster is a cluster composed of multiple storage nodes).
  • storage node 1021 and storage node 1022 can also use the binlog generated by the primary computing node 1011 to verify the correctness of the target redo log.
  • storage node 1021 may generate an erroneous redo log due to program operation errors or other reasons. For example, for a binlog that has been rolled back in the primary computing node 1011, storage node 1021 still generates a redo log for the modified data indicated by the binlog, or storage node 1021 may fail to replay the target redo log due to partial redo logs. The generation of the sub-redo log fails and there is a missing redo log.
  • the data processing device 201 can also identify and obtain the binlog corresponding to the target data in the master computing node 1011, and send the binlog and the target redo log in the storage node 1021 to the storage node 1022.
  • the data processing device 201 can send the binlog to the storage node 1022 through the connection with the storage node 1022.
  • the binlog can be sent by the master computing node 1011 to the slave computing node 1012, and then sent to the storage node 1022 by the slave computing node 1012. Then, the storage node 1022 can use the received binlog to verify the target redo log, such as verifying whether the identifiers of the committed transactions recorded in the binlog and the target redo log are consistent.
  • the storage node 1022 replays the target redo log to update the data persistently stored in the storage node 1022.
  • the storage node 1022 may feedback the verification failure result to the data processing device 201.
  • the data processing device 201 may instruct the slave computing node 1012 to update the persistently stored data in the storage node 1022 by performing the operation of replaying the binlog, so as to ensure the correctness of the data stored in the storage node 1022.
  • the data processing device 201 can also instruct the storage node 1021 to verify the target redo log using the verification binlog, and when the target redo log passes the verification, the data persistently stored in the storage node 1021 is updated by replaying the target redo log.
  • the data processing device 201 or the storage node 1021 can instruct the primary computing node 1011 to update the data persistently stored in the storage node 1021 by executing the operation of replaying the binlog, so as to ensure the correctness of the data stored in the storage node 1021.
  • the data in the storage node 1021 and the storage node 1022 can be persistently stored in the format of a file.
  • the storage node 1021 and the storage node 1022 can be respectively deployed with a corresponding file system (FS), and the FS is used to manage the persistently stored files, such as the process of replaying the target redo log by the FS to update the persistently stored data.
  • FS file system
  • the data in the storage node 1021 and the storage node 1022 can also be persistently stored in the data block format (data block format), that is, when the storage node 1021 and the storage node 1022 store data, the data is divided into blocks according to a fixed size, and the data volume of each block can be, for example, 512 bytes or 4 kilobytes (KB).
  • data block format data block format
  • the data in the storage node 1021 and the storage node 1022 can also be stored in the object format.
  • the object can be the basic unit of the storage node to store data, and each object can include a combination of data and the attributes of the data, wherein the attributes of the data can be set according to the needs of the application in the computing node, including data distribution, service quality, etc.
  • the storage format of data is not limited.
  • storage node 1021 and storage node 1022 may be deployed in the same physical area, as shown in FIG5 .
  • storage node 1021 and storage node 1022 may be deployed in the same data center, or in the same availability zone (AZ), etc.
  • AZ availability zone
  • the data processing device 201 can use the storage volume in the storage node 1021 as the master volume, logically create a slave volume (i.e., slave volume), and assign the slave volume to the storage node 1022; then, the data processing device 201 can record the value of the log sequence number (LSN) of the redo log at the current time (assuming it is time t), and control the storage node 1021 to copy the data persistently stored in the master volume at the current time to the slave volume, and copy the redo log in the storage node 1021 whose LSN is not greater than the value to the storage node 1022.
  • LSN log sequence number
  • the storage node 1021 and the storage node 1022 can achieve data synchronization between the storage node 1021 and the storage node 1022 at time t by replaying the redo logs respectively.
  • the data processing device 201 copies the new redo log to the storage node 1022 to ensure data consistency between the two storage nodes.
  • the data processing device 201 may create replicas based on other storage objects in addition to using storage volumes as objects, and this embodiment does not limit this.
  • the data in the master volume can be copied to the slave volume by data cloning.
  • the slave volume can also be called a clone volume.
  • the data in the master volume can be copied to the slave volume by snapshot. In this way, on the basis of realizing data backup, it can not only improve the efficiency of data backup, but also reduce the overhead required for data backup and reduce the storage space of the copy.
  • the data processing system 200 can also support reading and writing data of the data processing system 200 at a certain moment through cascade cloning. For example, when it is necessary to perform corresponding analysis or testing services based on the data of the data processing system 200 at time T, the data processing system 200 can create a new storage node 1023, and clone the data in the secondary volume of the storage node 1022 to generate a clone volume, as shown in Figure 5. Then, the data processing device 201 can assign the clone volume to the newly created storage node 1023, and copy the redo log generated before time T to the newly created storage node 1023.
  • the data processing system 200 can support users to use the computing node 1013 to read and write data in the analysis/testing business based on the data of the clone volume in the newly created storage node.
  • the data processing The data processing device 201 can create a replication group, which includes a master volume and at least one slave volume, and the data in the slave volume can be obtained by copying or snapshotting the master volume.
  • the data processing device 201 can be used to handle data synchronization between different storage volumes in the replication group, so that the master computing node 1011 on the computing side does not need to maintain the replication group, thereby simplifying the data processing logic of the master computing node 1011 and helping to improve the performance of the master computing node 1011 in providing business services.
  • the slave computing node 1012 can detect in real time or periodically whether the master computing node 1011 fails. For example, the slave computing node 1012 can determine that the master computing node 1011 fails when it does not receive a heartbeat message sent by the master computing node 1011. When it is determined that the master computing node 1011 fails, the slave computing node 1012 is upgraded to the master computing node, and the storage node 1022 is instructed to upgrade to the master storage node. At this time, the slave computing node 1012 continues to take over the business on the master computing node 1011 directly based on the data in the storage node 1022.
  • the slave computing node 1012 can write the buffer in the slave computing node 1012 in the aforementioned manner, generate a redo log, and write it to the storage node 1022 (currently upgraded to the master storage node). Then, the data processing device 201 can instruct the storage node 1022 to replay the redo log to update the persistently stored data. In addition, the data processing device 201 can also control the copying of the redo log to the storage node 1021, so that the storage node 1021 can maintain data synchronization with the storage node 1022 by replaying the redo log.
  • the master computing node 1011 can be used as a disaster recovery for the slave computing node 1012 after the failure recovery, that is, the slave computing node 1012 remains the master node and the master computing node 1011 is used as a slave node.
  • the master computing node 1011 can trigger the master-slave switching process so as to restore the master computing node 1011 to the master node again, and restore the slave computing node 1012 to the slave node again.
  • storage node 1021 and storage node 1022 may be deployed in different physical areas, such as physical area 1 and physical area 2 shown in FIG6 .
  • storage node 1021 is deployed in data center A, and storage node 1022 is deployed in data center B; or, storage node 1021 is deployed in AZ1, and storage node 1022 is deployed in AZ2, etc.
  • cross-data center or cross-AZ disaster recovery can be achieved, thereby improving the reliability of data storage in a different location.
  • the data processing device 201 can use the storage volume in the storage node 1021 as the primary volume, and create a logical secondary volume in the remote storage node 1022. Then, when copying the data in the storage node 1021 to the storage node 1022, the data processing device 201 can first record the LSN value of the redo log at the current moment, and control the storage node 1021 to remotely send the data on the primary volume and the redo log in the storage node 1021 whose LSN is not greater than the value to the storage node 1022, so as to achieve data synchronization between the storage node 1021 and the storage node 1022 at time t.
  • the data processing device 201 can control the storage node 1021 to remotely send the redo log generated after time t to the storage node 1022, so as to ensure the data consistency between the two storage nodes.
  • the data processing device 201 creates a copy based on the storage volume as an example for explanation.
  • the data processing device 201 can also create a copy based on other types of storage objects, which is not limited in this embodiment.
  • the data processing device 201 can create a replication group, which includes a master volume and at least one slave volume located in other physical areas, and the data in the slave volume can be obtained by replicating the master volume.
  • the data processing device 201 can be used to handle data synchronization between different storage volumes in the replication group, so that the master computing node 1011 on the computing side does not need to maintain the replication group.
  • the slave computing node 1012 can detect the failure of the master computing node 1011 in real time or periodically, and when it is determined that the master computing node 1011 fails, the slave computing node 1012 is upgraded to the master computing node, and the storage node 1022 is upgraded to the master storage node, and the slave computing node 1012 continues to take over the business on the master computing node 1011 directly based on the data in the storage node 1022.
  • the slave computing node 1012 can write the data newly written by the user through the client into the buffer in the aforementioned manner and generate a redo log, and then the data processing device 201 can copy the newly generated redo log to the storage node 1021 by controlling the storage node 1022 to achieve data synchronization between the two storage nodes.
  • the master computing node 1011 can be used as a disaster recovery for the slave computing node 1012; or, the master computing node 1011 can be restored to the master node again through master-slave switching.
  • the storage node 1021 may have multiple disaster recovery nodes.
  • some of the multiple disaster recovery nodes may be deployed in the same physical area as the storage node 1021, and another part of the multiple disaster recovery nodes may be deployed in a different physical area from the storage node 1021, etc., so as to improve the reliability of data storage locally and remotely.
  • storage node 1021 (and storage node 1022) can persistently store data based on a memory array, and can further improve the reliability of persistent storage of data in storage node 1021 (and storage node 1022) based on independent disk redundant array (Redundant Arrays of Independent Disks, RAID) technology, erasure coding (erasure coding, EC) technology, deduplication compression, data backup and other technologies on the storage array.
  • independent disk redundant array Redundant Arrays of Independent Disks, RAID
  • erasure coding erasure coding
  • deduplication compression data backup and other technologies on the storage array.
  • the data consistency between the storage node 1021 and the storage node 1022 can be maintained based on a variety of data replication methods.
  • the data replication process between the storage node 1021 and the storage node 1022 is described in detail below.
  • the data processing device 201 may first determine whether a baseline replication is performed between the storage node 1021 and the storage node 1022.
  • Baseline replication refers to sending all the data persistently stored by the storage node 1021 at a certain time point (such as the current time) and the generated logs to the storage node 1022. For example, when the data replication operation is performed for the first time between the storage node 1021 and the storage node 1022, the data processing device 201 may determine to perform a baseline replication.
  • redo logs in storage node 1021 are not sent to storage node 1022 in time and are deleted in storage node 1021, such as the life cycle of some redo logs in storage node 1021 exceeds a preset time and is deleted, or some of the first generated redo logs are deleted because of insufficient available area for storing redo logs in storage node 1021, etc., at this time, storage node 1022 cannot obtain the deleted redo logs to maintain data synchronization with storage node 1021. Therefore, data processing device 201 can determine to perform baseline replication on storage node 1021 and storage node 1022 so as to synchronize data in the two storage nodes based on the baseline.
  • the data processing device 201 can determine the first moment (such as the current moment) as the moment corresponding to the baseline, and determine the baseline data based on the moment.
  • the baseline data includes the data persistently stored in the current storage node 1021 and the latest redo log generated at this moment.
  • the data processing device 201 can record the LSN of the redo log in the baseline data, which is hereinafter referred to as the baseline LSN, and control the storage node 1021 to remotely send the baseline data to the storage node 1022, as shown in Figure 7, so that the storage node 1022 stores the baseline data, which can be specifically persistently storing the data on multiple data pages in the baseline data, and updating the current persistently stored data according to the redo log in the baseline data.
  • the data processing device 201 may instruct the storage node 1021 to write the newly generated redo log (ie, the redo log whose LSN is greater than the baseline LSN) into the cache area 701 for temporary storage.
  • the storage node 1021 may recycle the cached redo logs in the cache area 701 when the remaining storage space of the cache area 701 is less than a threshold.
  • the data processing device 201 can set a lower limit value of the storage space of the cache area 701 in the storage node 1021, that is, set the storage space of the cache area 701 created by the storage node 1021 to be not less than the lower limit value.
  • the data processing device 201 can instruct the storage node 1021 and the storage node 1022 to replay the redo logs in the baseline data so that the data persistently stored in the storage node 1021 and the storage node 1022 are consistent with the baseline.
  • the data processing device 201 can check whether there is a redo log with an LSN greater than the baseline LSN that has been recycled. For example, the storage node 1021 will record the LSN of the redo log when recycling the redo log. Therefore, when the LSN of the recycled redo log is greater than the baseline LSN, it is determined that there is a redo log with an LSN greater than the baseline LSN that has been recycled. Otherwise, it is determined that there is no redo log with an LSN greater than the baseline LSN that has been recycled.
  • the data processing device 201 instructs the storage node 1021 to redetermine the baseline, such as determining the baseline based on the second moment (such as the current moment), and re-executes the baseline replication process based on the redetermined baseline to ensure data consistency between the two storage nodes (to prevent the storage node 1022 from losing part of the data in the storage node 1021).
  • the new baseline replication process can be an incremental replication relative to the previous baseline replication process, so as to reduce the amount of data required to be transmitted during the baseline replication process; or the new baseline replication process can also be a full replication, etc.
  • the data processing device 201 instructs the storage node 1021 to send the newly generated redo log in the cache area 701 to the storage node 1022, and this process is the log replication process.
  • the storage node 1021 can also send the newly added log records in the newly generated redo log relative to the redo log in the baseline data to the storage node 1022, and it is not necessary to send all the log records in the newly generated redo log.
  • the data processing device 201 can instruct the storage node 1022 to replay the newly generated redo log to update the data stored persistently.
  • the storage node 1021 can perform the process of replaying the redo log simultaneously with the storage node 1022, or the storage node 1021 and the storage node 1022 can perform the process of replaying the redo log asynchronously.
  • the storage node 1021 can recycle it. Furthermore, during the log replication process, if the storage node 1021 generates a new redo log, it is still written into the cache area 701 so as to be subsequently sent to the storage node 1022 .
  • the recovery point object (RPO) of the data processing system 200 can be 0, and the RPO can be used to measure the maximum amount of data loss that occurs during disaster recovery of the data processing system 200.
  • the data processing device 201 can control the storage node 1021 to perform the mirroring function of the redo log, and send the mirror of the redo log to the storage node 1022, so that the storage node 1022 can replay the mirror file of the redo log.
  • the data processing device 201 can control the storage node 1021 to replay the redo log, and control the storage node 1022 to replay the mirror file of the redo log, so as to achieve data synchronization between the storage node 1021 and the storage node 1022.
  • storage node 1021 can add it to the cache area 701, and the data processing device 201 can instruct storage node 1021 to replay the redo log and send the redo log in the cache area 701 to storage node 1022, so that storage node 1022 can achieve data synchronization between storage node 1021 and storage node 1022 by replaying the received redo log.
  • the two storage nodes are in a state to be synchronized.
  • the storage node 1021 and the storage node 1022 can continue to perform the log replication process, and if there is a part of the unreplicated redo log to be recycled in the storage node 1021, the data processing device 201 instructs the storage node 1021 to redetermine the baseline, and re-execute the baseline replication process based on the redetermined baseline to ensure data consistency between the two storage nodes.
  • the baseline replication process and the log replication process may be executed serially between storage node 1021 and storage node 1022, while in other possible implementations, the baseline replication process and the log replication process may be executed in parallel between storage node 1021 and storage node 1022.
  • the data processing device 201 may first determine whether a baseline replication is performed between the storage node 1021 and the storage node 1022. If it is determined that the baseline replication is adopted, the data processing device 201 may record the baseline LSN of the redo log corresponding to the baseline (assuming that it is the LSN corresponding to the latest redo log), and control the storage node 1021 to remotely send the persistent data corresponding to the baseline and the redo log not greater than the baseline LSN to the storage node 1022, as shown in FIG8.
  • the data processing device 201 may control the storage node 1021 to mirror the redo log, and send the mirror file of the redo log to the storage node 1022. In this process, the data processing device 201 may control the storage node 1021 to replay the newly generated redo log to update the data in the storage node 1021.
  • the data processing device 201 can control the storage node 1022 to first replay the redo logs included in the baseline whose LSN is not greater than the baseline LSN, and then replay the image files of the received redo logs in the order of log reception until the storage node 1022 completes the replay of the image files of the latest received redo logs.
  • the data persistently stored in the storage node 1021 and the storage node 1022 can be consistent, and the RPO of the data processing system 200 can be 0.
  • the master computing node 1012 and the slave computing node 1012 may also share the same storage node. As shown in FIG3 , both the master computing node 1012 and the slave computing node 1012 can access the storage node 301. At this time, for the redo log sent by the master computing node 1011, the data processing device 201 controls a storage node (i.e., the storage node 301) to complete the replay of the redo log.
  • a storage node i.e., the storage node 301
  • the data processing device 201 can control multiple storage nodes to respectively replay the redo logs, thereby realizing data synchronization between the master and slave storage nodes.
  • FIG9 it is a flow chart of a data processing method in an embodiment of the present application, and the method can be applied to the data processing system 200 shown in FIG2 . In actual application, the method can also be applied to other applicable data processing systems.
  • the following is an exemplary description using the data processing system 200 shown in FIG2 as an example, and the method can specifically include:
  • the master computing node 1011 receives an access request and writes data to the storage cluster 102 .
  • the main computing node 1011 can receive an access request sent by a client or other device on the user side.
  • the access request can be used to request to read the data persistently stored in the data processing system 200, or can be used to request to modify the data persistently stored in the data processing system 200, or can be used to request to write new data to the data processing system 200, etc.
  • the master computing node 1011 can cache the modified data or the newly written data in the process of responding to the access request, and generate a redo log for the data. Zhi.
  • the data processing device 201 monitors the data written by the master computing node 1011 to the storage cluster 102 .
  • the data written by the master computing node 1011 to the storage cluster 102 may include, for example, redo logs, data pages, and other data, or other types of data.
  • the data processing device 201 recognizes that the data written by the primary computing node 1011 to the storage cluster 102 includes a redo log.
  • the data processing device 201 determines whether there is any redo log in the storage node 1021 that has not been copied to the storage node 1022 and needs to be recycled. If so, execute step S905; if not, execute step S907.
  • the data processing device 201 controls the storage node 1021 and the storage node 1022 to perform baseline replication.
  • the data processing device 201 can record the LSN of the latest generated redo log (such as the redo log identified in step S903), and based on the redo log generated at the current moment and the data persistently stored in the storage node 1021 as the baseline, control the storage node 1021 to remotely send the data and redo log included in the baseline to the storage node 1022.
  • the LSN of the latest generated redo log such as the redo log identified in step S903
  • the data processing device 201 may also control the storage node 1022 to replay the redo log in the baseline so that the data persistently stored by the storage node 1021 and the storage node 1022 based on the baseline remain consistent.
  • the data processing device 201 copies the new redo log generated by the storage node 1021 during the baseline copy process to the storage node 1022, and returns to execute step S902.
  • the baseline copy process and the log copy process can be executed serially between the storage node 1021 and the storage node 1022. Then, the data processing device 201 can instruct the storage node 1021 to cache the new redo log generated during the baseline copy process, so that after the baseline copy is completed, the storage node 1021 is controlled to send the cached redo log (the LSN of the redo log is greater than the LSN of the redo log included in the baseline) to the storage node 1022, and the storage node 1022 is controlled to replay the received redo log.
  • the baseline copy process and the log copy process can be executed in parallel between storage node 1021 and storage node 1022.
  • data processing device 201 can instruct storage node 1021 to mirror the new redo log generated during the baseline copy process, and send the mirror file of the new redo log to storage node 1022, and control storage node 1022 to play back the mirror file of the redo log it received after playing back all the redo logs in the baseline.
  • the data processing device 201 controls the storage node 1021 to copy the identified redo log to the storage node 1022 .
  • the redo log generated by storage node 1021 is written into the cache area, so that the data processing device 201 can instruct storage node 1021 to send the redo log in the cache area to storage node 1022 .
  • the data processing device 201 may instruct the storage node 1021 to mirror the generated redo log and send the mirror file to the storage node 1022 .
  • the data processing device 201 controls the storage node 1021 and the storage node 1022 to replay the redo logs thereon respectively to update the data persistently stored in each storage node, and returns to execute step S902.
  • storage node 1021 and storage node 1022 update the persistently stored data according to the redo log.
  • storage node 1021 and storage node 1022 update the persistently stored data according to the redo log.
  • the data is modified directly on the data page of the storage node. This can effectively reduce the amount of resources required to update the persistently stored data in the storage node.
  • the slave computing node 1012 since the data processing device 201 located on the storage side has controlled the storage node 1022 to update the persistently stored data according to the redo log in the storage node 1021, when the slave computing node 1012 is upgraded to the master computing node (the original master computing node 1011 fails or the slave computing node 1012 receives an upgrade instruction for master-slave switching, etc.), the slave computing node 1012 does not need to execute the process of updating the data according to the redo log, but can continue to provide data reading and writing services directly according to the data persistently stored in the storage node 1022, thereby effectively shortening the RTO of the data processing system 200 and improving the fault recovery performance of the data processing system 200.
  • the data synchronization process between multiple storage nodes is introduced by taking the data processing system shown in FIG. 2 as an example.
  • the master computing node 1011 and the slave computing node 1012 share the same storage node.
  • the data processing device 201 instructs the storage node 301 to replay the generated redo log to update the persistently stored data.
  • the slave computing node 1012 does not need to perform the process of replaying the redo log, but can directly restore the business operation based on the current persistently stored data, thereby reducing the RTO of the data processing system 200.
  • the data processing method embodiment shown in FIG9 corresponds to the data processing system 200 embodiment shown in FIG2 to FIG8 . Therefore, the specific implementation process of the data processing method shown in FIG9 can be found in the relevant description of the aforementioned embodiments and will not be repeated here.
  • the data processing device 201 uses the redo log to control the storage node 1021 and the storage node 1022 to maintain data consistency. In other possible embodiments, the data processing device 201 can also use binlog to control multiple storage nodes to maintain data consistency. The following is an exemplary description in conjunction with Figure 10.
  • the computing cluster in the data processing system 500 includes a master computing node 5011, a slave computing node 5012, and a slave computing node 5013, and the slave computing node 5012 and the slave computing node 5013 are both used as disaster recovery for the master computing node 5011, for example, they can be hot backups.
  • the storage cluster in the data processing system 500 includes a storage node 5021, a storage node 5022, a storage node 5023, and a data processing device 503, and the storage node 5022 and the storage node 5023 are both used as disaster recovery for the storage node 5011.
  • the main computing node 5011 and the storage node 5021 can constitute the main center
  • the slave computing node 5012 and the storage node 5022 can constitute the disaster recovery center 1
  • the slave computing node 5013 and the storage node 5023 can constitute the disaster recovery center 2.
  • the main center and the disaster recovery center 1 are deployed in the same physical area
  • the main center and the disaster recovery center 2 are deployed in different physical areas, such as deployed in different data centers.
  • a binlog (a logical log) is usually generated.
  • the binlog can record database statements, such as SQL statements, used to update the data persistently stored in the storage node 5021.
  • the data processing device 503 can sense and identify the binlog in the main computing node 5011, for example, it can identify the binlog according to the log format of the binlog, and obtain the binlog from the main computing node 5011.
  • the data processing device 503 can control the storage node 5021 to send the binlog to the disaster recovery center 1 and the disaster recovery center 2 respectively, and specifically, it can send the binlog to the storage node 5022 and the storage node 5023 respectively.
  • the data processing device 5031 is deployed independently of each storage node, the data processing device 503 can be configured with a network card, so that the data processing device 503 can send the binlog to each disaster recovery center, etc. through the network card. Then, the data processing device 503 can instruct the slave computing node 5012 and the slave computing node 5013 to replay the binlogs received by each.
  • the slave computing node 5012 can update the data persistently stored in the storage node 5022 by executing the database statements recorded in the binlog, so that the data in the storage node 5022 is consistent with the data in the storage node 5021.
  • the slave computing node 5013 can also update the data persistently stored in the storage node 5023 by executing the database statements recorded in the binlog, so that the data in the storage node 5023 is consistent with the data in the storage node 5021.
  • each disaster recovery center can receive binlog through the storage node without the need for the main computing node 5011 to send binlog to each disaster recovery center.
  • the slave computing nodes in each disaster recovery center can synchronize data with the main center by replaying the binlog, ensuring that the RPO of the data processing system 500 can be 0, and avoiding the main computing node 5011 from affecting the replication of binlog between the main center and the disaster recovery center due to excessive load, resulting in inconsistency between the data in the disaster recovery center and the data in the main center, affecting the RPO of the data processing system 500.
  • the data processing device 503 can also determine whether baseline replication is required between the storage node 5021 and each storage node used as its disaster recovery. For example, when the data replication process is performed for the first time between the main center and the disaster recovery center, the data processing device 503 can determine to perform baseline replication.
  • the data processing device 503 determines whether baseline replication is required between the storage node 5021 and each storage node, the data of the primary volume in the storage node 5021 is replicated to the secondary volume of the storage node 5022 and the secondary volume of the storage node 5023 based on the baseline.
  • the specific implementation of the data processing device 503 controlling the storage node 5021 to perform baseline replication with each storage node can be referred to the description of the relevant parts of the data processing device 201 controlling the storage node 1021 to perform baseline replication with the storage node 1022 in the above embodiment, which will not be repeated here.
  • the data processing system 500 shown in Figure 10 is only used as an exemplary description.
  • the number of disaster recovery centers (and storage nodes used as disaster recovery) may be less or more, etc., and this embodiment does not limit this.
  • FIG. 11 a schematic diagram of a data processing device provided by an embodiment of the present application is shown.
  • the data processing device 1100 shown in FIG. 11 is located in a data processing system, such as the data processing system 200 shown in FIG. 2 , etc.
  • the data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected via a network, the computing cluster includes a master computing node and a slave computing node, the storage cluster includes at least one storage node and the data processing device, and the slave computing node serves as a disaster recovery for the master computing node.
  • the data processing device 1100 includes:
  • Monitoring module 1101 used for monitoring the data written by the main computing node to the storage cluster
  • the control module 1102 is used to control the first storage node among the at least one storage node to replay the target redo log when it is identified that the written data includes the target redo log, so as to update the target data recorded in the target redo log to the data persistently stored in the at least one storage node, and the updated persistently stored data in the at least one storage node is used by the slave computing node to take over the access request on the master computing node.
  • At least one storage node further includes a second storage node, and the first storage node serves as a disaster recovery for the second storage node;
  • the control module 1102 is also used to control the second storage node to send the target redo log to the first storage node when it is recognized that the written data includes the target redo log, so that the first storage node replays the target redo log to update the data persistently stored in the first storage node.
  • a target application is running on the primary computing node, and a target redo log is generated by the target application during its running.
  • control module 1102 is configured to identify, according to a configuration file of a target application or a naming format of a redo log of a target application, that the written data includes a target redo log.
  • control module 1102 is specifically configured to control the first storage node to replay the target redo log according to the format of the data page corresponding to the target application, so as to update the data page on the first storage node.
  • control module 1102 is further configured to control the first storage node to obtain the format of the data page corresponding to the target application before playing back the target redo log.
  • control module 1102 is further used to control the second storage node to send a binary log corresponding to the target data to the first storage node, where the binary log is used to record database statements;
  • the control module 1102 is specifically used to control the first storage node to verify the target redo log using the binary log, so that the first storage node plays back the target redo log when determining that the verification passes.
  • both the master computing node and the slave computing node can access the first storage node, and the first storage node includes a read cache area;
  • the control module 1102 is also used to control the first storage node to cache the target data into the read cache area before playing back the target redo log, and the target data in the read cache area can be read by the primary computing node.
  • control module 1102 is further configured to control the first storage node to eliminate target data from the read cache area after replaying the target redo log.
  • At least one storage node includes a storage array, and the storage array is used to persistently store data.
  • control module 1102 is further configured to set a lower limit value of a storage space in the second storage node for caching redo logs before controlling the second storage node to send the baseline data.
  • the main computing node is deployed with multiple applications, and the multiple applications include the above-mentioned target application.
  • the first storage node and the second storage node are deployed in the same physical area, and the control module is further used to create the first storage node in the physical area based on the second storage node, and the data in the first storage node is obtained by taking a snapshot or cloning of the data in the second storage node.
  • the binary log is sent from the master computing node to the slave computing node, and the slave computing node sends the binary log to the first storage node.
  • the data processing device 1100 provided in this embodiment corresponds to the data processing system in the above-mentioned embodiments, and is used to implement the functions of the data processing device 201 in the above-mentioned embodiments or the data processing method executed by the data processing device 201. Therefore, the functions of each module in this embodiment and the technical effects thereof can be found in the relevant descriptions in the above-mentioned embodiments and will not be elaborated here.
  • the embodiment of the present application further provides a data processing device.
  • the data processing device 1200 may include a communication interface 1210 and a processor 1220.
  • the data processing device 1200 may also include a memory 1230.
  • the memory 1230 may be arranged inside the data processing device 1200, or may be arranged outside the data processing device 1200.
  • each action performed by the data processing device 201 in the above embodiment may be implemented by the processor 1220.
  • each step of the processing flow may complete the method performed by the data processing device 201 in the above embodiment through the hardware integrated logic circuit or software instructions in the processor 1220.
  • the program code executed by the processor 1220 to implement the above method may be stored in the memory 1230.
  • the memory 1230 is connected to the processor 1220, such as a coupling connection.
  • Some features of the embodiments of the present application may be completed/supported by the processor 1220 executing program instructions or software codes in the memory 1230.
  • the software components loaded on the memory 1230 may be summarized functionally or logically, for example, the monitoring module 1101 and the control module 1102 shown in FIG. 11.
  • the function of the monitoring module 1101 shown in FIG. 11 may be implemented by the communication interface 1010.
  • Any communication interface involved in the embodiments of the present application may be a circuit, a bus, a transceiver or any other device that can be used for information exchange.
  • the communication interface 1210 in the data processing device 1200 illustratively, the other device may be a device connected to the data processing device 1200, etc.
  • the embodiments of the present application further provide a chip, including a power supply circuit and a processing circuit, wherein the power supply circuit is used to supply power to the processing circuit, and the processing circuit is used to:
  • the first storage node among the at least one storage node is controlled to replay the target redo log to update the target data recorded in the target redo log to the data persistently stored by the at least one storage node, and the updated persistently stored data in the at least one storage node is used by the slave computing node to take over the access request on the master computing node, wherein the chip is applied to a data processing device in a data processing system, the data processing system includes a computing cluster and a storage cluster, the computing cluster is connected to the storage cluster through a network, the computing cluster includes a master computing node and a slave computing node, the storage cluster includes at least one storage node and a data processing device, and the slave computing node serves as a disaster recovery for the master computing node.
  • the power supply circuit includes, but is not limited to, at least one of the following: a power supply subsystem, a power management chip, a power consumption management processor, or a power consumption management control circuit.
  • At least one storage node further includes a second storage node, the first storage node serves as a disaster recovery for the second storage node, and the processing circuit is used to:
  • the second storage node is controlled to send the target redo log to the first storage node, so that the first storage node replays the target redo log to update the data persistently stored in the first storage node.
  • the second storage node and the first storage node are deployed in the same physical area; or, the second storage node and the first storage node are deployed in different physical areas.
  • a target application is running on the primary computing node, and a target redo log is generated by the target application during its running.
  • the main computing node is deployed with multiple applications, and the multiple applications include the above-mentioned target application.
  • the processing circuit is specifically configured to:
  • the written data includes a target redo log.
  • the first storage node and the second storage node are deployed in the same physical area, and the processing circuit is further configured to:
  • a first storage node is created in the physical area according to the second storage node, and data in the first storage node is obtained by taking a snapshot or cloning of the data in the second storage node.
  • the processing circuit is specifically configured to:
  • the first storage node is controlled to replay the target redo log according to the format of the data page corresponding to the target application to update the data page on the first storage node.
  • the processing circuit is further configured to:
  • the first storage node is controlled to obtain the format of the data page corresponding to the target application before replaying the target redo log.
  • the processing circuit is further configured to:
  • Control the second storage node to send the binary log corresponding to the target data to the first storage node, where the binary log is used to record database statements;
  • the processing circuit is specifically used to: control the first storage node to verify the target redo log using the binary log, so that the first storage node plays back the target redo log when determining that the verification is passed.
  • both the master computing node and the slave computing node can access the first storage node, and the first storage node includes a read cache area;
  • the processing circuit is further used to: control the first storage node to cache the target data to the read cache area before playing back the target redo log, and the target data in the read cache area can be read by the main computing node.
  • the processing circuit is further configured to:
  • the first storage node is controlled to eliminate the target data from the read cache area after replaying the target redo log.
  • the target application includes a relational database management system RDBMS
  • the RDBMS includes at least one of MySQL, PostgreSQL, openGauss, and Oracle.
  • the first storage node and the second storage node are deployed in different physical areas; the processing circuit is further configured to:
  • the second storage node is controlled to send the baseline data to the first storage node before sending the target redo log to the first storage node; the data processing device controls the first storage node to store the baseline data before playing back the target redo log.
  • the processing circuit is further configured to:
  • a lower limit value of a storage space in the second storage node for caching redo logs is set.
  • At least one storage node includes a storage array, and the storage array is used to persistently store data.
  • the binary log is sent from the master computing node to the slave computing node, and the slave computing node sends the binary log to the first storage node.
  • the processor involved in the embodiments of the present application may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed by a hardware processor, or may be executed by a combination of hardware and software modules in the processor.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, modules or modules, which can be electrical, mechanical or other forms, and is used for information exchange between devices, modules or modules.
  • the processor may operate in conjunction with a memory.
  • the memory may be a non-volatile memory, such as a hard disk or a solid-state drive, or a volatile memory, such as a random access memory.
  • the memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • connection medium between the communication interface, processor and memory is not limited in the embodiments of the present application.
  • the memory, processor and communication interface may be connected via a bus.
  • the bus may be divided into an address bus, a data bus, a control bus, etc.
  • the embodiments of the present application further provide a computer storage medium, in which a software program is stored, and when the software program is read and executed by one or more computing devices, the method performed by the data processing device 102 provided in any one or more of the above embodiments can be implemented.
  • the computer storage medium may include: a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk, and other media that can store program codes.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented in one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that include computer-usable program code.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

提供一种数据处理系统,包括计算集群、存储集群,计算集群包括主计算节点以及从计算节点,存储集群包括至少一个存储节点以及数据处理装置;主计算节点,用于接收访问请求,并向该存储集群写入数据;数据处理装置,用于监控并识别写入的数据包括目标重做日志时,控制至少一个存储节点中第一存储节点回放目标重做日志,以将该目标重做日志中记录的目标数据更新至持久化存储的数据中;从计算节点,用于根据更新后的持久化存储的数据,接管主计算节点上的访问请求。如此,从计算节点接管主计算节点上的访问请求时,无需执行回放重做日志的过程,从而有效缩短数据处理系统的RTO。此外,本申请还公开相应的数据处理方法、装置及相关设备。

Description

数据处理系统、数据处理方法、装置及相关设备
本申请要求于2022年10月13日提交中国国家知识产权局、申请号为202211253915.4、申请名称为“数据处理系统、数据处理方法、装置及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据库技术领域,尤其涉及一种数据处理系统、数据处理方法、装置及相关设备。
背景技术
随着信息化技术的发展,数据处理系统,如MySQL,PostgreSQL,openGauss等,在金融、通讯、医疗、物流、电子商务等领域广泛应用,用于在各个领域中对业务数据进行持久化存储。
目前,数据处理系统通常会部署有主中心(或者称为生产中心)以及至少一个灾备中心。其中,正常情况下,主中心对外提供数据读写的服务,灾备中心负责对主中心存储的数据进行备份。这样,当主中心发生故障时,灾备中心即可利用备份的数据继续对外提供数据读写服务,避免数据发生丢失,以此保证数据存储的可靠性。
主中心通常是通过向灾备中心发送binlog文件(一种二进制日志文件)的方式,将主中心的数据复制至灾备中心。具体地,主中心发送的binlog文件中记录了主中心上用于更新数据的数据库语句,从而灾备中心在接收到binlog文件后,通过执行该binlog中的数据库语句来更新灾备中心的数据,以此实现对主中心数据的复制。但是,这种数据复制的方式,会导致数据处理系统的恢复点目标(recovery point objective,RTO)通常较长,影响数据处理系统的故障恢复性能。
发明内容
提供一种数据处理系统、数据处理方法、数据处理装置、计算设备、计算机可读存储介质以及计算机程序产品,以缩短数据处理系统的RTO,提高数据处理系统的故障恢复性能。
第一方面,本申请实施例提供一种数据处理系统,该数据处理系统包括计算集群、存储集群,该计算集群与存储集群通过网络进行连接,例如可以通过有线网络或者无线网络进行连接等,该计算集群包括主计算节点以及从计算节点,该从计算节点作为主计算节点的灾备,存储集群包括至少一个存储节点,比如,当存储集群包括一个存储节点时,主计算节点与从计算节点可以共享该存储节点,而当存储节点包括多个存储节点时,部分存储节点可以作为另一部分存储节点的灾备,主计算节点与从计算节点分别访问不同的存储节点;其中,主计算节点,用于接收访问请求,如用于请求向数据处理系统写入新数据的访问请求,或者用于请求对数据处理系统中已持久化存储的数据进行修改或者删除的访问请求等,并向该存储集群写入数据,所写入的数据例如可以是响应该访问请求的过程中所生成的重做日志、数据页或者其它类型的数据等;部署于存储侧的数据处理装置,用于监控主计算节点向存储集群写入的数据,并在识别到写入的数据包括目标重做日志时,控制至少一个存储节点中第一存储节点回放该目标重做日志,以将该目标重做日志中记录的目标数据更新至该至少一个存储节点所持久化存储的数据中;从计算节点,用于根据该至少一个存储节点中更新后的持久化存储的数据,接管主计算节点上的访问请求,例如可以接管主计算节点在故障时未完成处理的访问请求等。
如此,在从计算节点升级为主计算节点时(如原主计算节点发生故障或者从计算节点接收到主从切换的升级指令等),由于位于存储侧的数据处理装置已经控制存储节点通过回放重做日志来更新持久化存储的数据,这使得从计算节点无需执行回放重做日志的过程,而能够直接根据存储节点中持久化存储的数据接管主计算节点上的访问请求,以继续为客户端或者其它设备提供数据读写服务,从而可以有效缩短数据处理系统的RTO,提高数据处理系统的故障恢复性能。并且,存储节点根据重做日志(其属于物理日志)对持久化存储的数据进行更新,相比于通过binlog(其属于逻辑日志)更新数据的方式,因为无需重复执行数据库语句,而是直接在物理的数据页上修改数据,这可以有效降低数据更新所需消耗的资源量。
在一种可能的实施方式中,至少一个存储节点中除了包括第一存储节点之外,还包括第二存储节点,并且,第一存储节点作为第二存储节点的灾备,如第二存储节点支持主计算节点对于数据的持久化存储, 第一存储节点用于对第二存储节点所持久化存储的数据进行备份等;该第二存储节点,用于存储主计算节点写入的数据;该数据处理装置,用于在识别到主计算节点写入的数据中包括目标重做日志时,控制第二存储节点将目标重做日志发送至第一存储节点,以使得第一存储节点回放该目标重做日志,以更新该第一存储节点中持久化存储的数据。如此,在从计算节点接管主计算节点上的访问请求时,无需控制第一存储节点执行回放重做日志的过程,而能够直接根据第一存储节点中持久化存储的数据接管主计算节点上的访问请求,从而可以有效缩短数据处理系统的RTO,提高数据处理系统的故障恢复性能。
在一种可能的实施方式中,第二存储节点与第一存储节点部署于同一物理区域,如部署于同一数据中心等。如此,可以实现数据在本地存储的可靠性。或者,第二存储节点与第一存储节点部署于不同的物理区域,如第一存储节点部署于AZ1,而第二存储节点部署于AZ2等。如此,可以提高数据在异地存储的可靠性。
在一种可能的实施方式中,第一存储节点与第二存储节点部署于同一物理区域,数据处理装置,还用于根据第二存储节点在该物理区域内创建第一存储节点,该第一存储节点中的数据通过对第二存储节点中的数据进行快照或者克隆得到。如此,可以通过快照或者克隆的方式,实现作为灾备的存储节点的快速创建。
在一种可能的实施方式中,主计算节点上运行有目标应用,如MySQL等,该目标重做日志有该目标应用在运行过程中产生。实际应用时,目标应用可以包括服务层以及存储引擎层,其中,在响应访问请求的过程中,服务层可以生成相应的binlog,存储引擎层可以生成相应的重做日志,如上述目标重做日志。
可选地,主计算节点可以部署有多个应用,该多个应用包括目标应用。
在一种可能的实施方式中,数据处理装置,具体用于根据目标应用的配置文件或目标应用的重做日志的命名格式,识别目标重做日志。比如,目标应用为开源的应用时,可以预先了解目标应用生成的重做日志的命名规则,从而数据处理装置可以根据该命名规则来识别目标重做日志;或者,配置文件中可以记录有用于区分目标应用所生成的目标重做日志的信息,如目标重做日志的名称等,从而数据处理装置可以根据目标应用的配置文件识别出目标重做日志。
在一种可能的实施方式中,第一存储节点,用于根据目标应用对应的数据页的格式,回放目标重做日志,以更新第一存储节点上的数据页。比如,第一存储节点可以根据该数据页的格式,恢复第一存储节点中需要修改的数据页,并根据该目标重做日志中所记录的修改操作,对该数据页上的修改进行相应的修改,并将修改后的数据页再进行持久化存储。如此,可以利用目标重做日志实现对存储节点上的数据页的更新。
在一种可能的实施方式中,第一存储节点,还用于在回放目标重做日志之前,获取目标应用对应的数据页的格式,以便于根据该数据页的格式,恢复第一存储节点上需要修改的数据页。示例性地,数据页的格式,可以由技术人员预先配置于第一存储节点的代码程序中;或者,数据页的格式,可以被配置于第一存储节点中的配置文件,从而第一存储节点可以从该配置文件中读取该数据页的格式;或者,数据处理装置可以通过识别主计算节点所写入的数据中的数据页,确定数据页的格式,并将数据页的格式通知给第一存储节点,以便第一存储节点获知数据页的格式。
在一种可能的实施方式中,主计算节点上的目标应用可以包括关系数据库管理系统(RDBMS),该关系数据库管理系统可以是MySQL、PostgreSQL、openGauss、oracle中的至少一种,或者可以是其它类型的应用。如此,当主计算节点上的应用为开源的MySQL、PostgreSQL、openGauss等数据库应用时,由于重做日志的回放操作,是由存储侧的数据处理装置控制完成,无需计算侧的主计算节点上的应用干预,对该应用透明,这使得在将该应用部署于主计算节点时,无需将该应用改造成具有控制存储节点回放重做日志的能力,从而可以有效降低在单机上部署应用的难度、提高应用部署效率。
在一种可能的实施方式中,第一存储节点与第二存储节点部署于不同物理区域,该第二存储节点,还用于在将目标重做日志发送至第一存储节点之前,将基线数据发送至第一存储节点,该基线数据可以是第一存储节点在某个时刻所持久化存储的数据以及所具有的重做日志;第一存储节点,还用于在回收目标重做日志之前,存储该基线数据。如此,第一存储节点与第二存储节点之间可以预先通过基线复制的过程实现数据的初始同步,以便后续根据目标重做日志保持两个存储节点之间数据的实时同步。
在一种可能的实施方式中,数据处理装置还用于在控制第二存储节点发送基线数据之前,设置第二存储节点中用于缓存重做日志的存储空间的下限值。如此,可以尽可能避免基线复制过程中所生成的部分重做日志在未及时复制至第一存储节点的情况下被回收,从而可以尽可能避免短期内第二次执行基线复制过程。
在一种可能的实施方式中,数据处理装置还用于控制第二存储节点在将目标数据对应的二进制日志(binlog)发送至第一存储节点,该二进制日志用于记录数据库语句;第一存储节点,具体用于采用二进制日志对目标重做日志进行验证,并在验证通过的情况下,回放该目标重做日志,以更新该第一存储节点中持久化存储的数据。如此,可以binlog来校验第一存储节点所要回放的重做日志的正确性,以保证数据在第一存储节点存储的正确性。
可选地,当目标重做日志未通过验证的情况下,第一存储节点可以拒绝对该目标重做日志进行回放,并可以指示从计算节点执行回放binlog的过程,以保证数据在第一存储节点存储的正确性。
可选地,二进制日志由主计算节点发送至从计算节点,并由从计算节点将所述二进制日志下发至所述从存储节点。
在一种可能的实施方式中,主计算节点与从计算节点均能够访问第一存储节点,该第一存储节点包括读缓存区域,则,第一存储节点,还用于在回放目标重做日志之前,将目标数据缓存至读缓存区域;主计算节点,还用于从读缓存区域中读取该目标数据。如此,当主计算节点需要读取该目标数据时,相比于从第一存储节点的持久化存储中读取数据的方式,从读缓存区域中直接读取该目标数据,可以有效提高数据读取效率。
在一种可能的实施方式中,第一存储节点在用于在回放目标重做日志之后,从读缓存区域中淘汰该目标数据。如此,可以释放目标数据在读缓存区域中的存储空间的占用,以便利用所释放的存储空间继续缓存主计算节点新写入的其它数据页,实现该读缓存区域的可持续性利用。
在一种可能的实施方式中,存储集群中的至少一个存储节点包括存储阵列,该存储阵列用于持久化存储数据,以此可以提高数据在存储阵列存储的可靠性。
第二方面,本申请实施例提供一种数据处理系统,该数据处理系统包括计算集群、存储集群,该计算集群与存储集群通过网络进行连接,例如可以通过有线网络或者无线网络进行连接等,该计算集群包括主计算节点以及从计算节点,该从计算节点作为主计算节点的灾备,存储集群包括第一存储节点以及第二存储节点,第二存储节点作为第一存储节点的灾备;主计算节点,用于接收访问请求,并生成目标数据对应的二进制日志,该二进制日志中记录数据库语句,并向存储集群写入数据;数据处理装置,用于监控主计算节点写入的数据,并在识别到该写入的数据包括二进制日志时,控制第一存储节点将二进制日志发送至第二存储节点,或,向第二存储节点发送所述二进制日志;从计算节点,用于回放二进制日志,以将目标数据更新至第二存储节点中持久化存储的数据,并根据所述更新后的第二存储节点中持久化存储的数据,接管主计算节点上的访问请求。
由于binlog是在存储侧完成从主中心到各个灾备中心的复制,这使得即使主中心的主计算节点的负荷较大,各个灾备中心也能通过存储节点接收到binlog,而无需由主计算节点再执行将binlog发送至各个灾备中心的过程。如此,各个灾备中心中的从计算节点通过回放该binlog,可以实现各个灾备中心与主中心之间的数据同步,保证数据处理系统的RPO能够为0,避免主计算节点因为负荷过大影响binlog在主中心与灾备中心之间的复制,而导致灾备中心的数据与主中心的数据不一致,影响数据处理系统的RPO。
第三方面,本申请实施例提供了一种数据处理方法,该方法应用于数据处理系统,数据处理系统包括计算集群、存储集群,计算集群与存储集群通过网络进行连接,计算集群包括主计算节点以及从计算节点,存储集群包括至少一个存储节点以及数据处理装置,从计算节点作为主计算节点的灾备,方法包括:数据处理装置监控主计算节点向存储集群写入的数据;数据处理装置在识别到写入的数据包括目标重做日志时,控制至少一个存储节点中的第一存储节点回放目标重做日志,以将目标重做日志中记录的目标数据更新至至少一个存储节点所持久化存储的数据中,至少一个存储节点中更新后的持久化存储的数据被从计算节点用于接管主计算节点上的访问请求。
在一种可能的实施方式中,至少一个存储节点还包括第二存储节点,第一存储节点作为第二存储节点的灾备,该方法还包括:数据处理装置在识别到写入的数据包括目标重做日志时,控制第二存储节点 将目标重做日志发送至第一存储节点,以使得第一存储节点回放目标重做日志,以更新第一存储节点中持久化存储的数据。
在一种可能的实施方式中,第二存储节点与第一存储节点部署于同一物理区域;或者,第二存储节点与第一存储节点部署于不同物理区域。
在一种可能的实施方式中,主计算节点上运行有目标应用,目标重做日志由目标应用在运行过程中产生。
在一种可能的实施方式中,主计算节点部署有多个应用,该多个应用包括上述目标应用。
在一种可能的实施方式中,数据处理装置识别到写入的数据包括目标重做日志,包括:数据处理装置根据目标应用的配置文件或目标应用的重做日志的命名格式,识别到写入的数据包括目标重做日志。
在一种可能的实施方式中,第一存储节点与第二存储节点部署于同一物理区域,该方法还包括:数据处理装置根据第二存储节点在该物理区域内创建第一存储节点,该第一存储节点中的数据通过对第二存储节点中的数据进行快照或者克隆得到。
在一种可能的实施方式中,数据处理装置控制至少一个存储节点中的第一存储节点回放目标重做日志,包括:数据处理装置控制第一存储节点根据目标应用对应的数据页的格式回放目标重做日志,以更新第一存储节点上的数据页。
在一种可能的实施方式中,该方法还包括:数据处理装置控制第一存储节点在回放目标重做日志之前,获取目标应用对应的数据页的格式。
在一种可能的实施方式中,方法还包括:数据处理装置控制第二存储节点将目标数据对应的二进制日志发送至第一存储节点,二进制日志用于记录数据库语句;数据处理装置控制至少一个存储节点中的第一存储节点回放目标重做日志,包括:数据处理装置控制第一存储节点采用二进制日志对目标重做日志进行验证,以使得第一存储节点在确定验证通过的情况下,回放目标重做日志。
在一种可能的实施方式中,主计算节点与从计算节点均能够访问第一存储节点,第一存储节点包括读缓存区域;方法还包括:数据处理装置控制第一存储节点在回放目标重做日志之前,将目标数据缓存至读缓存区域,读缓存区域中的目标数据能够被主计算节点读取。
在一种可能的实施方式中,方法还包括:数据处理装置控制第一存储节点在回放目标重做日志之后,从读缓存区域淘汰目标数据。
在一种可能的实施方式中,目标应用包括关系数据库管理系统RDBMS,RDBMS包括MySQL、PostgreSQL、openGauss、oracle中的至少一种。
在一种可能的实施方式中第一存储节点与第二存储节点部署于不同物理区域;该方法还包括:数据处理装置控制第二存储节点在将目标重做日志发送至第一存储节点之前,将基线数据发送至第一存储节点;数据处理装置控制第一存储节点在回放目标重做日志之前,存储基线数据。
在一种可能的实施方式中,该方法还包括:数据处理装置在控制第二存储节点发送基线数据之前,设置第二存储节点中用于缓存重做日志的存储空间的下限值。
在一种可能的实施方式中至少一个存储节点包括存储阵列,存储阵列用于持久化存储数据。
在一种可能的实施方式中,二进制日志由主计算节点发送至从计算节点,并由从计算节点将该二进制日志下发至第一存储节点。
由于第三方面提供的数据处理方法,对应于第一方面提供的数据处理方法,因此,第三方面以及第三方面中各实施方式所具有技术效果,可以参见相应的第一方面以及第一方面中各实施方式所具有的技术效果,在此不做赘述。
第四方面,本申请实施例提供数据处理装置应用于数据处理系统,数据处理系统包括计算集群、存储集群,计算集群与存储集群通过网络进行连接,计算集群包括主计算节点以及从计算节点,存储集群包括至少一个存储节点以及数据处理装置,从计算节点作为主计算节点的灾备,数据处理装置包括:监控模块,用于监控主计算节点向存储集群写入的数据;控制模块,用于在识别到写入的数据包括目标重做日志时,控制至少一个存储节点中的第一存储节点回放目标重做日志,以将目标重做日志中记录的目标数据更新至至少一个存储节点所持久化存储的数据中,至少一个存储节点中更新后的持久化存储的数据被从计算节点用于接管主计算节点上的访问请求。
在一种可能的实施方式中,至少一个存储节点还包括第二存储节点,第一存储节点作为第二存储节点的灾备;控制模块,还用于在识别到写入的数据包括目标重做日志时,控制第二存储节点将目标重做日志发送至第一存储节点,以使得第一存储节点回放目标重做日志,以更新第一存储节点中持久化存储的数据。
在一种可能的实施方式中,主计算节点上运行有目标应用,目标重做日志由目标应用在运行过程中产生。
在一种可能的实施方式中,控制模块,用于根据目标应用的配置文件或目标应用的重做日志的命名格式,识别到写入的数据包括目标重做日志。
在一种可能的实施方式中,控制模块,具体用于控制第一存储节点根据目标应用对应的数据页的格式回放目标重做日志,以更新第一存储节点上的数据页。
在一种可能的实施方式中,控制模块,还用于控制第一存储节点在回放目标重做日志之前,获取目标应用对应的数据页的格式。
在一种可能的实施方式中,控制模块,还用于控制第二存储节点将目标数据对应的二进制日志发送至第一存储节点,二进制日志用于记录数据库语句;控制模块,具体用于控制第一存储节点采用二进制日志对目标重做日志进行验证,以使得第一存储节点在确定验证通过的情况下,回放目标重做日志。
在一种可能的实施方式中,主计算节点与从计算节点均能够访问第一存储节点,第一存储节点包括读缓存区域;
控制模块,还用于控制第一存储节点在回放目标重做日志之前,将目标数据缓存至读缓存区域,读缓存区域中的目标数据能够被主计算节点读取。
在一种可能的实施方式中,控制模块,还用于控制第一存储节点在回放目标重做日志之后,从读缓存区域淘汰目标数据。
在一种可能的实施方式中,至少一个存储节点包括存储阵列,存储阵列用于持久化存储数据。
在一种可能的实施方式中,控制模块,还用于在控制第二存储节点发送基线数据之前,设置第二存储节点中用于缓存重做日志的存储空间的下限值。
在一种可能的实施方式中,主计算节点部署有多个应用,该多个应用包括上述目标应用。
在一种可能的实施方式中,第一存储节点与第二存储节点部署于同一物理区域,控制模块,还用于根据第二存储节点在该物理区域内创建第一存储节点,该第一存储节点中的数据通过对第二存储节点中的数据进行快照或者克隆得到。
在一种可能的实施方式中,二进制日志由主计算节点发送至从计算节点,并由从计算节点将该二进制日志下发至第一存储节点。
第五方面,本申请实施例还提供一种数据处理方法,该方法应用于数据处理系统,数据处理系统包括计算集群、存储集群,计算集群与存储集群通过网络进行连接,计算集群包括主计算节点以及从计算节点,存储集群包括至少一个存储节点以及数据处理装置,从计算节点作为主计算节点的灾备,方法包括:至少一个存储节点中的第一存储节点获取目标重做日志;第一存储节点获取该第一存储节点中的数据页的格式;第一存储节点根据该数据页的格式,回放目标重做日志,以将目标重做日志中记录的目标数据更新至至少一个存储节点所持久化存储的数据中,第一存储节点中更新后的持久化存储的数据被从计算节点用于接管主计算节点上的访问请求。
在一种可能的实施方式中,第一存储节点从配置文件中读取数据页的格式。
在一种可能的实施方式中,第一存储节点接收数据处理装置发送的数据页的格式。
可选地,数据页的格式,也可以是被预先配置于第一存储节点的代码程序中。
第六方面,本申请实施例提供一种数据处理装置,该数据处理装置应用于数据处理系统中的第一存储节点,数据处理系统包括计算集群、存储集群,计算集群与存储集群通过网络进行连接,计算集群包括主计算节点以及从计算节点,存储集群包括至少一个存储节点以及数据处理装置,从计算节点作为主计算节点的灾备,该至少一个存储节点包括该第一存储节点,数据处理装置包括:获取模块,用于获取目标重做日志,并获取该第一存储节点中的数据页的格式;回放模块,用于根据该数据页的格式,回放目标重做日志,以将目标重做日志中记录的目标数据更新至至少一个存储节点所持久化存储的数据中, 第一存储节点中更新后的持久化存储的数据被从计算节点用于接管主计算节点上的访问请求。
在一种可能的实施方式中,获取模块,用于从配置文件中读取数据页的格式。
在一种可能的实施方式中,获取模块,用于接收数据处理装置发送的数据页的格式。
可选地,数据页的格式,也可以是被预先配置于第一存储节点的代码程序中。
第七方面,本申请实施例提供一种数据处理设备,包括:处理器和存储器;该存储器用于存储指令,当该数据处理设备运行时,该处理器执行该存储器存储的该指令,以使该数据处理设备执行上述第三方面或第三方面的任一实现方式中所述的数据处理方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。计算设备还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第八方面,本申请实施例提供一种芯片,包括供电电路以及处理电路,所述供电电路用于对所述处理电路进行供电,所述处理电路执行如上述第三方面或第三方面的任一实现方式中所述的数据处理方法。
第九方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有程序或指令,当其在计算机上运行时,使得上述第三方面或第三方面的任一实现方式中第一设备所述的数据处理方法被执行。
第十方面,本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第三方面或第三方面的任一实现方式中所述的数据处理方法。
另外,第三方面至十方面中任一种实现方式所带来的技术效果可参见第一方面以及第二方面中不同实现方式所带来的技术效果,此处不再赘述。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1为一数据处理系统的结构示意图;
图2为本申请实施例提供的一示例性数据处理系统的结构示意图;
图3为本申请实施例提供的另一示例性数据处理系统的结构示意图;
图4为本申请实施例提供的又一示例性数据处理系统的结构示意图;
图5为存储节点1021与存储节点1022可以部署于同一物理区域的示意图;
图6为存储节点1021与存储节点1022可以部署于不同物理区域的示意图;
图7为一示例性两个存储节点之间执行基线复制以及日志复制的示意图;
图8为又一示例性两个存储节点之间执行基线复制以及日志复制的示意图;
图9为本申请实施例提供的一种数据处理方法的流程示意图;
图10为本申请实施例提供的再一示例性数据处理系统的结构示意图;
图11为本申请实施例提供的一种数据处理装置的结构示意图;
图12为本申请实施例提供的一种数据处理设备的硬件结构示意图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面将结合附图对本申请实施例中的各种非限定性实施方式进行示例性说明。显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,基于上述内容所获得的所有其它实施例,都属于本申请保护的范围。
参见图1,为一示例性数据处理系统100的结构示意图,数据处理系统100可以采用存算分离架构。如图1所示,数据处理系统100中包括计算集群101、存储集群102,并且,计算集群101与存储集群102之间可以通过网络进行通信,例如可以通过有线网络或者无线网络进行通信等。
其中,计算集群101包括多个计算节点,不同计算节点之间可以相互通信,并且,每个计算节点可以是一种包括处理器的计算设备,如服务器、台式计算机等。在计算集群101中,部分计算节点可以作为另一部分计算节点的灾备。为便于说明,图1中以计算集群101包括主计算节点1011以及从计算节点1012为例进行示例性说明,并且,从计算节点1012作为主计算节点1011的灾备。示例性地,从计算节点 1012具体可以作为主计算节点1011的冷备,即在主计算节点1011运行时,从计算节点1012可以不运行,或者,从计算节点1012可以利用其上的计算资源处理其它业务,而当主计算节点1011发生故障时,从计算节点1012运行/收回计算资源,利用已备份的数据接管主计算节点1011上的业务。
存储集群102可以包括一个或者多个存储节点,每个存储节点可以是包括持久化存储介质的设备,如网络附属存储器(network attached storage,NAS)、存储服务器等,可用于对数据进行持久化存储。其中,存储节点中的持久化存储介质,例如可以是硬盘,如固态硬盘或者叠瓦式磁记录硬盘等。当存储集群102包括多个存储节点时,部分存储节点可以作为另一部分存储节点的灾备。为便于说明,图1中以存储集群102包括存储节点1021以及存储节点1022为例,并且,存储节点1022作为存储节点1021的灾备。主计算节点1011利用存储节点1021提供数据读写服务,并且在主计算节点1011发生故障后,从计算节点1012利用存储节点1022上备份的数据接管主计算节点1011上的业务。实际应用场景中,主计算节点1011与存储节点1021可以构成主中心(通常属于生产站点),从计算节点1012与存储节点1022可以构成灾备中心(通常属于容灾站点)。或者,存储集群102也可以包括一个存储节点,此时,主计算节点1011与从计算节点1012可以共享该存储节点,以便在主计算节点1011发生故障后,从计算节点1012可以利用该存储节点中存储的数据继续提供数据读写服务。
主计算节点1011上可以部署有一个或者多个应用(图1中未示出),所部署的应用例如可以是数据库应用或者其它应用等。示例性地,数据库应用例如可以是关系数据库管理系统(relational database management system,RDBMS)等,该RDBMS可以包括MySQL、PostgreSQL、openGauss、oracle中的至少一种,或者可以是其它类型的数据库系统等。在应用运行的过程中,主计算节点1011通常会接收到用户侧的客户端或者其它设备发送的访问请求,如接收到用户侧的客户端发送的用于读取或者修改存储节点1021中数据的访问请求等。此时,主计算节点1011上的应用可以响应该访问请求,为客户端或者其它设备提供相应的数据读写服务。其中,当主计算节点1011所接收到的访问请求,用于请求向数据处理系统100写入新数据,或者用于请求对数据处理系统100中已持久化存储的数据进行修改,又或者请求删除数据处理系统100中已持久化存储的数据时,主计算节点1011上的应用会生成二进制日志(binlog),该binlog为逻辑日志,用于记录更新存储节点1021中持久化存储的数据的数据库语句,如SQL语句等。实际场景中,该应用可以包括服务层以及存储引擎层,并且可以由服务层生成binlog。然后,主计算节点1011会将生成的binlog发送至从计算节点1012,并由从计算节点1012通过执行该binlog中的数据库语句来更新存储节点1022中的数据,以此使得存储节点1022中的数据与存储节点1021中的数据保持一致,也即实现将存储节点1021中的数据复制至存储节点1022中。
在主计算节点1011发生故障后,从计算节点1012需要运行/回收计算资源,并利用该计算资源启动运行从计算节点1012上的应用。然后,从计算节点1012上的应用执行主计算节点1011在故障之前发送的binlog中所记录的数据库语句,以使得存储节点1022中的数据与存储节点1021在故障之前的数据保持一致。这样,从计算节点1012能够根据存储节点1022中存储的数据,接管主计算节点1011上未被完成的访问请求。
但是,从计算节点1012上的应用执行binlog中的数据库语句的过程所需的耗时通常较长,这会拖慢接管主计算节点1011上的访问请求的速度,从而导致数据处理系统100的RTO较长,影响数据处理系统100的故障恢复性能。其中,RTO是指灾难发生后,从数据处理系统100业务停顿之刻开始,到数据处理系统100恢复业务结束,这两个时刻之间的时间间隔。另外,从计算节点1012上的应用因为需要重新执行binlog中的数据库语句来同步存储节点1021与存储节点1022中存储的数据,这也需要消耗较多的资源。
基于此,本申请提供了一种数据处理系统200,在图1所示的数据处理系统100的基础上,在存储侧增设了数据处理装置201,该数据处理装置201例如可以部署于存储节点1021,用于控制将存储节点1021中的数据复制至存储节点1022中。
具体地,在主计算节点1011上的应用响应访问请求的过程中,不仅该应用的服务层可以针对数据的写入、修改或者删除生成binlog,而且,该应用的存储引擎层还能够生成相应的重做日志(redolog),该重做日志属于物理日志,能够记录客户端或者其它设备所请求写入的新数据或者请求修改后的数据或者请求对数据进行删除。这样,数据处理装置201监控主计算节点1011向存储集群102写入的数据,所写入的数据可能包括重做日志,也可能包括发生更新的数据页(data page)或者其它类型的数据等,从而 数据处理装置201在识别到主计算节点1011所写入的数据包括重做日志时,控制存储节点1021将该重做日志发送至存储节点1022中,以使得存储节点1022通过回放该重做日志,将写入的新数据或者修改后的数据更新至存储节点1022所持久化存储的数据中,或者对存储节点1022所持久化存储的部分数据进行删除。并且,存储节点1021也可以通过回放该重做日志,实现对自身持久化存储的数据进行更新,以此实现将存储节点1021与存储节点1022中持久化存储的数据保持一致。
如此,在从计算节点1012升级为主计算节点时(如原主计算节点1011发生故障或者从计算节点1012接收到主从切换的升级指令等),由于位于存储侧的数据处理装置201已经控制存储节点1022通过回放重做日志来更新持久化存储的数据,这使得从计算节点1012无需执行回放重做日志的过程,而能够直接根据存储节点1022中持久化存储的数据接管主计算节点1011上的访问请求,以继续为客户端或者其它设备提供数据读写服务,从而可以有效缩短数据处理系统200的RTO,提高数据处理系统200的故障恢复性能。
并且,存储节点1022根据重做日志(其属于物理日志)对持久化存储的数据进行更新,相比于通过binlog(其属于逻辑日志)更新数据的方式,因为无需重复执行数据库语句,而是直接在物理的数据页上修改数据,这可以有效降低数据更新所需消耗的资源量。其中,利用重做日志中记录的数据更新存储节点中的数据页的过程,可以称之为日志加固(log consolidation)。
另外,当主计算节点1011上的应用为开源的MySQL、PostgreSQL、openGauss等数据库应用时,由于重做日志的回放操作,是由存储侧的数据处理装置201控制完成,无需计算侧的主计算节点1011上的应用干预,对该应用透明,这使得在将该应用部署于主计算节点1011时,无需将该应用改造成具有控制存储节点1021回放重做日志的能力,从而可以有效降低在单机上部署应用的难度、提高应用部署效率。
示例性地,上述部署于存储侧的数据处理装置201可以通过软件实现,如可以是部署于硬件设备上的程序代码等。实际应用时,数据处理装置201例如可以是作为插件、组件或者应用等软件形式部署于存储节点1021中(例如,部署在存储节点1021的控制器中)。或者,上述数据处理装置201可以通过物理设备实现,其中,该物理设备例如可以是CPU,或者可以是专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device,PLD)、复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)、片上系统(system on chip,SoC)、软件定义架构(software-defined infrastructure,SDI)芯片、人工智能(artificial intelligence,AI)芯片、数据处理单元(data processing unit,DPU)等任意一种处理器或其任意组合。
值得注意的是,图2所示的数据处理系统200仅作为一种示例性说明,实际应用时,数据处理系统200也可以采用其它方式实现。为便于理解,本实施例提供了以下几种实现示例。
在第一种实现示例中,数据处理装置201也可以部署于存储节点1021外部,如在存储节点1022中进行部署,或者,数据处理装置201可以独立于存储节点1021以及存储节点1022进行部署等。
在第二种实现示例中,主计算节点1011与从计算节点1012可以共享同一存储节点301,并且,数据处理装置201部署于该存储节点301中,如图3所示。此时,主计算节点1011与从计算节点1012均可以访问存储节点301,从而在主计算节点1011故障的情况下,从计算节点1012可以根据存储节点301中的数据接管主计算节点1011上的业务。由于从计算节点1012在接管主计算节点1011上的业务时,位于存储侧的数据处理装置201已经控制存储节点301根据重做日志完成数据更新的过程,这使得从计算节点1012能够直接根据存储节点301中持久化存储的数据继续提供数据读写服务,以此可以有效缩短数据处理系统200的RTO,从而提高数据处理系统200的故障恢复性能。
实际应用时,存储节点301中还可以配置有读缓存区域202,示例性地,读缓存区域202可以通过动态随机存取内存(dynamic random access memory,DRAM)等存储介质实现。读缓存区域202,用于对主计算节点1011写入的数据页进行缓存。示例性地,读缓存区域202中所缓存的数据页,例如可以是主计算节点1011生成的新的数据页。比如,主计算节点1011在完成对存储节点1021中的部分数据页的修改后,可以将修改后的数据页写入存储节点1021等。此时,读缓存区域202可以缓存该数据页。这样,当主计算节点1011需要读取该数据页时,可以直接从存储节点1021中的读缓存区域202中读取数据,相比于从存储节点1021的持久化存储介质(如硬盘等)中读取数据的方式,可以有效提高主计算节点1011读 取数据的效率,降低数据访问时延。进一步地,当该数据页对应的重做日志完成回放,或者,存储节点1021完成对于该数据页的持久化存储,或者读缓存区域202的剩余存储空间少于阈值时,可以将该数据从读缓存区域202中进行淘汰,以释放该数据页所占用的缓存空间,从而支持读缓存区域202缓存主计算节点1011新写入的数据页,本实施例对此并不进行限定。
在第三种实现示例中,数据处理系统200中的计算集群以及存储集群中,均可以包括3个或者3个以上的节点,如图4所示。具体地,计算集群包括多个计算节点410,各个计算节点410之间可以相互通信,并且部分计算节点410可以作为另一部分计算节点401的灾备。每个计算节点410是一种包括处理器的计算设备,如服务器、台式计算机等。在硬件上,如图4所示,计算节点410至少包括处理器412、内存413、网卡414和存储介质415。其中,处理器412是一个中央处理器(central processing unit,CPU),用于处理来自计算节点410外部的数据访问请求,或者计算节点410内部生成的请求。处理器412从内存413中读取数据,或者,当内存413中的数据总量达到一定阈值时,处理器412将内存413中存储的数据发送给存储节点400进行持久化存储。图4中仅示出了一个CPU 412,在实际应用中,CPU 412的数量往往有多个,其中,一个CPU 412又具有一个或多个CPU核。本实施例不对CPU的数量、CPU核的数量进行限定。
内存413是指与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为操作系统或其他正在运行中的程序的临时数据存储器。内存包括至少两种存储器,例如内存既可以是随机存取存储器,也可以是只读存储器(Read Only Memory,ROM)。实际应用中,计算节点410中可配置多个内存413,以及不同类型的内存413。本实施例不对内存413的数量和类型进行限定。
网卡414用于与存储节点400通信。例如,当内存413中的数据总量达到一定阈值时,计算节点410可通过网卡414向存储节点400发送请求以对所述数据进行持久化存储。另外,计算节点410还可以包括总线,用于计算节点410内部各组件之间的通信。在实际实现中,计算节点410也可以内置少量的硬盘,或者外接少量硬盘。
每个计算节点410可通过网络访问存储集群中的存储节点400。存储集群包括多个存储节点400,并且,部分存储节点400可以作为另一部分存储节点400的灾备。一个存储节点400包括一个或多个控制器401、网卡404与多个硬盘405。网卡404用于与计算节点410通信。硬盘405用于持久化存储数据,可以是磁盘或者其他类型的存储介质,例如固态硬盘或者叠瓦式磁记录硬盘等。控制器401用于根据计算节点410发送的读/写数据请求,往硬盘405中写入数据或者从硬盘405中读取数据。在读写数据的过程中,控制器401需要将读/写数据请求中携带的地址转换为硬盘能够识别的地址。,并且,部分控制器401还可以用于实现上述数据处理装置201的功能,以实现该存储节点400与其它存储节点400之间的数据同步。
为便于理解与说明,下面基于图2所示的数据处理系统200,对数据处理装置201控制存储节点1021与存储节点1022之间进行数据同步的过程进行详细介绍。
通常情况下,主计算节点1011上运行有一个或者多个应用(如MySQL等),为便于理解,下面以主计算节点1011上运行目标应用为例进行示例性说明,该目标应用在运行时,能够支持主计算节点1011为用户提供数据读写服务。以用户请求修改数据为例,在主计算节点1011接收到用户通过客户端发送的用于修改数据的访问请求后,目标应用可以先将该访问请求所请求修改的数据所在的数据页从存储节点1021读取至主计算节点1011中的缓冲池(buffer pool),并根据该访问请求完成对该缓冲池中数据页的修改。此时,目标应用会为针对该数据页的修改内容分别生成binlog以及目标重做日志(redolog),该binlog用于记录针对该数据页进行修改的数据库语句,该目标重做日志用于记录存储节点1021中该数据页上部分数据被修改为新数据(新数据可以为空,此时是对该数据页上的部分数据进行删除),为便于区分和描述,以下将该新数据称之为目标数据。示例性地,目标应用所生成的目标重做日志可以是一组命名为ib_logfile的文件,例如可以是分别命名为ib_logfile0和ib_logfile1的一组文件等。
在完成对于缓冲池中数据页的修改并生成目标重做日志后,主计算节点1011可以向客户端反馈数据写入/修改成功。由于将数据写入缓冲池的速度通常高于对数据进行持久化存储的速度,如此,可以加快主计算节点1011响应访问请求。并且,目标应用可以向存储集群102写入数据,所写入的数据可以包括重做日志、修改后的数据页等。
数据处理装置201在存储侧可以监控主计算节点1011向存储集群102写入的数据,并识别所写入的数 据中是否包括目标应用生成的目标重做日志(主计算节点1011所写入的数据也可能是针对存储节点1021的配置文件等其它类型的数据)。例如,数据处理装置201可以根据重做日志的命名格式识别出目标重做日志(例如,预先了解目标应用生成的重做日志的命名规则,并根据该命名规则来识别目标重做日志),或者,数据处理装置201可以根据目标应用的配置文件识别出目标重做日志(该配置文件中可以记录有用于区分该目标应用所生成的目标重做日志的信息,如目标重做日志的文件名称:ib_logfile0和ib_logfile1等,并根据该信息识别目标重做日志),并在存储侧控制存储节点1021回放该目标重做日志,以将目标重做日志中记录的目标数据更新至存储节点1021所持久化存储的数据中,从而提高数据存储的可靠性。实际应用时,上述根据目标重做日志更新持久化存储的数据的技术,可以称之为页面物化(page materialization)技术,或者称之为日志即数据(log is data)技术。
具体实现时,数据处理装置201可以识别存储节点1021的数据页的格式。比如,数据处理装置1021可以通过识别主计算节点1011所写入的数据中的数据页,确定数据页的格式;或者,数据处理装置201可以从存储节点1021中的配置文件中获取数据页的格式等。然后,数据处理装置201可以将数据页的格式通知给存储节点1021。这样,存储节点1021在回放该目标重做日志时,可以根据数据页的格式,恢复存储节点1021中目标数据所在的数据页,并根据该目标重做日志中所记录的修改操作,将该数据页上的相应数据修改为目标数据,并对修改后的数据页进行持久化存储。实际应用时,当目标应用具体为MySQL时,存储节点1021中可以部署有文件系统(file system,FS),从而数据处理装置201可以指示存储节点1021中的FS回放目标重做日志,以将重做日志所记录的目标数据更新至存储节点1021持久化存储的数据中。
如此,可以实现将根据重做日志更新存储节点1021中持久化存储的数据的功能,由计算侧(原先由主计算节点1011根据binlog实现)卸载至存储侧(现在由数据处理装置201以及存储节点1021实现该功能),无需主计算节点1011感知和执行回放目标重做日志的操作。
从计算节点1012,部署有与主计算节点1011相同的目标应用,并作为主计算节点1011的灾备,具体可以是热备或者冷备。其中,从计算节点1012作为热备时,从计算节点1012与主计算节点1011持续处于运行状态;这样,当主计算节点1011发生故障时,从计算节点1012可以利用存储节点1022所存储的数据立即接管主计算节点1011上的业务,具体是处理主计算节点1011在故障时未处理完成的访问请求。从计算节点1012作为冷备时,在主计算节点1011正常运行期间,从计算节点1012可以不运行(如处于休眠状态等),或者从计算节点1012可以释放其上的计算资源并利用所释放的计算资源处理其它业务,如离线计算业务等。当主计算节点1011发生故障时,从计算节点1012启动运行/收回计算资源,并利用存储节点1022所存储的数据实现接管主计算节点1011上的业务,存储节点1022作为存储节点1021的灾备,用于对存储节点1021中持久化存储的数据进行备份。实际应用时,主计算节点1011可以具有多个作为灾备节点的从计算节点,从而部分从计算节点可以作为主计算节点1011的冷备,而另一部分从计算节点可以作为主计算节点1011的热备等。
本实施例中,数据处理装置201不仅控制存储节点1021回放目标重做日志更新数据,还会控制存储节点1022回放该目标重做日志,以对存储节点1022中持久化存储的数据进行数据更新,从而实现存储节点1021与存储节点1022之间的数据保持一致。
具体实现时,数据处理装置201可以控制存储节点1021,通过存储节点1021中的网卡或者通信接口将生成的目标重做日志发送给存储节点1022,并指示存储节点1022根据接收到的目标重做日志对自身持久化存储的数据进行更新,也可以由存储节点1022根据接收到的目标重做日志自动对持久化存储的数据进行更新。或者,数据处理装置201可以与存储节点1022建立有线或者无线连接,从而数据处理装置201可以通过与存储节点1022之间的连接,将目标重做日志发送给存储节点1022。然后,数据处理装置201再指示存储节点1022,也可以由存储节点1022自动根据接收到的目标重做日志对自身持久化存储的数据进行更新。实际应用场景中,存储节点1022在接收到目标重做日志后,可以立即执行针对该目标重做日志的回放过程,以此可以使得存储节点1022与存储节点1021之间能够实时保持数据一致。
作为一种实现示例,数据处理装置201(或者控制存储节点1021)在将目标重做日志发送至存储节点1022时,可以将目标重做日志中相对于上一次发送的重做日志的新增日志记录发送给存储节点1022。比如,数据处理装置201(或者控制存储节点1021)在上一次发送重做日志之前,可以记录该重做日志 中最新的日志记录所在的位置,该位置能够用于标识已经发送给存储节点1022的日志记录。这样,当需要将目标重做日志发送至存储节点1022时,由于日志记录通常顺序保存在重做日志的文件中,因此,数据处理装置201可以根据所记录的位置,确定目标重做日志中尚未发送给存储节点1022的新增的日志记录,并进一步将该部分新增的日志记录发送至存储节点1022。相应地,数据处理装置201可以回放所接收到的日志记录,实现对持久化存储的数据更新。
在另一种实现示例中,数据处理装置201(或者控制存储节点1021)可以将目标重做日志的整个日志文件发送给存储节点1022。存储节点1022在接收到目标重做日志后,可以查找当前已回放的日志记录在重做日志的文件中的位置,从而可以根据该位置确定目标重做日志中尚未完成回放的日志记录,并从该位置开始,按照顺序逐个回放该目标重做日志中新增的各条日志记录。
实际应用时,重做日志中每更新一条日志记录,数据处理装置201(或者控制存储节点1021)可以将该条日志记录或者重做日志的整个日志文件发送给存储节点1022。或者,数据处理装置201在监控到重做日志发生更新(如重做日志的LSN发生变化等)时,可以判断该重做日志中的发生更新的日志记录的数量是否达到预设数量(如10条等),并且,当更新的日志记录的数量达到预设数量时,确定将该新增的日志记录或者重做日志的整个日志文件发送给存储节点1022,以此可以减少存储节点1021与存储节点1022之间同步重做日志的输入输出(input/output,IO)。又或者,数据处理装置201在监控到重做日志发生更新时,可以判断该重做日志中的发生更新的日志记录所占用的存储空间达到预设阈值,并且,当占用的存储空间达到预设阈值时,确定将该新增的日志记录或者重做日志的整个日志文件发送给存储节点1022。再或者,数据处理装置201在监控到重做日志发生更新时可以启动计时,并且,当计时时长达到预设时长后,可以将重做日志中在计时时间段内新增的所有日志记录发送给存储节点1022,或者直接将该重做日志的整个日志文件发送给存储节点1022等。值得注意的是,上述各种触发数据处理装置201发送日志记录/重做日志的日志文件的实现方式仅作为示例性说明,在其它实施例中,还可以对上述各示例进行组合/变形,又或者可以是采用其它方式实现。并且,无论数据处理装置201发送的是日志记录,还是整个日志文件,均属于将重做日志发送至存储节点1022的理解范畴之内。
其中,存储节点1022在回放目标重做日志之前,可以获取存储节点1022中的数据页的格式。这样,存储节点1022可以根据数据页的格式,恢复存储节点1022中需要被修改的数据页,并根据该目标重做日志中所记录的修改操作,将该数据页上的相应数据修改为目标数据。然后,存储节点1022再对修改后的数据页进行持久化存储。
在一种示例中,数据页的格式,可以由技术人员预先配置于存储节点1022中,如静态配置于存储节点1022的代码程序中,或者记录于存储节点1022中的配置文件,从而存储节点1022可以从该配置文件中读取该数据页的格式。实际应用时,配置文件中可以记录有多种数据页的格式,不同数据页的格式对应于不同的应用,或对应于同一应用的不同版本,从而存储节点1022可以根据当前在主计算节点1011或者从计算节点1012运行的目标应用,从该配置文件中查询得到该目标应用所对应的数据页格式。在另一种示例中,数据处理装置201可以通过识别主计算节点1011所写入的数据中的数据页,确定数据页的格式,并将数据页的格式通知给存储节点1022,以便存储节点1022获知数据页的格式。
当主计算节点1011发生故障并由从计算节点1012接管业务时,由于数据处理装置201已经在存储侧控制存储节点1022完成目标重做日志的回放,使得存储节点1021与存储节点1022之间的数据保持一致,因此,从计算节点1012可以直接基于存储节点1022中持久化存储的数据继续为用户处理主计算节点1011未完成的访问请求,无需执行回放目标重做日志以将存储节点1022的数据更新为最新状态的过程,这可以有效降低数据处理系统200恢复业务的时延,也即可以有效降低数据处理系统200的RTO。并且,当数据处理装置201部署于存储节点1021时,数据处理装置201可以控制存储节点1021与至少一个作为其灾备的其它存储节点回放目标重做日志,这使得单个存储节点可以具备集群的管理能力(该集群即为多个存储节点构成的集群)。
在进一步可能的实施方式中,存储节点1021与存储节点1022在回放目标重做日志之前,还可以利用主计算节点1011生成的binlog校验目标重做日志的正确性。具体地,实际应用场景中,存储节点1021可能会因为程序运行错误或者其它原因,生成错误的重做日志,如对于主计算节点1011中已经完成回滚的binlog,存储节点1021仍然针对该binlog所指示修改的数据生成重做日志,或者,存储节点1021中因为部 分重做日志生成失败而存在重做日志的缺漏等。为此,数据处理装置201还可以识别以及获取主计算节点1011中目标数据对应的binlog,并将binlog与存储节点1021中的目标重做日志一并发送至存储节点1022。或者,数据处理装置201可以通过与存储节点1022之间的连接,将该binlog发送给存储节点1022。又或者,binlog可以由主计算节点1011将binlog发送至从计算节点1012,再由从计算节点1012下发给存储节点1022等。然后,存储节点1022可以利用接收到的binlog对目标重做日志进行验证,如验证binlog与目标重做日志中所记录的已提交事务的标识是否一致等。并且,在目标重做日志通过验证的情况下(如binlog与目标重做日志中所记录的已提交事务的标识一致等),存储节点1022回放目标重做日志,以更新存储节点1022中持久化存储的数据。当目标重做日志未通过验证时,存储节点1022可以向数据处理装置201反馈验证失败的结果。数据处理装置201可以指示从计算节点1012通过执行回放该binlog的操作,来更新存储节点1022中持久化存储的数据,以此保证存储节点1022中存储的数据的正确性。
而且,数据处理装置201还可以指示存储节点1021利用验binlog对目标重做日志进行验证,并在目标重做日志通过验证的情况下,通过回放目标重做日志来更新存储节点1021中持久化存储的数据。而当目标重做日志未通过验证时,数据处理装置201或者存储节点1021可以指示主计算节点1011通过执行回放该binlog的操作,来更新存储节点1021中持久化存储的数据,以此保证存储节点1021中存储的数据的正确性。
其中,数据在存储节点1021与存储节点1022中,可以是以文件的格式进行持久化存储,此时,存储节点1021与存储节点1022中可以分别部署有相应的文件系统(FS),该FS用于对持久化存储的文件进行管理,如由FS执行回放目标重做日志以更新持久化存储的数据的过程。或者,数据在存储节点1021与存储节点1022中,也可以是以数据块格式(data block format)进行持久化存储,即存储节点1021与存储节点1022在存储数据时,将数据按照固定大小的尺寸进行分块,每个分块的数据量例如可以是512字节或者4千字节(KB)等。或者,数据在存储节点1021与存储节点1022中,也可以是以对象(object)格式进行存储,此时,对象可以是存储节点存储数据的基本单位,每个对象可以包括数据和该数据的属性的综合体,其中,数据的属性可以根据计算节点中应用的需求进行设置,包括数据分布、服务质量等。本实施例中对于数据的存储格式并不进行限定。
实际应用时,数据处理系统中的存储节点1021与存储节点1022的部署方式,可以存在多种实现方式。
在第一种实现方式中,存储节点1021与存储节点1022可以部署于同一物理区域,如图5所示。例如,存储节点1021与存储节点1022可以部署于同一数据中心,或者部署于同一可用区(availability zones,AZ)等。此时,通过在同一物理区域内为数据创建多个副本,可以有效提高数据在本地存储的可靠性。
示例性地,数据处理装置201可以将存储节点1021中的存储卷作为主卷,在逻辑上创建从卷(即slave卷),并将该从卷分配给存储节点1022;然后,数据处理装置201可以记录当前时刻(假设为时刻t)的重做日志的日志序列号(log sequence number,LSN)的值,并控制存储节点1021将主卷在该当前时刻所持久化存储的数据复制至从卷,将存储节点1021中LSN不大于该值的重做日志复制至存储节点1022。如此,存储节点1021以及存储节点1022通过分别对重做日志进行回放,可以实现存储节点1021以及存储节点1022之间在时刻t的数据同步。当存储节点1021基于新写入的数据生成新的重做日志时,数据处理装置201通过将该新的重做日志复制至存储节点1022,以此保证两个存储节点之间的数据一致性。实际应用时,数据处理装置201除了以存储卷为对象创建副本之外,还可以基于其它存储对象创建副本,本实施例对此并不进行限定。
其中,主卷中的数据,可以通过数据克隆的方式,复制至从卷中,此时,从卷也可以称之为克隆卷。或者,主卷中的数据,可以通过快照的方式复制至从卷中,如此,在实现数据备份的基础上,不仅能够提高数据备份效率,而且,也能降低数据备份所需的开销、减少副本的存储空间。
进一步地,数据处理系统200还可以通过级联克隆的方式,支持对数据处理系统200某一时刻的数据进行读写。比如,当需要基于数据处理系统200在时刻T的数据执行相应的分析或者测试业务时,数据处理系统200可以创建新的存储节点1023,并对存储节点1022的从卷中的数据进行数据克隆,生成克隆卷,如图5所示。然后,数据处理装置201可以将该克隆卷分配给新创建的存储节点1023,并将时刻T之前所生成的重做日志复制至该新创建的存储节点1023中。如此,数据处理系统200可以基于该新创建的存储节点中克隆卷的数据,支持用户利用计算节点1013对分析/测试业务中的数据读写。实际应用时,数据处 理装置201可以创建复制组,该复制组中包括主卷以及至少一个从卷,并且,从卷中的数据可以通过对主卷进行复制或者快照的方式得到。并且,数据处理装置201能够用于处理该复制组内的不同存储卷之间的数据同步,这使得计算侧的主计算节点1011可以无需对复制组进行维护,从而可以简化主计算节点1011的数据处理逻辑、有助于提高主计算节点1011提供业务服务的性能。
相应地,从计算节点1012可以实时或者周期性的检测主计算节点1011是否发生故障,如从计算节点1012可以在未接收到主计算节点1011发送的心跳消息时确定主计算节点1011发生故障等。当确定主计算节点1011故障时,从计算节点1012升级为主计算节点,并指示存储节点1022升级为主存储节点,此时,从计算节点1012直接基于存储节点1022中的数据继续接管主计算节点1011上的业务。
对于用户通过客户端新写入的数据,从计算节点1012可以按照前述方式写入从计算节点1012中的缓冲区,并生成重做日志,并将其写入存储节点1022(当前已升级为主存储节点)。然后,数据处理装置201可以指示存储节点1022对该重做日志进行回放,以实现对持久化存储的数据进行更新。并且,数据处理装置201还可以控制将该重做日志复制至存储节点1021,以便存储节点1021通过回放该重做日志,保持与存储节点1022之间的数据同步。实际应用时,主计算节点1011在故障恢复后,可以作为从计算节点1012的灾备,即从计算节点1012保持为主节点,主计算节点1011作为从节点。或者,主计算节点1011可以触发主从切换过程,以便将主计算节点1011再次恢复为主节点、将从计算节点1012再次恢复为从节点等。
在第二种实现方式中,存储节点1021与存储节点1022可以部署于不同的物理区域,如图6所示的物理区域1以及物理区域2。例如,存储节点1021部署于数据中心A,存储节点1022部署于数据中心B;或者,存储节点1021部署于AZ1,存储节点1022部署于AZ2等。如此,可以实现跨数据中心或者跨AZ容灾,以此提高数据在异地存储的可靠性。
示例性地,数据处理装置201可以将存储节点1021中的存储卷作为主卷,并在远端的存储节点1022中创建逻辑上的从卷。然后,在将存储节点1021中的数据复制至存储节点1022时,数据处理装置201可以先记录当前时刻的重做日志的LSN的值,并控制存储节点1021将主卷上的数据、以及存储节点1021中LSN不大于该值的重做日志远程发送至存储节点1022,以此实现存储节点1021与存储节点1022之间在时刻t的数据同步。对于存储节点1021在时刻t之后所发生更新的数据,数据处理装置201可以控制存储节点1021将时刻t之后所生成的重做日志远程发送给存储节点1022,以此保证两个存储节点之间的数据一致性。在该示例中,是以数据处理装置201基于存储卷创建副本为例进行说明,在其它实施方式中,数据处理装置201也可以基于其它类型的存储对象创建副本,本实施例对此并不进行限定。实际应用时,数据处理装置201可以创建复制组,该复制组中包括主卷以及位于其它物理区域的至少一个从卷,并且,从卷中的数据可以通过对主卷进行复制的方式得到。并且,数据处理装置201能够用于处理该复制组内的不同存储卷之间的数据同步,这使得计算侧的主计算节点1011可以无需对复制组进行维护。
相应地,从计算节点1012可以实时或者周期性的检测主计算节点1011发生故障,并且,当确定主计算节点1011故障时,从计算节点1012升级为主计算节点,存储节点1022升级为主存储节点,并且,从计算节点1012直接基于存储节点1022中的数据继续接管主计算节点1011上的业务。相应地,从计算节点1012对于用户通过客户端新写入的数据,可以按照前述方式写入缓冲区,并生成重做日志,然后,数据处理装置201可以通过控制存储节点1022将新生成的重做日志复制至存储节点1021中,以实现两个存储节点之间的数据同步。主计算节点1011在故障恢复后,可以作为从计算节点1012的灾备;或者,主计算节点1011可以通过主从切换,将主计算节点1011再次恢复为主节点等。
上述两种实现方式,仅作为一些示例性说明。比如,在其它可能的实施方式中,存储节点1021可以存在多个灾备节点,此时,该多个灾备节点中的部分节点可以与存储节点1021部署于同一物理区域,该多个灾备节点中的另一部分节点可以与存储节点1021部署于不同的物理区域等,以此可以同时提高数据在本地存储以及异地存储的可靠性。
实际应用时,存储节点1021(以及存储节点1022)可以基于存储阵列(memory array)持久化存储数据,并且,可以在该存储阵列上基于独立磁盘冗余阵列(Redundant Arrays of Independent Disks,RAID)技术、纠删码(erasure coding,EC)技术、重删压缩、数据备份等技术,进一步提高数据在存储节点1021(以及存储节点1022)中持久化存储的可靠性。
在进一步可能的实施方式中,当存储节点1021与存储节点1022部署于不同的物理区域时,存储节点1021与存储节点1022之间可以基于多种数据复制方式保持两个存储节点的数据一致性。其中,下面对存储节点1021与存储节点1022之间数据复制过程进行详细介绍。
作为第一种实现示例,数据处理装置201可以先判断是否存储节点1021与存储节点1022之间是否进行基线(baseline)复制。其中,基线复制,是指将存储节点1021在某个时间点(如当前时刻)所持久化存储的数据以及所生成的日志全部发送至存储节点1022。比如,当存储节点1021与存储节点1022之间第一次执行数据复制操作时,数据处理装置201可以确定进行基线复制。或者,当存储节点1021中存在部分重做日志未被及时发送至存储节点1022而在存储节点1021中被删除,如该部分重做日志在存储节点1021中的生命周期超出预设时长而被删除,或者因为存储节点1021中用于存储重做日志的可用区域不足而删除部分最先生成的重做日志等,此时,存储节点1022无法获得该部分被删除的重做日志来保持与存储节点1021之间的数据同步,因此,数据处理装置201可以确定对存储节点1021与存储节点1022进行基线复制,以便基于该基线同步两个存储节点中的数据。
在确定采用基线复制时,数据处理装置201可以将第一时刻(如当前时刻)确定为基线对应的时刻,并基于该时刻确定基线数据。其中,基线数据包括当前存储节点1021中所持久化存储的数据以及该时刻所最新生成的重做日志。在确定基线数据后,数据处理装置201可以记录该基线数据中的重做日志的LSN,以下将该LSN称之为基线LSN,并控制存储节点1021将该基线数据远程发送至存储节点1022,如图7所示,以便存储节点1022存储该基线数据,具体可以是持久化存储该基线数据中的多个数据页上的数据,并根据该基线数据中的重做日志对当前持久化存储的数据进行更新。可以理解,由于存储节点1021远程发送基线数据的过程中(也即基线复制过程中)以及基线复制完成之后,用户可能会通过客户端向存储节点1021写入新的数据,从而基于该新写入的数据生成新的重做日志,如上述目标重做日志等。因此,数据处理装置201可以指示存储节点1021将新生成的重做日志(也即LSN大于该基线LSN的重做日志)写入缓存区域701中进行临时存储。
实际应用时,由于缓存区域701的存储空间有限,存储节点1021可能会在缓存区域701的剩余存储空间小于阈值时,回收缓存区域701中已缓存的重做日志。为避免基线复制过程中所生成的部分重做日志在未及时复制至存储节点1022的情况下被回收,数据处理装置201可以设置存储节点1021中缓存区域701的存储空间的下限值,也即设置存储节点1021所创建的缓存区域701的存储空间不小于该下限值。在基线复制过程中,数据处理装置201可以指示存储节点1021与存储节点1022对基线数据中的重做日志进行回放,以使得存储节点1021与存储节点1022中持久化存储的数据均与该基线保持一致。
在完成基线复制后,数据处理装置201可以校验是否存在LSN大于基线LSN的重做日志被回收,如存储节点1021在回收重做日志时会记录该重做日志的LSN,从而当回收的重做日志的LSN大于基线LSN时,确定存在LSN大于基线LSN的重做日志被回收,否则确定不存在LSN大于基线LSN的重做日志被回收。
若存在LSN大于基线LSN的重做日志被回收,则数据处理装置201指示存储节点1021重新确定基线,如基于第二时刻(如当前时刻)确定基线等,并基于重新确定的基线再次执行基线复制过程,以保证两个存储节点之间的数据一致性(避免存储节点1022丢失存储节点1021中的部分数据)。其中,新的基线复制过程可以是相对于上一次基线复制过程的增量复制,以此可以减少基线复制过程所需传输的数据量;或者,新的基线复制过程也可以是全量复制等。
若不存在LSN大于基线LSN的重做日志被回收,则数据处理装置201指示存储节点1021将缓存区域701中的新生成重做日志发送至存储节点1022,该过程即为日志复制过程。实际应用时,存储节点1021也可以是将新生成的重做日志中相对于基线数据中的重做日志所新增的日志记录发送至存储节点1022,而可以不用发送该新生成的重做日志中的所有日志记录。并且,数据处理装置201可以指示存储节点1022对该新生成的重做日志进行回放,以更新自身持久化存储的数据。其中,存储节点1021可以与存储节点1022同时执行回放该重做日志的过程,或者存储节点1021与存储节点1022异步执行回放该重做日志的过程。对于缓存区域中601中已经在存储节点1021完成回放并且已经被发送至存储节点1022的重做日志,存储节点1021可以对其进行回收。并且,在日志复制过程中,存储节点1021若生成新的重做日志,则仍然将其写入缓存区域701中,以便后续将其发送至存储节点1022中。
在完成日志复制过程时,也即缓存区域701中不存在未被发送至存储节点1022的重做日志,数据处 理系统200的恢复点目标(recovery point object,RPO)能够为0,该RPO可以用于衡量数据处理系统200灾难恢复时所发生的最大数据丢失量。此时,如果存储节点1021与存储节点1022之间的数据复制方式被配置为同步复制,则对于存储节点1021后续所生成的新的重做日志,数据处理装置201可以控制存储节点1021执行重做日志的镜像功能,并将该重做日志的镜像发送至存储节点1022,以便存储节点1022根据对该重做日志的镜像文件进行回放。在确定存储节点1022成功接收到镜像文件后,数据处理装置201可以控制存储节点1021可以回放该重做日志,控制存储节点1022回放该重做日志的镜像文件,以此实现存储节点1021与存储节点1022之间的数据同步。而若存储节点1021与存储节点1022之间的数据复制方式被配置为异步复制,则对于存储节点1021新生成的重做日志,存储节点1021可以将其添加至缓存区域701中,并且,数据处理装置201可以指示存储节点1021回放该重做日志,并将缓存区域701中的重做日志发送至存储节点1022中,以便存储节点1022通过回放接收到的重做日志来实现存储节点1021与存储节点1022之间的数据同步。
实际应用场景中,在进行日志复制过程的过程中,如果存储节点1021与存储节点1022之间断开连接,则这两个存储节点处于待同步的状态。当存储节点1021与存储节点1022之间恢复连接时,如果存储节点1021中不存在未复制的重做日志被回收,则存储节点1021与存储节点1022可以继续执行日志复制过程,而若存储节点1021中存在部分未复制的重做日志被回收,则数据处理装置201指示存储节点1021重新确定基线,并基于重新确定的基线再次执行基线复制过程,以保证两个存储节点之间的数据一致性。
上述第一种实施示例中,存储节点1021与存储节点1022之间可以串行执行基线复制过程以及日志复制过程,而在其它可能的实施方式中,存储节点1021与存储节点1022之间可以并行执行基线复制过程以及日志复制过程。
具体地,作为第二种实现示例,数据处理装置201可以先判断是否存储节点1021与存储节点1022之间是否进行基线(baseline)复制。若确定采用基线复制,则数据处理装置201可以记录该基线对应的重做日志的基线LSN(假设为最新的重做日志对应的LSN),并控制存储节点1021将该基线对应的持久化数据以及不大于该基线LSN的重做日志远程发送至存储节点1022,如图8所示。在基线复制过程中(以及完成基线复制后),对于存储节点1021生成的新的重做日志,数据处理装置201可以控制存储节点1021对该重做日志进行镜像,并将该重做日志的镜像文件发送至存储节点1022。在此过程中,数据处理装置201可以控制存储节点1021对新生成的重做日志进行回放,以在存储节点1021进行数据更新。
在完成基线复制后,数据处理装置201可以控制存储节点1022先回放基线中包括的LSN不大于基线LSN的重做日志,然后,再按照日志接收顺序对接收到的重做日志的镜像文件进行回放,直至存储节点1022完成对最新接收到的重做日志的镜像文件进行回放。此时,存储节点1021与存储节点1022中持久化存储的数据可以保持一致,数据处理系统200的RPO能够为0。
值得注意的是,上述是以数据处理装置201控制两个存储节点分别回放重做日志进行示例性说明,在其它可能的实施方式中,主计算节点1012与从计算节点1012之间也可以共享同一存储节点,如图3所示,主计算节点1012与从计算节点1012均可以访问存储节点301。此时,对于主计算节点1011下发的重做日志,数据处理装置201控制一个存储节点(即存储节点301)对该重做日志完成回放即可。又或者,当存储节点1021具有多个作为灾备的从存储节点时(即一主多从的灾备架构),数据处理装置201可以控制多个存储节点分别回放重做日志,已实现数据在主、从存储节点之间的数据同步。
如图9所示,为本申请实施例中一种数据处理方法的流程示意图,该方法可以应用于如图2所示的数据处理系统200中。实际应用时,该方法也可以应用于其它可适用的数据处理系统中。为便于理解与描述,下面以应用于图2所示的数据处理系统200为例进行示例性说明,该方法具体可以包括:
S901:主计算节点1011接收访问请求,并向存储集群102写入数据。
实际应用时,主计算节点1011可以接收用户侧的客户端或者其它设备发送的访问请求,该访问请求可以用于请求读取数据处理系统200中所持久化存储的数据,或者可以用于请求对数据处理系统200中所持久化存储的数据进行修改,或者可以用于请求向数据处理系统200写入新数据等。
其中,当主计算节点1011接收到的访问请求用于请求修改数据或者写入新数据时,主计算节点1011在响应该访问请求的过程中,可以对修改后的数据或者新写入的数据进行缓存,并为该数据生成重做日 志。
S902:数据处理装置201监控主计算节点1011向存储集群102写入的数据。
主计算节点1011向存储集群102写入的数据,例如可以包括重做日志、数据页等数据,还可以是其它类型的数据。
S903:数据处理装置201识别到主计算节点1011向存储集群102写入的数据包括重做日志。
S904:数据处理装置201判断存储节点1021中是否存在未复制至存储节点1022的重做日志发生回收,若是,则执行步骤S905;若否,则执行步骤S907。
S905:数据处理装置201控制存储节点1021与存储节点1022之间进行基线复制。
具体实现时,数据处理装置201可以记录最新生成的重做日志(如步骤S903中所识别到的重做日志)的LSN,并根据当前时刻已生成的重做日志以及存储节点1021中持久化存储的数据作为基线,控制存储节点1021将该基线包括的数据以及重做日志远程发送至存储节点1022。
其中,在基线复制过程中,数据处理装置201还可以控制存储节点1022对基线中的重做日志进行回放,以使得存储节点1021与存储节点1022基于该基线所持久化存储的数据保持一致。
S906:数据处理装置201将存储节点1021在基线复制过程中所生成的新的重做日志复制至存储节点1022,并返回执行步骤S902。
本实施例中,存储节点1021与存储节点1022之间可以串行执行基线复制过程以及日志复制过程,则,数据处理装置201可以指示存储节点1021在基线复制过程中所生成的新的重做日志进行缓存,从而在完成基线复制后,控制存储节点1021将缓存的重做日志(该重做日志的LSN大于基线包括的重做日志的LSN)发送至存储节点1022,并控制存储节点1022回放所接收到的重做日志。
或者,存储节点1021与存储节点1022之间可以并行执行基线复制过程以及日志复制过程,则,数据处理装置201可以指示存储节点1021对基线复制过程中所生成的新的重做日志进行镜像,并将该新的重做日志的镜像文件发送至存储节点1022,并控制存储节点1022在回放完成基线中的所有重做日志后,再回放其接收到的重做日志的镜像文件。
S907:数据处理装置201控制存储节点1021将识别到的重做日志复制至存储节点1022。
其中,当存储节点1021与存储节点1022之间采用异步复制时,存储节点1021所生成的重做日志被写入缓存区域中,从而数据处理装置201可以指示存储节点1021将缓存区域中的重做日志发送至存储节点1022。
当存储节点1021与存储节点1022之间采用同步复制时,数据处理装置201可以指示存储节点1021对生成的重做日志进行镜像,并将镜像文件发送至存储节点1022。
S908:数据处理装置201分别控制存储节点1021以及存储节点1022回放其上的重做日志,以更新各存储节点所持久化存储的数据,并返回执行步骤S902。
如此,存储节点1021与存储节点1022根据重做日志对持久化存储的数据进行更新,相比于通过binlog更新数据的方式,因为无需执行数据库语句,而是直接在存储节点的数据页上修改数据,这可以有效降低更新存储节点中持久化存储的数据所需消耗的资源量。
并且,由于位于存储侧的数据处理装置201已经控制存储节点1022根据存储节点1021中的重做日志更新持久化存储的数据,这使得在从计算节点1012升级为主计算节点时(原主计算节点1011发生故障或者从计算节点1012接收到主从切换的升级指令等),从计算节点1012无需执行根据重做日志更新数据的过程,而能够直接根据存储节点1022中持久化存储的数据继续提供数据读写服务,以此可以有效缩短数据处理系统200的RTO,提高数据处理系统200的故障恢复性能。
值得注意的是,本实施例中是以应用于图2所示的数据处理系统为例,介绍多个存储节点之间的数据同步过程,当应用于3所示的数据处理系统时,主计算节点1011与从计算节点1012共享同一存储节点,此时,数据处理装置201指示存储节点301对生成的重做日志进行回放以更新持久化存储的数据即可。这样,当主计算节点1011故障后,从计算节点1012无需再执行回放重做日志的过程,而可以直接基于当前持久化存储的数据恢复业务运行,以此可以降低数据处理系统200的RTO。
图9所示的数据处理方法实施例,对应于图2至图8所示的数据处理系统200实施例,故图9所示的数据处理方法中的具体实现过程,可参见前述实施例的相关之处描述,在此不做重述。
上述各实施例中,是以数据处理装置201利用重做日志控制存储节点1021与存储节点1022保持数据一致性为例进行示例性说明。在其它可能的实施例中,数据处理装置201也可以利用binlog控制多个存储节点之间保持数据一致性。下面结合图10进行示例性说明。
参见图10,示出了本申请实施例提供的另一种数据处理系统500。如图10所示,数据处理系统500中的计算集群包括主计算节点5011、从计算节点5012以及从计算节点5013,并且,从计算节点5012以及从计算节点5013均作为主计算节点5011的灾备,例如可以是热备等。数据处理系统500中的存储集群包括存储节点5021、存储节点5022、存储节点5023以及数据处理装置503,并且,存储节点5022以及存储节点5023均作为存储节点5011的灾备。其中,主计算节点5011与存储节点5021可以构成主中心,从计算节点5012与存储节点5022可以构成灾备中心1,从计算节点5013与存储节点5023可以构成灾备中心2,并且,主中心与灾备中心1部署于同一物理区域,主中心与灾备中心2部署于不同的物理区域,如部署于不同的数据中心等。
主计算节点5011在向存储节点5021写入新的数据,或者对存储节点5021中的数据进行修改的过程中,通常会生成binlog(一种逻辑日志),该binlog中可以记录用于更新存储节点5021中持久化存储的数据的数据库语句,如SQL语句等。
此时,数据处理装置503可以感知并识别主计算节点5011中的binlog,例如可以是根据binlog的日志格式识别出binlog等,并从主计算节点5011中获取该binlog。当数据处理装置503部署于存储节点5021时,数据处理装置503可以控制存储节点5021,将binlog分别发送至灾备中心1以及灾备中心2,具体可以是将binlog分别发送至存储节点5022以及存储节点5023。而当数据处理装置5031独立于各个存储节点进行部署时,数据处理装置503可以配置有网卡,从而数据处理装置503可以通过该网卡将binlog分别发送至各个灾备中心等。然后,数据处理装置503可以分别指示从计算节点5012以及从计算节点5013回放各自所接收到的binlog。
从计算节点5012在回放binlog时,通过执行binlog中所记录的数据库语句,可以对存储节点5022中持久化存储的数据进行更新,以使得存储节点5022中的数据与存储节点5021中的数据保持一致。类似地,从计算节点5013也可以通过执行binlog中所记录的数据库语句,对存储节点5023中持久化存储的数据进行更新,以使得存储节点5023中的数据与存储节点5021中的数据保持一致。
由于binlog是在存储侧完成从主中心到各个灾备中心的复制,这使得即使主中心的主计算节点5011的负荷较大,各个灾备中心也能通过存储节点接收到binlog,而无需由主计算节点5011再执行将binlog发送至各个灾备中心的过程。如此,各个灾备中心中的从计算节点通过回放该binlog,可以实现各个灾备中心与主中心之间的数据同步,保证数据处理系统500的RPO能够为0,避免主计算节点5011因为负荷过大影响binlog在主中心与灾备中心之间的复制,而导致灾备中心的数据与主中心的数据不一致,影响数据处理系统500的RPO。
在进一步可能的实施方式中,数据处理装置503还可以判断存储节点5021与各个作为其灾备的存储节点之间是否需要进行基线复制。比如,当主中心与灾备中心之间第一次执行数据复制过程时,数据处理装置503可以确定进行基线复制。或者,当主计算节点5011生成的部分binlog未被及时发送至各个灾备中心而在主计算节点5011中被删除,如该部分binlog在主计算节点5011中的生命周期超出预设时长而被删除,或者因为主计算节点5011中用于存储binlog的可用区域不足而删除部分最先生成的binlog等,此时,存储节点5022以及存储节点5023难以获得该部分被删除的binlog,从而导致各灾备中心难以通过回放binlog来保持各灾备中心与主中心之间的数据同步。因此,数据处理装置503在确定对存储节点5021与各个存储节点之间是否需要进行基线复制时,基于该基线将存储节点5021中主卷的数据,分别复制至存储节点5022的从卷以及存储节点5023的从卷中。本实施例中,数据处理装置503控制存储节点5021分别与各个存储节点之间进行基线复制的具体实现,可参见前述实施例中数据处理装置201控制存储节点1021与存储节点1022之间进行基线复制的相关之处描述,在此不做重述。
值得注意的是,图10所示的数据处理系统500仅作为一种示例性说明。在其它可能的数据处理系统中,灾备中心(以及作为灾备的存储节点)的数量可以更少,或者可以更多等,本实施例对此并不进行限定。
上文中结合图1至图10,详细描述了本申请所提供的数据处理系统,下面将结合图11和图12,描述根据本申请所提供的数据处理装置、数据处理设备。
与上述方法同样的发明构思,本申请实施例还提供一种数据处理装置。参见图11,示出了本申请实施例提供的一种数据处理装置的示意图。其中,图11所示的数据处理装置1100位于数据处理系统,如图2所示的数据处理系统200等,该数据处理系统包括计算集群、存储集群,所述计算集群与所述存储集群通过网络进行连接,所述计算集群包括主计算节点以及从计算节点,所述存储集群包括至少一个存储节点以及所述数据处理装置,所述从计算节点作为所述主计算节点的灾备。
如图11所示,数据处理装置1100包括:
监控模块1101,用于监控所述主计算节点向所述存储集群写入的数据;
控制模块1102,用于在识别到所述写入的数据包括所述目标重做日志时,控制所述至少一个存储节点中的第一存储节点回放所述目标重做日志,以将所述目标重做日志中记录的目标数据更新至所述至少一个存储节点所持久化存储的数据中,所述至少一个存储节点中更新后的持久化存储的数据被所述从计算节点用于接管所述主计算节点上的访问请求。
在一种可能的实施方式中,至少一个存储节点还包括第二存储节点,第一存储节点作为第二存储节点的灾备;
控制模块1102,还用于在识别到写入的数据包括目标重做日志时,控制第二存储节点将目标重做日志发送至第一存储节点,以使得第一存储节点回放目标重做日志,以更新第一存储节点中持久化存储的数据。
在一种可能的实施方式中,主计算节点上运行有目标应用,目标重做日志由目标应用在运行过程中产生。
在一种可能的实施方式中,控制模块1102,用于根据目标应用的配置文件或目标应用的重做日志的命名格式,识别到写入的数据包括目标重做日志。
在一种可能的实施方式中,控制模块1102,具体用于控制第一存储节点根据目标应用对应的数据页的格式回放目标重做日志,以更新第一存储节点上的数据页。
在一种可能的实施方式中,控制模块1102,还用于控制第一存储节点在回放目标重做日志之前,获取目标应用对应的数据页的格式。
在一种可能的实施方式中,控制模块1102,还用于控制第二存储节点将目标数据对应的二进制日志发送至第一存储节点,二进制日志用于记录数据库语句;
控制模块1102,具体用于控制第一存储节点采用二进制日志对目标重做日志进行验证,以使得第一存储节点在确定验证通过的情况下,回放目标重做日志。
在一种可能的实施方式中,主计算节点与从计算节点均能够访问第一存储节点,第一存储节点包括读缓存区域;
控制模块1102,还用于控制第一存储节点在回放目标重做日志之前,将目标数据缓存至读缓存区域,读缓存区域中的目标数据能够被主计算节点读取。
在一种可能的实施方式中,控制模块1102,还用于控制第一存储节点在回放目标重做日志之后,从读缓存区域淘汰目标数据。
在一种可能的实施方式中,至少一个存储节点包括存储阵列,存储阵列用于持久化存储数据。
在一种可能的实施方式中,控制模块1102,还用于在控制第二存储节点发送基线数据之前,设置第二存储节点中用于缓存重做日志的存储空间的下限值。
在一种可能的实施方式中,主计算节点部署有多个应用,该多个应用包括上述目标应用。
在一种可能的实施方式中,第一存储节点与第二存储节点部署于同一物理区域,控制模块,还用于根据第二存储节点在该物理区域内创建第一存储节点,该第一存储节点中的数据通过对第二存储节点中的数据进行快照或者克隆得到。
在一种可能的实施方式中,二进制日志由主计算节点发送至从计算节点,并由从计算节点将该二进制日志下发至第一存储节点。
本实施例提供的数据处理装置1100,对应于上述各实施例中的数据处理系统,用于实现上述各实施例中数据处理装置201的功能或者数据处理装置201所执行的数据处理方法,因此,本实施例中的各个模块的功能及其所具有的技术效果,可参见前述实施例中的相关之处描述,在此不做赘述。
此外,本申请实施例还提供一种数据处理设备,如图12所示,数据处理设备1200中可以包括通信接口1210、处理器1220。可选的,数据处理设备1200中还可以包括存储器1230。其中,存储器1230可以设置于数据处理设备1200内部,还可以设置于数据处理设备1200外部。示例性地,上述实施例中数据处理装置201执行的各个动作均可以由处理器1220实现。在实现过程中,处理流程的各步骤可以通过处理器1220中的硬件的集成逻辑电路或者软件形式的指令完成前述实施例中数据处理装置201执行的方法。为了简洁,在此不再赘述。处理器1220用于实现上述方法所执行的程序代码可以存储在存储器1230中。存储器1230和处理器1220连接,如耦合连接等。
本申请实施例的一些特征可以由处理器1220执行存储器1230中的程序指令或者软件代码来完成/支持。存储器1230上在加载的软件组件可以从功能或者逻辑上进行概括,例如,图11所示的监控模块1101、控制模块1102,图11所示的监控模块1101的功能可以由通信接口1010实现。
本申请实施例中涉及到的任一通信接口可以是电路、总线、收发器或者其它任意可以用于进行信息交互的装置。比如数据处理设备1200中的通信接口1210,示例性地,该其它装置可以是与该数据处理设备1200相连的设备等。
基于以上实施例,本申请实施例还提供了一种芯片,包括供电电路以及处理电路,所述供电电路用于对所述处理电路进行供电,所述处理电路用于:
监控所述主计算节点向所述存储集群写入的数据;
在识别到所述写入的数据包括所述目标重做日志时,控制所述至少一个存储节点中的第一存储节点回放所述目标重做日志,以将所述目标重做日志中记录的目标数据更新至所述至少一个存储节点所持久化存储的数据中,所述至少一个存储节点中更新后的持久化存储的数据被所述从计算节点用于接管所述主计算节点上的访问请求,其中,该芯片应用于数据处理系统中的数据处理装置,所述数据处理系统包括计算集群、存储集群,所述计算集群与所述存储集群通过网络进行连接,所述计算集群包括主计算节点以及从计算节点,所述存储集群包括至少一个存储节点以及数据处理装置,所述从计算节点作为所述主计算节点的灾备。
示例性地,供电电路包括但不限于如下至少一个:供电子系统、电管管理芯片、功耗管理处理器或功耗管理控制电路。
在一种可能的实施方式中,至少一个存储节点还包括第二存储节点,第一存储节点作为第二存储节点的灾备,所述处理电路用于:
在识别到写入的数据包括目标重做日志时,控制第二存储节点将目标重做日志发送至第一存储节点,以使得第一存储节点回放目标重做日志,以更新第一存储节点中持久化存储的数据。
在一种可能的实施方式中,第二存储节点与第一存储节点部署于同一物理区域;或者,第二存储节点与第一存储节点部署于不同物理区域。
在一种可能的实施方式中,主计算节点上运行有目标应用,目标重做日志由目标应用在运行过程中产生。
在一种可能的实施方式中,主计算节点部署有多个应用,该多个应用包括上述目标应用。
在一种可能的实施方式中,所述处理电路具体用于:
根据目标应用的配置文件或目标应用的重做日志的命名格式,识别到写入的数据包括目标重做日志。
在一种可能的实施方式中,第一存储节点与第二存储节点部署于同一物理区域,所述处理电路还用于:
根据第二存储节点在该物理区域内创建第一存储节点,该第一存储节点中的数据通过对第二存储节点中的数据进行快照或者克隆得到。
在一种可能的实施方式中,所述处理电路具体用于:
控制第一存储节点根据目标应用对应的数据页的格式回放目标重做日志,以更新第一存储节点上的数据页。
在一种可能的实施方式中,所述处理电路还用于:
控制第一存储节点在回放目标重做日志之前,获取目标应用对应的数据页的格式。
在一种可能的实施方式中,所述处理电路还用于:
控制第二存储节点将目标数据对应的二进制日志发送至第一存储节点,二进制日志用于记录数据库语句;
则,所述处理电路具体用于:控制第一存储节点采用二进制日志对目标重做日志进行验证,以使得第一存储节点在确定验证通过的情况下,回放目标重做日志。
在一种可能的实施方式中,主计算节点与从计算节点均能够访问第一存储节点,第一存储节点包括读缓存区域;
则,所述处理电路还用于:控制第一存储节点在回放目标重做日志之前,将目标数据缓存至读缓存区域,读缓存区域中的目标数据能够被主计算节点读取。
在一种可能的实施方式中,所述处理电路还用于:
控制第一存储节点在回放目标重做日志之后,从读缓存区域淘汰目标数据。
在一种可能的实施方式中,目标应用包括关系数据库管理系统RDBMS,RDBMS包括MySQL、PostgreSQL、openGauss、oracle中的至少一种。
在一种可能的实施方式中第一存储节点与第二存储节点部署于不同物理区域;所述处理电路还用于:
控制第二存储节点在将目标重做日志发送至第一存储节点之前,将基线数据发送至第一存储节点;数据处理装置控制第一存储节点在回放目标重做日志之前,存储基线数据。
在一种可能的实施方式中,所述处理电路还用于:
在控制第二存储节点发送基线数据之前,设置第二存储节点中用于缓存重做日志的存储空间的下限值。
在一种可能的实施方式中至少一个存储节点包括存储阵列,存储阵列用于持久化存储数据。
在一种可能的实施方式中,二进制日志由主计算节点发送至从计算节点,并由从计算节点将该二进制日志下发至第一存储节点。
本申请实施例中涉及的处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
本申请实施例中的耦合是装置、模块或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、模块或模块之间的信息交互。
处理器可能和存储器协同操作。存储器可以是非易失性存储器,比如硬盘或固态硬盘等,还可以是易失性存储器,例如随机存取存储器。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
本申请实施例中不限定上述通信接口、处理器以及存储器之间的具体连接介质。比如存储器、处理器以及通信接口之间可以通过总线连接。所述总线可以分为地址总线、数据总线、控制总线等。
基于以上实施例,本申请实施例还提供了一种计算机存储介质,该存储介质中存储软件程序,该软件程序在被一个或多个计算设备读取并执行时可实现上述任意一个或多个实施例提供的数据处理装置102执行的方法。所述计算机存储介质可以包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图 和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (33)

  1. 一种数据处理系统,其特征在于,所述数据处理系统包括计算集群、存储集群,所述计算集群与所述存储集群通过网络进行连接,所述计算集群包括主计算节点以及从计算节点,所述存储集群包括至少一个存储节点以及数据处理装置,所述从计算节点作为所述主计算节点的灾备;
    所述主计算节点,用于接收访问请求,并向所述存储集群写入数据;
    所述数据处理装置,用于监控所述主计算节点向所述存储集群写入的数据,并在识别到所述写入的数据包括所述目标重做日志时,控制所述至少一个存储节点中的第一存储节点回放所述目标重做日志,以将所述目标重做日志中记录的目标数据更新至所述至少一个存储节点所持久化存储的数据中;
    所述从计算节点,用于根据所述至少一个存储节点中更新后的持久化存储的数据,接管所述主计算节点上的访问请求。
  2. 根据权利要求1所述的数据处理系统,其特征在于,所述至少一个存储节点还包括第二存储节点,所述第一存储节点作为所述第二存储节点的灾备;
    所述第二存储节点,用于存储所述主计算节点写入的数据;
    所述数据处理装置,用于在识别到所述写入的数据包括所述目标重做日志时,控制所述第二存储节点将所述目标重做日志发送至所述第一存储节点,以使得所述第一存储节点回放所述目标重做日志,以更新所述第一存储节点中持久化存储的数据。
  3. 根据权利要求2所述的数据处理系统,其特征在于,所述第二存储节点与所述第一存储节点部署于同一物理区域;或者,所述第二存储节点与所述第一存储节点部署于不同物理区域。
  4. 根据权利要求2或3所述的数据处理系统,其特征在于,所述主计算节点上运行有目标应用,所述目标重做日志由所述目标应用在运行过程中产生。
  5. 根据权利要求4所述的数据处理系统,其特征在于,
    所述数据处理装置,具体用于根据所述目标应用的配置文件或所述目标应用的重做日志的命名格式,识别所述目标重做日志。
  6. 根据权利要求4或5所述的数据处理系统,其特征在于,
    所述第一存储节点,用于根据所述目标应用对应的数据页的格式,回放所述目标重做日志,以更新所述第一存储节点上的数据页。
  7. 根据权利要求6所述的数据处理系统,其特征在于,
    所述第一存储节点,还用于在回放所述目标重做日志之前,获取所述目标应用对应的数据页的格式。
  8. 根据权利要求4所述的数据处理系统,其特征在于,所述目标应用包括关系数据库管理系统RDBMS,所述RDBMS包括MySQL、PostgreSQL、openGauss、oracle中的至少一种。
  9. 根据权利要求3所述的数据处理系统,其特征在于,所述第一存储节点与所述第二存储节点部署于不同物理区域;
    所述第二存储节点,还用于在将所述目标重做日志发送至所述第一存储节点之前,将基线数据发送至所述第一存储节点;
    所述第一存储节点,还用于在回放所述目标重做日志之前,存储所述基线数据。
  10. 根据权利要求2至9任一项所述的数据处理系统,其特征在于,所述数据处理装置,还用于控制所述第二存储节点将所述目标数据对应的二进制日志发送至所述第一存储节点,所述二进制日志用于记录数据库语句;
    所述第一存储节点,具体用于采用所述二进制日志对所述目标重做日志进行验证,并在验证通过的情况下,回放所述目标重做日志,以更新所述第一存储节点中持久化存储的数据。
  11. 根据权利要求1所述的数据处理系统,其特征在于,所述主计算节点与所述从计算节点均能够访问所述第一存储节点,所述第一存储节点包括读缓存区域;
    所述第一存储节点,还用于在回放所述目标重做日志之前,将所述目标数据缓存至所述读缓存区域;
    所述主计算节点,还用于从所述读缓存区域中读取所述目标数据。
  12. 根据权利要求11所述的数据处理系统,其特征在于,所述第一存储节点还用于在回放所述目标重做日志之后,从所述读缓存区域淘汰所述目标数据。
  13. 根据权利要求1所述的数据处理系统,其特征在于,所述至少一个存储节点包括存储阵列,所述存储阵列用于持久化存储数据。
  14. 一种数据处理方法,其特征在于,所述方法应用于数据处理系统,所述数据处理系统包括计算集群、存储集群,所述计算集群与所述存储集群通过网络进行连接,所述计算集群包括主计算节点以及从计算节点,所述存储集群包括至少一个存储节点以及数据处理装置,所述从计算节点作为所述主计算节点的灾备,所述方法包括:
    所述数据处理装置监控所述主计算节点向所述存储集群写入的数据;
    所述数据处理装置在识别到所述写入的数据包括所述目标重做日志时,控制所述至少一个存储节点中的第一存储节点回放所述目标重做日志,以将所述目标重做日志中记录的目标数据更新至所述至少一个存储节点所持久化存储的数据中,所述至少一个存储节点中更新后的持久化存储的数据被所述从计算节点用于接管所述主计算节点上的访问请求。
  15. 根据权利要求14所述的方法,其特征在于,所述至少一个存储节点还包括第二存储节点,所述第一存储节点作为所述第二存储节点的灾备,所述方法还包括:
    所述数据处理装置在识别到所述写入的数据包括所述目标重做日志时,控制所述第二存储节点将所述目标重做日志发送至所述第一存储节点,以使得所述第一存储节点回放所述目标重做日志,以更新所述第一存储节点中持久化存储的数据。
  16. 根据权利要求15所述的方法,其特征在于,所述主计算节点上运行有目标应用,所述目标重做日志由所述目标应用在运行过程中产生。
  17. 根据权利要求16所述的方法,其特征在于,所述数据处理装置识别到所述写入的数据包括所述目标重做日志,包括:
    所述数据处理装置根据所述目标应用的配置文件或所述目标应用的重做日志的命名格式,识别到所述写入的数据包括所述目标重做日志。
  18. 根据权利要求16或17所述的方法,其特征在于,所述数据处理装置控制所述至少一个存储节点中的第一存储节点回放所述目标重做日志,包括:
    所述数据处理装置控制所述第一存储节点根据所述目标应用对应的数据页的格式回放所述目标重做日志,以更新所述第一存储节点上的数据页。
  19. 根据权利要求18所述的方法,其特征在于,所述方法还包括:
    所述数据处理装置控制所述第一存储节点在回放所述目标重做日志之前,获取所述目标应用对应的数据页的格式。
  20. 根据权利要求15至19任一项所述的方法,其特征在于,所述方法还包括:
    所述数据处理装置控制所述第二存储节点将所述目标数据对应的二进制日志发送至所述第一存储节点,所述二进制日志用于记录数据库语句;
    所述数据处理装置控制所述至少一个存储节点中的第一存储节点回放所述目标重做日志,包括:
    所述数据处理装置控制所述第一存储节点采用所述二进制日志对所述目标重做日志进行验证,以使得所述第一存储节点在确定所述验证通过的情况下,回放所述目标重做日志。
  21. 根据权利要求14所述的方法,其特征在于,所述主计算节点与所述从计算节点均能够访问所述第一存储节点,所述第一存储节点包括读缓存区域;
    所述方法还包括:
    所述数据处理装置控制所述第一存储节点在回放所述目标重做日志之前,将所述目标数据缓存至所述读缓存区域,所述读缓存区域中的所述目标数据能够被所述主计算节点读取。
  22. 根据权利要求21所述的方法,其特征在于,所述方法还包括:
    所述数据处理装置控制所述第一存储节点在回放所述目标重做日志之后,从所述读缓存区域淘汰所述目标数据。
  23. 一种数据处理装置,其特征在于,所述数据处理装置应用于数据处理系统,所述数据处理系统包括计算集群、存储集群,所述计算集群与所述存储集群通过网络进行连接,所述计算集群包括主计算节点以及从计算节点,所述存储集群包括至少一个存储节点以及所述数据处理装置,所述从计算节点作 为所述主计算节点的灾备,所述数据处理装置包括:
    监控模块,用于监控所述主计算节点向所述存储集群写入的数据;
    控制模块,用于在识别到所述写入的数据包括所述目标重做日志时,控制所述至少一个存储节点中的第一存储节点回放所述目标重做日志,以将所述目标重做日志中记录的目标数据更新至所述至少一个存储节点所持久化存储的数据中,所述至少一个存储节点中更新后的持久化存储的数据被所述从计算节点用于接管所述主计算节点上的访问请求。
  24. 根据权利要求23所述的数据处理装置,其特征在于,所述至少一个存储节点还包括第二存储节点,所述第一存储节点作为所述第二存储节点的灾备;
    所述控制模块,还用于在识别到所述写入的数据包括所述目标重做日志时,控制所述第二存储节点将所述目标重做日志发送至所述第一存储节点,以使得所述第一存储节点回放所述目标重做日志,以更新所述第一存储节点中持久化存储的数据。
  25. 根据权利要求24所述的数据处理装置,其特征在于,所述主计算节点上运行有目标应用,所述目标重做日志由所述目标应用在运行过程中产生。
  26. 根据权利要求25所述的数据处理装置,其特征在于,所述控制模块,用于根据所述目标应用的配置文件或所述目标应用的重做日志的命名格式,识别到所述写入的数据包括所述目标重做日志。
  27. 根据权利要求25或26所述的数据处理装置,其特征在于,所述控制模块,具体用于控制所述第一存储节点根据所述目标应用对应的数据页的格式回放所述目标重做日志,以更新所述第一存储节点上的数据页。
  28. 根据权利要求27所述的数据处理装置,其特征在于,所述控制模块,还用于控制所述第一存储节点在回放所述目标重做日志之前,获取所述目标应用对应的数据页的格式。
  29. 根据权利要求23至28任一项所述的数据处理装置,其特征在于,所述控制模块,还用于控制所述第二存储节点将所述目标数据对应的二进制日志发送至所述第一存储节点,所述二进制日志用于记录数据库语句;
    所述控制模块,具体用于控制所述第一存储节点采用所述二进制日志对所述目标重做日志进行验证,以使得所述第一存储节点在确定所述验证通过的情况下,回放所述目标重做日志。
  30. 一种数据处理设备,其特征在于,所述计算设备包括处理器和存储器;
    所述处理器用于执行所述存储器中存储的指令,以使得所述数据处理设备执行如权利要求14至22中任一项所述的方法。
  31. 一种芯片,其特征在于,包括供电电路以及处理电路,所述供电电路用于对所述处理电路进行供电,所述处理电路执行如权利要求14至22中任一项所述的方法。
  32. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在计算设备上运行时,使得所述计算设备执行如权利要求14至22任一项所述的方法。
  33. 一种包含指令的计算机程序产品,其特征在于,当其在至少一个计算设备上运行时,使得所述至少一个计算设备执行如权利要求14至22中任一项所述的方法。
PCT/CN2023/101476 2022-10-13 2023-06-20 数据处理系统、数据处理方法、装置及相关设备 WO2024078001A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211253915.4A CN117931831A (zh) 2022-10-13 2022-10-13 数据处理系统、数据处理方法、装置及相关设备
CN202211253915.4 2022-10-13

Publications (1)

Publication Number Publication Date
WO2024078001A1 true WO2024078001A1 (zh) 2024-04-18

Family

ID=90668640

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101476 WO2024078001A1 (zh) 2022-10-13 2023-06-20 数据处理系统、数据处理方法、装置及相关设备

Country Status (2)

Country Link
CN (1) CN117931831A (zh)
WO (1) WO2024078001A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073656A (zh) * 2016-11-17 2018-05-25 杭州华为数字技术有限公司 一种数据同步方法及相关设备
CN114610532A (zh) * 2022-01-26 2022-06-10 阿里云计算有限公司 数据库处理方法以及装置
CN114610533A (zh) * 2022-01-26 2022-06-10 阿里云计算有限公司 数据库处理方法以及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073656A (zh) * 2016-11-17 2018-05-25 杭州华为数字技术有限公司 一种数据同步方法及相关设备
CN114610532A (zh) * 2022-01-26 2022-06-10 阿里云计算有限公司 数据库处理方法以及装置
CN114610533A (zh) * 2022-01-26 2022-06-10 阿里云计算有限公司 数据库处理方法以及装置

Also Published As

Publication number Publication date
CN117931831A (zh) 2024-04-26

Similar Documents

Publication Publication Date Title
US11734306B2 (en) Data replication method and storage system
JP4477950B2 (ja) リモートコピーシステム及び記憶装置システム
US9235481B1 (en) Continuous data replication
US9563517B1 (en) Cloud snapshots
US8521694B1 (en) Leveraging array snapshots for immediate continuous data protection
US8745004B1 (en) Reverting an old snapshot on a production volume without a full sweep
US7882286B1 (en) Synchronizing volumes for replication
US7308545B1 (en) Method and system of providing replication
JP4419884B2 (ja) データ複製装置、方法及びプログラム並びに記憶システム
US8732128B2 (en) Shadow copy bookmark generation
CN111078667B (zh) 一种数据迁移的方法以及相关装置
JPH07239799A (ja) 遠隔データ・シャドーイングを提供する方法および遠隔データ二重化システム
US11748215B2 (en) Log management method, server, and database system
CN115858236A (zh) 一种数据备份方法和数据库集群
WO2024103594A1 (zh) 容器容灾方法、系统、装置、设备及计算机可读存储介质
CN113535665A (zh) 一种主数据库与备数据库之间同步日志文件的方法及装置
WO2024078001A1 (zh) 数据处理系统、数据处理方法、装置及相关设备
WO2022033269A1 (zh) 数据处理的方法、设备及系统
CN115955488A (zh) 基于副本冗余的分布式存储副本跨机房放置方法与装置
CN115658245A (zh) 一种基于分布式数据库系统的事务提交系统、方法及装置
CN111813334B (zh) 一种Ceph的写性能优化和双控节点组合方法
WO2024093263A1 (zh) 数据处理系统、方法、装置及相关设备
CN111400098B (zh) 一种副本管理方法、装置、电子设备及存储介质
CN114518973A (zh) 分布式集群节点宕机重启恢复方法
KR100503899B1 (ko) 데이터베이스 복제시스템 및 그 복제방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23876211

Country of ref document: EP

Kind code of ref document: A1