WO2024093263A1 - Data processing system, method and apparatus, and related device - Google Patents

Data processing system, method and apparatus, and related device Download PDF

Info

Publication number
WO2024093263A1
WO2024093263A1 PCT/CN2023/101428 CN2023101428W WO2024093263A1 WO 2024093263 A1 WO2024093263 A1 WO 2024093263A1 CN 2023101428 W CN2023101428 W CN 2023101428W WO 2024093263 A1 WO2024093263 A1 WO 2024093263A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
node
computing node
data
binlog
Prior art date
Application number
PCT/CN2023/101428
Other languages
French (fr)
Chinese (zh)
Inventor
王伟
任仁
曹宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024093263A1 publication Critical patent/WO2024093263A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present application relates to the field of database technology, and in particular to a data processing system, method, apparatus and related equipment.
  • data processing systems are usually deployed with a main center (or production center) and at least one disaster recovery center.
  • the main center includes the main computing node and the main storage node
  • the disaster recovery center includes the main computing node and the main storage node.
  • the main center uses the main computing node and the main storage node to provide data reading and writing services to the outside world;
  • the backup center is responsible for backing up the data stored in the main center, and when the main center fails, the disaster recovery center can use the backup data to continue to provide data reading and writing services to the outside world to avoid data loss, thereby ensuring the reliability of data storage.
  • the master computing node when it updates the persistently stored data in the master storage node, it will send a binary log (binlog) file to the slave computing node, so that the slave computing node can complete the data update in the slave storage node by replaying the binlog file, thereby achieving data synchronization between the master center and the disaster recovery center.
  • binary log binary log
  • the master center fails, the data stored in the disaster recovery center is often inconsistent with the data in the master center before the failure, which causes the recovery point object (RPO) of the data processing system to fail to reach 0, affecting the reliability of the data processing system.
  • RPO can be used to measure the maximum amount of data loss that occurs during disaster recovery of the data processing system.
  • a data processing system is provided to achieve consistency between the data stored in the disaster recovery center and the data in the main center before the failure, so as to improve the reliability of the data processing system and achieve an RPO of 0 for the data processing system.
  • corresponding data processing methods, devices, computing device clusters, chips, computer-readable storage media, and computer program products are also provided.
  • an embodiment of the present application provides a data processing system, which includes a computing cluster and a storage cluster, wherein the computing cluster and the storage cluster are connected through a network, such as through a wired network or a wireless network, and the computing cluster includes a master computing node and a slave computing node.
  • the slave computing node serves as a disaster recovery node for the master computing node
  • the storage cluster includes at least one storage node
  • the master computing node is used to generate a binlog (binary log) in response to a data update request, and the data update request can be sent to the master computing node by a user through a client, etc.
  • the master computing node is also used to send the binlog to the storage cluster for storage
  • the slave computing node is used to read the binlog stored in the storage cluster, and update the data persistently stored in the storage cluster by replaying the binlog (specifically replaying the database statements recorded in the binlog).
  • the slave computing node can synchronize data with the master computing node by timely replaying the binlog generated by the master computing node, or the slave computing node can replay the binlog when the master computing node fails to achieve data recovery, etc.
  • the master computing node and the slave computing node transmit binlogs through the storage cluster, and the master computing node does not need to send binlogs directly to the slave computing node, this can avoid excessive load on the master computing node, or unstable data transmission links between the master computing node and the slave computing node, resulting in the slave computing node failing to obtain the binlog generated by the master computing node.
  • the slave computing node can achieve data synchronization between the master computing node and the slave computing node by replaying the binlog that can be obtained, or achieve data recovery when the master computing node fails.
  • the slave computing node can take over the business on the master computing node based on the data of the master computing node at the time of failure, thereby achieving an RPO of 0 for the data processing system and improving the reliability of the data processing system.
  • the storage cluster includes a log storage area, which is accessed by the master computing node and the slave computing node, that is, the master computing node and the slave computing node can share the log storage area, so that when the master computing node sends binlog to the storage cluster, it specifically sends the binlog to the log storage area for storage; correspondingly, the slave computing node specifically reads the binlog from the log storage area; wherein the storage cluster includes not only the log storage area, but also the data storage area, and the data storage area Used to store business data, such as business data processed during the normal operation of the master computing node, the data storage area can be accessed by the master computing node and the slave computing node, or the data storage area can only be accessed by the master computing node (the slave computing node can only access the data storage area when the master computing node fails).
  • the storage cluster includes not only the log storage area, but also the data storage area, and the data storage area Used to store business data, such as business data processed during the normal operation of the master computing node, the
  • binlog transmission can be achieved through a shared storage area in the storage cluster, so as to ensure that the slave computing node can obtain the binlog generated by the master computing node as much as possible, thereby improving the reliability of the data processing system.
  • the slave computing node is specifically used to synchronize data with the master computing node by replaying binlog, that is, during the normal operation of the master computing node, the slave computing node can continuously replay the binlog generated by the master computing node to achieve data synchronization with the master computing node; or, the slave computing node is used to recover the data lost when the master computing node fails by replaying binlog, that is, before the master computing node fails, the slave computing node may not perform the operation of replaying binlog, and after the master computing node fails, the slave computing node recovers the data lost when the master computing node fails by replaying binlog, so that the RPO of the data processing system is 0. In this way, the reliability of the data processing system can be effectively improved.
  • the storage nodes included in the storage cluster include a master storage node and a slave storage node, wherein the slave storage node serves as a disaster recovery for the master storage node, and the master storage node and the slave storage node are deployed in the same data center or the same availability zone.
  • the master storage node is used to provide read and write data services for the master computing node
  • the slave storage node is used to provide read and write data services for the slave computing node; thus, the slave computing node, during the normal operation of the master computing node, will read the binlog generated and sent by the master computing node from the log storage area of the storage cluster, and update the data persistently stored in the slave storage node by replaying the binlog.
  • the slave computing node can synchronize data with the master computing node by continuously replaying the binlog generated by the master computing node, so that when the master computing node fails, the slave computing node can take over the business on the master computing node based on the synchronized data, thereby achieving an RPO of 0 for the data processing system, thereby improving the reliability of the data processing system.
  • a storage cluster includes a target storage node among the storage nodes, which is used to persistently store the data written by the master computing node; the slave computing node may not perform the binlog playback operation during the normal operation of the master computing node, but when the master computing node fails, it reads the binlog from the log storage area and updates the data persistently stored in the target storage node by playing back the binlog, so as to restore the data lost by the master computing node when the failure occurs, thereby improving the reliability of the data processing system.
  • the storage cluster includes a main storage node and a slave storage node in the storage nodes, the slave storage node serves as a disaster recovery for the main storage node, the main storage node and the slave storage node are deployed in different data centers, or the main storage node and the slave storage node are deployed in different availability zones, usually, the main storage node is used to provide read and write data services to the main computing node, and the slave storage node is used to provide read and write data services to the slave computing node; in the process of transmitting binlog, the main computing node, specifically, sends the binlog to the main storage node for storage, and then the main storage node sends the binlog to the slave storage node, so that the slave computing node can read the binlog stored in the slave storage node.
  • the master storage node is also used to send baseline data to the slave storage node before sending the binlog to the slave storage node.
  • the baseline data generally includes the data persistently stored by the master storage node at a certain moment (that is, the data stored on the data page) and the binlog generated by the master computing node before that moment.
  • the slave storage node stores the baseline data after receiving it. In this way, the slave computing node can update the baseline data stored in the slave storage node by replaying the binlog to achieve data synchronization with the master center.
  • a target application is running on the main computing node, and the binlog transmitted through the storage cluster is generated during the operation of the target application.
  • the target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle.
  • the storage nodes in the storage cluster are specifically storage arrays, which are used to persistently store data. Since the storage array is usually configured with technologies based on independent disk redundant array technology, erasure coding technology, deduplication compression, data backup, etc., the reliability of persistent storage of data in the storage cluster can be further improved.
  • an embodiment of the present application provides a data processing method, the method is applied to a data processing system, the data processing system includes a computer The method comprises: the master computing node generates a binary log binlog in response to a data update request; the master computing node sends the binlog to the storage cluster for storage; the slave computing node reads the binlog stored in the storage cluster; and the slave computing node updates the data persistently stored in the storage cluster by replaying the binlog.
  • the storage cluster includes a log storage area, which is accessed by a master computing node and a slave computing node; the master computing node sends binlog to the storage cluster for storage, including: the master computing node sends binlog to the log storage area for storage; the slave computing node reads the binlog stored in the storage cluster, including: the slave computing node reads the binlog from the log storage area; wherein the storage cluster also includes a data storage area, which is used to store business data, and the data storage area is accessed by the master computing node and the slave computing node, or the data storage area is only accessed by the master computing node.
  • replaying binlog from a computing node includes: replaying binlog from a computing node to synchronize data with a master computing node, or replaying binlog to recover data lost when a master computing node fails.
  • At least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in the same data center or the same availability zone; the secondary computing node reads the binlog stored in the storage cluster, including: the secondary computing node reads the binlog from the log storage area during the normal operation of the primary computing node; the secondary computing node updates the data persistently stored in the storage cluster by replaying the binlog, including: the secondary computing node updates the data persistently stored in the secondary storage node by replaying the binlog.
  • At least one storage node includes a target storage node, and the target storage node is used to persistently store data written by the main computing node; the slave computing node reads the binlog stored in the storage cluster, including: when the main computing node fails, the slave computing node reads the binlog from the log storage area; the slave computing node updates the data persistently stored in the storage cluster by replaying the binlog, including: the slave computing node updates the data persistently stored in the target storage node by replaying the binlog.
  • At least one storage node includes a master storage node and a slave storage node
  • the slave storage node serves as a disaster recovery for the master storage node
  • the master storage node and the slave storage node are deployed in different data centers, or the master storage node and the slave storage node are deployed in different availability zones
  • the master computing node sends the binlog to the storage cluster for storage, including: the master computing node sends the binlog to the master storage node for storage
  • the slave computing node reads the binlog stored in the storage cluster, including: the slave computing node is specifically used to read the binlog stored in the slave storage node, and the binlog in the slave storage node is sent by the master storage node.
  • the method further includes: before the master storage node sends the binlog to the slave storage node, the master storage node sends the baseline data to the slave storage node for storage; the slave computing node updates the data persistently stored in the storage cluster by replaying the binlog, including: the slave computing node updates the baseline data stored in the slave storage node by replaying the binlog.
  • a target application is running on the main computing node, binlog is generated during the running of the target application, the target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle.
  • the storage node is a storage array, and the storage array is used to persistently store data.
  • an embodiment of the present application provides a data processing device, which is applied to a data processing system, wherein the data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected through a network, the computing cluster includes a master computing node and a slave computing node, and the storage cluster includes at least one storage node; the data processing device includes: a storage module, used to instruct the master computing node to send the binary log binlog generated in response to the data update request to the storage cluster for storage; a reading module, used to instruct the slave computing node to read the binlog stored in the storage cluster to the slave computing node; and a playback module, used to instruct the slave computing node to update the data persistently stored in the storage cluster by replaying the binlog.
  • the storage cluster includes a log storage area, which is accessed by the master computing node and the slave computing node; a storage module, which is specifically used to instruct the master computing node to send binlog to the log storage area for storage; a reading module, which is specifically used to instruct the slave computing node to read binlog from the log storage area to the slave storage node; wherein the storage cluster also includes a data storage area, which is used to store business data, and the data storage area is accessed by the master computing node and the slave computing node, or the data storage area is only accessed by the master computing node.
  • the revisit module is specifically used to instruct the slave computing node to synchronize data with the master computing node by replaying the binlog, or is specifically used to instruct the slave computing node to recover data lost when the master computing node fails by replaying the binlog.
  • At least one storage node includes a master storage node and a slave storage node, wherein the slave storage node serves as the master storage node.
  • the master storage node and the slave storage node are deployed in the same data center or the same availability zone; the reading module is specifically used to instruct the slave computing node to read the binlog from the log storage area to the slave computing node during the normal operation of the master computing node; the playback module is specifically used to instruct the slave computing node to update the data persistently stored in the slave storage node by replaying the binlog.
  • At least one storage node includes a target storage node, which is used to persistently store data written by the main computing node; a reading module, which is specifically used to instruct the slave computing node to read binlog from the log storage area when the main computing node fails; and a playback module, which is specifically used to instruct the slave computing node to update the data persistently stored in the target storage node by replaying the binlog.
  • At least one storage node includes a master storage node and a slave storage node
  • the slave storage node serves as a disaster recovery for the master storage node
  • the master storage node and the slave storage node are deployed in different data centers, or the master storage node and the slave storage node are deployed in different availability zones
  • a storage module is specifically used to instruct the master computing node to send the binlog to the master storage node for storage
  • a reading module is specifically used to instruct the slave computing node to read the binlog stored in the slave storage node, and the binlog in the slave storage node is sent by the master storage node.
  • the storage module is further used to: instruct the master storage node to send the baseline data to the slave storage node for storage before sending the binlog to the slave storage node; the playback module is specifically used to instruct the slave computing node to update the baseline data stored in the slave storage node by replaying the binlog.
  • a target application is running on the main computing node, binlog is generated during the running of the target application, the target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle.
  • the storage node is a storage array, and the storage array is used to persistently store data.
  • an embodiment of the present application provides a computing device cluster, the computing device cluster includes at least one computing device, each computing device in the at least one computing device includes: a processor and a memory; the memory is used to store instructions, and the processor executes the instructions stored in the memory so that the computing device cluster executes the data processing method described in the above second aspect or any implementation of the second aspect, or implements the data processing device described in the above third aspect or any implementation of the third aspect.
  • the memory can be integrated into the processor or can be independent of the processor.
  • the computing device may also include a bus.
  • the processor is connected to the memory via a bus.
  • the memory may include a readable memory and a random access memory.
  • an embodiment of the present application provides a chip, comprising a power supply circuit and a processing circuit, wherein the power supply circuit is used to power the processing circuit, and the processing circuit implements the data processing device described in the above third aspect or any implementation method of the third aspect.
  • an embodiment of the present application further provides a computer-readable storage medium, in which a program or instruction is stored.
  • the computer-readable storage medium is run on a computer, the data processing method described in the first device in the above-mentioned second aspect or any implementation of the second aspect is executed.
  • an embodiment of the present application further provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to execute the data processing method described in the above second aspect or any implementation of the second aspect.
  • FIG1 is a schematic diagram of the structure of an exemplary data processing system provided in an embodiment of the present application.
  • FIG2 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application.
  • FIG7 is a flow chart of an exemplary data processing method provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of the structure of an exemplary data processing device provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the structure of an exemplary computing device provided in an embodiment of the present application.
  • FIG. 1 it is a schematic diagram of the structure of an exemplary data processing system 100, and the data processing system 100 may adopt a storage-computing separation architecture.
  • the data processing system 100 includes a computing cluster 101 and a storage cluster 102, and the computing cluster 101 and the storage cluster 102 may communicate with each other through a network, such as a wired network or a wireless network.
  • the computing cluster 101 includes multiple computing nodes, different computing nodes can communicate with each other, and each computing node can be a computing device including a processor, such as a server, a desktop computer, etc.
  • each computing node can be a computing device including a processor, such as a server, a desktop computer, etc.
  • some computing nodes can be used as disaster recovery for another part of the computing nodes.
  • FIG1 takes the computing cluster 101 including the main computing node 1011 and the slave computing node 1012 as an example for exemplary explanation, and the slave computing node 1012 is used as the disaster recovery for the main computing node 1011.
  • the slave computing node 1012 can be used as a hot standby or a cold standby for the main computing node 1011.
  • the slave computing node 1012 when used as a hot standby, the slave computing node 1012 and the main computing node 1011 are continuously in operation; in this way, when the main computing node 1011 fails, the slave computing node 1012 can use the backup data to immediately take over the business on the main computing node 1011, specifically, to process the requests that the main computing node 1011 has not completed when it fails.
  • the slave computing node 1012 When the slave computing node 1012 is used as a cold standby, during the normal operation of the master computing node 1011, the slave computing node 1012 may not be running (such as being in a dormant state, etc.), or the slave computing node 1012 may release the computing resources thereon and use the released computing resources to process other services, such as offline computing services, etc.
  • the master computing node 1011 fails, the slave computing node 1012 starts to run/recover computing resources, and uses the backed-up data to take over the services on the master computing node 1011.
  • the master computing node 1011 may have multiple slave computing nodes as disaster recovery nodes, so that some slave computing nodes can be used as cold standby for the master computing node 1011, and another part of the slave computing nodes can be used as hot standby for the master computing node 1011, etc.
  • the storage cluster 102 may include one or more storage nodes, each of which may be a device including a persistent storage medium, such as a network attached storage (NAS), a storage server, etc., which may be used to persistently store data.
  • a persistent storage medium such as a network attached storage (NAS), a storage server, etc.
  • the persistent storage medium in the storage node may be, for example, a hard disk, such as a solid state disk or a shingled magnetic recording hard disk, etc.
  • each storage node may be constructed by one or more devices for persistently storing data.
  • some storage nodes may be used as disaster recovery for another part of the storage nodes.
  • FIG1 takes the storage cluster 102 including a master storage node 1021 and a slave storage node 1022 as an example, and the master storage node 1021 is used as a disaster recovery for the slave storage node 1022, and the data storage areas on the master storage node 1021 and the slave storage node 1022 are respectively used to store business data, and the data storage area on the master storage node 1021 is accessed by the master computing node 1011, and the data storage area on the slave storage node 1022 is accessed by the slave computing node 1012.
  • the master storage node 1021 and the slave storage node 1022 can be deployed in the same data center, or in the same availability zone (AZ), etc.
  • the reliability of data storage in the local area can be improved.
  • the master storage node 1021 and the slave storage node 1022 can be deployed in different data centers, for example, the master storage node 1021 is deployed in data center A, and the slave storage node 1022 is deployed in data center B; or, the master storage node 1021 and the slave storage node 1022 can be deployed in different AZs, such as the master storage node 1021 is deployed in AZ 1 , and the slave storage node 1022 is deployed in AZ 2 , etc.
  • data disaster recovery can be achieved across data centers or across AZs, thereby improving the reliability of data storage in a different location.
  • the master computing node 1011 uses the master storage node 1021 to provide data read and write services, and after the master computing node 1011 fails, the slave computing node 1012 takes over the business on the master computing node 1011 using the data backed up from the storage node 1022.
  • the master computing node 1011 and the master storage node 1021 can constitute a master center (usually belonging to a production site), and the slave computing node 1012 and the slave storage node 1022 can constitute a disaster recovery center (usually belonging to a disaster recovery site).
  • the storage cluster 102 can also include a storage node.
  • the master computing node 1011 and the slave computing node 1012 can share the storage node, that is, the business data stored in the data storage area on the storage node can be accessed by the master computing node 1011 and the slave computing node 1012, so that after the master computing node 1011 fails, the slave computing node 1012 can continue to provide data read and write services using the data stored in the storage node.
  • One or more applications may be deployed on the main computing node 1011.
  • the deployed applications may be, for example, database applications or other applications.
  • the database application may be, for example, a relational database management system (RDBMS).
  • RDBMS relational database management system
  • RDBMS Management system
  • the main computing node 1011 usually receives a data update request sent by a client or other device on the user side, such as receiving a data update request sent by a client on the user side for reading or modifying data in the main storage node 1021.
  • the application on the main computing node 1011 can respond to the data update request and provide corresponding data read and write services for the client or other devices.
  • the application on the main computing node 1011 will generate a binary log (binlog) and save the binlog in a local storage area.
  • binlog is a logical log, which is used to record database statements, such as SQL statements, for updating the data persistently stored in the main storage node 1021.
  • the application may include a service layer and a storage engine layer, and the service layer may generate and save binlog. Then, the master computing node 1011 will send the generated binlog to the slave computing node 1012, and the slave computing node 1012 will update the data in the slave storage node 1022 by executing the database statements in the binlog, so that the data in the slave storage node 1022 is consistent with the data in the master storage node 1021, that is, the data in the master storage node 1021 is copied to the slave storage node 1022.
  • the slave computing node 1012 After the failure of the master computing node 1011, the slave computing node 1012 needs to run/reclaim computing resources and use the computing resources to start running the application on the slave computing node 1012. Then, the application on the slave computing node 1012 executes the database statements recorded in the binlog sent by the master computing node 1011 before the failure, so that the data in the slave storage node 1022 is consistent with the data before the failure of the storage node 1021. In this way, the slave computing node 1012 can take over the unfinished requests on the master computing node 1011 based on the data stored in the slave storage node 1022.
  • some binlogs may exist in the master computing node 1011 and fail to be successfully transmitted to the slave computing node 1012.
  • the binlog sent by the master computing node 1011 may be lost during the transmission process of the communication network, making it difficult for the slave computing node 1012 to receive the binlog.
  • the master computing node 1011 may have difficulty in sending multiple binlogs stored locally to the slave computing node 1012 in a timely manner while continuously generating new binlogs, and the storage space of the local storage area for storing binlogs in the master computing node 1011 is limited, which causes the master computing node 1011 to eliminate the first part of the binlogs stored in the local storage area in order to store new binlogs.
  • the slave computing node 1012 since part of the binlog on the master computing node 1011 has not been sent to the slave computing node 1012, it is difficult for the slave computing node 1012 to synchronize the data in the master storage node 1021 with the data between the slave storage node 1022 by replaying the part of the binlog, that is, the data between the master center and the disaster recovery center are inconsistent. In this way, when the master computing node 1011 of the master center fails, since the disaster recovery center cannot restore the data to the state when the master computing node 1011 fails, the RPO of the data processing system 100 cannot reach 0, that is, some data is lost, affecting the reliability of the data processing system 100.
  • the master computing node 1011 after generating the binlog, the master computing node 1011 will send the binlog to the storage cluster 102, and the storage cluster 102 will store the binlog. Then, the slave computing node 1012 reads the binlog from the storage cluster 102, and updates the data persistently stored in the storage cluster 102 by replaying the binlog (i.e., replaying the database statements recorded in the binlog), specifically updating the data in the slave storage node 1021, so as to achieve the consistency of the data in the slave storage node 1022 with the data in the master storage node 1021, or with the data in the master storage node 1021 + the data cached by the master computing node 1011.
  • the binlog i.e., replaying the database statements recorded in the binlog
  • the binlog generated by the master computing node 1011 is transmitted to the slave computing node 1012 through the storage side, this can avoid the overload of the master computing node 1011, or the instability of the data transmission link between the master computing node 1011 and the slave computing node 1012, resulting in the data synchronization problem caused by the slave computing node 1012 not obtaining the binlog generated by the master computing node 1011.
  • the slave computing node 1012 can take over the business on the main computing node 1011 based on the data in the slave storage node 1022, thereby achieving an RPO of 0 for the data processing system 100 and improving the reliability of the data processing system 100.
  • a unified processing logic can be used between the master computing node 1011 and the slave computing node 1012 to copy the binlog generated by the master computing node 1011 to the slave computing node 1012, thereby improving the compatibility of the data processing system 100 with database applications and reducing the difficulty of deploying database applications on the computing cluster 101.
  • Fig. 1 is only an exemplary description, and in actual application, the data processing system 100 may also be implemented in other ways. For ease of understanding, this embodiment provides the following implementation examples.
  • the storage cluster 102 may include only one storage node for the master computing node 1011 and the slave computing node 1012 , so that the master computing node 1011 and the slave computing node 1012 can share access to data pages in the storage node.
  • a metadata management cluster may be included between the computing cluster 101 and the storage cluster 102, and the metadata management cluster is responsible for managing the metadata stored in the storage cluster 102; accordingly, the computing nodes in the computing cluster 101 may first access the metadata from the metadata management cluster, and then access the data stored in the storage cluster 102 based on the metadata.
  • the computing cluster and the storage cluster in the data processing system 200 may include three or more nodes, as shown in FIG2 .
  • the computing cluster includes a plurality of computing nodes 410, each of which can communicate with each other, and some computing nodes 410 can serve as disaster recovery for another computing node 410.
  • Each computing node 410 is a computing device including a processor, such as a server, a desktop computer, etc.
  • the computing node 410 includes at least a processor 412, a memory 413, a network card 414, and a storage medium 415.
  • the processor 412 is a central processing unit (CPU) for processing data access requests from outside the computing node 410, or requests generated inside the computing node 410.
  • the processor 412 reads data from the memory 413, or, when the total amount of data in the memory 413 reaches a certain threshold, the processor 412 sends the data stored in the memory 413 to the storage node 400 for persistent storage.
  • FIG2 shows only one CPU 412. In actual applications, there are often multiple CPUs 412, wherein one CPU 412 has one or more CPU cores. This embodiment does not limit the number of CPUs or CPU cores.
  • processor 412 in the computing node 410 can also be used to implement the above-mentioned functions of writing binlog to the storage cluster and/or reading and replaying binlog from the storage cluster, so as to achieve data synchronization between different storage nodes 400 in the storage cluster.
  • Memory 413 refers to an internal memory that directly exchanges data with the processor. It can read and write data at any time and at a high speed, and serves as a temporary data storage for the operating system or other running programs. Memory includes at least two types of memory. For example, memory can be either a random access memory or a read-only memory (ROM). In actual applications, multiple memories 413 and different types of memories 413 can be configured in the computing node 410. This embodiment does not limit the number and type of memory 413.
  • the network card 414 is used to communicate with the storage node 400. For example, when the total amount of data in the memory 413 reaches a certain threshold, the computing node 410 can send a request to the storage node 400 through the network card 414 to store the data persistently.
  • the computing node 410 can also include a bus for communication between components inside the computing node 410.
  • the computing node 410 can also have a small number of hard disks built in, or a small number of hard disks connected externally.
  • Each computing node 410 can access the storage node 400 in the storage cluster through the network.
  • the storage cluster includes multiple storage nodes 400, and some storage nodes 400 can be used as disaster recovery for another part of the storage nodes 400.
  • a storage node 400 includes one or more controllers 401, a network card 404 and multiple hard disks 405.
  • the network card 404 is used to communicate with the computing node 410.
  • the hard disk 405 is used for persistent storage of data, and can be a disk or other types of storage media, such as a solid-state hard disk or a shingled magnetic recording hard disk.
  • the controller 401 is used to write data to the hard disk 405 or read data from the hard disk 405 according to the read/write data request sent by the computing node 410. In the process of reading and writing data, the controller 401 needs to convert the address carried in the read/write data request into an address that the hard disk can recognize.
  • one or more applications are running on the main computing node 1011.
  • applications such as MySQL, etc.
  • the target application can support the main computing node 1011 to provide data read and write services for users.
  • the target application can first read the data page where the data requested to be modified by the data update request is located from the main storage node 1021 to the buffer pool in the main computing node 1011, and complete the modification of the data page in the buffer pool according to the data update request, specifically modifying the data on the data page to new data (the new data can be empty, in which case the data on the data page is deleted).
  • the target application will generate a binlog for the data modification content, which is used to record the database statement indicating the modification of the data.
  • the new data is referred to as target data below.
  • the master computing node 1011 can feedback to the client that the data has been written/modified successfully. Since the speed of writing data to the buffer pool is usually higher than the speed of persistently storing the data, this can speed up the master computing node 1011 to respond to data update requests. In actual application, when the amount of data accumulated in the buffer pool reaches a threshold, the master computing node 1011 sends the data in the buffer pool to the master storage node 1021 for persistent storage, and can delete the binlog corresponding to the data. Alternatively, the master The computing node 1011 may also feedback to the client that the data has been written/modified successfully, etc., after successfully sending the binlog to the storage cluster 102 for storage.
  • the data in the master storage node 1021 (and the slave storage node 1022) may be persistently stored in the format of a file.
  • a corresponding file system (FS) may be deployed in the master storage node 1021 (and the slave storage node 1022), and the FS is used to manage the persistently stored files.
  • the data in the master storage node 1021 (and the slave storage node 1022) may be persistently stored in the format of a data block. That is, when the master storage node 1021 (and the slave storage node 1022) stores data, the data is divided into blocks according to a fixed size.
  • the data volume of each block may be, for example, 512 bytes or 4 kilobytes (KB).
  • the data in the master storage node 1021 may be stored in the format of an object.
  • an object may be the basic unit for storing data in a storage node.
  • Each object may include a combination of data and the attributes of the data.
  • the attributes of the data may be set according to the requirements of the application in the computing node, including data distribution, quality of service, etc.
  • the storage format of data is not limited.
  • the master computing node 1011 sends the generated binlog to the storage cluster 102 and stores the binlog in the storage cluster 102 .
  • the master computing node 1011 may send the binlog to the master storage node 1021 in the storage cluster 102 so that the master storage node 1021 saves the binlog.
  • the master storage node 1021 may back up the written binlog and send the backed-up binlog to the slave storage node 1022 via wired or wireless means.
  • the master storage node 1021 may feedback to the master computing node 1011 that the binlog has been written successfully.
  • the master storage node 1021 and the slave storage node 1022 can be deployed in the same data center or the same AZ.
  • a wired or wireless connection can be established between the master storage node 1021 and the slave storage node 1022, and the backed-up binlog can be sent to the slave storage node 1022 for storage through the wired or wireless connection.
  • the master storage node 1021 and the slave storage node 1022 may be deployed in different AZs, such as the master storage node 1021 is deployed in AZ 1 and the slave storage node 1022 is deployed in AZ 2 , etc.
  • the master storage node 1021 may send the backed-up binlog to the slave storage node 1022 for storage via a network card or a network interface.
  • the master storage node 1021 may have multiple slave storage nodes as disaster recovery.
  • some of the multiple slave storage nodes may be deployed in the same physical area as the master storage node 1021, such as in the same data center or the same AZ, and another part of the multiple slave storage nodes may be deployed in a different physical area from the master storage node 1021, such as in a different data center/AZ, etc., so as to improve the reliability of data storage both locally and remotely.
  • the computing node 1012 can read the binlog stored in the storage node 1022, and update the data stored in the storage node 1022 (specifically the data on the data page) by replaying the database statements recorded in the binlog, thereby realizing data synchronization between the data in the storage node 1022 and the main storage node 1021, or it can be called data synchronization between the disaster recovery center and the main center.
  • an input/output (IO) thread and a database thread can be created in the slave computing node 1012. Then, the slave computing node 1012 can use the IO thread to access the slave storage node 1022 to read the binlog in the slave storage node 1022 and store the read binlog in the local storage area. Then, the slave computing node 1012 can use the database thread to perform playback operations on each binlog in the local storage area in sequence. For example, each binlog in the local storage area can have a log sequence number (LSN), so that the database thread can play back each binlog in order from small to large LSN.
  • LSN log sequence number
  • the database thread can parse the database statement to be executed from the binlog, such as an SQL statement, and perform semantic analysis and grammatical analysis on the database statement to determine the legitimacy of the database statement.
  • grammatical analysis refers to the grammatical rules of the database language to check whether the database statement has grammatical errors
  • semantic analysis refers to analyzing whether the semantics of the database statement is legal.
  • the database thread can generate a plan tree for the database statement, which indicates the execution plan for processing the data.
  • the database thread can update the data from the storage node 1022 according to the optimized plan tree after completing the optimization of the plan tree.
  • the master computing node 1011 and the slave computing node 1012 may have the same configuration.
  • the configuration files in the master computing node 1011 and the master storage node 1021 may be backed up, and the backed-up configuration files may be sent to the slave computing node 1012 and the slave storage node 1022 in the disaster recovery center, so that the slave computing node 1012 may have the same configuration based on the received configuration files.
  • the slave storage node 1022 has the same configuration as the master computing node 1012, and the slave storage node 1022 has the same configuration as the master storage node 1021 based on the received configuration file.
  • the binlog in the master storage node 1021 can be mounted to the specified directory, so that after receiving the binlog sent by the master storage node 1021, the slave storage node 1022 can store the binlog to the storage location corresponding to the directory.
  • the slave computing node 1012 can read the binlog belonging to the directory stored in the slave storage node 1022 based on the directory uniformly configured with the master computing node 1011.
  • a log storage area 501 may be configured in the storage cluster 102, and the log storage area 501 can be accessed by the master computing node 1011 and the slave computing node 1012.
  • the log storage area 501 may be a partial storage area on the master storage node 1021 or the slave storage node 1022, or may be a storage area on other storage nodes independent of the master storage node 1021 and the slave storage node 1022, etc., which is not limited in this embodiment.
  • the master computing node 1011 may send the binlog to the log storage area 501 in the storage cluster 102 for storage, for example, it may be sent to the log storage area 501 under the specified directory for storage.
  • the slave computing node 1012 may obtain the binlog generated by the master computing node 1011 by accessing the log storage area 501, for example, it may be accessing the corresponding log storage area 501 according to the specified directory to obtain the binlog, etc. Furthermore, the master computing node 1012 can achieve data synchronization between the master storage node 1021 and the slave storage node 1022 by replaying the binlog. The specific implementation process of the master computing node 1012 replaying the binlog can be found in the above-mentioned related description, which will not be repeated here.
  • slave computing node 1012 can also obtain the binlog generated by the master computing node 1011 from the storage cluster 102 in other ways.
  • baseline replication refers to sending all the data persistently stored by the master storage node 1021 at a certain point in time (such as the current moment) and the binlog that the master computing node 1011 has generated to the disaster recovery center.
  • the master storage node 1021 and the slave storage node 1022 can use the baseline replication method to achieve data synchronization.
  • the main storage node 1021 can determine the first moment (such as the current moment) as the moment corresponding to the baseline, and determine the baseline data based on the moment.
  • the baseline data includes the data persistently stored by the main storage node 1021 at the first moment (that is, the data stored on the data page), and the binlog generated by the main computing node 1011 before the first moment, that is, the binlog whose LSN is less than or equal to the LSN corresponding to the first moment.
  • the updated data indicated by the binlog in the baseline data is stored in the buffer pool and has not yet been sunk to the main storage node 1021 for persistent storage.
  • the main storage node 1021 can send the baseline data to the slave storage node 1022 by wired or wireless means.
  • the slave computing node 1012 can perform the process of replaying each binlog in the baseline data in sequence according to the order of LSN from small to large, so as to complete the update of the data belonging to the data page in the baseline data, thereby realizing the synchronization of the data stored by the main center at the first moment to the disaster recovery center.
  • the main computing node 1011 when the main computing node 1011 generates a binlog based on the data newly written by the user, the main computing node 1011 sends the binlog to the slave computing node 1012 through the storage cluster 102, and then the slave computing node 1012 replays the binlog to update the baseline data stored in the slave storage node 1022, thereby achieving timely synchronization of data between the main center and the disaster recovery center.
  • the slave computing node 1012 can detect in real time or periodically whether the master computing node 1011 has a fault. For example, the slave computing node 1012 can determine whether the master computing node 1011 has a fault when it does not receive a heartbeat message sent by the master computing node 1011, or determine whether the master computing node 1011 has a fault when it receives a fault notification sent by a third-party arbitration server. When it is determined that the master computing node 1011 has a fault, the slave computing node 1012 is upgraded to the master computing node, and the storage node 1022 is instructed to upgrade to the master storage node.
  • the slave computing node 1012 can first check whether there is any binlog that has not been fully replayed stored in the slave storage node 1022 or the log storage area 501. If so, the slave computing node 1012 will first read and replay the binlog to complete the update of the data persistently stored in the slave storage node 1022.
  • the updated data is the data stored in the main center when the main computing node 101 fails; then, the slave computing node 1012 continues to take over the business on the main computing node 1011 based on the data in the storage node 1022.
  • the slave computing node 1012 can generate the corresponding binlog and write it to the slave storage node 1022 or the log storage area 501 in the storage cluster 102, so that the master computing node 1011 can synchronize data according to the binlog stored in the storage cluster 102 after the failure recovery.
  • the master computing node 1011 can be used as a disaster recovery for the slave computing node 1012; or, the master computing node 1011 can be restored to the master node again through the master-slave switching, etc., which is not limited in this embodiment.
  • the binlog generated by the master computing node 1011 is transmitted to the slave computing node 1012 through the storage side, this can avoid the overload of the master computing node 1011, or the instability of the data transmission link between the master computing node 1011 and the slave computing node 1012, resulting in the data synchronization problem caused by the slave computing node 1012 not obtaining the binlog generated by the master computing node 1011.
  • the slave computing node 1012 can take over the business on the master computing node 1011 based on the data in the slave storage node 1022, thereby achieving an RPO of 0 for the data processing system 100, and improving the reliability of the data processing system 100.
  • a unified processing logic can be used between the master computing node 1011 and the slave computing node 1012 to copy the binlog generated by the master computing node 1011 to the slave computing node 1012, thereby improving the compatibility of the data processing system 100 with database applications and reducing the difficulty of deploying database applications on the computing cluster 101.
  • the master storage node 1021 and the slave storage node 1022 can be implemented by a memory array, or can be implemented by a device including the memory array, so that the master storage node 1021 and the slave storage node 1022 can persistently store data based on the memory array, and can further improve the reliability of persistent storage of data in the master storage node 1021 and the slave storage node 1022 based on the memory array based on Redundant Arrays of Independent Disks (RAID) technology, erasure coding (EC) technology, deduplication and compression, data backup and other technologies.
  • RAID Redundant Arrays of Independent Disks
  • EC erasure coding
  • the master computing node 1011 and the slave computing node 1012 are respectively configured with their own storage nodes, and the master storage node 1021 can only be accessed by the master computing node 1011, and the slave storage node 1022 can only be accessed by the slave computing node 1012.
  • the master computing node 1011 and the slave computing node 1012 can also share the same storage node, that is, the data on the data page in the storage node is allowed to be accessed by the master computing node 1011 and the slave computing node 1012.
  • the data processing system is described in detail below in conjunction with FIG. 6.
  • FIG. 6 a schematic diagram of the structure of another data processing system provided by the present application is shown.
  • the data processing system 600 still adopts a storage-computing separation structure, including a computing cluster 601 and a storage cluster 602, wherein the computing cluster 601 and the storage cluster 602 can communicate with each other through a network.
  • the computing cluster 601 includes multiple computing nodes.
  • FIG6 takes the computing cluster 601 including a master computing node 6011 and a slave computing node 6012 as an example for illustrative explanation, and the slave computing node 1012 serves as a disaster recovery for the master computing node 1011, which can be a hot backup or a cold backup.
  • the storage cluster 602 includes at least one storage node, and FIG. 6 takes the storage node 6021 as an example, and the storage cluster 602 also includes a log storage area 6022. As shown in FIG. 6, the log storage area 6022 can be deployed in the storage node 6021, or it can be deployed independently of the storage node 6021, such as deployed in other storage nodes in the storage cluster 602, etc., and this embodiment does not limit this.
  • the storage node 6021 also includes a data storage area (not shown in FIG. 6), and the data storage area and the log storage area 6022 can be accessed by the main computing node 6011 and the slave computing node 6012.
  • the storage node 6021 is used to use the data storage area to persistently store data, such as persistently storing the data generated by the main computing node 6011 when processing business.
  • the log storage area 6022 is used to store the binlog generated by the main computing node 6011 during operation.
  • the master computing node 6011 After receiving a data update request for requesting to update the data persistently stored in the storage node 6021 (including data addition, deletion, modification, etc.), the master computing node 6011 can read the data page where the data requested to be modified by the data update request is located from the storage node 6021 to the buffer pool in the master computing node 6011, and complete the modification of the data page in the buffer pool according to the data update request, and generate a binlog for the data modification content.
  • the master computing node 6011 can feedback the data update success to the user side, and send the generated binlog to the log storage area 6022 in the storage cluster 602 for storage (and send the updated data page to the storage node 6021 for persistent storage); or, the master computing node 6011 can send the generated binlog to the log storage area 6022 in the storage cluster 602 for storage, and feedback the data update success to the user side, etc., which is not limited in this embodiment.
  • the master computing node 6011 and the slave computing node 6012 share the data page in the storage node 6021, after the master computing node 6011 successfully writes the binlog into the log storage area 6022, if the master computing node 6011 operates normally, the slave computing node 6012 It is not necessary to read and replay the binlog in the log storage area 6022. In actual application, when the main computing node 6011 writes the data in the buffer pool to the storage node 6021, the binlog corresponding to the data in the buffer pool can be eliminated from the log storage area 6022.
  • the slave computing node 6012 can detect in real time or periodically whether the master computing node 6011 fails, and when it is determined that the master computing node 6011 fails, the slave computing node 6012 is upgraded to the master node, and detects whether there is a binlog in the log storage area 6022. If there is, it indicates that when the master computing node 6011 fails, there may be new data in the buffer pool of the master computing node 6011 that has not been persistently stored in the storage node 6021.
  • the slave computing node 6012 can read the binlog in the log storage area 6022, and update the data in the storage node 6021 by replaying the binlog to restore the data cached in the buffer pool of the master computing node 6011 when the failure occurs, thereby realizing the restoration of the data in the data processing system 600 to the state when the master computing node 6011 fails, that is, realizing the RPO of the data processing system 600 to be 0.
  • the slave computing node 6012 plays back the binlog in the log storage area 6022, it can first read the binlog in the log storage area 6022 to the local storage space of the slave computing node 6012, and then perform the playback operation on each binlog in the local storage space in order of LSN from small to large.
  • the slave computing node 6012 can also directly read each binlog in the log storage area 6022 in order of LSN from small to large, and directly perform the playback operation on the read binlog.
  • the resource consumption required for playing back the binlog can be further reduced, the data recovery delay can be reduced, and the recovery point objective (RTO) of the data processing system 600 can be reduced.
  • RTO refers to the time interval between the moment when the data processing system 600 business is suspended and the moment when the data processing system 600 resumes business after the disaster occurs.
  • the operations performed by the master computing node and the slave computing node can be implemented by an application deployed thereon, which application can be, for example, the above-mentioned database application such as MySQL, PostgreSQL, OpenGauss, or Oracle, or can be other applications.
  • application can be, for example, the above-mentioned database application such as MySQL, PostgreSQL, OpenGauss, or Oracle, or can be other applications.
  • the binlog can be transmitted from the master computing node to the slave computing node through the storage side, thereby realizing data synchronization between the main center and the disaster recovery center.
  • the operations performed by the master computing node and the slave computing node may also be performed by a data processing device deployed separately in the computing cluster, that is, the master computing node may write the generated binlog to the storage cluster under the control of the data processing device, and the slave computing node may read the binlog from the storage cluster and replay the binlog under the control of the data processing device.
  • the data processing device may be implemented by software or hardware.
  • the data processing device when implemented by software, can be, for example, a program code deployed on a hardware device.
  • the data processing device can be, for example, deployed in the main computing node and/or the slave computing node in the form of software such as a plug-in, component or application (for example, deployed in the controller of the main computing node and/or the slave computing node).
  • the transmission of binlog can be completed between the main computing node and the slave computing node through the storage side, which can reduce or eliminate the need to modify the database application deployed on the main computing node and the slave computing node, reducing the difficulty of implementing the solution.
  • the above-mentioned data processing device can be implemented by a physical device, wherein the physical device can be, for example, a CPU, or can be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), a system on chip (SoC), a software-defined infrastructure (SDI) chip, an artificial intelligence (AI) chip, a data processing unit (DPU), or any other processor or any combination thereof, and this embodiment does not limit this.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL generic array logic
  • SoC system on chip
  • SDI software-defined infrastructure
  • AI artificial intelligence
  • FIG. 7 a flow chart of a data processing method provided by an embodiment of the present application is shown.
  • Figure 7 is used as an example to illustrate the data processing system 100 shown in Figure 1.
  • the method may specifically include:
  • the main computing node 1011 receives a data update request, where the data update request is used to request the data processing system 100 to update persistently stored data.
  • the main computing node 1011 can receive a data update request sent by a client or other device on the user side.
  • the data update request can be used to request modification of data persistently stored in the data processing system 100, or can be used to request writing new data to the data processing system 100, etc.
  • S702 The main computing node 1011 responds to the data update request, completes the data update in the buffer pool, and generates a corresponding binlog for the data update request.
  • S703 The main computing node 1011 sends the binlog to the storage cluster 102 for storage.
  • the storage cluster 102 includes a master storage node 1021 and a slave storage node 1022, wherein the master storage node 1021 supports the data reading and writing of the master computing node 1011, and the slave storage node 1022 supports the data reading and writing of the slave computing node 1012.
  • the master computing node 1011 can write the binlog to the master storage node 1021, and the master storage node 1021 backs up the binlog, and then sends the backed-up binlog to the slave storage node 1022 for storage.
  • the master storage node 1021 and the slave storage node 1022 can be deployed in the same physical area (such as the same data center/AZ), or can be deployed in different physical areas.
  • a log storage area such as the above-mentioned log storage area 501, can be deployed in the storage cluster 102, so that the master computing node 1011 can write the generated binlog into the log storage area, and the log storage area can be accessed by the master computing node 1011 and the slave computing node 1012.
  • the slave computing node 1012 reads the binlog from the storage cluster 102 .
  • the slave computing node 1012 may read binlogs etc. from the slave storage node 1022 or the log storage area in the storage cluster 102 .
  • the slave computing node 1012 may have the same configuration as the master computing node 1011.
  • the master computing node 1011 may back up its own configuration files and other data, and send the backed-up data to the slave computing node 1012, so that the slave computing node 1012 completes the corresponding configuration according to the received backup data, such as configuring the logic of processing services, the application running on the slave computing node 1012, and the directory where the binlog in the storage cluster 102 is mounted.
  • the slave computing node 1012 can specifically read the binlog from the storage node 1022 or the log storage area, and by replaying the binlog, the data in the slave storage node 1022 can be kept in a synchronized state with the data in the main center.
  • the slave computing node 1012 can parse database statements, such as SQL statements, from the binlog, and perform semantic analysis and grammatical analysis on the database statements to determine the legitimacy of the database statements. After passing the legitimacy check, the slave computing node 1012 can generate a plan tree for the database statement, which indicates an execution plan for processing the data. Finally, after completing the optimization of the plan tree, the slave computing node 1012 can implement the update of the data in the slave storage node 1022 according to the optimized plan tree.
  • the slave computing node 1012 can take over the business on the main computing node 1011 by using the data in the storage node 1022 that is synchronized with the main center, thereby realizing fault recovery of the data processing system 100.
  • the master computing node 1011 can control the master storage node 1021 to send baseline data to the slave storage node 1022 to complete the baseline replication.
  • the specific implementation process of the baseline replication can be found in the relevant description of the aforementioned embodiment and will not be repeated here.
  • the master computing node 1011 and the slave computing node 1012 may share the same storage node, which is referred to as the target storage node below. Then, when the master computing node 1011 is in normal operation, for the binlog written by the master computing node 1011 into the storage cluster 102, the slave computing node 1012 does not need to perform the operation of reading the binlog from the storage cluster 102 and replaying the binlog.
  • the binlog stored in the storage cluster 102 is the binlog corresponding to the updated data stored in the buffer pool of the master computing node 1011; when the data in the buffer pool is written into the storage cluster 102, the binlog corresponding to the data in the buffer pool can be eliminated from the storage cluster 102.
  • the main computing node 1011 fails, the data cached in the buffer pool of the main computing node 1011 may be lost due to the failure of the main computing node 1011 because it has not yet completed persistent storage.
  • the slave computing node 1012 reads the binlog from the storage cluster 102 and performs the operation of replaying the binlog to update the data persistently stored in the target storage node, thereby restoring the data in the buffer pool of the main computing node 1011 that has not yet completed persistent storage in the target storage node.
  • steps S701 to S705 shown in FIG. 7 correspond to the system embodiments shown in FIG. 1 to FIG. 6 above. Therefore, the specific implementation process of steps S701 to S705 can be found in the relevant description of the aforementioned embodiments and will not be repeated here.
  • the embodiment of the present application also provides a data processing device.
  • the data processing device 800 shown in FIG8 is located in a data processing system, such as the data processing system 100 shown in FIG1 , the data processing system 600 shown in FIG6 , etc.
  • the data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected via a network, the computing cluster includes a master computing node and a slave computing node, the storage cluster includes at least one storage node, and usually, the slave computing node serves as a disaster recovery for the master computing node.
  • the data processing device 800 includes:
  • the storage module 801 is used to instruct the main computing node to send the binary log binlog generated in response to the data update request to the storage cluster for storage;
  • a reading module 802 is used to instruct the slave computing node to read the binlog stored in the storage cluster to the slave computing node;
  • the playback module 803 is used to instruct the slave computing node to update the persistently stored data in the storage cluster by replaying the binlog.
  • the storage cluster includes a log storage area, and the log storage area is accessed by the master computing node and the slave computing node;
  • Storage module 801 is specifically used to instruct the main computing node to send the binlog to the log storage area for storage
  • the reading module 802 is specifically used to instruct the slave computing node to read the binlog from the log storage area to the slave storage node; wherein, the storage cluster also includes a data storage area, the data storage area is used to store business data, the data storage area is accessed by the master computing node and the slave computing node, or the data storage area is only accessed by the master computing node.
  • the revisit module 803 is specifically used to instruct the slave computing node to synchronize data with the master computing node by replaying the binlog, or is specifically used to instruct the slave computing node to recover data lost when the master computing node fails by replaying the binlog.
  • At least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in the same data center or the same availability zone;
  • the reading module 802 is specifically used to instruct the slave computing node to read the binlog from the log storage area to the slave computing node during the normal operation of the master computing node;
  • the playback module 803 is specifically used to instruct the slave computing node to update the data persistently stored in the slave storage node by replaying the binlog.
  • the at least one storage node includes a target storage node, and the target storage node is used to persistently store data written by the primary computing node;
  • the reading module 802 is specifically used to instruct the slave computing node to read the binlog from the log storage area when the master computing node fails;
  • the playback module 803 is specifically used to instruct the slave computing node to update the persistently stored data in the target storage node by replaying the binlog.
  • At least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in different data centers, or the primary storage node and the secondary storage node are deployed in different availability zones;
  • Storage module 801 is specifically used to instruct the main computing node to send the binlog to the main storage node for storage
  • the reading module 802 is specifically used to instruct the slave computing node to read the binlog stored in the slave storage node, where the binlog in the slave storage node is sent by the master storage node.
  • the storage module 801 is further used to: instruct the master storage node to send the baseline data to the slave storage node for storage before sending the binlog to the slave storage node;
  • the playback module 803 is specifically used to instruct the slave computing node to update the baseline data stored in the slave storage node by replaying the binlog.
  • a target application is running on the main computing node, binlog is generated during the running of the target application, the target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle.
  • the storage node is a storage array, and the storage array is used to persistently store data.
  • the data processing device 800 provided in this embodiment corresponds to the data processing system in the above-mentioned embodiments, and is used to implement the data processing process performed in the above-mentioned embodiments. Therefore, the functions of each module in this embodiment and the technical effects thereof can be found in the relevant descriptions in the above-mentioned embodiments, and will not be elaborated here.
  • the present application embodiment further provides a computing device.
  • the computing device 900 may include a communication interface 910 and a processor 920.
  • the computing device 900 may also include a memory 930.
  • the memory 930 may be disposed in the computing device 900.
  • the part may also be arranged outside the computing device 900.
  • each action that the data processing device instructs the master computing node and the slave computing node (and the master storage node) to perform in the above-mentioned embodiment can be implemented by the processor 920.
  • each step of the processing flow can complete the method in the above-mentioned embodiment by the hardware integrated logic circuit in the processor 920 or the instruction in the form of software.
  • the program code executed by the processor 920 to implement the above-mentioned method can be stored in the memory 930.
  • the memory 930 is connected to the processor 920, such as a coupling connection.
  • Some features of the embodiments of the present application may be completed/supported by the processor 920 executing program instructions or software codes in the memory 930.
  • the software components loaded on the memory 930 may be summarized functionally or logically, for example, the functions of the storage module 801 and the playback module 803 shown in FIG8 , and the reading module 802 shown in FIG8 may be implemented by the communication interface 910.
  • Any communication interface involved in the embodiments of the present application may be a circuit, a bus, a transceiver or any other device that can be used for information exchange.
  • the communication interface 910 in the computing device 900 illustratively, the other device may be a device connected to the computing device 900, etc.
  • An embodiment of the present application also provides a computing device cluster, which may include one or more computing devices, each of which may have a hardware structure of a computing device 900 as shown in FIG. 9 , and during operation, the computing device cluster can be used to implement the data processing method in the embodiment shown in FIG. 7 .
  • the embodiments of the present application further provide a chip, including a power supply circuit and a processing circuit, wherein the power supply circuit is used to supply power to the processing circuit, and the processing circuit is used to implement the functions of the data processing device 800 shown in FIG. 8 .
  • the processor involved in the embodiments of the present application may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed by a hardware processor, or may be executed by a combination of hardware and software modules in the processor.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, modules or modules, which can be electrical, mechanical or other forms, and is used for information exchange between devices, modules or modules.
  • the processor may operate in conjunction with a memory.
  • the memory may be a non-volatile memory, such as a hard disk or a solid-state drive, or a volatile memory, such as a random access memory.
  • the memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • connection medium between the communication interface, processor and memory is not limited in the embodiments of the present application.
  • the memory, processor and communication interface may be connected via a bus.
  • the bus may be divided into an address bus, a data bus, a control bus, etc.
  • the embodiments of the present application further provide a computer storage medium, in which a software program is stored, and when the software program is read and executed by one or more computing devices, the method performed by the data processing device 102 provided in any one or more of the above embodiments can be implemented.
  • the computer storage medium may include: a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk, and other media that can store program codes.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented in one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions can also be loaded into a computer or other programmable device so that the computer or other programmable device can execute the program.
  • a series of operation steps are performed to produce a computer-implemented process, so that the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more flows in the flowchart and/or one or more blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data processing system, comprising a computing cluster and a storage cluster (102), wherein the computing cluster comprises a master computing node (1011) and a slave computing node (1012), and the storage cluster (102) comprises at least one storage node. The master computing node (1011) is used for generating a binlog in response to a data update request, and sending the binlog to the storage cluster (102) for storage; and the slave computing node (1012) is used for reading the binlog stored in the storage cluster (102) and updating persistently stored data in the storage cluster (102) by means of playing back the binlog.

Description

数据处理系统、方法、装置及相关设备Data processing system, method, device and related equipment
本申请要求于2022年11月2日提交中国国家知识产权局、申请号为202211363509.3、申请名称为“一种管理binlog的方法”的中国专利申请的优先权,并要求于2022年12月14日提交中国国家知识产权局、申请号为202211608424.7、申请名称为“数据处理系统、方法、装置及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the State Intellectual Property Office of China on November 2, 2022, with application number 202211363509.3 and application name “A method for managing binlog”, and claims priority to the Chinese patent application filed with the State Intellectual Property Office of China on December 14, 2022, with application number 202211608424.7 and application name “Data processing systems, methods, devices and related equipment”, all contents of which are incorporated by reference in this application.
技术领域Technical Field
本申请涉及数据库技术领域,尤其涉及一种数据处理系统、方法、装置及相关设备。The present application relates to the field of database technology, and in particular to a data processing system, method, apparatus and related equipment.
背景技术Background technique
随着信息化技术的发展,数据处理系统,如MySQL,PostgreSQL,openGauss等,在金融、通讯、医疗、物流、电子商务等领域广泛应用,用于在各个领域中对业务数据进行持久化存储。With the development of information technology, data processing systems, such as MySQL, PostgreSQL, openGauss, etc., are widely used in finance, communications, medical care, logistics, e-commerce and other fields for persistent storage of business data in various fields.
目前,数据处理系统通常会部署有主中心(或者称为生产中心)以及至少一个灾备中心。其中,主中心包括主计算节点以及主存储节点,灾备中心包括主计算节点以及主存储节点。正常运行时,主中心利用主计算节点以及主存储节点对外提供数据读写的服务;备中心负责对主中心存储的数据进行备份,并在主中心故障时,灾备中心即可利用备份的数据继续对外提供数据读写服务,避免数据发生丢失,以此保证数据存储的可靠性。At present, data processing systems are usually deployed with a main center (or production center) and at least one disaster recovery center. The main center includes the main computing node and the main storage node, and the disaster recovery center includes the main computing node and the main storage node. During normal operation, the main center uses the main computing node and the main storage node to provide data reading and writing services to the outside world; the backup center is responsible for backing up the data stored in the main center, and when the main center fails, the disaster recovery center can use the backup data to continue to provide data reading and writing services to the outside world to avoid data loss, thereby ensuring the reliability of data storage.
通常情况下,主计算节点在更新主存储节点中持久化存储的数据时,会向从计算节点发送二进制日志(binlog)文件,以便从计算节点通过回放该binlog文件完成对从存储节点中的数据更新,从而实现主中心与灾备中心之间的数据同步。但是,实际应用场景中,当主中心发生故障后,灾备中心中存储的数据与主中心在故障之前的数据经常存在不一致的情况,从而导致数据处理系统的恢复点目标(recovery point object,RPO)不能达到0,影响数据处理系统的可靠性。其中,RPO可以用于衡量数据处理系统灾难恢复时所发生的最大数据丢失量。Normally, when the master computing node updates the persistently stored data in the master storage node, it will send a binary log (binlog) file to the slave computing node, so that the slave computing node can complete the data update in the slave storage node by replaying the binlog file, thereby achieving data synchronization between the master center and the disaster recovery center. However, in actual application scenarios, when the master center fails, the data stored in the disaster recovery center is often inconsistent with the data in the master center before the failure, which causes the recovery point object (RPO) of the data processing system to fail to reach 0, affecting the reliability of the data processing system. Among them, RPO can be used to measure the maximum amount of data loss that occurs during disaster recovery of the data processing system.
发明内容Summary of the invention
提供一种数据处理系统,以实现在灾备中心中存储的数据与主中心在故障之前的数据保持一致,以此提高数据处理系统的可靠性,实现数据处理系统的RPO为0。此外,还提供了相应的数据处理方法、装置、计算设备集群、芯片、计算机可读存储介质以及计算机程序产品。A data processing system is provided to achieve consistency between the data stored in the disaster recovery center and the data in the main center before the failure, so as to improve the reliability of the data processing system and achieve an RPO of 0 for the data processing system. In addition, corresponding data processing methods, devices, computing device clusters, chips, computer-readable storage media, and computer program products are also provided.
第一方面,本申请实施例提供一种数据处理系统,该数据处理系统包括计算集群、存储集群,其中,计算集群与存储集群通过网络进行连接,如通过有线网络或者无线网络进行连接等,并且,计算集群包括主计算节点、从计算节点,通常情况下,从计算节点作为主计算节点的灾备节点,存储集群包括至少一个存储节点;主计算节点,用于响应于数据更新请求生成binlog(二进制日志),该数据更新请求例如可以是由用户通过客户端发送给主计算节点等,并且,主计算节点还用于将binlog发送至存储集群中进行存储;从计算节点,用于读取该存储集群中存储的binlog,并通过回放binlog(具体为回放该binlog中记录的数据库语句)来更新存储集群中持久化存储的数据,比如,从计算节点可以通过及时回放主计算节点所生成的binlog来实现与主计算节点的数据同步,或者从计算节点可以在主计算节点故障时回放该binlog来实现数据恢复等。In a first aspect, an embodiment of the present application provides a data processing system, which includes a computing cluster and a storage cluster, wherein the computing cluster and the storage cluster are connected through a network, such as through a wired network or a wireless network, and the computing cluster includes a master computing node and a slave computing node. Generally, the slave computing node serves as a disaster recovery node for the master computing node, and the storage cluster includes at least one storage node; the master computing node is used to generate a binlog (binary log) in response to a data update request, and the data update request can be sent to the master computing node by a user through a client, etc., and the master computing node is also used to send the binlog to the storage cluster for storage; the slave computing node is used to read the binlog stored in the storage cluster, and update the data persistently stored in the storage cluster by replaying the binlog (specifically replaying the database statements recorded in the binlog). For example, the slave computing node can synchronize data with the master computing node by timely replaying the binlog generated by the master computing node, or the slave computing node can replay the binlog when the master computing node fails to achieve data recovery, etc.
由于主计算节点与从计算节点之间通过存储集群传输binlog,而不需要由主计算节点将binlog直接发送给从计算节点,这能够避免主计算节点的负荷过大,或者主计算节点与从计算节点之间的数据传输链路不稳定,导致从计算节点未获取到主计算节点生成的binlog,从而从计算节点通过回放所能获取到的binlog,实现主计算节点与从计算节点之间的数据同步,或者在主计算节点故障时实现数据恢复。这样,当主计算节点故障时,从计算节点能够基于主计算节点在故障时的数据,接管主计算节点上的业务,从而实现数据处理系统的RPO为0,提高了数据处理系统的可靠性。Since the master computing node and the slave computing node transmit binlogs through the storage cluster, and the master computing node does not need to send binlogs directly to the slave computing node, this can avoid excessive load on the master computing node, or unstable data transmission links between the master computing node and the slave computing node, resulting in the slave computing node failing to obtain the binlog generated by the master computing node. The slave computing node can achieve data synchronization between the master computing node and the slave computing node by replaying the binlog that can be obtained, or achieve data recovery when the master computing node fails. In this way, when the master computing node fails, the slave computing node can take over the business on the master computing node based on the data of the master computing node at the time of failure, thereby achieving an RPO of 0 for the data processing system and improving the reliability of the data processing system.
在一种可能的实施方式中,存储集群包括日志存储区域,该日志存储区域被主计算节点以及从计算节点访问,即主计算节点以及从计算节点可以共享该日志存储区域,从而主计算节点在向存储集群发送binlog时,具体是将该binlog发送至日志存储区域中进行存储;相应地,从计算节点,具体是从该日志存储区域中读取binlog;其中,存储集群不仅包括日志存储区域,还包括数据存储区域,该数据存储区域 用于存储业务数据,如主计算节点正常运行过程中所处理的业务数据等,该数据存储区域可以被主计算节点以及从计算节点访问,或者,该数据存储区域仅被主计算节点访问(仅主计算节点故障时从计算节点才能访问该数据存储区域)。如此,在主计算节点与从计算节点之间,可以通过存储集群中一块被共享的存储区域实现binlog的传输,以此尽可能保证从计算节点能够获取到主计算节点所生成的binlog,从而提高数据处理系统的可靠性。In a possible implementation, the storage cluster includes a log storage area, which is accessed by the master computing node and the slave computing node, that is, the master computing node and the slave computing node can share the log storage area, so that when the master computing node sends binlog to the storage cluster, it specifically sends the binlog to the log storage area for storage; correspondingly, the slave computing node specifically reads the binlog from the log storage area; wherein the storage cluster includes not only the log storage area, but also the data storage area, and the data storage area Used to store business data, such as business data processed during the normal operation of the master computing node, the data storage area can be accessed by the master computing node and the slave computing node, or the data storage area can only be accessed by the master computing node (the slave computing node can only access the data storage area when the master computing node fails). In this way, between the master computing node and the slave computing node, binlog transmission can be achieved through a shared storage area in the storage cluster, so as to ensure that the slave computing node can obtain the binlog generated by the master computing node as much as possible, thereby improving the reliability of the data processing system.
在一种可能的实施方式中,从计算节点,具体用于通过回放binlog以与主计算节点同步数据,即,在主计算节点正常运行的过程中,从计算节点可以持续执行回放主计算节点所生成的binlog来实现与主计算节点之间的数据同步;或者,从计算节点用于通过回放binlog以恢复主计算节点故障时丢失的数据,即在主计算节点故障之前,从计算节点可以不执行回放binlog的操作,而在主计算节点故障后,从计算节点通过执行回放binlog过程来恢复主计算节点故障时丢失的数据,以使得数据处理系统的RPO为0。如此,可以有效提高数据处理系统的可靠性。In a possible implementation, the slave computing node is specifically used to synchronize data with the master computing node by replaying binlog, that is, during the normal operation of the master computing node, the slave computing node can continuously replay the binlog generated by the master computing node to achieve data synchronization with the master computing node; or, the slave computing node is used to recover the data lost when the master computing node fails by replaying binlog, that is, before the master computing node fails, the slave computing node may not perform the operation of replaying binlog, and after the master computing node fails, the slave computing node recovers the data lost when the master computing node fails by replaying binlog, so that the RPO of the data processing system is 0. In this way, the reliability of the data processing system can be effectively improved.
在一种可能的实施方式中,存储集群中所包括的存储节点有主存储节点以及从存储节点,其中,从存储节点作为主存储节点的灾备,该主存储节点与从存储节点被部署于同一数据中心或者同一可用区,通常情况下,主存储节点用于为主计算节点提供读写数据的服务,从存储节点用于为从计算节点提供读写数据的服务;这样,从计算节点,在主计算节点正常运行的过程中,会从存储集群的日志存储区域中读取主计算节点生成并下发的binlog,并通过回放该binlog来更新从存储节点中持久化存储的数据。如此,从计算节点可以通过持续回放主计算节点所生成的binlog来实现与主计算节点之间的数据同步,以便在主计算节点故障时,从计算节点能够基于所同步的数据,接管主计算节点上的业务,从而实现数据处理系统的RPO为0,提高了数据处理系统的可靠性。In a possible implementation, the storage nodes included in the storage cluster include a master storage node and a slave storage node, wherein the slave storage node serves as a disaster recovery for the master storage node, and the master storage node and the slave storage node are deployed in the same data center or the same availability zone. Normally, the master storage node is used to provide read and write data services for the master computing node, and the slave storage node is used to provide read and write data services for the slave computing node; thus, the slave computing node, during the normal operation of the master computing node, will read the binlog generated and sent by the master computing node from the log storage area of the storage cluster, and update the data persistently stored in the slave storage node by replaying the binlog. In this way, the slave computing node can synchronize data with the master computing node by continuously replaying the binlog generated by the master computing node, so that when the master computing node fails, the slave computing node can take over the business on the master computing node based on the synchronized data, thereby achieving an RPO of 0 for the data processing system, thereby improving the reliability of the data processing system.
在一种可能的实施方式中,存储集群包括的存储节点中有目标存储节点,该目标存储节点用于持久化存储主计算节点所写入的数据;从计算节点可以在主计算节点正常运行的过程中,不执行回放binlog的操作,而在当主计算节点发生故障时,从日志存储区域中读取binlog,并通过回放该binlo来更新目标存储节点中持久化存储的数据,以此恢复主计算节点在故障时所丢失的数据,从而提高数据处理系统的可靠性。In one possible implementation, a storage cluster includes a target storage node among the storage nodes, which is used to persistently store the data written by the master computing node; the slave computing node may not perform the binlog playback operation during the normal operation of the master computing node, but when the master computing node fails, it reads the binlog from the log storage area and updates the data persistently stored in the target storage node by playing back the binlog, so as to restore the data lost by the master computing node when the failure occurs, thereby improving the reliability of the data processing system.
在一种可能的实施方式中,存储集群包括的存储节点中有主存储节点以及从存储节点,该从存储节点作为主存储节点的灾备,该主存储节点与从存储节点部署于不同的数据中心,或者,主存储节点与从存储节点部署于不同的可用区,通常情况下,主存储节点用于为主计算节点提供读写数据的服务,从存储节点用于为从计算节点提供读写数据的服务;在传输binlog的过程中,主计算节点,具体是将binlog发送至主存储节点中进行存储,然后,主存储节点将该binlog发送给从存储节点,从而从计算节点能够在从存储节点中读取到其所存储的binlog。如此,在主计算节点与从计算节点之间,可以通过在主存储节点以及从存储节点之间传输binlog,尽可能保证从计算节点能够获取到主计算节点所生成的binlog,从而提高数据处理系统的可靠性。In a possible implementation, the storage cluster includes a main storage node and a slave storage node in the storage nodes, the slave storage node serves as a disaster recovery for the main storage node, the main storage node and the slave storage node are deployed in different data centers, or the main storage node and the slave storage node are deployed in different availability zones, usually, the main storage node is used to provide read and write data services to the main computing node, and the slave storage node is used to provide read and write data services to the slave computing node; in the process of transmitting binlog, the main computing node, specifically, sends the binlog to the main storage node for storage, and then the main storage node sends the binlog to the slave storage node, so that the slave computing node can read the binlog stored in the slave storage node. In this way, between the main computing node and the slave computing node, by transmitting binlog between the main storage node and the slave storage node, it is possible to ensure that the slave computing node can obtain the binlog generated by the main computing node, thereby improving the reliability of the data processing system.
在一种可能的实施方式中,主存储节点,还用于在将binlog发送给从存储节点之前,将基线数据也发送给从存储节点,该基线数据通常包括主存储节点在某个时刻所持久化存储的数据(也即数据页面上存储的数据)以及主计算节点在该时刻之前所生成的binlog;从存储节点在接收到该基线数据后对其进行存储;这样,从计算节点通过回放该binlog,能够对从存储节点中存储的基线数据进行更新,以实现与主中心之间的数据同步。In one possible implementation, the master storage node is also used to send baseline data to the slave storage node before sending the binlog to the slave storage node. The baseline data generally includes the data persistently stored by the master storage node at a certain moment (that is, the data stored on the data page) and the binlog generated by the master computing node before that moment. The slave storage node stores the baseline data after receiving it. In this way, the slave computing node can update the baseline data stored in the slave storage node by replaying the binlog to achieve data synchronization with the master center.
在一种可能的实施方式中,主计算节点上运行有目标应用,通过存储集群传输的binlog是在该目标应用运行过程中产生,该目标应用包括关系数据库管理系统RDBMS,该RDBMS包括MySQL、PostgreSQL、OpenGauss、Oracle中的至少一种。如此,针对运行任意类型应用的主计算节点,均可以通过利用存储集群传输binlog的方式来保障数据处理系统的可靠性,以此可以提高数据处理系统对于数据库应用的兼容性和扩展性。In a possible implementation, a target application is running on the main computing node, and the binlog transmitted through the storage cluster is generated during the operation of the target application. The target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle. In this way, for the main computing node running any type of application, the reliability of the data processing system can be guaranteed by using the storage cluster to transmit binlog, thereby improving the compatibility and scalability of the data processing system for database applications.
在一种可能的实施方式中,存储集群中的存储节点具体为存储阵列,该存储阵列用于持久化存储数据。由于存储阵列上通常配置有基于独立磁盘冗余阵列技术、纠删码技术、重删压缩、数据备份等技术,以此能够进一步提高数据在存储集群中持久化存储的可靠性。In a possible implementation, the storage nodes in the storage cluster are specifically storage arrays, which are used to persistently store data. Since the storage array is usually configured with technologies based on independent disk redundant array technology, erasure coding technology, deduplication compression, data backup, etc., the reliability of persistent storage of data in the storage cluster can be further improved.
第二方面,本申请实施例提供一种数据处理方法,方法应用于数据处理系统,数据处理系统包括计 算集群、存储集群,计算集群与存储集群通过网络进行连接,计算集群包括主计算节点、从计算节点,存储集群包括至少一个存储节点;该方法包括:主计算节点响应于数据更新请求生成二进制日志binlog;主计算节点将binlog发送至存储集群中进行存储;从计算节点读取存储集群中存储的binlog;从计算节点通过回放binlog来更新存储集群中持久化存储的数据。In a second aspect, an embodiment of the present application provides a data processing method, the method is applied to a data processing system, the data processing system includes a computer The method comprises: the master computing node generates a binary log binlog in response to a data update request; the master computing node sends the binlog to the storage cluster for storage; the slave computing node reads the binlog stored in the storage cluster; and the slave computing node updates the data persistently stored in the storage cluster by replaying the binlog.
在一种可能的实施方式中,存储集群包括日志存储区域,日志存储区域被主计算节点以及从计算节点访问;主计算节点将binlog发送至存储集群中进行存储,包括:主计算节点将binlog发送至日志存储区域中进行存储;从计算节点读取存储集群中存储的binlog,包括:从计算节点从日志存储区域中读取binlog;其中,存储集群还包括数据存储区域,数据存储区域用于存储业务数据,数据存储区域被主计算节点以及从计算节点访问,或者,数据存储区域仅被主计算节点访问。In a possible implementation, the storage cluster includes a log storage area, which is accessed by a master computing node and a slave computing node; the master computing node sends binlog to the storage cluster for storage, including: the master computing node sends binlog to the log storage area for storage; the slave computing node reads the binlog stored in the storage cluster, including: the slave computing node reads the binlog from the log storage area; wherein the storage cluster also includes a data storage area, which is used to store business data, and the data storage area is accessed by the master computing node and the slave computing node, or the data storage area is only accessed by the master computing node.
在一种可能的实施方式中,从计算节点通过回放binlog,包括:从计算节点通过回放binlog以与主计算节点同步数据,或者,用于通过回放binlog以恢复主计算节点故障时丢失的数据。In a possible implementation, replaying binlog from a computing node includes: replaying binlog from a computing node to synchronize data with a master computing node, or replaying binlog to recover data lost when a master computing node fails.
在一种可能的实施方式中,至少一个存储节点包括主存储节点以及从存储节点,从存储节点作为主存储节点的灾备,主存储节点与从存储节点部署于同一数据中心或者同一可用区;从计算节点读取存储集群中存储的binlog,包括:从计算节点在主计算节点正常运行的过程中,从日志存储区域中读取binlog;从计算节点通过回放binlog来更新存储集群中持久化存储的数据,包括:从计算节点通过回放binlog来更新从存储节点中持久化存储的数据。In a possible implementation, at least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in the same data center or the same availability zone; the secondary computing node reads the binlog stored in the storage cluster, including: the secondary computing node reads the binlog from the log storage area during the normal operation of the primary computing node; the secondary computing node updates the data persistently stored in the storage cluster by replaying the binlog, including: the secondary computing node updates the data persistently stored in the secondary storage node by replaying the binlog.
在一种可能的实施方式中,至少一个存储节点包括目标存储节点,目标存储节点用于持久化存储主计算节点所写入的数据;从计算节点读取存储集群中存储的binlog,包括:从计算节点当主计算节点发生故障,从日志存储区域中读取binlog;从计算节点通过回放binlog来更新存储集群中持久化存储的数据,包括:从计算节点通过回放binlog来更新目标存储节点中持久化存储的数据。In one possible implementation, at least one storage node includes a target storage node, and the target storage node is used to persistently store data written by the main computing node; the slave computing node reads the binlog stored in the storage cluster, including: when the main computing node fails, the slave computing node reads the binlog from the log storage area; the slave computing node updates the data persistently stored in the storage cluster by replaying the binlog, including: the slave computing node updates the data persistently stored in the target storage node by replaying the binlog.
在一种可能的实施方式中,至少一个存储节点包括主存储节点以及从存储节点,从存储节点作为主存储节点的灾备,主存储节点与从存储节点部署于不同的数据中心,或者,主存储节点与从存储节点部署于不同的可用区;主计算节点将binlog发送至存储集群中进行存储,包括:主计算节点将binlog发送至主存储节点中进行存储;从计算节点读取存储集群中存储的binlog,包括:从计算节点,具体用于读取从存储节点存储的binlog,从存储节点中的binlog是由主存储节点发送的。In a possible implementation, at least one storage node includes a master storage node and a slave storage node, the slave storage node serves as a disaster recovery for the master storage node, the master storage node and the slave storage node are deployed in different data centers, or the master storage node and the slave storage node are deployed in different availability zones; the master computing node sends the binlog to the storage cluster for storage, including: the master computing node sends the binlog to the master storage node for storage; the slave computing node reads the binlog stored in the storage cluster, including: the slave computing node is specifically used to read the binlog stored in the slave storage node, and the binlog in the slave storage node is sent by the master storage node.
在一种可能的实施方式中,方法还包括:主存储节点在将binlog发送给从存储节点之前,将基线数据发送给从存储节点进行存储;从计算节点通过回放binlog来更新存储集群中持久化存储的数据,包括:从计算节点通过回放binlog,以对从存储节点中存储的基线数据进行更新。In a possible implementation, the method further includes: before the master storage node sends the binlog to the slave storage node, the master storage node sends the baseline data to the slave storage node for storage; the slave computing node updates the data persistently stored in the storage cluster by replaying the binlog, including: the slave computing node updates the baseline data stored in the slave storage node by replaying the binlog.
在一种可能的实施方式中,主计算节点上运行有目标应用,binlog是在目标应用运行过程中产生,目标应用包括关系数据库管理系统RDBMS,RDBMS包括MySQL、PostgreSQL、OpenGauss、Oracle中的至少一种。In a possible implementation, a target application is running on the main computing node, binlog is generated during the running of the target application, the target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle.
在一种可能的实施方式中,存储节点为存储阵列,存储阵列用于持久化存储数据。In a possible implementation, the storage node is a storage array, and the storage array is used to persistently store data.
第三方面,本申请实施例提供一种数据处理装置,该数据处理装置应用于数据处理系统,数据处理系统包括计算集群、存储集群,计算集群与存储集群通过网络进行连接,计算集群包括主计算节点、从计算节点,存储集群包括至少一个存储节点;数据处理装置包括:存储模块,用于指示主计算节点将响应于数据更新请求所生成二进制日志binlog发送至存储集群中进行存储;读取模块,用于指示从计算节点读取存储集群中存储的binlog至从计算节点;回放模块,用于指示从计算节点通过回放binlog来更新存储集群中持久化存储的数据。In a third aspect, an embodiment of the present application provides a data processing device, which is applied to a data processing system, wherein the data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected through a network, the computing cluster includes a master computing node and a slave computing node, and the storage cluster includes at least one storage node; the data processing device includes: a storage module, used to instruct the master computing node to send the binary log binlog generated in response to the data update request to the storage cluster for storage; a reading module, used to instruct the slave computing node to read the binlog stored in the storage cluster to the slave computing node; and a playback module, used to instruct the slave computing node to update the data persistently stored in the storage cluster by replaying the binlog.
在一种可能的实施方式中,存储集群包括日志存储区域,日志存储区域被主计算节点以及从计算节点访问;存储模块,具体用于指示主计算节点将binlog发送至日志存储区域中进行存储;读取模块,具体用于指示从计算节点从日志存储区域中读取binlog至从存储节点;其中,存储集群还包括数据存储区域,数据存储区域用于存储业务数据,数据存储区域被主计算节点以及从计算节点访问,或者,数据存储区域仅被主计算节点访问。In one possible implementation, the storage cluster includes a log storage area, which is accessed by the master computing node and the slave computing node; a storage module, which is specifically used to instruct the master computing node to send binlog to the log storage area for storage; a reading module, which is specifically used to instruct the slave computing node to read binlog from the log storage area to the slave storage node; wherein the storage cluster also includes a data storage area, which is used to store business data, and the data storage area is accessed by the master computing node and the slave computing node, or the data storage area is only accessed by the master computing node.
在一种可能的实施方式中,回访模块,具体用于指示从计算节点通过回放binlog以与主计算节点同步数据,或者,具体用于指示从计算节点通过回放binlog以恢复主计算节点故障时丢失的数据。In a possible implementation, the revisit module is specifically used to instruct the slave computing node to synchronize data with the master computing node by replaying the binlog, or is specifically used to instruct the slave computing node to recover data lost when the master computing node fails by replaying the binlog.
在一种可能的实施方式中,至少一个存储节点包括主存储节点以及从存储节点,从存储节点作为主 存储节点的灾备,主存储节点与从存储节点部署于同一数据中心或者同一可用区;读取模块,具体用于在主计算节点正常运行的过程中,指示从计算节点从日志存储区域中读取binlog至从计算节点;回放模块,具体用于指示从计算节点通过回放binlog来更新从存储节点中持久化存储的数据。In a possible implementation, at least one storage node includes a master storage node and a slave storage node, wherein the slave storage node serves as the master storage node. For disaster recovery of storage nodes, the master storage node and the slave storage node are deployed in the same data center or the same availability zone; the reading module is specifically used to instruct the slave computing node to read the binlog from the log storage area to the slave computing node during the normal operation of the master computing node; the playback module is specifically used to instruct the slave computing node to update the data persistently stored in the slave storage node by replaying the binlog.
在一种可能的实施方式中,至少一个存储节点包括目标存储节点,目标存储节点用于持久化存储主计算节点所写入的数据;读取模块,具体用于当主计算节点发生故障,指示从计算节点从日志存储区域中读取binlog;回放模块,具体用于指示从计算节点通过回放binlog来更新目标存储节点中持久化存储的数据。In one possible implementation, at least one storage node includes a target storage node, which is used to persistently store data written by the main computing node; a reading module, which is specifically used to instruct the slave computing node to read binlog from the log storage area when the main computing node fails; and a playback module, which is specifically used to instruct the slave computing node to update the data persistently stored in the target storage node by replaying the binlog.
在一种可能的实施方式中,至少一个存储节点包括主存储节点以及从存储节点,从存储节点作为主存储节点的灾备,主存储节点与从存储节点部署于不同的数据中心,或者,主存储节点与从存储节点部署于不同的可用区;存储模块,具体用于指示主计算节点将binlog发送至主存储节点中进行存储;读取模块,具体用于指示从计算节点读取从存储节点存储的binlog,从存储节点中的binlog是由主存储节点发送的。In one possible implementation, at least one storage node includes a master storage node and a slave storage node, the slave storage node serves as a disaster recovery for the master storage node, the master storage node and the slave storage node are deployed in different data centers, or the master storage node and the slave storage node are deployed in different availability zones; a storage module is specifically used to instruct the master computing node to send the binlog to the master storage node for storage; a reading module is specifically used to instruct the slave computing node to read the binlog stored in the slave storage node, and the binlog in the slave storage node is sent by the master storage node.
在一种可能的实施方式中,存储模块,还用于:指示主存储节点在将binlog发送给从存储节点之前,将基线数据发送给从存储节点进行存储;回放模块,具体用于指示从计算节点通过回放binlog,以对从存储节点中存储的基线数据进行更新。In one possible implementation, the storage module is further used to: instruct the master storage node to send the baseline data to the slave storage node for storage before sending the binlog to the slave storage node; the playback module is specifically used to instruct the slave computing node to update the baseline data stored in the slave storage node by replaying the binlog.
在一种可能的实施方式中,主计算节点上运行有目标应用,binlog是在目标应用运行过程中产生,目标应用包括关系数据库管理系统RDBMS,RDBMS包括MySQL、PostgreSQL、OpenGauss、Oracle中的至少一种。In a possible implementation, a target application is running on the main computing node, binlog is generated during the running of the target application, the target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle.
在一种可能的实施方式中,存储节点为存储阵列,存储阵列用于持久化存储数据。In a possible implementation, the storage node is a storage array, and the storage array is used to persistently store data.
第四方面,本申请实施例提供一种计算设备集群,该计算设备集群包括至少一个计算设备,该至少一个计算设备中的每个计算设备包括:处理器和存储器;该存储器用于存储指令,该处理器执行该存储器存储的指令,以使该计算设备集群执行上述第二方面或第二方面的任一实现方式中所述的数据处理方法,或者,实现上述第三方面或第三方面的任一实现方式中所述的数据处理装置。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。计算设备还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。In a fourth aspect, an embodiment of the present application provides a computing device cluster, the computing device cluster includes at least one computing device, each computing device in the at least one computing device includes: a processor and a memory; the memory is used to store instructions, and the processor executes the instructions stored in the memory so that the computing device cluster executes the data processing method described in the above second aspect or any implementation of the second aspect, or implements the data processing device described in the above third aspect or any implementation of the third aspect. It should be noted that the memory can be integrated into the processor or can be independent of the processor. The computing device may also include a bus. The processor is connected to the memory via a bus. The memory may include a readable memory and a random access memory.
第五方面,本申请实施例提供一种芯片,包括供电电路以及处理电路,所述供电电路用于对所述处理电路进行供电,所述处理电路实现上述第三方面或第三方面的任一实现方式中所述的数据处理装置。In a fifth aspect, an embodiment of the present application provides a chip, comprising a power supply circuit and a processing circuit, wherein the power supply circuit is used to power the processing circuit, and the processing circuit implements the data processing device described in the above third aspect or any implementation method of the third aspect.
第六方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有程序或指令,当其在计算机上运行时,使得上述第二方面或第二方面的任一实现方式中第一设备所述的数据处理方法被执行。In the sixth aspect, an embodiment of the present application further provides a computer-readable storage medium, in which a program or instruction is stored. When the computer-readable storage medium is run on a computer, the data processing method described in the first device in the above-mentioned second aspect or any implementation of the second aspect is executed.
第七方面,本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第二方面或第二方面的任一实现方式中所述的数据处理方法。In the seventh aspect, an embodiment of the present application further provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to execute the data processing method described in the above second aspect or any implementation of the second aspect.
另外,第二方面至七方面中任一种实现方式所带来的技术效果可参见第一方面中不同实现方式所带来的技术效果,此处不再赘述。In addition, the technical effects brought about by any implementation method in the second to seventh aspects can refer to the technical effects brought about by different implementation methods in the first aspect, and will not be repeated here.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the present application. For ordinary technicians in this field, other drawings can also be obtained based on these drawings.
图1为本申请实施例提供的一示例性数据处理系统的结构示意图;FIG1 is a schematic diagram of the structure of an exemplary data processing system provided in an embodiment of the present application;
图2为本申请实施例提供的另一示例性数据处理系统的结构示意图;FIG2 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application;
图3为本申请实施例提供的又一示例性数据处理系统的结构示意图;FIG3 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application;
图4为本申请实施例提供的再一示例性数据处理系统的结构示意图;FIG4 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application;
图5为本申请实施例提供的再一示例性数据处理系统的结构示意图;FIG5 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application;
图6为本申请实施例提供的再一示例性数据处理系统的结构示意图;FIG6 is a schematic diagram of the structure of another exemplary data processing system provided in an embodiment of the present application;
图7为本申请实施例提供的一示例性数据处理方法的流程示意图; FIG7 is a flow chart of an exemplary data processing method provided in an embodiment of the present application;
图8为本申请实施例提供的一示例性数据处理装置的结构示意图;FIG8 is a schematic diagram of the structure of an exemplary data processing device provided in an embodiment of the present application;
图9为本申请实施例提供的一示例性计算设备的结构示意图。FIG. 9 is a schematic diagram of the structure of an exemplary computing device provided in an embodiment of the present application.
具体实施方式Detailed ways
为使本申请的上述目的、特征和优点能够更加明显易懂,下面将结合附图对本申请实施例中的各种非限定性实施方式进行示例性说明。显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,基于上述内容所获得的所有其它实施例,都属于本申请保护的范围。In order to make the above-mentioned purposes, features and advantages of the present application more obvious and easy to understand, various non-limiting implementation methods in the embodiments of the present application are exemplarily described below in conjunction with the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in the present application, all other embodiments obtained based on the above content belong to the scope of protection of the present application.
参见图1,为一示例性数据处理系统100的结构示意图,数据处理系统100可以采用存算分离架构。如图1所示,数据处理系统100中包括计算集群101、存储集群102,并且,计算集群101与存储集群102之间可以通过网络进行通信,例如可以通过有线网络或者无线网络进行通信等。Referring to Fig. 1, it is a schematic diagram of the structure of an exemplary data processing system 100, and the data processing system 100 may adopt a storage-computing separation architecture. As shown in Fig. 1, the data processing system 100 includes a computing cluster 101 and a storage cluster 102, and the computing cluster 101 and the storage cluster 102 may communicate with each other through a network, such as a wired network or a wireless network.
其中,计算集群101包括多个计算节点,不同计算节点之间可以相互通信,并且,每个计算节点可以是一种包括处理器的计算设备,如服务器、台式计算机等。在计算集群101中,部分计算节点可以作为另一部分计算节点的灾备。为便于说明,图1中以计算集群101包括主计算节点1011以及从计算节点1012为例进行示例性说明,并且,从计算节点1012作为主计算节点1011的灾备。示例性地,从计算节点1012,可以作为主计算节点1011的热备或者冷备。其中,从计算节点1012作为热备时,从计算节点1012与主计算节点1011持续处于运行状态;这样,当主计算节点1011发生故障时,从计算节点1012可以利用备份的数据立即接管主计算节点1011上的业务,具体是处理主计算节点1011在故障时未处理完成的请求。从计算节点1012作为冷备时,在主计算节点1011正常运行期间,从计算节点1012可以不运行(如处于休眠状态等),或者从计算节点1012可以释放其上的计算资源并利用所释放的计算资源处理其它业务,如离线计算业务等。当主计算节点1011发生故障时,从计算节点1012启动运行/收回计算资源,并利用备份的数据实现接管主计算节点1011上的业务。实际应用时,主计算节点1011可以具有多个作为灾备节点的从计算节点,从而部分从计算节点可以作为主计算节点1011的冷备,而另一部分从计算节点可以作为主计算节点1011的热备等。Among them, the computing cluster 101 includes multiple computing nodes, different computing nodes can communicate with each other, and each computing node can be a computing device including a processor, such as a server, a desktop computer, etc. In the computing cluster 101, some computing nodes can be used as disaster recovery for another part of the computing nodes. For ease of explanation, FIG1 takes the computing cluster 101 including the main computing node 1011 and the slave computing node 1012 as an example for exemplary explanation, and the slave computing node 1012 is used as the disaster recovery for the main computing node 1011. Exemplarily, the slave computing node 1012 can be used as a hot standby or a cold standby for the main computing node 1011. Among them, when the slave computing node 1012 is used as a hot standby, the slave computing node 1012 and the main computing node 1011 are continuously in operation; in this way, when the main computing node 1011 fails, the slave computing node 1012 can use the backup data to immediately take over the business on the main computing node 1011, specifically, to process the requests that the main computing node 1011 has not completed when it fails. When the slave computing node 1012 is used as a cold standby, during the normal operation of the master computing node 1011, the slave computing node 1012 may not be running (such as being in a dormant state, etc.), or the slave computing node 1012 may release the computing resources thereon and use the released computing resources to process other services, such as offline computing services, etc. When the master computing node 1011 fails, the slave computing node 1012 starts to run/recover computing resources, and uses the backed-up data to take over the services on the master computing node 1011. In actual application, the master computing node 1011 may have multiple slave computing nodes as disaster recovery nodes, so that some slave computing nodes can be used as cold standby for the master computing node 1011, and another part of the slave computing nodes can be used as hot standby for the master computing node 1011, etc.
存储集群102可以包括一个或者多个存储节点,每个存储节点可以是包括持久化存储介质的设备,如网络附属存储器(network attached storage,NAS)、存储服务器等,可用于对数据进行持久化存储。其中,存储节点中的持久化存储介质,例如可以是硬盘,如固态硬盘或者叠瓦式磁记录硬盘等。实际应用时,每个存储节点可以是由一个或者多个用于持久化存储数据的设备构建。当存储集群102包括多个存储节点时,部分存储节点可以作为另一部分存储节点的灾备。为便于说明,图1中以存储集群102包括主存储节点1021以及从存储节点1022为例,并且,主存储节点1021作为从存储节点1022的灾备,主存储节点1021与从存储节点1022上的数据存储区域分别用于存储业务数据,并且,主存储节点1021上的数据存储区域被主计算节点1011访问,从存储节点1022上的数据存储区域被从计算节点1012访问。其中,主存储节点1021与从存储节点1022可以部署于同一数据中心,或者部署于同一可用区(availability zones,AZ)等。此时,通过在同一数据中心或者同一AZ内为持久化存储的数据创建多个副本,可以提高数据在本地存储的可靠性。或者,主存储节点1021与从存储节点1022可以部署于不同数据中心,例如,主存储节点1021部署于数据中心A,从存储节点1022部署于数据中心B;或者,主存储节点1021与从存储节点1022可以部署于不同AZ,如主存储节点1021部署于AZ1,从存储节点1022部署于AZ2等。如此,可以实现跨数据中心或者跨AZ进行数据容灾,以此提高数据在异地存储的可靠性。The storage cluster 102 may include one or more storage nodes, each of which may be a device including a persistent storage medium, such as a network attached storage (NAS), a storage server, etc., which may be used to persistently store data. Among them, the persistent storage medium in the storage node may be, for example, a hard disk, such as a solid state disk or a shingled magnetic recording hard disk, etc. In actual application, each storage node may be constructed by one or more devices for persistently storing data. When the storage cluster 102 includes multiple storage nodes, some storage nodes may be used as disaster recovery for another part of the storage nodes. For ease of explanation, FIG1 takes the storage cluster 102 including a master storage node 1021 and a slave storage node 1022 as an example, and the master storage node 1021 is used as a disaster recovery for the slave storage node 1022, and the data storage areas on the master storage node 1021 and the slave storage node 1022 are respectively used to store business data, and the data storage area on the master storage node 1021 is accessed by the master computing node 1011, and the data storage area on the slave storage node 1022 is accessed by the slave computing node 1012. Among them, the master storage node 1021 and the slave storage node 1022 can be deployed in the same data center, or in the same availability zone (AZ), etc. At this time, by creating multiple copies of persistently stored data in the same data center or the same AZ, the reliability of data storage in the local area can be improved. Alternatively, the master storage node 1021 and the slave storage node 1022 can be deployed in different data centers, for example, the master storage node 1021 is deployed in data center A, and the slave storage node 1022 is deployed in data center B; or, the master storage node 1021 and the slave storage node 1022 can be deployed in different AZs, such as the master storage node 1021 is deployed in AZ 1 , and the slave storage node 1022 is deployed in AZ 2 , etc. In this way, data disaster recovery can be achieved across data centers or across AZs, thereby improving the reliability of data storage in a different location.
主计算节点1011利用主存储节点1021提供数据读写服务,并且,在主计算节点1011发生故障后,从计算节点1012利用从存储节点1022上备份的数据接管主计算节点1011上的业务。实际应用场景中,主计算节点1011与主存储节点1021可以构成主中心(通常属于生产站点),从计算节点1012与从存储节点1022可以构成灾备中心(通常属于容灾站点)。在其他可能的数据处理系统中,存储集群102也可以包括一个存储节点,此时,主计算节点1011与从计算节点1012可以共享该存储节点,即该存储节点上的数据存储区域所存储的业务数据可以被主计算节点1011以及从计算节点1012访问,以便在主计算节点1011发生故障后,从计算节点1012可以利用该存储节点中存储的数据继续提供数据读写服务。The master computing node 1011 uses the master storage node 1021 to provide data read and write services, and after the master computing node 1011 fails, the slave computing node 1012 takes over the business on the master computing node 1011 using the data backed up from the storage node 1022. In actual application scenarios, the master computing node 1011 and the master storage node 1021 can constitute a master center (usually belonging to a production site), and the slave computing node 1012 and the slave storage node 1022 can constitute a disaster recovery center (usually belonging to a disaster recovery site). In other possible data processing systems, the storage cluster 102 can also include a storage node. In this case, the master computing node 1011 and the slave computing node 1012 can share the storage node, that is, the business data stored in the data storage area on the storage node can be accessed by the master computing node 1011 and the slave computing node 1012, so that after the master computing node 1011 fails, the slave computing node 1012 can continue to provide data read and write services using the data stored in the storage node.
主计算节点1011上可以部署有一个或者多个应用(图1中未示出),所部署的应用例如可以是数据库应用或者其它应用等。示例性地,数据库应用例如可以是关系数据库管理系统(relational database  management system,RDBMS)等,该RDBMS可以包括MySQL、PostgreSQL、OpenGauss、Oracle中的至少一种,或者可以是其它类型的数据库系统等。在应用运行的过程中,主计算节点1011通常会接收到用户侧的客户端或者其它设备发送的数据更新请求,如接收到用户侧的客户端发送的用于读取或者修改主存储节点1021中数据的数据更新请求等。此时,主计算节点1011上的应用可以响应该数据更新请求,为客户端或者其它设备提供相应的数据读写服务。其中,当主计算节点1011所接收到的数据更新请求,用于请求向数据处理系统100写入新数据,或者用于请求对数据处理系统100中已持久化存储的数据进行修改,又或者请求删除数据处理系统100中已持久化存储的数据时,主计算节点1011上的应用会生成二进制日志(binlog),并在本地的存储区域保存该binlog。其中,binlog为逻辑日志,用于记录更新主存储节点1021中持久化存储的数据的数据库语句,如SQL语句等。实际场景中,该应用可以包括服务层以及存储引擎层,并且可以由服务层生成并保存binlog。然后,主计算节点1011会将生成的binlog发送至从计算节点1012,并由从计算节点1012通过执行该binlog中的数据库语句来更新从存储节点1022中的数据,以此使得从存储节点1022中的数据与主存储节点1021中的数据保持一致,也即实现将主存储节点1021中的数据复制至从存储节点1022中。One or more applications (not shown in FIG. 1 ) may be deployed on the main computing node 1011. The deployed applications may be, for example, database applications or other applications. For example, the database application may be, for example, a relational database management system (RDBMS). Management system (RDBMS), etc., which RDBMS may include at least one of MySQL, PostgreSQL, OpenGauss, Oracle, or other types of database systems. During the application operation, the main computing node 1011 usually receives a data update request sent by a client or other device on the user side, such as receiving a data update request sent by a client on the user side for reading or modifying data in the main storage node 1021. At this time, the application on the main computing node 1011 can respond to the data update request and provide corresponding data read and write services for the client or other devices. Among them, when the data update request received by the main computing node 1011 is used to request to write new data to the data processing system 100, or to request to modify the data that has been persistently stored in the data processing system 100, or to request to delete the data that has been persistently stored in the data processing system 100, the application on the main computing node 1011 will generate a binary log (binlog) and save the binlog in a local storage area. Among them, binlog is a logical log, which is used to record database statements, such as SQL statements, for updating the data persistently stored in the main storage node 1021. In actual scenarios, the application may include a service layer and a storage engine layer, and the service layer may generate and save binlog. Then, the master computing node 1011 will send the generated binlog to the slave computing node 1012, and the slave computing node 1012 will update the data in the slave storage node 1022 by executing the database statements in the binlog, so that the data in the slave storage node 1022 is consistent with the data in the master storage node 1021, that is, the data in the master storage node 1021 is copied to the slave storage node 1022.
在主计算节点1011发生故障后,从计算节点1012需要运行/回收计算资源,并利用该计算资源启动运行从计算节点1012上的应用。然后,从计算节点1012上的应用执行主计算节点1011在故障之前发送的binlog中所记录的数据库语句,以使得从存储节点1022中的数据与存储节点1021在故障之前的数据保持一致。这样,从计算节点1012能够根据从存储节点1022中存储的数据,接管主计算节点1011上未被完成的请求。After the failure of the master computing node 1011, the slave computing node 1012 needs to run/reclaim computing resources and use the computing resources to start running the application on the slave computing node 1012. Then, the application on the slave computing node 1012 executes the database statements recorded in the binlog sent by the master computing node 1011 before the failure, so that the data in the slave storage node 1022 is consistent with the data before the failure of the storage node 1021. In this way, the slave computing node 1012 can take over the unfinished requests on the master computing node 1011 based on the data stored in the slave storage node 1022.
但是,实际应用场景中,主计算节点1011中可能存在部分binlog,未能成功被传输至从计算节点1012中。比如,当主计算节点1011与从计算节点1012之间的数据传输链路不稳定时(如数据传输抖动较高或者通信网络传输压力较大等),主计算节点1011发送的binlog可能在通信网络的传输过程中发生丢失,导致从计算节点1012难以接收到binlog。又比如,当主计算节点1011上的业务负载较大时,主计算节点1011在持续生成新的binlog的同时,可能难以及时将本地保存的多个binlog发送给从计算节点1012,而主计算节点1011中用于存储binlog的本地存储区域的存储空间有限,这使得主计算节点1011会在本地存储区域中淘汰最先保存的部分binlog,以便存储新的binlog。此时,由于主计算节点1011上的部分binlog未被发送至从计算节点1012,导致从计算节点1012难以通过回放该部分binlog实现将主存储节点1021中的数据与从存储节点1022之间的数据保持同步,也即主中心与灾备中心之间的数据不一致。这样,当主中心的主计算节点1011故障时,由于灾备中心并不能将数据恢复至主计算节点1011故障时的状态,这导致数据处理系统100的RPO不能达到0,即存在部分数据发生丢失,影响数据处理系统100的可靠性。However, in actual application scenarios, some binlogs may exist in the master computing node 1011 and fail to be successfully transmitted to the slave computing node 1012. For example, when the data transmission link between the master computing node 1011 and the slave computing node 1012 is unstable (such as high data transmission jitter or high communication network transmission pressure), the binlog sent by the master computing node 1011 may be lost during the transmission process of the communication network, making it difficult for the slave computing node 1012 to receive the binlog. For another example, when the business load on the master computing node 1011 is large, the master computing node 1011 may have difficulty in sending multiple binlogs stored locally to the slave computing node 1012 in a timely manner while continuously generating new binlogs, and the storage space of the local storage area for storing binlogs in the master computing node 1011 is limited, which causes the master computing node 1011 to eliminate the first part of the binlogs stored in the local storage area in order to store new binlogs. At this time, since part of the binlog on the master computing node 1011 has not been sent to the slave computing node 1012, it is difficult for the slave computing node 1012 to synchronize the data in the master storage node 1021 with the data between the slave storage node 1022 by replaying the part of the binlog, that is, the data between the master center and the disaster recovery center are inconsistent. In this way, when the master computing node 1011 of the master center fails, since the disaster recovery center cannot restore the data to the state when the master computing node 1011 fails, the RPO of the data processing system 100 cannot reach 0, that is, some data is lost, affecting the reliability of the data processing system 100.
基于此,本申请提供的数据处理系统100中,主计算节点1011在生成binlog后,会将该binlog发送给存储集群102,由存储集群102存储该binlog。然后,从计算节点1012从存储集群102中读取binlog,并通过回放该binlog(即回放该binlog中记录的数据库语句)来更新存储集群102中持久化存储的数据,具体是对从存储节点1021中的数据进行更新,以实现将从存储节点1022中的数据与主存储节点1021中的数据保持一致,或者与主存储节点1021中的数据+主计算节点1011缓存的数据保持一致。由于主计算节点1011所生成的binlog,是通过存储侧传输至从计算节点1012,这能够避免主计算节点1011的负荷过大,或者主计算节点1011与从计算节点1012之间的数据传输链路不稳定,导致从计算节点1012未获取到主计算节点1011生成的binlog而产生的数据不同步问题。这样,当主计算节点1011故障时,由于从存储节点1022中的数据与主存储节点1021中的数据保持一致,这使得从计算节点1012能够基于从存储节点1022中的数据,接管主计算节点1011上的业务,从而实现数据处理系统100的RPO为0,提高了数据处理系统100的可靠性。Based on this, in the data processing system 100 provided by the present application, after generating the binlog, the master computing node 1011 will send the binlog to the storage cluster 102, and the storage cluster 102 will store the binlog. Then, the slave computing node 1012 reads the binlog from the storage cluster 102, and updates the data persistently stored in the storage cluster 102 by replaying the binlog (i.e., replaying the database statements recorded in the binlog), specifically updating the data in the slave storage node 1021, so as to achieve the consistency of the data in the slave storage node 1022 with the data in the master storage node 1021, or with the data in the master storage node 1021 + the data cached by the master computing node 1011. Since the binlog generated by the master computing node 1011 is transmitted to the slave computing node 1012 through the storage side, this can avoid the overload of the master computing node 1011, or the instability of the data transmission link between the master computing node 1011 and the slave computing node 1012, resulting in the data synchronization problem caused by the slave computing node 1012 not obtaining the binlog generated by the master computing node 1011. In this way, when the main computing node 1011 fails, since the data in the slave storage node 1022 remains consistent with the data in the main storage node 1021, the slave computing node 1012 can take over the business on the main computing node 1011 based on the data in the slave storage node 1022, thereby achieving an RPO of 0 for the data processing system 100 and improving the reliability of the data processing system 100.
另外,当主计算节点1011与从计算节点1012上同时包括MySQL、PostgreSQL、OpenGauss、Oracle中的多种数据库应用时,主计算节点1011与从计算节点1012之间均可以采用统一的处理逻辑实现将主计算节点1011生成的binlog复制至从计算节点1012,从而可以提高数据处理系统100对于数据库应用的兼容性、降低数据库应用在计算集群101上的部署难度。In addition, when the master computing node 1011 and the slave computing node 1012 simultaneously include multiple database applications such as MySQL, PostgreSQL, OpenGauss, and Oracle, a unified processing logic can be used between the master computing node 1011 and the slave computing node 1012 to copy the binlog generated by the master computing node 1011 to the slave computing node 1012, thereby improving the compatibility of the data processing system 100 with database applications and reducing the difficulty of deploying database applications on the computing cluster 101.
值得注意的是,图1所示的数据处理系统100仅作为一种示例性说明,实际应用时,数据处理系统100也可以采用其它方式实现。为便于理解,本实施例提供了以下几种实现示例。 It is worth noting that the data processing system 100 shown in Fig. 1 is only an exemplary description, and in actual application, the data processing system 100 may also be implemented in other ways. For ease of understanding, this embodiment provides the following implementation examples.
在第一种实现示例中,存储集群102中针对主计算节点1011以及从计算节点1012可以仅包括一个存储节点,从而主计算节点1011与从计算节点1012可以共享访问该存储节点中的数据页面。In a first implementation example, the storage cluster 102 may include only one storage node for the master computing node 1011 and the slave computing node 1012 , so that the master computing node 1011 and the slave computing node 1012 can share access to data pages in the storage node.
在第二种实现示例,在计算集群101与存储集群102之间还可以包括元数据管理集群,该元数据管理集群负责对存储集群102中存储的元数据进行管理;相应地,计算集群101中的计算节点可以先从元数据管理集群中访问得到元数据,从而根据该元数据访问存储集群102中存储的数据。In the second implementation example, a metadata management cluster may be included between the computing cluster 101 and the storage cluster 102, and the metadata management cluster is responsible for managing the metadata stored in the storage cluster 102; accordingly, the computing nodes in the computing cluster 101 may first access the metadata from the metadata management cluster, and then access the data stored in the storage cluster 102 based on the metadata.
在第三种实现示例中,数据处理系统200中的计算集群以及存储集群中,均可以包括3个或者3个以上的节点,如图2所示。具体地,计算集群包括多个计算节点410,各个计算节点410之间可以相互通信,并且部分计算节点410可以作为另一部分计算节点410的灾备。每个计算节点410是一种包括处理器的计算设备,如服务器、台式计算机等。在硬件上,如图2所示,计算节点410至少包括处理器412、内存413、网卡414和存储介质415。其中,处理器412是一个中央处理器(central processing unit,CPU),用于处理来自计算节点410外部的数据访问请求,或者计算节点410内部生成的请求。处理器412从内存413中读取数据,或者,当内存413中的数据总量达到一定阈值时,处理器412将内存413中存储的数据发送给存储节点400进行持久化存储。图2中仅示出了一个CPU 412,在实际应用中,CPU 412的数量往往有多个,其中,一个CPU 412又具有一个或多个CPU核。本实施例不对CPU的数量、CPU核的数量进行限定。并且,计算节点410中的处理器412还可以用于实现上述向存储集群写入binlog和/或从存储集群读取并回放binlog的功能,以实现存储集群中的不同存储节点400之间的数据同步。In the third implementation example, the computing cluster and the storage cluster in the data processing system 200 may include three or more nodes, as shown in FIG2 . Specifically, the computing cluster includes a plurality of computing nodes 410, each of which can communicate with each other, and some computing nodes 410 can serve as disaster recovery for another computing node 410. Each computing node 410 is a computing device including a processor, such as a server, a desktop computer, etc. In terms of hardware, as shown in FIG2 , the computing node 410 includes at least a processor 412, a memory 413, a network card 414, and a storage medium 415. Among them, the processor 412 is a central processing unit (CPU) for processing data access requests from outside the computing node 410, or requests generated inside the computing node 410. The processor 412 reads data from the memory 413, or, when the total amount of data in the memory 413 reaches a certain threshold, the processor 412 sends the data stored in the memory 413 to the storage node 400 for persistent storage. FIG2 shows only one CPU 412. In actual applications, there are often multiple CPUs 412, wherein one CPU 412 has one or more CPU cores. This embodiment does not limit the number of CPUs or CPU cores. In addition, the processor 412 in the computing node 410 can also be used to implement the above-mentioned functions of writing binlog to the storage cluster and/or reading and replaying binlog from the storage cluster, so as to achieve data synchronization between different storage nodes 400 in the storage cluster.
内存413是指与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为操作系统或其他正在运行中的程序的临时数据存储器。内存包括至少两种存储器,例如内存既可以是随机存取存储器,也可以是只读存储器(Read Only Memory,ROM)。实际应用中,计算节点410中可配置多个内存413,以及不同类型的内存413。本实施例不对内存413的数量和类型进行限定。Memory 413 refers to an internal memory that directly exchanges data with the processor. It can read and write data at any time and at a high speed, and serves as a temporary data storage for the operating system or other running programs. Memory includes at least two types of memory. For example, memory can be either a random access memory or a read-only memory (ROM). In actual applications, multiple memories 413 and different types of memories 413 can be configured in the computing node 410. This embodiment does not limit the number and type of memory 413.
网卡414用于与存储节点400通信。例如,当内存413中的数据总量达到一定阈值时,计算节点410可通过网卡414向存储节点400发送请求以对所述数据进行持久化存储。另外,计算节点410还可以包括总线,用于计算节点410内部各组件之间的通信。在实际实现中,计算节点410也可以内置少量的硬盘,或者外接少量硬盘。The network card 414 is used to communicate with the storage node 400. For example, when the total amount of data in the memory 413 reaches a certain threshold, the computing node 410 can send a request to the storage node 400 through the network card 414 to store the data persistently. In addition, the computing node 410 can also include a bus for communication between components inside the computing node 410. In actual implementation, the computing node 410 can also have a small number of hard disks built in, or a small number of hard disks connected externally.
每个计算节点410可通过网络访问存储集群中的存储节点400。存储集群包括多个存储节点400,并且,部分存储节点400可以作为另一部分存储节点400的灾备。一个存储节点400包括一个或多个控制器401、网卡404与多个硬盘405。网卡404用于与计算节点410通信。硬盘405用于持久化存储数据,可以是磁盘或者其他类型的存储介质,例如固态硬盘或者叠瓦式磁记录硬盘等。控制器401用于根据计算节点410发送的读/写数据请求,往硬盘405中写入数据或者从硬盘405中读取数据。在读写数据的过程中,控制器401需要将读/写数据请求中携带的地址转换为硬盘能够识别的地址。Each computing node 410 can access the storage node 400 in the storage cluster through the network. The storage cluster includes multiple storage nodes 400, and some storage nodes 400 can be used as disaster recovery for another part of the storage nodes 400. A storage node 400 includes one or more controllers 401, a network card 404 and multiple hard disks 405. The network card 404 is used to communicate with the computing node 410. The hard disk 405 is used for persistent storage of data, and can be a disk or other types of storage media, such as a solid-state hard disk or a shingled magnetic recording hard disk. The controller 401 is used to write data to the hard disk 405 or read data from the hard disk 405 according to the read/write data request sent by the computing node 410. In the process of reading and writing data, the controller 401 needs to convert the address carried in the read/write data request into an address that the hard disk can recognize.
为便于理解与说明,下面基于图1所示的数据处理系统100,对主中心(包括主计算节点1011与主存储节点1021)与灾备中心(包括从计算节点1012与从存储节点1022)之间实现数据同步的过程进行详细介绍。To facilitate understanding and explanation, the following is a detailed introduction to the process of achieving data synchronization between the main center (including the main computing node 1011 and the main storage node 1021) and the disaster recovery center (including the slave computing node 1012 and the slave storage node 1022) based on the data processing system 100 shown in Figure 1.
通常情况下,主计算节点1011上运行有一个或者多个应用(如MySQL等),为便于理解,下面以主计算节点1011上运行目标应用为例进行示例性说明,该目标应用在运行时,能够支持主计算节点1011为用户提供数据读写服务。以用户请求修改数据为例,在主计算节点1011接收到用户通过客户端发送的用于修改数据的数据更新请求后,目标应用可以先将该数据更新请求所请求修改的数据所在的数据页从主存储节点1021读取至主计算节点1011中的缓冲池(buffer pool),并根据该数据更新请求完成对该缓冲池中数据页的修改,具体是将该数据页上的数据修改为新数据(新数据可以为空,此时是对该数据页上的数据进行删除)。此时,目标应用会针对数据修改内容生成binlog,该binlog用于记录指示对该数据进行修改的数据库语句,为便于区分和描述,以下将该新数据称之为目标数据。Normally, one or more applications (such as MySQL, etc.) are running on the main computing node 1011. For ease of understanding, the following is an example of running a target application on the main computing node 1011. When the target application is running, it can support the main computing node 1011 to provide data read and write services for users. Taking a user request to modify data as an example, after the main computing node 1011 receives a data update request for modifying data sent by the user through the client, the target application can first read the data page where the data requested to be modified by the data update request is located from the main storage node 1021 to the buffer pool in the main computing node 1011, and complete the modification of the data page in the buffer pool according to the data update request, specifically modifying the data on the data page to new data (the new data can be empty, in which case the data on the data page is deleted). At this time, the target application will generate a binlog for the data modification content, which is used to record the database statement indicating the modification of the data. For ease of distinction and description, the new data is referred to as target data below.
在完成对于缓冲池中数据的修改并生成binlog后,主计算节点1011可以向客户端反馈数据写入/修改成功。由于将数据写入缓冲池的速度通常高于对数据进行持久化存储的速度,如此,可以加快主计算节点1011响应数据更新请求。实际应用时,当缓冲池中积累的数据量达到阈值时,主计算节点1011将该缓冲池中的数据发送至主存储节点1021中进行持久化存储,并可以删除该数据所对应的binlog。或者,主 计算节点1011也可以是在将binlog成功发送至存储集群102进行保存后,向客户端反馈数据写入/修改成功等。After completing the modification of the data in the buffer pool and generating the binlog, the master computing node 1011 can feedback to the client that the data has been written/modified successfully. Since the speed of writing data to the buffer pool is usually higher than the speed of persistently storing the data, this can speed up the master computing node 1011 to respond to data update requests. In actual application, when the amount of data accumulated in the buffer pool reaches a threshold, the master computing node 1011 sends the data in the buffer pool to the master storage node 1021 for persistent storage, and can delete the binlog corresponding to the data. Alternatively, the master The computing node 1011 may also feedback to the client that the data has been written/modified successfully, etc., after successfully sending the binlog to the storage cluster 102 for storage.
其中,数据在主存储节点1021(以及从存储节点1022)中,可以是以文件的格式进行持久化存储,此时,主存储节点1021(以及从存储节点1022)中可以部署有相应的文件系统(file system,FS),该FS用于对持久化存储的文件进行管理。或者,数据在主存储节点1021(以及从存储节点1022)中,也可以是以数据块格式(data block format)进行持久化存储,即主存储节点1021(以及从存储节点1022)在存储数据时,将数据按照固定大小的尺寸进行分块,每个分块的数据量例如可以是512字节或者4千字节(KB)等。或者,数据在主存储节点1021(以及从存储节点1022)中,也可以是以对象(object)格式进行存储,此时,对象可以是存储节点存储数据的基本单位,每个对象可以包括数据和该数据的属性的综合体,其中,数据的属性可以根据计算节点中应用的需求进行设置,包括数据分布、服务质量等。本实施例中对于数据的存储格式并不进行限定。The data in the master storage node 1021 (and the slave storage node 1022) may be persistently stored in the format of a file. In this case, a corresponding file system (FS) may be deployed in the master storage node 1021 (and the slave storage node 1022), and the FS is used to manage the persistently stored files. Alternatively, the data in the master storage node 1021 (and the slave storage node 1022) may be persistently stored in the format of a data block. That is, when the master storage node 1021 (and the slave storage node 1022) stores data, the data is divided into blocks according to a fixed size. The data volume of each block may be, for example, 512 bytes or 4 kilobytes (KB). Alternatively, the data in the master storage node 1021 (and the slave storage node 1022) may be stored in the format of an object. In this case, an object may be the basic unit for storing data in a storage node. Each object may include a combination of data and the attributes of the data. The attributes of the data may be set according to the requirements of the application in the computing node, including data distribution, quality of service, etc. In this embodiment, the storage format of data is not limited.
本实施例中,针对主计算节点1011将生成的binlog发送至存储集群102并在存储集群102中存储该binlog的过程,提供了以下几种示例性的实现方式。In this embodiment, the following exemplary implementations are provided for the process in which the master computing node 1011 sends the generated binlog to the storage cluster 102 and stores the binlog in the storage cluster 102 .
在第一种可能的实施方式中,主计算节点1011可以将binlog发送至存储集群102中的主存储节点1021,以便主存储节点1021保存该binlog。在存储集群102中,由于从存储节点1022是主存储节点1021的灾备,因此,在主计算节点1011向主存储节点1021写入binlog后,主存储节点1021可以对写入的binlog进行备份,并将备份的binlog通过有线或者无线的方式,发送给从存储节点1022。当确定从存储节点1022成功写入备份的binlog时,主存储节点1021可以向主计算节点1011反馈binlog写入成功。In a first possible implementation, the master computing node 1011 may send the binlog to the master storage node 1021 in the storage cluster 102 so that the master storage node 1021 saves the binlog. In the storage cluster 102, since the slave storage node 1022 is a disaster recovery for the master storage node 1021, after the master computing node 1011 writes the binlog to the master storage node 1021, the master storage node 1021 may back up the written binlog and send the backed-up binlog to the slave storage node 1022 via wired or wireless means. When it is determined that the slave storage node 1022 has successfully written the backed-up binlog, the master storage node 1021 may feedback to the master computing node 1011 that the binlog has been written successfully.
示例性地,如图3所示,主存储节点1021与从存储节点1022可以部署于同一数据中心或者同一AZ,此时,主存储节点1021与从存储节点1022之间可以建立有线或者无线连接,并通过该有线或者无线连接,将备份的binlog发送至从存储节点1022中进行存储。Exemplarily, as shown in FIG3 , the master storage node 1021 and the slave storage node 1022 can be deployed in the same data center or the same AZ. At this time, a wired or wireless connection can be established between the master storage node 1021 and the slave storage node 1022, and the backed-up binlog can be sent to the slave storage node 1022 for storage through the wired or wireless connection.
或者,如图4所示,主存储节点1021与从存储节点1022可以部署于不同的AZ,如主存储节点1021部署于AZ1,从存储节点1022部署于AZ2等。此时,主存储节点1021可以通过网卡或者网络接口将备份的binlog发送至从存储节点1022中进行存储。Alternatively, as shown in FIG4 , the master storage node 1021 and the slave storage node 1022 may be deployed in different AZs, such as the master storage node 1021 is deployed in AZ 1 and the slave storage node 1022 is deployed in AZ 2 , etc. At this time, the master storage node 1021 may send the backed-up binlog to the slave storage node 1022 for storage via a network card or a network interface.
上述图3以及图4所示的实现方式,仅作为一些示例性说明。比如,在其它可能的实施方式中,主存储节点1021可以存在多个作为灾备的从存储节点,此时,该多个从存储节点中的部分节点可以与主存储节点1021部署于同一物理区域,如部署于同一数据中心或者同一AZ,该多个从存储节点中的另一部分节点可以与主存储节点1021部署于不同的物理区域,如部署于不同的数据中心/AZ等,以此可以同时提高数据在本地存储以及异地存储的可靠性。The implementations shown in Figures 3 and 4 above are only some exemplary explanations. For example, in other possible implementations, the master storage node 1021 may have multiple slave storage nodes as disaster recovery. In this case, some of the multiple slave storage nodes may be deployed in the same physical area as the master storage node 1021, such as in the same data center or the same AZ, and another part of the multiple slave storage nodes may be deployed in a different physical area from the master storage node 1021, such as in a different data center/AZ, etc., so as to improve the reliability of data storage both locally and remotely.
这样,从计算节点1012可以读取从存储节点1022中存储的binlog,并通过回放该binlog中记录的数据库语句,更新从存储节点1022中存储的数据(具体为数据页面上的数据),以此实现从存储节点1022中的数据与主存储节点1021之间的数据同步,或者称之为灾备中心与主中心之间的数据同步。In this way, the computing node 1012 can read the binlog stored in the storage node 1022, and update the data stored in the storage node 1022 (specifically the data on the data page) by replaying the database statements recorded in the binlog, thereby realizing data synchronization between the data in the storage node 1022 and the main storage node 1021, or it can be called data synchronization between the disaster recovery center and the main center.
作为一种示例,从计算节点1012中可以创建有输入输出(input/output,IO)线程以及数据库线程(如SQL线程等),则,从计算节点1012可以利用IO线程访问从存储节点1022,以读取从存储节点1022中的binlog,并将读取的binlog存储至本地存储区域。然后,从计算节点1012可以利用数据库线程对本地存储区域中的各个binlog依次执行回放操作。例如,本地存储区域中的各个binlog可以具有日志序号(log sequence number,LSN),从而数据库线程可以按照LSN由小到大的顺序依次回放各个binlog。具体地,在回放每个binlog时,数据库线程可以从binlog中解析出待执行的数据库语句,如SQL语句等,并对该数据库语句进行语义分析和语法分析,以确定该数据库语句的合法性。其中,语法分析,是指数据库语言的语法规则校验该数据库语句是否存在语法错误;语义分析,是指分析该数据库语句的语义是否合法。在通过合法性校验后,数据库线程可以针对该数据库语句生成计划树,该计划树指示了针对数据进行处理的执行计划。最后,数据库线程可以在完成对该计划树的优化后,根据优化后的计划树实现对从存储节点1022中的数据更新。As an example, an input/output (IO) thread and a database thread (such as an SQL thread, etc.) can be created in the slave computing node 1012. Then, the slave computing node 1012 can use the IO thread to access the slave storage node 1022 to read the binlog in the slave storage node 1022 and store the read binlog in the local storage area. Then, the slave computing node 1012 can use the database thread to perform playback operations on each binlog in the local storage area in sequence. For example, each binlog in the local storage area can have a log sequence number (LSN), so that the database thread can play back each binlog in order from small to large LSN. Specifically, when playing back each binlog, the database thread can parse the database statement to be executed from the binlog, such as an SQL statement, and perform semantic analysis and grammatical analysis on the database statement to determine the legitimacy of the database statement. Among them, grammatical analysis refers to the grammatical rules of the database language to check whether the database statement has grammatical errors; semantic analysis refers to analyzing whether the semantics of the database statement is legal. After passing the legality check, the database thread can generate a plan tree for the database statement, which indicates the execution plan for processing the data. Finally, the database thread can update the data from the storage node 1022 according to the optimized plan tree after completing the optimization of the plan tree.
实际应用时,主计算节点1011与从计算节点1012可以具有相同的配置。比如,在创建灾备中心时,可以对主计算节点1011以及主存储节点1021中的配置文件进行备份,并将备份的配置文件分别发送至灾备中心中的从计算节点1012以及从存储节点1022,从而从计算节点1012基于接收到的配置文件,可以具 有与主计算节点1012相同的配置,从存储节点1022基于接收到的配置文件与主存储节点1021具有相同的配置。这样,主计算节点1011在将binlog发送至主存储节点1021时,可以将主存储节点1021中的binlog挂载至指定的目录下,从而从存储节点1022在接收主存储节点1021发送的binlog后,可以将该binlog存储至该目录下所对应的存储位置。这样,从计算节点1012可以基于与主计算节点1011统一配置的目录,读取从存储节点1022中存储的属于该目录下的binlog。In actual application, the master computing node 1011 and the slave computing node 1012 may have the same configuration. For example, when creating a disaster recovery center, the configuration files in the master computing node 1011 and the master storage node 1021 may be backed up, and the backed-up configuration files may be sent to the slave computing node 1012 and the slave storage node 1022 in the disaster recovery center, so that the slave computing node 1012 may have the same configuration based on the received configuration files. The slave storage node 1022 has the same configuration as the master computing node 1012, and the slave storage node 1022 has the same configuration as the master storage node 1021 based on the received configuration file. In this way, when the master computing node 1011 sends the binlog to the master storage node 1021, the binlog in the master storage node 1021 can be mounted to the specified directory, so that after receiving the binlog sent by the master storage node 1021, the slave storage node 1022 can store the binlog to the storage location corresponding to the directory. In this way, the slave computing node 1012 can read the binlog belonging to the directory stored in the slave storage node 1022 based on the directory uniformly configured with the master computing node 1011.
在第二种可能的实施方式中,如图5所示,存储集群102中可以配置有日志存储区域501,该日志存储区域501能够被主计算节点1011以及从计算节点1012共同访问。示例性地,如图5所示,日志存储区域501,可以是主存储节点1021或者从存储节点1022上的部分存储区域,或者可以是独立于主存储节点1021以及从存储节点1022的其他存储节点上的存储区域等,本实施例对此并不进行限定。此时,主计算节点1011可以将binlog发送至存储集群102中的日志存储区域501进行存储,例如可以是发送至指定目录下的日志存储区域501中进行存储。则,从计算节点1012可以通过访问该日志存储区域501,获取主计算节点1011所生成的binlog,例如可以是根据指定的目录访问相应的日志存储区域501,以获取binlog等。进而,主计算节点1012可通过回放该binlog实现主存储节点1021与从存储节点1022之间的数据同步。其中,主计算节点1012回放binlog的具体实现过程,可参见前述相关之处描述,在此不再重述。In a second possible implementation, as shown in FIG5 , a log storage area 501 may be configured in the storage cluster 102, and the log storage area 501 can be accessed by the master computing node 1011 and the slave computing node 1012. Exemplarily, as shown in FIG5 , the log storage area 501 may be a partial storage area on the master storage node 1021 or the slave storage node 1022, or may be a storage area on other storage nodes independent of the master storage node 1021 and the slave storage node 1022, etc., which is not limited in this embodiment. At this time, the master computing node 1011 may send the binlog to the log storage area 501 in the storage cluster 102 for storage, for example, it may be sent to the log storage area 501 under the specified directory for storage. Then, the slave computing node 1012 may obtain the binlog generated by the master computing node 1011 by accessing the log storage area 501, for example, it may be accessing the corresponding log storage area 501 according to the specified directory to obtain the binlog, etc. Furthermore, the master computing node 1012 can achieve data synchronization between the master storage node 1021 and the slave storage node 1022 by replaying the binlog. The specific implementation process of the master computing node 1012 replaying the binlog can be found in the above-mentioned related description, which will not be repeated here.
值得注意的是,上述两种实现方式仅作为一种示例性说明,实际应用时,从计算节点1012也可以通过其他方式从存储集群102中获取主计算节点1011所生成的binlog。It is worth noting that the above two implementation methods are only used as an exemplary description. In actual application, the slave computing node 1012 can also obtain the binlog generated by the master computing node 1011 from the storage cluster 102 in other ways.
在进一步可能的实施方式中,在主计算节点1011将当前生成的binlog通过存储集群102同步给从计算节点1012之前,主存储节点1021与从存储节点1022之间还可以预先完成基线(baseline)复制。其中,基线复制,是指将主存储节点1021在某个时间点(如当前时刻)所持久化存储的数据以及主计算节点1011已经生成的binlog全部发送至灾备中心。比如,在创建从存储节点1022时,主存储节点1021与从存储节点1022之间第一次执行数据同步过程,则,主存储节点1021与从存储节点1022之间可以采用基线复制的方式实现数据同步。In a further possible implementation, before the master computing node 1011 synchronizes the currently generated binlog to the slave computing node 1012 through the storage cluster 102, the master storage node 1021 and the slave storage node 1022 can also complete the baseline replication in advance. Among them, baseline replication refers to sending all the data persistently stored by the master storage node 1021 at a certain point in time (such as the current moment) and the binlog that the master computing node 1011 has generated to the disaster recovery center. For example, when creating the slave storage node 1022, the data synchronization process is performed for the first time between the master storage node 1021 and the slave storage node 1022. Then, the master storage node 1021 and the slave storage node 1022 can use the baseline replication method to achieve data synchronization.
具体实现时,主存储节点1021可以将第一时刻(如当前时刻)确定为基线对应的时刻,并基于该时刻确定基线数据。其中,基线数据包括主存储节点1021在第一时刻所持久化存储的数据(也即数据页面上存储的数据),以及主计算节点1011在该第一时刻之前所生成的binlog,也即LSN小于或者等于第一时刻对应的LSN的binlog。通常情况下,该基线数据中的binlog所指示更新的数据保存在缓冲池中,尚未下沉至主存储节点1021中进行持久化存储。然后,主存储节点1021可以将该基线数据通过有线或者无线的方式发送至从存储节点1022。在从存储节点1022成功存储该基线数据后,从计算节点1012可以按照LSN由小到大的顺序,依次执行回放该基线数据中的各个binlog的过程,以此完成对基线数据中的属于数据页面上的数据进行更新,从而实现将主中心在第一时刻所存储的数据同步至灾备中心。这样,当主计算节点1011基于用户新写入的数据生成binlog时,主计算节点1011将该binlog通过存储集群102发送至从计算节点1012,再由从计算节点1012通过回放该binlog,实现对从存储节点1022中存储的基线数据进行更新,以此实现及时将主中心与灾备中心之间数据的同步。In specific implementation, the main storage node 1021 can determine the first moment (such as the current moment) as the moment corresponding to the baseline, and determine the baseline data based on the moment. Among them, the baseline data includes the data persistently stored by the main storage node 1021 at the first moment (that is, the data stored on the data page), and the binlog generated by the main computing node 1011 before the first moment, that is, the binlog whose LSN is less than or equal to the LSN corresponding to the first moment. Usually, the updated data indicated by the binlog in the baseline data is stored in the buffer pool and has not yet been sunk to the main storage node 1021 for persistent storage. Then, the main storage node 1021 can send the baseline data to the slave storage node 1022 by wired or wireless means. After the slave storage node 1022 successfully stores the baseline data, the slave computing node 1012 can perform the process of replaying each binlog in the baseline data in sequence according to the order of LSN from small to large, so as to complete the update of the data belonging to the data page in the baseline data, thereby realizing the synchronization of the data stored by the main center at the first moment to the disaster recovery center. In this way, when the main computing node 1011 generates a binlog based on the data newly written by the user, the main computing node 1011 sends the binlog to the slave computing node 1012 through the storage cluster 102, and then the slave computing node 1012 replays the binlog to update the baseline data stored in the slave storage node 1022, thereby achieving timely synchronization of data between the main center and the disaster recovery center.
实际应用时,当主存储节点1021中存在部分binlog未被及时发送至从存储节点1022而在主存储节点1021中被删除,如该部分binlog在主存储节点1021中的生命周期超出预设时长而被删除等,此时,从存储节点1022无法获得该部分被删除的binlog来保持与主存储节点1021之间的数据同步,因此,主存储节点1021与从存储节点1022之间可以通过再次执行基线复制的方式实现数据同步。In actual application, when some binlogs in the main storage node 1021 are not sent to the slave storage node 1022 in time and are deleted in the main storage node 1021, such as the life cycle of this part of the binlog in the main storage node 1021 exceeds the preset time and is deleted, at this time, the slave storage node 1022 cannot obtain the deleted part of the binlog to maintain data synchronization with the main storage node 1021. Therefore, data synchronization can be achieved between the main storage node 1021 and the slave storage node 1022 by re-executing the baseline replication.
数据处理系统100在运行过程中,从计算节点1012可以实时或者周期性的检测主计算节点1011是否发生故障,如从计算节点1012可以在未接收到主计算节点1011发送的心跳消息时确定主计算节点1011是否发生故障,或者接收到第三方的仲裁服务器发送的故障通知时确定主计算节点1011是否发生故障等。当确定主计算节点1011故障时,从计算节点1012升级为主计算节点,并指示存储节点1022升级为主存储节点。此时,从计算节点1012可以先校验从存储节点1022或者日志存储区域501中是否存储有尚未完成回放的binlog,若存在,则从计算节点1012先读取并回放该binlog,以完成对从存储节点1022中持久化存储的数据进行更新,更新后的数据即为在主计算节点101故障时主中心所存储的数据;然后,从计算节点1012再基于存储节点1022中的数据继续接管主计算节点1011上的业务。During the operation of the data processing system 100, the slave computing node 1012 can detect in real time or periodically whether the master computing node 1011 has a fault. For example, the slave computing node 1012 can determine whether the master computing node 1011 has a fault when it does not receive a heartbeat message sent by the master computing node 1011, or determine whether the master computing node 1011 has a fault when it receives a fault notification sent by a third-party arbitration server. When it is determined that the master computing node 1011 has a fault, the slave computing node 1012 is upgraded to the master computing node, and the storage node 1022 is instructed to upgrade to the master storage node. At this time, the slave computing node 1012 can first check whether there is any binlog that has not been fully replayed stored in the slave storage node 1022 or the log storage area 501. If so, the slave computing node 1012 will first read and replay the binlog to complete the update of the data persistently stored in the slave storage node 1022. The updated data is the data stored in the main center when the main computing node 101 fails; then, the slave computing node 1012 continues to take over the business on the main computing node 1011 based on the data in the storage node 1022.
在从计算节点1012接管业务的过程中,当用户请求对数据处理系统100中所持久化存储的数据进行 修改时,从计算节点1012可以生成相应的binlog,并将其写入存储集群102中的从存储节点1022或者日志存储区域501,以便后续主计算节点1011在故障恢复后根据存储集群102中存储的binlog实现数据同步。主计算节点1011在故障恢复后,可以作为从计算节点1012的灾备;或者,主计算节点1011可以通过主从切换,将主计算节点1011再次恢复为主节点等,本实施例对此并不进行限定。In the process of taking over the business from the computing node 1012, when the user requests to perform operations on the data persistently stored in the data processing system 100, During the modification, the slave computing node 1012 can generate the corresponding binlog and write it to the slave storage node 1022 or the log storage area 501 in the storage cluster 102, so that the master computing node 1011 can synchronize data according to the binlog stored in the storage cluster 102 after the failure recovery. After the failure recovery, the master computing node 1011 can be used as a disaster recovery for the slave computing node 1012; or, the master computing node 1011 can be restored to the master node again through the master-slave switching, etc., which is not limited in this embodiment.
本实施例中,由于主计算节点1011所生成的binlog,是通过存储侧传输至从计算节点1012,这能够避免主计算节点1011的负荷过大,或者主计算节点1011与从计算节点1012之间的数据传输链路不稳定,导致从计算节点1012未获取到主计算节点1011生成的binlog而产生的数据不同步问题。这样,当主计算节点1011故障时,由于从存储节点1022中的数据与主存储节点1021中的数据保持一致,这使得从计算节点1012能够基于从存储节点1022中的数据,接管主计算节点1011上的业务,从而实现数据处理系统100的RPO为0,提高了数据处理系统100的可靠性。In this embodiment, since the binlog generated by the master computing node 1011 is transmitted to the slave computing node 1012 through the storage side, this can avoid the overload of the master computing node 1011, or the instability of the data transmission link between the master computing node 1011 and the slave computing node 1012, resulting in the data synchronization problem caused by the slave computing node 1012 not obtaining the binlog generated by the master computing node 1011. In this way, when the master computing node 1011 fails, since the data in the slave storage node 1022 is consistent with the data in the master storage node 1021, the slave computing node 1012 can take over the business on the master computing node 1011 based on the data in the slave storage node 1022, thereby achieving an RPO of 0 for the data processing system 100, and improving the reliability of the data processing system 100.
另外,当主计算节点1011与从计算节点1012上同时包括多种数据库应用时,主计算节点1011与从计算节点1012之间均可以采用统一的处理逻辑实现将主计算节点1011生成的binlog复制至从计算节点1012,从而可以提高数据处理系统100对于数据库应用的兼容性、降低数据库应用在计算集群101上的部署难度。In addition, when the master computing node 1011 and the slave computing node 1012 include multiple database applications at the same time, a unified processing logic can be used between the master computing node 1011 and the slave computing node 1012 to copy the binlog generated by the master computing node 1011 to the slave computing node 1012, thereby improving the compatibility of the data processing system 100 with database applications and reducing the difficulty of deploying database applications on the computing cluster 101.
实际应用时,主存储节点1021以及从存储节点1022可以通过存储阵列(memory array)实现,或者可以通过包括该存储阵列的设备实现,从而主存储节点1021以及从存储节点1022可以基于存储阵列持久化存储数据,并且,可以在该存储阵列上基于独立磁盘冗余阵列(Redundant Arrays of Independent Disks,RAID)技术、纠删码(erasure coding,EC)技术、重删压缩、数据备份等技术,进一步提高数据在主存储节点1021以及从存储节点1022中持久化存储的可靠性。In actual application, the master storage node 1021 and the slave storage node 1022 can be implemented by a memory array, or can be implemented by a device including the memory array, so that the master storage node 1021 and the slave storage node 1022 can persistently store data based on the memory array, and can further improve the reliability of persistent storage of data in the master storage node 1021 and the slave storage node 1022 based on the memory array based on Redundant Arrays of Independent Disks (RAID) technology, erasure coding (EC) technology, deduplication and compression, data backup and other technologies.
上述图1至图5所示的数据处理系统100中,主计算节点1011与从计算节点1012分别配置有各自的存储节点,并且,主存储节点1021仅能被主计算节点1011所访问,从存储节点1022仅能被从计算节点1012所访问。在其他可能的数据处理系统中,主计算节点1011与从计算节点1012之间也可以共享同一存储节点,即该存储节点中的数据页面上的数据允许被主计算节点1011以及从计算节点1012所访问,下面结合图6对该数据处理系统进行详细描述。In the data processing system 100 shown in FIGS. 1 to 5 above, the master computing node 1011 and the slave computing node 1012 are respectively configured with their own storage nodes, and the master storage node 1021 can only be accessed by the master computing node 1011, and the slave storage node 1022 can only be accessed by the slave computing node 1012. In other possible data processing systems, the master computing node 1011 and the slave computing node 1012 can also share the same storage node, that is, the data on the data page in the storage node is allowed to be accessed by the master computing node 1011 and the slave computing node 1012. The data processing system is described in detail below in conjunction with FIG. 6.
参见图6,示出了本申请提供的另一种数据处理系统的结构示意图。如图6所示,数据处理系统600仍然采用存算分离结构,包括计算集群601以及存储集群602中,其中,计算集群601与存储集群602之间可以通过网络进行通信。Referring to Figure 6, a schematic diagram of the structure of another data processing system provided by the present application is shown. As shown in Figure 6, the data processing system 600 still adopts a storage-computing separation structure, including a computing cluster 601 and a storage cluster 602, wherein the computing cluster 601 and the storage cluster 602 can communicate with each other through a network.
计算集群601包括多个计算节点,为便于描述,图6中以计算集群601包括主计算节点6011以及从计算节点6012为例进行示例性说明,并且,从计算节点1012作为主计算节点1011的灾备,具体可以是热备或者冷备。The computing cluster 601 includes multiple computing nodes. For ease of description, FIG6 takes the computing cluster 601 including a master computing node 6011 and a slave computing node 6012 as an example for illustrative explanation, and the slave computing node 1012 serves as a disaster recovery for the master computing node 1011, which can be a hot backup or a cold backup.
存储集群602包括至少一个存储节点,图6中以包括存储节点6021为例,并且,该存储集群602中还包括日志存储区域6022。如图6所示,该日志存储区域6022可以部署于存储节点6021中,也可以独立于存储节点6021进行部署,如部署于存储集群602中的其他存储节点等,本实施例对此并不进行限定。并且,存储节点6021上还包括数据存储区域(图6中未示出),该数据存储区域与日志存储区域6022均可以被主计算节点6011与从计算节点6012所访问。并且,存储节点6021用于利用该数据存储区域持久化存储数据,如持久化存储主计算节点6011在处理业务时所生成的数据等。日志存储区域6022用于存储主计算节点6011在运行过程中所生成的binlog。The storage cluster 602 includes at least one storage node, and FIG. 6 takes the storage node 6021 as an example, and the storage cluster 602 also includes a log storage area 6022. As shown in FIG. 6, the log storage area 6022 can be deployed in the storage node 6021, or it can be deployed independently of the storage node 6021, such as deployed in other storage nodes in the storage cluster 602, etc., and this embodiment does not limit this. In addition, the storage node 6021 also includes a data storage area (not shown in FIG. 6), and the data storage area and the log storage area 6022 can be accessed by the main computing node 6011 and the slave computing node 6012. In addition, the storage node 6021 is used to use the data storage area to persistently store data, such as persistently storing the data generated by the main computing node 6011 when processing business. The log storage area 6022 is used to store the binlog generated by the main computing node 6011 during operation.
主计算节点6011在接收到用于请求对存储节点6021中持久化存储的数据进行更新(包括数据的增、删、改等)的数据更新请求后,可以将该数据更新请求所请求修改的数据所在的数据页从存储节点6021读取至主计算节点6011中的缓冲池,并根据该数据更新请求完成对该缓冲池中数据页的修改,以及针对该数据修改内容生成binlog。然后,主计算节点6011可以向用户侧反馈数据更新成功,并将所生成的binlog发送至存储集群602中的日志存储区域6022中进行存储(以及将更新后的数据页发送至存储节点6021中进行持久化存储);或者,主计算节点6011可以在将所生成的binlog发送至存储集群602中的日志存储区域6022中进行存储后,向用户侧反馈数据更新成功等,本实施例对此并不进行限定。After receiving a data update request for requesting to update the data persistently stored in the storage node 6021 (including data addition, deletion, modification, etc.), the master computing node 6011 can read the data page where the data requested to be modified by the data update request is located from the storage node 6021 to the buffer pool in the master computing node 6011, and complete the modification of the data page in the buffer pool according to the data update request, and generate a binlog for the data modification content. Then, the master computing node 6011 can feedback the data update success to the user side, and send the generated binlog to the log storage area 6022 in the storage cluster 602 for storage (and send the updated data page to the storage node 6021 for persistent storage); or, the master computing node 6011 can send the generated binlog to the log storage area 6022 in the storage cluster 602 for storage, and feedback the data update success to the user side, etc., which is not limited in this embodiment.
值得注意的是,由于主计算节点6011与从计算节点6012共享存储节点6021中的数据页面,因此,在主计算节点6011将binlog成功写入日志存储区域6022后,若主计算节点6011正常运行,则从计算节点6012 可以不用读取以及回放日志存储区域6022中的binlog。实际应用时,当主计算节点6011将缓冲池中的数据写入存储节点6021,则该缓冲池中的数据所对应的binlog可以从日志存储区域6022中进行淘汰。It is worth noting that since the master computing node 6011 and the slave computing node 6012 share the data page in the storage node 6021, after the master computing node 6011 successfully writes the binlog into the log storage area 6022, if the master computing node 6011 operates normally, the slave computing node 6012 It is not necessary to read and replay the binlog in the log storage area 6022. In actual application, when the main computing node 6011 writes the data in the buffer pool to the storage node 6021, the binlog corresponding to the data in the buffer pool can be eliminated from the log storage area 6022.
从计算节点6012可以实时或者周期性的检测主计算节点6011是否发生故障,并且,当确定主计算节点6011故障时,从计算节点6012升级为主节点,并检测日志存储区域6022中是否存在binlog。若存在,表明主计算节点6011在故障时,主计算节点6011的缓冲池中可能存在新数据未被持久化存储至存储节点6021,此时,从计算节点6012可以读取日志存储区域6022中的binlog,并通过回放该binlog的方式,对存储节点6021中的数据进行更新,以恢复主计算节点6011在故障时缓存于缓冲池中的数据,从而实现将数据处理系统600中的数据恢复至主计算节点6011故障时的状态,也即实现数据处理系统600的RPO为0。The slave computing node 6012 can detect in real time or periodically whether the master computing node 6011 fails, and when it is determined that the master computing node 6011 fails, the slave computing node 6012 is upgraded to the master node, and detects whether there is a binlog in the log storage area 6022. If there is, it indicates that when the master computing node 6011 fails, there may be new data in the buffer pool of the master computing node 6011 that has not been persistently stored in the storage node 6021. At this time, the slave computing node 6012 can read the binlog in the log storage area 6022, and update the data in the storage node 6021 by replaying the binlog to restore the data cached in the buffer pool of the master computing node 6011 when the failure occurs, thereby realizing the restoration of the data in the data processing system 600 to the state when the master computing node 6011 fails, that is, realizing the RPO of the data processing system 600 to be 0.
值得注意的是,从计算节点6012在回放日志存储区域6022中的binlog的过程中,可以先将日志存储区域6022中的binlog读取至从计算节点6012的本地存储空间,然后再按照LSN由小到大的顺序,对本地存储空间中各个的binlog依次执行回放操作。或者,从计算节点6012也可以是直接按照LSN由小到大的顺序,依次读取日志存储区域6022中的各个binlog,并直接对读取的binlog执行回放操作。如此,可以进一步减少回放binlog所需的资源消耗,减小数据恢复时延,从而降低数据处理系统600的恢复点目标(recovery point objective,RTO)。其中,RTO是指灾难发生后,从数据处理系统600业务停顿之刻开始,到数据处理系统600恢复业务结束,这两个时刻之间的时间间隔。It is worth noting that, when the slave computing node 6012 plays back the binlog in the log storage area 6022, it can first read the binlog in the log storage area 6022 to the local storage space of the slave computing node 6012, and then perform the playback operation on each binlog in the local storage space in order of LSN from small to large. Alternatively, the slave computing node 6012 can also directly read each binlog in the log storage area 6022 in order of LSN from small to large, and directly perform the playback operation on the read binlog. In this way, the resource consumption required for playing back the binlog can be further reduced, the data recovery delay can be reduced, and the recovery point objective (RTO) of the data processing system 600 can be reduced. Among them, RTO refers to the time interval between the moment when the data processing system 600 business is suspended and the moment when the data processing system 600 resumes business after the disaster occurs.
需要说明的是,上述图1至图6所示的数据处理系统中,主计算节点以及从计算节点所执行的操作,可以由部署于其上的应用实现,该应用例如可以是上述MySQL、PostgreSQL、OpenGauss、或Oracle等数据库应用,或者可以是其他应用。It should be noted that in the data processing system shown in Figures 1 to 6 above, the operations performed by the master computing node and the slave computing node can be implemented by an application deployed thereon, which application can be, for example, the above-mentioned database application such as MySQL, PostgreSQL, OpenGauss, or Oracle, or can be other applications.
这样,通过对主计算节点以及从计算节点上部署的现有应用进行版本更新,即可实现binlog通过存储侧由主计算节点传输至从计算节点,从而实现主中心与灾备中心之间的数据同步。In this way, by updating the versions of existing applications deployed on the master computing node and the slave computing node, the binlog can be transmitted from the master computing node to the slave computing node through the storage side, thereby realizing data synchronization between the main center and the disaster recovery center.
或者,主计算节点以及从计算节点所执行的操作,也可以是在计算集群中单独部署的数据处理装置执行,即主计算节点可以是在该数据处理装置的控制下,将生成的binlog写入存储集群,而从计算节点可以在该数据处理装置的控制下,从存储集群中读取binlog并回放该binlog。Alternatively, the operations performed by the master computing node and the slave computing node may also be performed by a data processing device deployed separately in the computing cluster, that is, the master computing node may write the generated binlog to the storage cluster under the control of the data processing device, and the slave computing node may read the binlog from the storage cluster and replay the binlog under the control of the data processing device.
示例性地,数据处理装置可以通过软件或者硬件实现。By way of example, the data processing device may be implemented by software or hardware.
其中,当通过软件实现时,数据处理装置例如可以是部署于硬件设备上的程序代码等。实际应用时,数据处理装置例如可以是作为插件、组件或者应用等软件形式部署于主计算节点和/或从计算节点(例如,部署在主计算节点和/或从计算节点的控制器中)。此时,通过在主计算节点和/或从计算节点上部署该数据处理装置,即可实现在主计算节点与从计算节点之间通过存储侧完成binlog的传输,这可以减少或者无需对部署于主计算节点以及从计算节点上的数据库应用进行修改,降低方案实施的难度。Among them, when implemented by software, the data processing device can be, for example, a program code deployed on a hardware device. In actual application, the data processing device can be, for example, deployed in the main computing node and/or the slave computing node in the form of software such as a plug-in, component or application (for example, deployed in the controller of the main computing node and/or the slave computing node). At this time, by deploying the data processing device on the main computing node and/or the slave computing node, the transmission of binlog can be completed between the main computing node and the slave computing node through the storage side, which can reduce or eliminate the need to modify the database application deployed on the main computing node and the slave computing node, reducing the difficulty of implementing the solution.
或者,上述数据处理装置可以通过物理设备实现,其中,该物理设备例如可以是CPU,或者可以是专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device,PLD)、复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)、片上系统(system on chip,SoC)、软件定义架构(software-defined infrastructure,SDI)芯片、人工智能(artificial intelligence,AI)芯片、数据处理单元(data processing unit,DPU)等任意一种处理器或其任意组合,本实施例对此并不进行限定。Alternatively, the above-mentioned data processing device can be implemented by a physical device, wherein the physical device can be, for example, a CPU, or can be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), a system on chip (SoC), a software-defined infrastructure (SDI) chip, an artificial intelligence (AI) chip, a data processing unit (DPU), or any other processor or any combination thereof, and this embodiment does not limit this.
上述结合图1至图6,介绍了通过在存储侧传输binlog实现数据处理系统中的主中心与灾备中心之间实现数据同步的过程。下面,结合图7,从方法流程的角度,对主中心与灾备中心实现数据同步的流程进行示例性描述。参见图7,示出了本申请实施例提供的一种数据处理方法的流程图。为便于理解,图7中以应用于图1所示的数据处理系统100为例进行说明,如图7所示,该方法具体可以包括:The above, in combination with Figures 1 to 6, introduces the process of realizing data synchronization between the main center and the disaster recovery center in the data processing system by transmitting binlog on the storage side. Below, in combination with Figure 7, an exemplary description of the process of realizing data synchronization between the main center and the disaster recovery center is given from the perspective of the method flow. Referring to Figure 7, a flow chart of a data processing method provided by an embodiment of the present application is shown. For ease of understanding, Figure 7 is used as an example to illustrate the data processing system 100 shown in Figure 1. As shown in Figure 7, the method may specifically include:
S701:主计算节点1011接收数据更新请求,该数据更新请求用于请求数据处理系统100对持久化存储的数据进行更新。S701: The main computing node 1011 receives a data update request, where the data update request is used to request the data processing system 100 to update persistently stored data.
实际应用时,主计算节点1011可以接收用户侧的客户端或者其它设备发送的数据更新请求,该数据更新请求可以用于请求对数据处理系统100中所持久化存储的数据进行修改,或者可以用于请求向数据处理系统100写入新数据等。 In actual application, the main computing node 1011 can receive a data update request sent by a client or other device on the user side. The data update request can be used to request modification of data persistently stored in the data processing system 100, or can be used to request writing new data to the data processing system 100, etc.
S702:主计算节点1011响应该数据更新请求,在缓冲池中完成数据更新,并针对该数据更新请求生成相应的binlog。S702: The main computing node 1011 responds to the data update request, completes the data update in the buffer pool, and generates a corresponding binlog for the data update request.
S703:主计算节点1011将binlog发送至存储集群102中进行存储。S703: The main computing node 1011 sends the binlog to the storage cluster 102 for storage.
在第一种实现示例中,存储集群102中包括主存储节点1021以及从存储节点1022,其中,主存储节点1021支持主计算节点1011的数据读写,从存储节点1022支持从计算节点1012的数据读写。这样,主计算节点1011可以将该binlog写入主存储节点1021,并由主存储节点1021对该binlog进行备份,然后将备份的binlog发送至从存储节点1022中进行存储。其中,主存储节点1021与从存储节点1022可以部署于同一物理区域(如同一数据中心/AZ),或者可以部署于不同物理区域。In the first implementation example, the storage cluster 102 includes a master storage node 1021 and a slave storage node 1022, wherein the master storage node 1021 supports the data reading and writing of the master computing node 1011, and the slave storage node 1022 supports the data reading and writing of the slave computing node 1012. In this way, the master computing node 1011 can write the binlog to the master storage node 1021, and the master storage node 1021 backs up the binlog, and then sends the backed-up binlog to the slave storage node 1022 for storage. The master storage node 1021 and the slave storage node 1022 can be deployed in the same physical area (such as the same data center/AZ), or can be deployed in different physical areas.
在第二种实现示例中,存储集群102中可以部署有日志存储区域,如上述日志存储区域501等,从而主计算节点1011可以将生成的binlog写入该日志存储区域,该日志存储区域能够被主计算节点1011以及从计算节点1012访问。In the second implementation example, a log storage area, such as the above-mentioned log storage area 501, can be deployed in the storage cluster 102, so that the master computing node 1011 can write the generated binlog into the log storage area, and the log storage area can be accessed by the master computing node 1011 and the slave computing node 1012.
S704:从计算节点1012从存储集群102中读取binlog。S704 : The slave computing node 1012 reads the binlog from the storage cluster 102 .
具体实现时,从计算节点1012可以从存储集群102中的从存储节点1022或者日志存储区域中读取binlog等。In specific implementation, the slave computing node 1012 may read binlogs etc. from the slave storage node 1022 or the log storage area in the storage cluster 102 .
其中,从计算节点1012可以与主计算节点1011具有相同的配置。例如,在创建从计算节点1012的过程中,主计算节点1011可以对自身的配置文件等数据进行备份,并将备份的数据发送至从计算节点1012,从而从计算节点1012根据接收到的备份数据完成相应的配置,如配置处理业务的逻辑、运行在从计算节点1012上的应用、存储集群102中的binlog所挂载的目录等。The slave computing node 1012 may have the same configuration as the master computing node 1011. For example, in the process of creating the slave computing node 1012, the master computing node 1011 may back up its own configuration files and other data, and send the backed-up data to the slave computing node 1012, so that the slave computing node 1012 completes the corresponding configuration according to the received backup data, such as configuring the logic of processing services, the application running on the slave computing node 1012, and the directory where the binlog in the storage cluster 102 is mounted.
S705:从计算节点1012回放所读取到的binlog,以更新存储集群102中的从存储节点1022中持久化存储的数据。S705 : Play back the read binlog from the computing node 1012 to update the data persistently stored in the slave storage node 1022 in the storage cluster 102 .
在图1所示的数据处理系统100中,从计算节点1012具体可以是从存储节点1022或者日志存储区域中读取该binlog,并通过回放该binlog,实现将从存储节点1022中的数据保持与主中心中的数据处于同步状态。具体实现时,从计算节点1012可以从binlog中解析出数据库语句,如SQL语句等,并对该数据库语句进行语义分析和语法分析,以确定该数据库语句的合法性。在通过合法性校验后,从计算节点1012可以针对该数据库语句生成计划树,该计划树指示了针对数据进行处理的执行计划。最后,从计算节点1012可以在完成对该计划树的优化后,根据优化后的计划树实现对从存储节点1022中的数据更新。In the data processing system 100 shown in Figure 1, the slave computing node 1012 can specifically read the binlog from the storage node 1022 or the log storage area, and by replaying the binlog, the data in the slave storage node 1022 can be kept in a synchronized state with the data in the main center. In specific implementation, the slave computing node 1012 can parse database statements, such as SQL statements, from the binlog, and perform semantic analysis and grammatical analysis on the database statements to determine the legitimacy of the database statements. After passing the legitimacy check, the slave computing node 1012 can generate a plan tree for the database statement, which indicates an execution plan for processing the data. Finally, after completing the optimization of the plan tree, the slave computing node 1012 can implement the update of the data in the slave storage node 1022 according to the optimized plan tree.
这样,在主计算节点1011发生故障后,从计算节点1012可以通过从存储节点1022中与主中心保持同步状态的数据,接管主计算节点1011上的业务,以此实现数据处理系统100的故障恢复。In this way, after the main computing node 1011 fails, the slave computing node 1012 can take over the business on the main computing node 1011 by using the data in the storage node 1022 that is synchronized with the main center, thereby realizing fault recovery of the data processing system 100.
其中,在从计算节点1012回放binlog之前,主计算节点1011可以控制主存储节点1021向从存储节点1022发送基线数据,以完成基线复制,其基线复制的具体实现过程,可参见前述实施例的相关之处描述,在此不做赘述。Among them, before replaying the binlog from the computing node 1012, the master computing node 1011 can control the master storage node 1021 to send baseline data to the slave storage node 1022 to complete the baseline replication. The specific implementation process of the baseline replication can be found in the relevant description of the aforementioned embodiment and will not be repeated here.
而在其他可能的数据处理系统,如图6所示的数据处理系统600等,主计算节点1011与从计算节点1012可以共享同一存储节点,以下称之为目标存储节点。则,在主计算节点1011处于正常运行的过程中,对于主计算节点1011写入存储集群102中的binlog,从计算节点1012可以不用执行从存储集群102中读取binlog并执行回放binlog的操作。此时,存储集群102中所存储的binlog,为主计算节点1011的缓冲池存储的更新后的数据所对应的binlog;当缓冲池中的数据被写入存储集群102时,该缓冲池中的数据所对应的binlog可以从存储集群102中淘汰。而当主计算节点1011故障时,主计算节点1011的缓冲池中所缓存的数据因为尚未完成持久化存储,从而可能会因为主计算节点1011的故障而发生丢失,此时,从计算节点1012通过从存储集群102中读取binlog,并执行回放binlog的操作,实现对目标存储节点中持久化存储的数据进行更新,以此实现在目标存储节点中恢复出主计算节点1011的缓冲池中尚未完成持久化存储的数据。In other possible data processing systems, such as the data processing system 600 shown in FIG6 , the master computing node 1011 and the slave computing node 1012 may share the same storage node, which is referred to as the target storage node below. Then, when the master computing node 1011 is in normal operation, for the binlog written by the master computing node 1011 into the storage cluster 102, the slave computing node 1012 does not need to perform the operation of reading the binlog from the storage cluster 102 and replaying the binlog. At this time, the binlog stored in the storage cluster 102 is the binlog corresponding to the updated data stored in the buffer pool of the master computing node 1011; when the data in the buffer pool is written into the storage cluster 102, the binlog corresponding to the data in the buffer pool can be eliminated from the storage cluster 102. When the main computing node 1011 fails, the data cached in the buffer pool of the main computing node 1011 may be lost due to the failure of the main computing node 1011 because it has not yet completed persistent storage. At this time, the slave computing node 1012 reads the binlog from the storage cluster 102 and performs the operation of replaying the binlog to update the data persistently stored in the target storage node, thereby restoring the data in the buffer pool of the main computing node 1011 that has not yet completed persistent storage in the target storage node.
需要说明的是,图7所示的步骤S701至步骤S705,对应于上述图1至图6所示的系统实施例,因此,步骤S701至步骤S705的具体实现过程,可参见前述实施例的相关之处描述,在此不做重述。It should be noted that steps S701 to S705 shown in FIG. 7 correspond to the system embodiments shown in FIG. 1 to FIG. 6 above. Therefore, the specific implementation process of steps S701 to S705 can be found in the relevant description of the aforementioned embodiments and will not be repeated here.
上文中结合图1至图7,详细描述了本申请所提供的数据处理系统,下面将结合图8和图9,描述根据本申请所提供的数据处理装置、数据处理设备。The data processing system provided by the present application is described in detail above in conjunction with Figures 1 to 7. The data processing apparatus and data processing device provided by the present application will be described below in conjunction with Figures 8 and 9.
与上述方法同样的发明构思,本申请实施例还提供一种数据处理装置。参见图8,示出了本申请实 施例提供的一种数据处理装置的示意图。其中,图8所示的数据处理装置800位于数据处理系统,如图1所示的数据处理系统100、图6所示的数据处理系统600等,该数据处理系统包括计算集群、存储集群,所述计算集群与所述存储集群通过网络进行连接,所述计算集群包括主计算节点以及从计算节点,所述存储集群包括至少一个存储节点,通常情况下,从计算节点作为主计算节点的灾备。With the same inventive concept as the above method, the embodiment of the present application also provides a data processing device. A schematic diagram of a data processing device provided in the embodiment. The data processing device 800 shown in FIG8 is located in a data processing system, such as the data processing system 100 shown in FIG1 , the data processing system 600 shown in FIG6 , etc. The data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected via a network, the computing cluster includes a master computing node and a slave computing node, the storage cluster includes at least one storage node, and usually, the slave computing node serves as a disaster recovery for the master computing node.
如图8所示,数据处理装置800包括:As shown in FIG8 , the data processing device 800 includes:
存储模块801,用于指示主计算节点将响应于数据更新请求所生成二进制日志binlog发送至存储集群中进行存储;The storage module 801 is used to instruct the main computing node to send the binary log binlog generated in response to the data update request to the storage cluster for storage;
读取模块802,用于指示从计算节点读取存储集群中存储的binlog至从计算节点;A reading module 802 is used to instruct the slave computing node to read the binlog stored in the storage cluster to the slave computing node;
回放模块803,用于指示从计算节点通过回放binlog来更新存储集群中持久化存储的数据。The playback module 803 is used to instruct the slave computing node to update the persistently stored data in the storage cluster by replaying the binlog.
在一种可能的实施方式中,存储集群包括日志存储区域,日志存储区域被主计算节点以及从计算节点访问;In one possible implementation, the storage cluster includes a log storage area, and the log storage area is accessed by the master computing node and the slave computing node;
存储模块801,具体用于指示主计算节点将binlog发送至日志存储区域中进行存储;Storage module 801 is specifically used to instruct the main computing node to send the binlog to the log storage area for storage;
读取模块802,具体用于指示从计算节点从日志存储区域中读取binlog至从存储节点;其中,存储集群还包括数据存储区域,数据存储区域用于存储业务数据,数据存储区域被主计算节点以及从计算节点访问,或者,数据存储区域仅被主计算节点访问。The reading module 802 is specifically used to instruct the slave computing node to read the binlog from the log storage area to the slave storage node; wherein, the storage cluster also includes a data storage area, the data storage area is used to store business data, the data storage area is accessed by the master computing node and the slave computing node, or the data storage area is only accessed by the master computing node.
在一种可能的实施方式中,回访模块803,具体用于指示从计算节点通过回放binlog以与主计算节点同步数据,或者,具体用于指示从计算节点通过回放binlog以恢复主计算节点故障时丢失的数据。In a possible implementation, the revisit module 803 is specifically used to instruct the slave computing node to synchronize data with the master computing node by replaying the binlog, or is specifically used to instruct the slave computing node to recover data lost when the master computing node fails by replaying the binlog.
在一种可能的实施方式中,至少一个存储节点包括主存储节点以及从存储节点,从存储节点作为主存储节点的灾备,主存储节点与从存储节点部署于同一数据中心或者同一可用区;In a possible implementation, at least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in the same data center or the same availability zone;
读取模块802,具体用于在主计算节点正常运行的过程中,指示从计算节点从日志存储区域中读取binlog至从计算节点;The reading module 802 is specifically used to instruct the slave computing node to read the binlog from the log storage area to the slave computing node during the normal operation of the master computing node;
回放模块803,具体用于指示从计算节点通过回放binlog来更新从存储节点中持久化存储的数据。The playback module 803 is specifically used to instruct the slave computing node to update the data persistently stored in the slave storage node by replaying the binlog.
在一种可能的实施方式中,至少一个存储节点包括目标存储节点,目标存储节点用于持久化存储主计算节点所写入的数据;In a possible implementation, the at least one storage node includes a target storage node, and the target storage node is used to persistently store data written by the primary computing node;
读取模块802,具体用于当主计算节点发生故障,指示从计算节点从日志存储区域中读取binlog;The reading module 802 is specifically used to instruct the slave computing node to read the binlog from the log storage area when the master computing node fails;
回放模块803,具体用于指示从计算节点通过回放binlog来更新目标存储节点中持久化存储的数据。The playback module 803 is specifically used to instruct the slave computing node to update the persistently stored data in the target storage node by replaying the binlog.
在一种可能的实施方式中,至少一个存储节点包括主存储节点以及从存储节点,从存储节点作为主存储节点的灾备,主存储节点与从存储节点部署于不同的数据中心,或者,主存储节点与从存储节点部署于不同的可用区;In a possible implementation, at least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in different data centers, or the primary storage node and the secondary storage node are deployed in different availability zones;
存储模块801,具体用于指示主计算节点将binlog发送至主存储节点中进行存储;Storage module 801 is specifically used to instruct the main computing node to send the binlog to the main storage node for storage;
读取模块802,具体用于指示从计算节点读取从存储节点存储的binlog,从存储节点中的binlog是由主存储节点发送的。The reading module 802 is specifically used to instruct the slave computing node to read the binlog stored in the slave storage node, where the binlog in the slave storage node is sent by the master storage node.
在一种可能的实施方式中,存储模块801,还用于:指示主存储节点在将binlog发送给从存储节点之前,将基线数据发送给从存储节点进行存储;In a possible implementation, the storage module 801 is further used to: instruct the master storage node to send the baseline data to the slave storage node for storage before sending the binlog to the slave storage node;
回放模块803,具体用于指示从计算节点通过回放binlog,以对从存储节点中存储的基线数据进行更新。The playback module 803 is specifically used to instruct the slave computing node to update the baseline data stored in the slave storage node by replaying the binlog.
在一种可能的实施方式中,主计算节点上运行有目标应用,binlog是在目标应用运行过程中产生,目标应用包括关系数据库管理系统RDBMS,RDBMS包括MySQL、PostgreSQL、OpenGauss、Oracle中的至少一种。In a possible implementation, a target application is running on the main computing node, binlog is generated during the running of the target application, the target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle.
在一种可能的实施方式中,存储节点为存储阵列,存储阵列用于持久化存储数据。In a possible implementation, the storage node is a storage array, and the storage array is used to persistently store data.
本实施例提供的数据处理装置800,对应于上述各实施例中的数据处理系统,用于实现上述各实施例中所执行的数据处理过程,因此,本实施例中的各个模块的功能及其所具有的技术效果,可参见前述实施例中的相关之处描述,在此不做赘述。The data processing device 800 provided in this embodiment corresponds to the data processing system in the above-mentioned embodiments, and is used to implement the data processing process performed in the above-mentioned embodiments. Therefore, the functions of each module in this embodiment and the technical effects thereof can be found in the relevant descriptions in the above-mentioned embodiments, and will not be elaborated here.
此外,本申请实施例还提供一种计算设备,如图9所示,计算设备900中可以包括通信接口910、处理器920。可选的,计算设备900中还可以包括存储器930。其中,存储器930可以设置于计算设备900内 部,还可以设置于计算设备900外部。示例性地,上述实施例中数据处理装置指示主计算节点、从计算节点(以及主存储节点)执行的各个动作均可以由处理器920实现。在实现过程中,处理流程的各步骤可以通过处理器920中的硬件的集成逻辑电路或者软件形式的指令完成前述实施例中的方法。为了简洁,在此不再赘述。处理器920用于实现上述方法所执行的程序代码可以存储在存储器930中。存储器930和处理器920连接,如耦合连接等。In addition, the present application embodiment further provides a computing device. As shown in FIG9 , the computing device 900 may include a communication interface 910 and a processor 920. Optionally, the computing device 900 may also include a memory 930. The memory 930 may be disposed in the computing device 900. The part may also be arranged outside the computing device 900. Exemplarily, each action that the data processing device instructs the master computing node and the slave computing node (and the master storage node) to perform in the above-mentioned embodiment can be implemented by the processor 920. In the implementation process, each step of the processing flow can complete the method in the above-mentioned embodiment by the hardware integrated logic circuit in the processor 920 or the instruction in the form of software. For the sake of brevity, it will not be repeated here. The program code executed by the processor 920 to implement the above-mentioned method can be stored in the memory 930. The memory 930 is connected to the processor 920, such as a coupling connection.
本申请实施例的一些特征可以由处理器920执行存储器930中的程序指令或者软件代码来完成/支持。存储器930上在加载的软件组件可以从功能或者逻辑上进行概括,例如,图8所示的存储模块801、回放模块803,图8所示的读取模块802的功能可以由通信接口910实现。Some features of the embodiments of the present application may be completed/supported by the processor 920 executing program instructions or software codes in the memory 930. The software components loaded on the memory 930 may be summarized functionally or logically, for example, the functions of the storage module 801 and the playback module 803 shown in FIG8 , and the reading module 802 shown in FIG8 may be implemented by the communication interface 910.
本申请实施例中涉及到的任一通信接口可以是电路、总线、收发器或者其它任意可以用于进行信息交互的装置。比如计算设备900中的通信接口910,示例性地,该其它装置可以是与该计算设备900相连的设备等。Any communication interface involved in the embodiments of the present application may be a circuit, a bus, a transceiver or any other device that can be used for information exchange. For example, the communication interface 910 in the computing device 900, illustratively, the other device may be a device connected to the computing device 900, etc.
本申请实施例还提供一种计算设备集群,该计算设备集群可以包括一个或者多个计算设备,每个计算设备可以具有如图9所示的计算设备900的硬件结构,并且,该计算设备集群在运行过程中,能够用于实现图7所示实施例中的数据处理方法。An embodiment of the present application also provides a computing device cluster, which may include one or more computing devices, each of which may have a hardware structure of a computing device 900 as shown in FIG. 9 , and during operation, the computing device cluster can be used to implement the data processing method in the embodiment shown in FIG. 7 .
基于以上实施例,本申请实施例还提供了一种芯片,包括供电电路以及处理电路,所述供电电路用于对所述处理电路进行供电,所述处理电路用于实现图8所示的数据处理装置800的功能。Based on the above embodiments, the embodiments of the present application further provide a chip, including a power supply circuit and a processing circuit, wherein the power supply circuit is used to supply power to the processing circuit, and the processing circuit is used to implement the functions of the data processing device 800 shown in FIG. 8 .
本申请实施例中涉及的处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。The processor involved in the embodiments of the present application may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present application. A general-purpose processor may be a microprocessor or any conventional processor, etc. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed by a hardware processor, or may be executed by a combination of hardware and software modules in the processor.
本申请实施例中的耦合是装置、模块或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、模块或模块之间的信息交互。The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, modules or modules, which can be electrical, mechanical or other forms, and is used for information exchange between devices, modules or modules.
处理器可能和存储器协同操作。存储器可以是非易失性存储器,比如硬盘或固态硬盘等,还可以是易失性存储器,例如随机存取存储器。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。The processor may operate in conjunction with a memory. The memory may be a non-volatile memory, such as a hard disk or a solid-state drive, or a volatile memory, such as a random access memory. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
本申请实施例中不限定上述通信接口、处理器以及存储器之间的具体连接介质。比如存储器、处理器以及通信接口之间可以通过总线连接。所述总线可以分为地址总线、数据总线、控制总线等。The specific connection medium between the communication interface, processor and memory is not limited in the embodiments of the present application. For example, the memory, processor and communication interface may be connected via a bus. The bus may be divided into an address bus, a data bus, a control bus, etc.
基于以上实施例,本申请实施例还提供了一种计算机存储介质,该存储介质中存储软件程序,该软件程序在被一个或多个计算设备读取并执行时可实现上述任意一个或多个实施例提供的数据处理装置102执行的方法。所述计算机存储介质可以包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。Based on the above embodiments, the embodiments of the present application further provide a computer storage medium, in which a software program is stored, and when the software program is read and executed by one or more computing devices, the method performed by the data processing device 102 provided in any one or more of the above embodiments can be implemented. The computer storage medium may include: a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk, and other media that can store program codes.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented in one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to the flowchart and/or block diagram of the method, device (system) and computer program product according to the embodiment of the present application. It should be understood that each process and/or box in the flowchart and/or block diagram, and the combination of the process and/or box in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processing machine or other programmable device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for realizing the function specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程设备上,使得在计算机或其他可编程设备上执 行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded into a computer or other programmable device so that the computer or other programmable device can execute the program. A series of operation steps are performed to produce a computer-implemented process, so that the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more flows in the flowchart and/or one or more blocks in the block diagram.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。The terms "first", "second", etc. in the specification and claims of this application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchangeable under appropriate circumstances. This is just a way of distinguishing objects with the same attributes when describing the embodiments of this application.
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。 Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the scope of the embodiments of the present application. Thus, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims (30)

  1. 一种数据处理系统,其特征在于,所述数据处理系统包括计算集群、存储集群,所述计算集群与所述存储集群通过网络进行连接,所述计算集群包括主计算节点、从计算节点,所述存储集群包括至少一个存储节点;A data processing system, characterized in that the data processing system comprises a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected via a network, the computing cluster comprises a master computing node and a slave computing node, and the storage cluster comprises at least one storage node;
    所述主计算节点,用于响应于数据更新请求生成二进制日志binlog,并将所述binlog发送至所述存储集群中进行存储;The master computing node is used to generate a binary log binlog in response to a data update request, and send the binlog to the storage cluster for storage;
    所述从计算节点,用于读取所述存储集群中存储的所述binlog,并通过回放所述binlog来更新所述存储集群中持久化存储的数据。The slave computing node is used to read the binlog stored in the storage cluster, and update the data persistently stored in the storage cluster by replaying the binlog.
  2. 根据权利要求1所述的数据处理系统,其特征在于,所述存储集群包括日志存储区域,所述日志存储区域被所述主计算节点以及所述从计算节点访问;The data processing system according to claim 1, characterized in that the storage cluster includes a log storage area, and the log storage area is accessed by the master computing node and the slave computing node;
    所述主计算节点,具体用于将所述binlog发送至所述日志存储区域中进行存储;The main computing node is specifically used to send the binlog to the log storage area for storage;
    所述从计算节点,具体用于从所述日志存储区域中读取所述binlog;The slave computing node is specifically used to read the binlog from the log storage area;
    其中,所述存储集群还包括数据存储区域,所述数据存储区域用于存储业务数据,所述数据存储区域被所述主计算节点以及所述从计算节点访问,或者,所述数据存储区域仅被所述主计算节点访问。The storage cluster further includes a data storage area, which is used to store business data. The data storage area is accessed by the master computing node and the slave computing node, or the data storage area is only accessed by the master computing node.
  3. 根据权利要求2所述的数据处理系统,其特征在于,所述从计算节点,具体用于通过回放所述binlog以与主计算节点同步数据,或者,用于通过回放所述binlog以恢复所述主计算节点故障时丢失的数据。The data processing system according to claim 2 is characterized in that the slave computing node is specifically used to synchronize data with the master computing node by replaying the binlog, or to recover data lost when the master computing node fails by replaying the binlog.
  4. 根据权利要求3所述的数据处理系统,其特征在于,所述至少一个存储节点包括主存储节点以及从存储节点,所述从存储节点作为所述主存储节点的灾备,所述主存储节点与所述从存储节点部署于同一数据中心或者同一可用区;The data processing system according to claim 3, characterized in that the at least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in the same data center or the same availability zone;
    所述从计算节点,具体用于在所述主计算节点正常运行的过程中,从所述日志存储区域中读取所述binlog,并通过回放所述binlog来更新所述从存储节点中持久化存储的数据。The slave computing node is specifically used to read the binlog from the log storage area during the normal operation of the master computing node, and update the data persistently stored in the slave storage node by replaying the binlog.
  5. 根据权利要求3所述的数据处理系统,其特征在于,所述至少一个存储节点包括目标存储节点,所述目标存储节点用于持久化存储所述主计算节点所写入的数据;The data processing system according to claim 3, characterized in that the at least one storage node includes a target storage node, and the target storage node is used to persistently store the data written by the main computing node;
    所述从计算节点,具体用于当所述主计算节点发生故障,从所述日志存储区域中读取所述binlog,并通过回放所述binlo来更新所述目标存储节点中持久化存储的数据。The slave computing node is specifically used to read the binlog from the log storage area when the master computing node fails, and to update the data persistently stored in the target storage node by replaying the binlog.
  6. 根据权利要求1所述的数据处理系统,其特征在于,所述至少一个存储节点包括主存储节点以及从存储节点,所述从存储节点作为所述主存储节点的灾备,所述主存储节点与所述从存储节点部署于不同的数据中心,或者,所述主存储节点与所述从存储节点部署于不同的可用区;The data processing system according to claim 1, characterized in that the at least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in different data centers, or the primary storage node and the secondary storage node are deployed in different availability zones;
    所述主计算节点,具体用于将所述binlog发送至所述主存储节点中进行存储;The main computing node is specifically used to send the binlog to the main storage node for storage;
    所述主存储节点,用于将所述binlog发送给所述从存储节点;The master storage node is used to send the binlog to the slave storage node;
    所述从计算节点,具体用于读取所述从存储节点存储的所述binlog。The slave computing node is specifically used to read the binlog stored by the slave storage node.
  7. 根据权利要求6所述的数据处理系统,其特征在于,The data processing system according to claim 6, characterized in that
    所述主存储节点,还用于在将所述binlog发送给所述从存储节点之前,将基线数据发送给所述从存储节点;The master storage node is further configured to send the baseline data to the slave storage node before sending the binlog to the slave storage node;
    所述从存储节点,用于存储所述基线数据;The slave storage node is used to store the baseline data;
    所述从计算节点,具体用于通过回放所述binlog,以对所述从存储节点中存储的基线数据进行更新。The slave computing node is specifically used to update the baseline data stored in the slave storage node by replaying the binlog.
  8. 根据权利要求1至7任一项所述的数据处理系统,其特征在于,所述主计算节点上运行有目标应用,所述binlog是在所述目标应用运行过程中产生,所述目标应用包括关系数据库管理系统RDBMS,所述RDBMS包括MySQL、PostgreSQL、OpenGauss、Oracle中的至少一种。The data processing system according to any one of claims 1 to 7 is characterized in that a target application is running on the main computing node, the binlog is generated during the running of the target application, the target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle.
  9. 根据权利要求1至8任一项所述的数据处理系统,其特征在于,所述存储节点为存储阵列,所述存储阵列用于持久化存储数据。The data processing system according to any one of claims 1 to 8, characterized in that the storage node is a storage array, and the storage array is used to persistently store data.
  10. 一种数据处理方法,其特征在于,所述方法应用于数据处理系统,所述数据处理系统包括计算集群、存储集群,所述计算集群与所述存储集群通过网络进行连接,所述计算集群包括主计算节点、从计算节点,所述存储集群包括至少一个存储节点;所述方法包括:A data processing method, characterized in that the method is applied to a data processing system, the data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected via a network, the computing cluster includes a master computing node and a slave computing node, and the storage cluster includes at least one storage node; the method includes:
    所述主计算节点响应于数据更新请求生成二进制日志binlog;The main computing node generates a binary log binlog in response to the data update request;
    所述主计算节点将所述binlog发送至所述存储集群中进行存储; The main computing node sends the binlog to the storage cluster for storage;
    所述从计算节点读取所述存储集群中存储的所述binlog;The slave computing node reads the binlog stored in the storage cluster;
    所述从计算节点通过回放所述binlog来更新所述存储集群中持久化存储的数据。The slave computing node updates the data persistently stored in the storage cluster by replaying the binlog.
  11. 根据权利要求10所述的数据处理方法,其特征在于,所述存储集群包括日志存储区域,所述日志存储区域被所述主计算节点以及所述从计算节点访问;The data processing method according to claim 10, characterized in that the storage cluster includes a log storage area, and the log storage area is accessed by the master computing node and the slave computing node;
    所述主计算节点将所述binlog发送至所述存储集群中进行存储,包括:The main computing node sends the binlog to the storage cluster for storage, including:
    所述主计算节点将所述binlog发送至所述日志存储区域中进行存储;The main computing node sends the binlog to the log storage area for storage;
    所述从计算节点读取所述存储集群中存储的所述binlog,包括:The reading the binlog stored in the storage cluster from the computing node includes:
    所述从计算节点从所述日志存储区域中读取所述binlog;The slave computing node reads the binlog from the log storage area;
    其中,所述存储集群还包括数据存储区域,所述数据存储区域用于存储业务数据,所述数据存储区域被所述主计算节点以及所述从计算节点访问,或者,所述数据存储区域仅被所述主计算节点访问。The storage cluster further includes a data storage area, which is used to store business data. The data storage area is accessed by the master computing node and the slave computing node, or the data storage area is only accessed by the master computing node.
  12. 根据权利要求11所述的数据处理方法,其特征在于,所述从计算节点通过回放所述binlog,包括:The data processing method according to claim 11, characterized in that the slave computing node plays back the binlog, comprising:
    所述从计算节点通过回放所述binlog以与主计算节点同步数据,或者,用于通过回放所述binlog以恢复所述主计算节点故障时丢失的数据。The slave computing node synchronizes data with the master computing node by replaying the binlog, or is used to recover data lost when the master computing node fails by replaying the binlog.
  13. 根据权利要求12所述的数据处理方法,其特征在于,所述至少一个存储节点包括主存储节点以及从存储节点,所述从存储节点作为所述主存储节点的灾备,所述主存储节点与所述从存储节点部署于同一数据中心或者同一可用区;The data processing method according to claim 12, characterized in that the at least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in the same data center or the same availability zone;
    所述从计算节点读取所述存储集群中存储的所述binlog,包括:The reading the binlog stored in the storage cluster from the computing node includes:
    所述从计算节点在所述主计算节点正常运行的过程中,从所述日志存储区域中读取所述binlog;The slave computing node reads the binlog from the log storage area during the normal operation of the master computing node;
    所述从计算节点通过回放所述binlog来更新所述存储集群中持久化存储的数据,包括:The slave computing node updates the persistently stored data in the storage cluster by replaying the binlog, including:
    所述从计算节点通过回放所述binlog来更新所述从存储节点中持久化存储的数据。The slave computing node updates the data persistently stored in the slave storage node by replaying the binlog.
  14. 根据权利要求12所述的数据处理方法,其特征在于,所述至少一个存储节点包括目标存储节点,所述目标存储节点用于持久化存储所述主计算节点所写入的数据;The data processing method according to claim 12, characterized in that the at least one storage node includes a target storage node, and the target storage node is used to persistently store the data written by the main computing node;
    所述从计算节点读取所述存储集群中存储的所述binlog,包括:The reading the binlog stored in the storage cluster from the computing node includes:
    所述从计算节点当所述主计算节点发生故障,从所述日志存储区域中读取所述binlog;When the master computing node fails, the slave computing node reads the binlog from the log storage area;
    所述从计算节点通过回放所述binlog来更新所述存储集群中持久化存储的数据,包括:The slave computing node updates the persistently stored data in the storage cluster by replaying the binlog, including:
    所述从计算节点通过回放所述binlog来更新所述目标存储节点中持久化存储的数据。The slave computing node updates the data persistently stored in the target storage node by replaying the binlog.
  15. 根据权利要求10所述的数据处理方法,其特征在于,所述至少一个存储节点包括主存储节点以及从存储节点,所述从存储节点作为所述主存储节点的灾备,所述主存储节点与所述从存储节点部署于不同的数据中心,或者,所述主存储节点与所述从存储节点部署于不同的可用区;The data processing method according to claim 10, characterized in that the at least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in different data centers, or the primary storage node and the secondary storage node are deployed in different availability zones;
    所述主计算节点将所述binlog发送至所述存储集群中进行存储,包括:The main computing node sends the binlog to the storage cluster for storage, including:
    所述主计算节点将所述binlog发送至所述主存储节点中进行存储;The main computing node sends the binlog to the main storage node for storage;
    所述从计算节点读取所述存储集群中存储的所述binlog,包括:The reading the binlog stored in the storage cluster from the computing node includes:
    所述从计算节点,具体用于读取所述从存储节点存储的所述binlog,所述从存储节点中的所述binlog是由所述主存储节点发送的。The slave computing node is specifically used to read the binlog stored in the slave storage node, and the binlog in the slave storage node is sent by the master storage node.
  16. 根据权利要求15所述的数据处理方法,其特征在于,所述方法还包括:The data processing method according to claim 15, characterized in that the method further comprises:
    所述主存储节点在将所述binlog发送给所述从存储节点之前,将基线数据发送给所述从存储节点进行存储;The master storage node sends the baseline data to the slave storage node for storage before sending the binlog to the slave storage node;
    所述从计算节点通过回放所述binlog来更新所述存储集群中持久化存储的数据,包括:The slave computing node updates the persistently stored data in the storage cluster by replaying the binlog, including:
    所述从计算节点通过回放所述binlog,以对所述从存储节点中存储的基线数据进行更新。The slave computing node updates the baseline data stored in the slave storage node by replaying the binlog.
  17. 根据权利要求10至16任一项所述的数据处理方法,其特征在于,所述主计算节点上运行有目标应用,所述binlog是在所述目标应用运行过程中产生,所述目标应用包括关系数据库管理系统RDBMS,所述RDBMS包括MySQL、PostgreSQL、OpenGauss、Oracle中的至少一种。The data processing method according to any one of claims 10 to 16 is characterized in that a target application is running on the main computing node, the binlog is generated during the running of the target application, the target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle.
  18. 根据权利要求10至17任一项所述的数据处理方法,其特征在于,所述存储节点为存储阵列,所述存储阵列用于持久化存储数据。The data processing method according to any one of claims 10 to 17 is characterized in that the storage node is a storage array, and the storage array is used to persistently store data.
  19. 一种数据处理装置,其特征在于,所述数据处理装置应用于数据处理系统,所述数据处理系统包括计算集群、存储集群,所述计算集群与所述存储集群通过网络进行连接,所述计算集群包括主计算 节点、从计算节点,所述存储集群包括至少一个存储节点;所述数据处理装置包括:A data processing device, characterized in that the data processing device is applied to a data processing system, the data processing system includes a computing cluster and a storage cluster, the computing cluster and the storage cluster are connected via a network, the computing cluster includes a main computing cluster and a storage cluster. Node, slave computing node, the storage cluster includes at least one storage node; the data processing device includes:
    存储模块,用于指示所述主计算节点将响应于数据更新请求所生成二进制日志binlog发送至所述存储集群中进行存储;A storage module, used to instruct the master computing node to send a binary log binlog generated in response to a data update request to the storage cluster for storage;
    读取模块,用于指示所述从计算节点读取所述存储集群中存储的所述binlog至所述从计算节点;A reading module, used for instructing the slave computing node to read the binlog stored in the storage cluster to the slave computing node;
    回放模块,用于指示所述从计算节点通过回放所述binlog来更新所述存储集群中持久化存储的数据。The playback module is used to instruct the slave computing node to update the data persistently stored in the storage cluster by playing back the binlog.
  20. 根据权利要求19所述的数据处理装置,其特征在于,所述存储集群包括日志存储区域,所述日志存储区域被所述主计算节点以及所述从计算节点访问;The data processing device according to claim 19, characterized in that the storage cluster includes a log storage area, and the log storage area is accessed by the master computing node and the slave computing node;
    所述存储模块,具体用于指示所述主计算节点将所述binlog发送至所述日志存储区域中进行存储;The storage module is specifically used to instruct the main computing node to send the binlog to the log storage area for storage;
    所述读取模块,具体用于指示所述从计算节点从所述日志存储区域中读取所述binlog至所述从存储节点;The reading module is specifically used to instruct the slave computing node to read the binlog from the log storage area to the slave storage node;
    其中,所述存储集群还包括数据存储区域,所述数据存储区域用于存储业务数据,所述数据存储区域被所述主计算节点以及所述从计算节点访问,或者,所述数据存储区域仅被所述主计算节点访问。The storage cluster further includes a data storage area, which is used to store business data. The data storage area is accessed by the master computing node and the slave computing node, or the data storage area is only accessed by the master computing node.
  21. 根据权利要求20所述的数据处理装置,其特征在于,所述回访模块,具体用于指示所述从计算节点通过回放所述binlog以与主计算节点同步数据,或者,具体用于指示所述从计算节点通过回放所述binlog以恢复所述主计算节点故障时丢失的数据。The data processing device according to claim 20 is characterized in that the revisit module is specifically used to instruct the slave computing node to synchronize data with the master computing node by replaying the binlog, or is specifically used to instruct the slave computing node to recover data lost when the master computing node fails by replaying the binlog.
  22. 根据权利要求21所述的数据处理装置,其特征在于,所述至少一个存储节点包括主存储节点以及从存储节点,所述从存储节点作为所述主存储节点的灾备,所述主存储节点与所述从存储节点部署于同一数据中心或者同一可用区;The data processing device according to claim 21, characterized in that the at least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in the same data center or the same availability zone;
    所述读取模块,具体用于在所述主计算节点正常运行的过程中,指示所述从计算节点从所述日志存储区域中读取所述binlog至所述从计算节点;The reading module is specifically used to instruct the slave computing node to read the binlog from the log storage area to the slave computing node during the normal operation of the master computing node;
    所述回放模块,具体用于指示所述从计算节点通过回放所述binlog来更新所述从存储节点中持久化存储的数据。The playback module is specifically used to instruct the slave computing node to update the data persistently stored in the slave storage node by replaying the binlog.
  23. 根据权利要求21所述的数据处理装置,其特征在于,所述至少一个存储节点包括目标存储节点,所述目标存储节点用于持久化存储所述主计算节点所写入的数据;The data processing device according to claim 21, characterized in that the at least one storage node includes a target storage node, and the target storage node is used to persistently store the data written by the main computing node;
    所述读取模块,具体用于当所述主计算节点发生故障,指示所述从计算节点从所述日志存储区域中读取所述binlog;The reading module is specifically used to instruct the slave computing node to read the binlog from the log storage area when the master computing node fails;
    所述回放模块,具体用于指示所述从计算节点通过回放所述binlog来更新所述目标存储节点中持久化存储的数据。The playback module is specifically used to instruct the slave computing node to update the data persistently stored in the target storage node by replaying the binlog.
  24. 根据权利要求19所述的数据处理装置,其特征在于,所述至少一个存储节点包括主存储节点以及从存储节点,所述从存储节点作为所述主存储节点的灾备,所述主存储节点与所述从存储节点部署于不同的数据中心,或者,所述主存储节点与所述从存储节点部署于不同的可用区;The data processing device according to claim 19, characterized in that the at least one storage node includes a primary storage node and a secondary storage node, the secondary storage node serves as a disaster recovery for the primary storage node, and the primary storage node and the secondary storage node are deployed in different data centers, or the primary storage node and the secondary storage node are deployed in different availability zones;
    所述存储模块,具体用于指示所述主计算节点将所述binlog发送至所述主存储节点中进行存储;The storage module is specifically used to instruct the main computing node to send the binlog to the main storage node for storage;
    所述读取模块,具体用于指示所述从计算节点读取所述从存储节点存储的所述binlog,所述从存储节点中的所述binlog是由所述主存储节点发送的。The reading module is specifically used to instruct the slave computing node to read the binlog stored in the slave storage node, where the binlog in the slave storage node is sent by the master storage node.
  25. 根据权利要求24所述的数据处理装置,其特征在于,所述存储模块,还用于:The data processing device according to claim 24, characterized in that the storage module is further used for:
    指示所述主存储节点在将所述binlog发送给所述从存储节点之前,将基线数据发送给所述从存储节点进行存储;Instructing the master storage node to send the baseline data to the slave storage node for storage before sending the binlog to the slave storage node;
    所述回放模块,具体用于指示所述从计算节点通过回放所述binlog,以对所述从存储节点中存储的基线数据进行更新。The playback module is specifically used to instruct the slave computing node to update the baseline data stored in the slave storage node by replaying the binlog.
  26. 根据权利要求19至25任一项所述的数据处理装置,其特征在于,所述主计算节点上运行有目标应用,所述binlog是在所述目标应用运行过程中产生,所述目标应用包括关系数据库管理系统RDBMS,所述RDBMS包括MySQL、PostgreSQL、OpenGauss、Oracle中的至少一种。The data processing device according to any one of claims 19 to 25 is characterized in that a target application is running on the main computing node, the binlog is generated during the running of the target application, the target application includes a relational database management system RDBMS, and the RDBMS includes at least one of MySQL, PostgreSQL, OpenGauss, and Oracle.
  27. 根据权利要求19至26任一项所述的数据处理装置,其特征在于,所述存储节点为存储阵列,所述存储阵列用于持久化存储数据。The data processing device according to any one of claims 19 to 26 is characterized in that the storage node is a storage array, and the storage array is used to persistently store data.
  28. 一种计算设备集群,其特征在于,所述计算设备集群包括至少一个计算设备,所述至少一个计算设备中的每个计算设备包括处理器和存储器; A computing device cluster, characterized in that the computing device cluster includes at least one computing device, and each computing device in the at least one computing device includes a processor and a memory;
    所述处理器用于执行所述存储器中存储的指令,以使得所述计算设备集群执行权利要求10至18中任一项所述的方法。The processor is configured to execute instructions stored in the memory, so that the computing device cluster executes the method according to any one of claims 10 to 18.
  29. 一种计算机可读存储介质,其特征在于,包括指令,所述指令用于实现权利要求10至18中任一项所述的方法。A computer-readable storage medium, characterized in that it includes instructions, wherein the instructions are used to implement the method according to any one of claims 10 to 18.
  30. 一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行如权利要求10至18中任一项所述的方法。 A computer program product comprising instructions, characterized in that when the computer program product is run on a computer, the computer is caused to perform the method according to any one of claims 10 to 18.
PCT/CN2023/101428 2022-11-02 2023-06-20 Data processing system, method and apparatus, and related device WO2024093263A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202211363509.3 2022-11-02
CN202211363509 2022-11-02
CN202211608424.7 2022-12-14
CN202211608424.7A CN117992467A (en) 2022-11-02 2022-12-14 Data processing system, method and device and related equipment

Publications (1)

Publication Number Publication Date
WO2024093263A1 true WO2024093263A1 (en) 2024-05-10

Family

ID=90901671

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101428 WO2024093263A1 (en) 2022-11-02 2023-06-20 Data processing system, method and apparatus, and related device

Country Status (2)

Country Link
CN (1) CN117992467A (en)
WO (1) WO2024093263A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8069366B1 (en) * 2009-04-29 2011-11-29 Netapp, Inc. Global write-log device for managing write logs of nodes of a cluster storage system
CN108920637A (en) * 2018-07-02 2018-11-30 北京科东电力控制系统有限责任公司 Method for synchronizing data of database and device applied to synchronization subsystem
CN111966652A (en) * 2019-05-20 2020-11-20 阿里巴巴集团控股有限公司 Method, device, equipment, system and storage medium for sharing storage synchronous data
CN114281794A (en) * 2021-11-12 2022-04-05 上海瀚银信息技术有限公司 Database system based on binary log server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8069366B1 (en) * 2009-04-29 2011-11-29 Netapp, Inc. Global write-log device for managing write logs of nodes of a cluster storage system
CN108920637A (en) * 2018-07-02 2018-11-30 北京科东电力控制系统有限责任公司 Method for synchronizing data of database and device applied to synchronization subsystem
CN111966652A (en) * 2019-05-20 2020-11-20 阿里巴巴集团控股有限公司 Method, device, equipment, system and storage medium for sharing storage synchronous data
CN114281794A (en) * 2021-11-12 2022-04-05 上海瀚银信息技术有限公司 Database system based on binary log server

Also Published As

Publication number Publication date
CN117992467A (en) 2024-05-07

Similar Documents

Publication Publication Date Title
US11755415B2 (en) Variable data replication for storage implementing data backup
JP6522812B2 (en) Fast Crash Recovery for Distributed Database Systems
EP3663922B1 (en) Data replication method and storage system
US8015157B2 (en) File sharing system, file server, and method for managing files
US9996421B2 (en) Data storage method, data storage apparatus, and storage device
JP4477950B2 (en) Remote copy system and storage device system
JP2019101703A (en) Storage system and control software arrangement method
JP2017195004A (en) Distributed database with modular blocks and associated log files
US11093387B1 (en) Garbage collection based on transmission object models
CN106407040A (en) Remote data copy method and system
WO2023046042A1 (en) Data backup method and database cluster
JP2005242403A (en) Computer system
US7870095B2 (en) Apparatus, system, and method for replication of data management information
US10628298B1 (en) Resumable garbage collection
WO2024103594A1 (en) Container disaster recovery method, system, apparatus and device, and computer-readable storage medium
CN116204137B (en) Distributed storage system, control method, device and equipment based on DPU
WO2024093263A1 (en) Data processing system, method and apparatus, and related device
WO2023019953A1 (en) Data synchronization method and system, server, and storage medium
WO2022033269A1 (en) Data processing method, device and system
CN115955488A (en) Distributed storage copy cross-computer room placement method and device based on copy redundancy
JP6376626B2 (en) Data storage method, data storage device, and storage device
CN115563221A (en) Data synchronization method, storage system, device and storage medium
CN114518973A (en) Distributed cluster node downtime restarting recovery method
CN115470041A (en) Data disaster recovery management method and device
WO2024078001A1 (en) Data processing system, data processing method and apparatus, and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23884202

Country of ref document: EP

Kind code of ref document: A1