WO2023185934A1 - 数据处理方法及装置 - Google Patents

数据处理方法及装置 Download PDF

Info

Publication number
WO2023185934A1
WO2023185934A1 PCT/CN2023/084738 CN2023084738W WO2023185934A1 WO 2023185934 A1 WO2023185934 A1 WO 2023185934A1 CN 2023084738 W CN2023084738 W CN 2023084738W WO 2023185934 A1 WO2023185934 A1 WO 2023185934A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
request
data storage
data processing
preprocessing
Prior art date
Application number
PCT/CN2023/084738
Other languages
English (en)
French (fr)
Inventor
朱云锋
严祥光
赵帅
Original Assignee
阿里云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里云计算有限公司 filed Critical 阿里云计算有限公司
Publication of WO2023185934A1 publication Critical patent/WO2023185934A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • the embodiments of this specification relate to the field of computer technology, and in particular, to a data processing method.
  • embodiments of this specification provide a data processing method.
  • One or more embodiments of this specification simultaneously relate to a data processing device, a data processing system, a computing device, a computer-readable storage medium, and a computer program to solve the technical deficiencies existing in the prior art. .
  • a data processing method including:
  • the target data generate a data preprocessing request for the target data
  • a data processing device including:
  • the first receiving module is configured to receive a data processing request, wherein the data processing request carries target data;
  • a generation module configured to generate a data preprocessing request for the target data according to the target data
  • the first sending module is configured to send the data preprocessing request to at least two data storage modules respectively;
  • the second sending module is configured to receive the data returned by each data storage module according to the data preprocessing request. In the case of notification of preprocessing completion, send the data processing request to each of the data storage modules;
  • the second receiving module is configured to receive a data processing completion notification returned by each data storage module according to the data processing request.
  • a data processing system including a request processing module and at least two data storage modules, wherein
  • the request processing module is configured to receive a data update request, wherein the data update request carries target data for updating the initial data in the data storage unit contained in the data storage module, and generates based on the target data For the data preprocessing request of the target data, send the data preprocessing request to the at least two data storage modules respectively;
  • the at least two data storage modules are configured to set the data storage unit corresponding to the target data as inaccessible based on the data preprocessing request, and send a preprocessing completion notification to the request processing module;
  • the request processing module is further configured to send the data update request to the at least two data stores upon receiving a preprocessing completion notification returned by each data storage module according to the data preprocessing request. module;
  • the at least two data storage modules are configured to update the initial data in the data storage unit through the target data according to the data update request, and send a data processing completion notification to the request processing module.
  • a computing device including:
  • the memory is used to store computer-executable instructions
  • the processor is used to execute the computer-executable instructions.
  • the steps of the data processing method are implemented.
  • a computer-readable storage medium which stores computer-executable instructions.
  • the steps of the data processing method are implemented.
  • a computer program is provided, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the data processing method.
  • the data processing method provided in this specification receives a data processing request, wherein the data processing request carries target data; generates a data preprocessing request for the target data according to the target data; and converts the data preprocessing request to Send to at least two data storage modules respectively; upon receiving a preprocessing completion notification returned by each data storage module according to the data preprocessing request, send the data processing request to each data storage module ; Receive the data processing completion notification returned by each data storage module according to the data processing request.
  • a data preprocessing request is first generated based on the target data carried in the data processing request and sent to at least two data storage modules; and each data storage module returns after receiving
  • the data processing request is sent to each data storage module, so that each data storage module can obtain the target data, thereby ensuring the data consistency of each data storage module and further avoiding It eliminates the data loss problem caused when any data storage module in at least two data storage modules fails, and ensures data security.
  • Figure 1 is a schematic process diagram of a cross-cluster synchronous replication solution provided by an embodiment of this specification
  • Figure 2 is a schematic diagram of data update in a cross-cluster synchronous replication solution provided by an embodiment of this specification
  • Figure 3 is a schematic structural diagram of a data processing system provided by an embodiment of this specification.
  • Figure 4 is a flow chart of a data processing method provided by an embodiment of this specification.
  • Figure 5 is a schematic diagram of a consistency queue in a data processing method provided by an embodiment of this specification.
  • Figure 6 is a schematic diagram of the processing of reading data from a cluster in a data processing method provided by an embodiment of this specification;
  • Figure 7 is a processing flow chart of a data processing method provided by an embodiment of this specification.
  • Figure 8 is a schematic structural diagram of a data processing device provided by an embodiment of this specification.
  • Figure 9 is a structural block diagram of a computing device provided by an embodiment of this specification.
  • first, second, etc. may be used to describe various information in one or more embodiments of this specification, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other.
  • the first may also be called the second, and similarly, the second may also be called the first.
  • the word "if” as used herein may be interpreted as "when” or “when” or “in response to determining.”
  • Cluster Multiple or even thousands of servers are gathered together and divided into multiple machine groups. Each machine group runs the same service. Each server here is not indispensable. The common role is to alleviate the pressure of concurrent access and Avoid single points of failure, etc., and achieve a highly available, highly scalable, and low-cost distributed system.
  • An availability zone refers to a physical area in the same area where power and network are independent of each other, including one or more IDC computer rooms. Network latency within the same availability zone will be relatively small, and fault isolation can be achieved between different availability zones.
  • RPO Recovery Point Objective
  • RTO Recovery Time Objective
  • Synchronous replication designed for the dimension of data loss in disaster recovery scenarios, usually based on the primary and secondary mechanisms.
  • Asynchronous replication refers to the remote periodic replication of data from the primary to the backup, usually across availability zones or regions.
  • the RPO generally ranges from seconds to minutes, and has little impact on user IO; synchronous replication is remote real-time replication of data, and the RPO is Zero, to ensure the consistency of user data to the maximum extent, which will cause performance loss accordingly.
  • Multi-activity in remote locations means that multiple different physical regions can provide data access services at the same time, and each region is not an active-standby relationship. Multi-activity is a higher-level requirement in the disaster recovery system. When RPO is zero, RTO is also required to be zero. That is, when a single region failure occurs, system services can be restored immediately without any loss of data.
  • Consistency protocol A mechanism for multiple nodes in a distributed system to reach consensus on a proposed value. By continuously reaching consensus on multiple rounds of proposed values, a distributed consensus system is formed.
  • Typical consensus protocols include Paxos, Raft, EPaxos, etc.
  • data protection products can be divided into different disaster recovery levels based on RPO and RTO.
  • RPO and RTO the smaller the RPO and RTO, it means less data loss and faster data recovery time, but the corresponding cost will be higher.
  • data protection products can be divided into asynchronous replication and synchronous replication.
  • asynchronous replication is the remote periodic replication of data, usually across availability zones or regions.
  • RPO generally ranges from seconds to minutes, and has less impact on user IO; synchronous replication is remote real-time replication of data, usually across regions. In the availability zone, the RPO is zero to ensure the consistency of user data to the maximum extent, which will result in performance losses.
  • Asynchronous replication and synchronous replication are important features of storage products. Many cloud storage products and database products are making every effort to build asynchronous replication and synchronous replication. On this basis, Internet, financial and other enterprises have further put forward the need for higher-level disaster recovery capabilities that compress RTO/RPO to zero. That is, when a single region (single availability zone) failure occurs, system services (distributed systems provide storage service) can be restored immediately, and there is no loss of data. Therefore, how storage products should respond to and deploy such higher-level disaster recovery capabilities has become a problem that needs to be solved.
  • Figure 1 is an embodiment of this specification Provides a schematic process diagram of a cross-cluster synchronous replication solution, that is, a replication state machine model.
  • technical solutions based on consistency consensus are widely used because of their natural sequencing capabilities and reliability guarantee of data consistency.
  • consensus based replication treats each cluster as a replicated state machine.
  • the client's write request i.e.
  • Each node in the synchronization system corresponds to the service state machine of a certain cluster.
  • nodes will apply the log data they synchronize to (that is, write requests) to the corresponding service state machine.
  • the service state machine of each cluster provides read access services directly to clients.
  • Figure 2 is a schematic diagram of data update in a cross-cluster synchronous replication solution provided by an embodiment of this specification; among them, the logs updated by the service state machine of each cluster must be first and last, and accordingly The data updates of the state machine cannot be strictly synchronized, so when clients access different clusters, they may see different values corresponding to the same key value, or even read old data. Therefore, the consistency of data between different clusters cannot be strictly guaranteed, resulting in storage data loss when any one of the multiple clusters fails.
  • This specification also relates to a data processing device, a data processing system, a computing device, a computer-readable storage medium and a computer program, Detailed descriptions will be given one by one in the following embodiments.
  • Figure 3 shows a schematic structural diagram of a data processing system according to an embodiment of this specification, wherein the system includes a request processing module 302 and at least two data storage modules 304, where,
  • the request processing module 302 is configured to receive a data update request, wherein the data update request carries target data for updating the initial data in the data storage unit included in the data storage module 304. According to the target data , generate a data preprocessing request for the target data, and send the data preprocessing request to the at least two data storage modules 304 respectively;
  • the at least two data storage modules 304 are configured to set the data storage unit corresponding to the target data as inaccessible based on the data preprocessing request, and send a preprocessing completion notification to the request processing module 302;
  • the request processing module 302 is further configured to, upon receiving a preprocessing completion notification returned by each data storage module 304 according to the data preprocessing request, send the data update request to the at least two Data storage module 304;
  • the at least two data storage modules 304 are configured to update the initial data in the data storage unit through the target data according to the data update request, and send a data processing completion notification to the request processing module 302.
  • the request processing module 302 can be understood as a module that can process the received data and send it to each data storage module 304 and ensure the consistency of the target data of each data storage module 304.
  • the request processing module 302 It can be understood as the cross-cluster log synchronization system (consensus based replication) in the above embodiment.
  • the data storage module 304 can be understood as a module that can store target data; in practical applications, the data storage module 304 can be understood as a cluster, a service state machine (stata machine) in the cluster, a server, an available zone, a data center, a computer The physical disk in the computer, the memory in the computer, etc. This manual does not impose specific restrictions on this. In order to avoid unnecessary redundancy, the following explanation takes the data storage module 304 as a cluster.
  • the initial data can be understood as data stored in the data storage module 304.
  • the initial data can be parameters, multimedia data, documents, applications, scripts, etc. This specification does not specifically limit this.
  • the target data can be understood as data that needs to update the initial parameters in the data storage module 304.
  • the target data can also be parameters, multimedia data, documents, applications, scripts, etc. This specification does not specifically limit this. It should be noted that the data type of the initial object and the data type of the target data may be the same or different.
  • the data storage unit can be understood as a unit in the data storage module 304 that stores the initial data, for example.
  • the data storage unit can be understood as a physical storage medium in the cluster that stores the initial data, or a key value corresponding to the initial data.
  • the data storage unit can be the key value corresponding to the numerical value.
  • the initial data is the value "1"; the data storage unit is the key value "Z”.
  • a data update request can be understood as a request that needs to update the initial data in the data storage module 304 .
  • the data update request can be understood as a request to update the value "1" stored in the cluster to the value "5".
  • the data preprocessing request can be understood as a request that instructs the data storage module 304 to set the data storage unit as inaccessible before performing data update. In practical applications, this is sent in order to avoid the situation where users may read historical data from the cluster due to multiple clusters or the service state machines in the cluster performing data update operations at different speeds. Therefore, the cross-cluster log synchronization system (hereinafter referred to as the synchronization system) can first set the data storage unit storing the initial data to be disabled when at least multiple clusters or the service state machine in the cluster perform data update operations.
  • Access after the data update is completed, set the current status of the data storage unit to accessible; for example, the service state machine in the cluster sets the status corresponding to the key value "Z" to the "Prep (preparation)" status.
  • the read request When the user initiates a read request for the key value "Z” in the "Prep” state, the read request will be blocked, that is, the execution will be suspended.
  • the synchronization system updates the value corresponding to the key value "Z” from the value "1" to the value "5", it will be cleared to the "Prep” state of the key value "Z", avoiding the user's failure in the cluster. The problem of reading old data during the update process of the service state machine.
  • the data preprocessing request can also be a "Prep” request.
  • This "Prep” request can instruct the service state machine in the cluster to set the status of a specific key value to "Prep”.
  • the data update request may be the request "Z ⁇ 1".
  • the preprocessing completion notification can be understood as a notification that the data storage module 304 indicates to the request processing module 302 that it has completed the operation of setting the data storage unit as inaccessible.
  • the data processing completion notification can be understood as a notification that the data storage module 304 instructs (ie informs) the request processing module 302 that it has completed the update operation of the initial data through the target data.
  • data can be received Update request, this data update request carries target data, and the target data is used to update the initial data in the data storage unit included in the data storage module 304.
  • the request processing module 302 After receiving the data update request, the request processing module 302 first needs to generate a data preprocessing request for the target data based on the target data, and send the data preprocessing request to at least two data storage modules 304 for Instruct at least two data storage modules 304 to perform preprocessing work before data update.
  • each data storage module 304 can set the data storage unit corresponding to the target data as inaccessible, and send a preprocessing completion notification to the request processing module 302 to instruct the request processing module 302 to issue Data update request.
  • the request processing module 302 determines that it has received the preprocessing completion notification returned by each data storage module 304 according to the data preprocessing request, it determines that all data storage modules 304 have been prepared, and therefore sends the data processing request. to each data storage module 304.
  • each data storage module 304 After receiving the data update request sent by the request processing module 302, each data storage module 304 updates the initial data in the data storage unit based on the target data carried in the data update request. And a data processing completion notification is sent to the request processing module 302.
  • the request processing module 302 when the request processing module 302 receives the data processing completion notification sent by the data storage module 304, it will continue to send new data update requests and data preprocessing requests to the data storage, which is convenient for each data storage module. 304 Continue to perform data update operations.
  • the request processing module 302 is a synchronization system
  • the data storage module 304 is a synchronization system in the cluster.
  • Service state machine the target data is the value "5"
  • the initial data is the value "1”
  • the data storage unit is the key value "Z”
  • the data preprocessing request is the "Prep Z” request for the key value "Z”
  • the data The update request is a data write request (request "Z ⁇ 1").
  • the synchronization system in the data processing system can, when receiving a data write request (request "Z ⁇ 1"), split each write request into two sub-requests. Therefore, the request "Z ⁇ 1" will be converted into two sub-requests “Z ⁇ 1” and "Prep Z". It should be noted that by splitting a write request into two sub-requests, according to the "Z ⁇ 1" carried in the write request, a service state machine in the cluster can be generated to instruct the service state machine in the cluster to change the state corresponding to the key value "Z” and modify " "Prep Z" request with status "Prep".
  • the two sub-requests will be learned and applied by the state machines of each cluster in strict order and synchronously.
  • the sub-request "Prep Z" is exposed to each cluster state machine for learning, that is, the "Prep Z" request is sent to the service state machine in each cluster, thereby instructing the service state machine in each cluster to The state corresponding to the key value "Z" modifies the "Prep" state.
  • the synchronization system will only further disclose the sub-request after it has been learned and clearly replied by each cluster state machine, that is, after receiving a notification of completion of state change sent by the service state machine in each cluster.
  • "Z ⁇ 1" allows the state machines of each cluster to learn and apply it, that is, change the value "1” corresponding to the key value "Z” to the value "5". This ensures that the data is updated synchronously.
  • the data processing system provided in this manual implements cross-cluster synchronous replication based on the consistency replication state machine.
  • the problem is that the data presented in each cluster state machine is ultimately consistent. Therefore, in a multi-active scenario, the client successively accesses different Cluster state machine, then the data seen may be rolled back (that is, the new version of data is seen first, and then the old version of data is seen), so multi-activity cannot be effectively supported.
  • the data processing system provided in this manual introduces two-stage submission.
  • the state machine modifies the status of the relevant key value based on the Prep sub-request in the first stage, marks the key value as being modified, and then reads the request when facing the key in the "Prep" state.
  • Figure 4 shows a flow chart of a data processing method according to an embodiment of this specification, which specifically includes the following steps.
  • Step 402 Receive a data processing request, where the data processing request carries target data.
  • the data processing request can be understood as a request that can process the target data.
  • the data processing request can be understood as a data update request or a data storage request; the data update request can be referred to the above-mentioned description of the data processing system.
  • the data storage request can be understood as a request to store target data into the data storage module.
  • the data processing request is sent by a dispatcher deployed in the cluster.
  • the data processing request can be sent to the synchronization system through the scheduler deployed in the cluster.
  • the data processing request is sent to each cluster through the synchronization system, thereby ensuring synchronous replication of multiple clusters.
  • the received data processing request includes:
  • the request forwarding unit can be understood as a unit capable of forwarding data processing requests to the request processing module, such as a scheduler deployed in a cluster.
  • any one of the multiple clusters when any one of the multiple clusters receives a data update request and/or data storage request, it can send the data update request and/or data storage request to the requester through the scheduler deployed in the cluster itself. processing module.
  • the data processing request may be a data update request; based on Therefore, after the request processing module receives the data update request, it can subsequently send the data update request to multiple data storage modules through a two-phase submission method, thereby ensuring data consistency of the data storage modules.
  • the specific implementation method is as follows.
  • the received data processing request includes:
  • the request processing platform can receive a data update request carrying target data.
  • a data update request carrying target data.
  • the data update request please refer to the corresponding or corresponding content in the above description of the data processing system.
  • Step 404 Generate a data preprocessing request for the target data according to the target data.
  • the data preprocessing request when the data processing request is a data update request, can be understood as a request instructing the data storage module to set the data storage unit as inaccessible before performing data update; when the data processing request is a data storage
  • the data preprocessing request can be understood as a request that instructs the data storage module to determine the data storage unit used to store the target data before performing data storage, and to set the data storage unit as inaccessible; thus facilitating subsequent The target data can be successfully stored in the data storage unit.
  • generating a data preprocessing request for the target data according to the target data includes:
  • the data processing requests include at least two, determine the target data carried in each data processing request;
  • a data preprocessing request for the data storage unit corresponding to each target data is generated respectively.
  • the target data when the data processing request is a data storage request, can be understood as the data that needs to be stored in the data storage module, and correspondingly, the data storage unit can be understood as the unit where the target data needs to be stored.
  • the data storage unit can be set according to the actual application scenario.
  • the data storage module is a cluster
  • the data storage unit can be a physical storage medium in the cluster or a key value in the cluster.
  • the data processing request includes at least two.
  • the request processing system needs to determine the target data carried in each data processing request; and determine the corresponding data storage of the target data in at least two data storage modules. unit; and generate data preprocessing requests for the data storage unit corresponding to each target data respectively.
  • the main design ideas of the data processing method include three aspects: consistency queue, two-stage submission, and read-write separation.
  • the consistency queue refers to the decoupling of the cluster service state machine and the cross-cluster log synchronization system, so that the cross-cluster log synchronization system relies on a separate queue based on the consistency protocol, through which the data processing request and the corresponding The data preprocessing request is sent to the cluster's service state machine. Subsequently, the state machine of each cluster needs to actively learn from the queue and confirm it to update local data.
  • Such a decoupled design can smoothly support cluster state machine expansion; the specific implementation method is as follows.
  • the data processing request and the corresponding data preprocessing request are stored in the request sending queue.
  • the request processing sequence information can be understood as the request receiving time corresponding to each data processing request, and the request receiving time can be understood as the time when the request processing platform receives each data processing request.
  • the request processing sequence information can also be The sequence number, serial number, ID, etc. assigned by the request processing platform to the data processing request based on the request reception time.
  • the request processing sequence information corresponding to the first data processing request received by the request processing platform can be the sequence number "1"
  • the request processing sequence information corresponding to the second data processing request received is the sequence number. "2".
  • the request processing platform determines the request processing sequence information corresponding to each data processing request, and preprocesses the data corresponding to the data processing request according to the request processing sequence information.
  • the request is stored in the request sending queue, and the request sending queue can be a consistency queue.
  • Figure 5 is a schematic diagram of a consistency queue in a data processing method provided by an embodiment of this specification.
  • the consistency queue stores multiple data write requests, and the sub-request "Prep" corresponding to each data write request. Z".
  • the two-phase apply in the main design idea of the data processing method means that for each write request, the replication queue based on the consistency protocol will Split it into two subrequests. For example, the request "Z ⁇ 1" will be converted into two sub-requests "Z ⁇ 1" and "Prep Z". The two sub-requests will be learned and applied by the state machines of each cluster in strict order and synchronously.
  • the replication queue will first expose the sub-request "Prep Z" to each cluster state machine for learning. Only after the sub-request is learned and clearly replied by each cluster state machine, the replication queue will further expose the sub-request "Z ⁇ 1". Let the state machines of each cluster learn and apply them to ensure that data is updated synchronously. That is, only the "Prep" request is applied (run) by each cluster, and then a formal request (data processing request) is made for each cluster to learn (learn). See step 406 to step 408 for details.
  • Step 406 Send the data preprocessing requests to at least two data storage modules respectively.
  • the request processing platform can first send data preprocessing requests to at least two data storage modules respectively.
  • the request processing module stores the data processing request and the data preprocessing request in the request issuance queue.
  • the data processing request and the data preprocessing request that first enter the queue can be determined from the request sending queue in a first-in, first-out manner; the data preprocessing request is obtained, and the data preprocessing request is sent to at least two data storage module. For example, see Figure 5.
  • the synchronization system stores the data write request and the "Prep" request in the consistency queue, it can first send the "Prep Z" request to multiple clusters, and then receive the cluster's response to the "Prep Z" request. After replying, send the request "Z ⁇ 1" to multiple clusters.
  • the synchronization system will issue "Prep" requests and data write requests to multiple clusters by processing one request at a time based on the consistency queue.
  • Prep data consistency is ensured, the efficiency of task delivery is low. Therefore, the data processing method provided in this specification can simultaneously execute multiple data processing requests for different data storage units, thereby improving the efficiency of request issuance while ensuring data consistency; the specific implementation method is as follows.
  • the identification information of the data storage unit can be understood as information that uniquely identifies a data storage unit. For example, when the data storage unit is a key value, the identification information is the name of the key value; when the data storage unit is a physical disk When a storage area in a storage area, the identification information can be the number of the storage area.
  • the consistency queue stores multiple data write requests ("X ⁇ 4" “Y ⁇ 7” “Y ⁇ 5" “Z ⁇ 1”) and the corresponding "Prep” requests ( “Prep Y, Y, Z).
  • the synchronization system determines the target key value (X, Y, Z) from multiple key values based on the key value name; based on the position of the key value in the consistency queue, determines the "key value” corresponding to the key value.
  • Prep (ie “Prep X”, “Prep Y” and “Prep Z”), and send the "Prep X” request, "Prep Y” request, and "Prep Z” request to multiple clusters.
  • Step 408 After receiving the preprocessing completion notification returned by each data storage module according to the data preprocessing request, send the data processing request to each data storage module.
  • the preprocessing completion notification is a notification generated after each data storage module sets the data storage unit corresponding to the target data to be inaccessible based on the data preprocessing request.
  • the preprocessing completion notification can be the service state machine in the cluster. After setting the state corresponding to the key value "Z" to the "Prep (preparation)" state, the synchronization The request sent by the system. Among them, when the cluster receives a read request for the key value "Z", the request will be blocked (execution suspended).
  • the read-write separation in the main design idea of the data processing method provided in this manual means: write requests can be continuously submitted to the replication queue based on the consistency protocol, and both sub-requests will be submitted based on the consistency protocol. and persist. Among them, the two sub-requests will be persisted, which means that the synchronization system can store the two sub-requests in the local disk. The data stored in the local disk will not be lost when the synchronization system is powered off or shut down. is lost, thus ensuring that the two sub-requests can persist.
  • the access performance of write requests can be fully guaranteed.
  • the data processing method provided in this manual can be used in remote multi-active scenarios, so read requests are all strongly consistent reads.
  • the read request accesses the state machine, if the state corresponding to a certain key value is Prep state , then the read request will be blocked until the relevant second-stage sub-request is learned and applied, the value corresponding to the key value is updated, and then the read request will return. That is, the architecture based on the separation of reading and writing ensures the throughput performance of writing on the one hand, and the strong consistency of reading on the other.
  • Figure 6 is a schematic diagram of reading data from a cluster in a data processing method provided by an embodiment of this specification.
  • the data processing method provided in this specification adopts a read-write separation design.
  • the key value reaches the "Prep" state
  • Figure 6 The data processing method provided in this manual supports multi-active cross-cluster synchronous replication in remote locations. Therefore, there is no so-called active and standby clusters in the entire system, and all clusters can directly provide services to the outside world.
  • Figure 6 shows an example of how to implement strong consistent reads in the solution.
  • the write request will be converted into two sub-requests, "Prep Z” and "Z ⁇ 1".
  • the state machine applies a sub-request “Prep Z”
  • the state of the relevant key value Z is marked with "Prep”, which means that the key value is in a modified state, and the read access request needs to block until the state machine is updated.
  • the state machine applies subrequest "Z ⁇ 1”, which removes the "Prep” status mark and changes the value corresponding to the key value to 1.
  • Read requests can directly return key values without a "Prep" status tag.
  • the method of sending the data processing request to each data storage module includes:
  • the synchronization system After the synchronization system receives the reply notification returned by each cluster for the "Prep Z" request, it can determine the request "Z ⁇ 1" corresponding to the "Prep Z" request from the consistency queue, and send the request "Z ⁇ 1" is sent to each cluster.
  • sending the data processing request to each data storage module includes:
  • the synchronization system After the synchronization system receives the reply notification returned by each cluster for the "Prep X" request, "Prep Y” request, or “Prep Z” request, it can determine the corresponding "Prep X" request from the consistency queue , "Prep Y” request or "Prep Z” request corresponding data write request, and send the data write request to each cluster.
  • Step 410 Receive the data processing completion notification returned by each data storage module according to the data processing request.
  • the data processing completion notification can be understood as a notification generated after each data storage module stores the target data into the data storage unit. That is, each cluster stores the numerical value. After "1" is stored in the corresponding physical storage area or key value, a notification of data storage completion will be returned to the synchronization system.
  • the synchronization system has been informed that the cluster has completed data storage; this will facilitate subsequent synchronization systems to determine that all clusters have completed data storage. Afterwards, proceed with subsequent data processing requests.
  • the data processing completion notification is a notification generated by each data storage module after updating the initial data in the data storage unit through the target data according to the data processing request. That is to say, after each cluster updates the value "1" stored in the key value "Z” based on the value "5", it will return a notification of data update completion to the synchronization system, informing the synchronization system that the cluster has been completed. Data update; it is convenient for the subsequent synchronization system to continue executing subsequent data processing requests after confirming that all clusters have completed data updates.
  • the data storage module After the data storage module completes processing of the target data based on the data processing request, it can send a data processing completion notification to the request processing module.
  • the request processing module can receive the data processing completion notification returned by each data storage module according to the data processing request.
  • the request processing module can also receive read access requests sent by the scheduler deployed in the cluster; this read access request can be sent to the scheduler for the client. System request.
  • the request processing module can convert the read access into a no-op write request (no-operation read request), treat the no-op write request as a data processing request, and perform the above-mentioned steps for the data processing request. operation, or store the no-op write request in the queue as a data processing request, and send the no-op write request to each cluster through the queue, or send the read access request to the cluster; when sending the read access request.
  • the state machine in the cluster after waiting for the no-op write request to be learned and applied to this state machine, can directly read the data corresponding to the local state machine key value. Based on this, this method does not require two-stage submission of the consistent replication queue. Instead, a state machine is used to convert each read request into a write request to synchronize the latest data to ensure strong read consistency.
  • the data processing method provided in this specification when receiving a data processing request, first generates a data preprocessing request based on the target data carried in the data processing request, and sends it to at least two data storage modules; and after receiving When each data storage module returns a preprocessing completion notification, the data processing request is sent to each data storage module, so that each data storage module can obtain the target data, thereby ensuring that each data storage module
  • the data consistency further avoids the data loss problem caused when any data storage module in at least two data storage modules fails, ensuring data security.
  • the above is a schematic solution of a data processing method in this embodiment. It should be noted that the technical solution of this data processing method belongs to the same concept as the technical solution of the above-mentioned data processing system. For details that are not described in detail in the technical solution of the data processing method, please refer to the description of the technical solution of the above-mentioned data processing system. .
  • FIG. 7 shows a process flow chart of a data processing method provided by an embodiment of this specification, which specifically includes the following steps.
  • Step 702 The client sends a data write request to any one of the multiple clusters.
  • the data write request can be the request "Z ⁇ 1".
  • Step 704 After receiving the data write request, the cluster sends the data write request to the synchronization system through the scheduler.
  • the synchronization system is a cross-cluster log synchronization system.
  • Step 706 The synchronization system generates a "Prep Z" request for the data write request.
  • Step 708 The synchronization system stores the request "Z ⁇ 1" and the corresponding "Prep Z" request into the consistency queue.
  • Step 710 The synchronization system first sends the "Prep Z" request in the consistency queue to the state machine of each cluster.
  • Step 712 After receiving the "Prep Z" request, the state machine of each cluster modifies the status of the key value "Z" to "Prep” and replies to the synchronization system with a status modification notification.
  • Step 714 After receiving the status modification notification of each cluster, the synchronization system sends the request "Z ⁇ 1" to each cluster.
  • Step 716 Based on the request "Z ⁇ 1", the state machine of each cluster modifies the value corresponding to the key value “Z” to "1", resets the "Prep” state of the key value "Z”, and returns The synchronization system reply data writing is completed.
  • the problem with the data processing method provided in this manual and the cross-cluster synchronous replication based on the consistency replication state machine is that the data presented in each cluster state machine is ultimately consistent. Therefore, in a multi-active scenario, the client successively accesses different Cluster state machine, then the data seen may be rolled back (that is, the new version of data is seen first, and then the old version of data is seen), so multi-activity cannot be effectively supported.
  • the data processing system provided in this manual introduces two-stage submission.
  • the state machine modifies the status of the relevant key value based on the Prep sub-request in the first stage, marks the key value as being modified, and then reads the request when facing the key in the "Prep" state.
  • FIG. 8 shows a schematic structural diagram of a data processing device provided by an embodiment of this specification. As shown in Figure 8, the device includes:
  • the first receiving module 802 is configured to receive a data processing request, where the data processing request carries target data;
  • the generation module 804 is configured to generate a data preprocessing request for the target data according to the target data;
  • the first sending module 806 is configured to send the data preprocessing request to at least two data storage modules respectively;
  • the second sending module 808 is configured to send the data processing request to each data storage module upon receiving a preprocessing completion notification returned by each data storage module according to the data preprocessing request;
  • the second receiving module 810 is configured to receive a data processing completion notification returned by each data storage module according to the data processing request.
  • the generation module 804 is also configured to:
  • the data processing requests include at least two, determine the target data carried in each data processing request;
  • a data preprocessing request for the data storage unit corresponding to each target data is generated respectively.
  • the data processing device also includes a storage module, configured as:
  • the data processing request and the corresponding data preprocessing request are stored in the request sending queue.
  • the first sending module 806 is also configured to:
  • the second sending module 808 is also configured to:
  • the first sending module 806 is also configured to:
  • the second sending module 808 is also configured to:
  • the preprocessing completion notification is a notification generated after each data storage module sets the data storage unit corresponding to the target data to be inaccessible based on the data preprocessing request.
  • the first receiving module 802 is also configured to:
  • the first receiving module 802 is also configured to:
  • the second receiving module 810 is also configured to:
  • the data processing completion notification is a notification generated by each data storage module after updating the initial data in the data storage unit through the target data according to the data processing request.
  • the data processing system when receiving a data processing request, first generates a data preprocessing request based on the target data carried in the data processing request, and sends it to at least two data storage modules; and after receiving each When each data storage module returns a preprocessing completion notification, the data processing request is sent to each data storage module, so that each data storage module can obtain the target data, thereby ensuring that the data of each data storage module Consistency further avoids the problem of data loss caused when any data storage module in at least two data storage modules fails, ensuring data security.
  • the above is a schematic solution of a data processing device in this embodiment. It should be noted that the technical solution of the data processing device and the technical solution of the above-mentioned data processing method belong to the same concept. For details that are not described in detail in the technical solution of the data processing device, please refer to the description of the technical solution of the above-mentioned data processing method. .
  • Figure 9 shows a structural block diagram of a computing device 900 provided according to an embodiment of this specification.
  • Components of the computing device 900 include, but are not limited to, memory 910 and processor 920 .
  • the processor 920 and the memory 910 are connected through a bus 930, and the database 950 is used to save data.
  • Computing device 900 also includes an access device 940 that enables computing device 900 to communicate via one or more networks 960 .
  • networks include Public Switched Telephone Network (PSTN), Local Area Network (LAN), Broadcast A network (WAN), a personal area network (PAN), or a combination of communications networks such as the Internet.
  • Access device 940 may include one or more of any type of network interface (eg, a network interface card (NIC)), wired or wireless, such as an IEEE 802.11 Wireless Local Area Network (WLAN) wireless interface, a Worldwide Interconnection for Microwave Access (WLAN) Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC) interface, etc.
  • NIC network interface card
  • WLAN Wireless Local Area Network
  • WLAN Worldwide Interconnection for Microwave Access
  • USB Universal Serial Bus
  • cellular network interface Bluetooth interface
  • NFC Near Field Communication
  • the above-mentioned components of the computing device 900 and other components not shown in FIG. 9 may also be connected to each other, for example, through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 9 is for illustrative purposes only and does not limit the scope of this description. Those skilled in the art can add or replace other components as needed.
  • Computing device 900 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet computer, personal digital assistant, laptop computer, notebook computer, netbook, etc.), a mobile telephone (e.g., smartphone ), a wearable computing device (e.g., smart watch, smart glasses, etc.) or other type of mobile device, or a stationary computing device such as a desktop computer or PC.
  • a mobile computer or mobile computing device e.g., tablet computer, personal digital assistant, laptop computer, notebook computer, netbook, etc.
  • a mobile telephone e.g., smartphone
  • a wearable computing device e.g., smart watch, smart glasses, etc.
  • stationary computing device such as a desktop computer or PC.
  • Computing device 900 may also be a mobile or stationary server.
  • the processor 920 is configured to execute the following computer-executable instructions. When the computer-executable instructions are executed by the processor 920, the steps of the above data processing method are implemented.
  • the above is a schematic solution of a computing device in this embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above-mentioned data processing method belong to the same concept. For details that are not described in detail in the technical solution of the computing device, please refer to the description of the technical solution of the above data processing method.
  • An embodiment of the present specification also provides a computer-readable storage medium that stores computer-executable instructions.
  • the computer-executable instructions are executed by a processor, the steps of the above data processing method are implemented.
  • An embodiment of the present specification also provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to perform the steps of the above data processing method.
  • the computer instructions include computer program code, which may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signals telecommunications signals
  • software distribution media etc.
  • the content contained in the computer-readable medium can be appropriately added or deleted according to the requirements of legislation and patent practice in the jurisdiction.
  • the computer-readable medium Excludes electrical carrier signals and telecommunications signals.

Abstract

本说明书实施例提供数据处理方法及装置,其中所述数据处理方法包括:接收数据处理请求,其中,所述数据处理请求中携带目标数据;根据所述目标数据,生成针对所述目标数据的数据预处理请求;将所述数据预处理请求分别发送给至少两个数据存储模块;在接收到每个数据存储模块根据所述数据预处理请求返回的预处理完成通知的情况下,将所述数据处理请求发送至所述每个数据存储模块;接收所述每个数据存储模块根据所述数据处理请求返回的数据处理完成通知。从而保证了每个数据存储模块的数据一致性,进一步避免了当至少两个数据存储模块中任意数据存储模块发生故障时,所造成的数据丢失问题,保证了数据的安全性。

Description

数据处理方法及装置
本申请要求于2022年03月31日提交中国专利局、申请号为202210333042.1、申请名称为“数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本说明书实施例涉及计算机技术领域,特别涉及一种数据处理方法。
背景技术
随着大数据时代的来临,数据已经成为许多企业的核心资产,因此,数据容灾也成为了众多企业的普适需求;特别是针对数据容灾需求较高的金融、互联网等企业,为了进一步避免数据丢失,会将数据存储至分布式系统的多个数据存储节点。但是,当分布式系统中任意数据存储节点出现故障时,会导致整个分布式系统出现数据丢失、数据不一致等问题,严重影响了数据的安全性。
发明内容
有鉴于此,本说明书实施例提供了一种数据处理方法。本说明书一个或者多个实施例同时涉及一种数据处理装置,一种数据处理系统,一种计算设备,一种计算机可读存储介质,一种计算机程序,以解决现有技术中存在的技术缺陷。
根据本说明书实施例的第一方面,提供了一种数据处理方法,包括:
接收数据处理请求,其中,所述数据处理请求中携带目标数据;
根据所述目标数据,生成针对所述目标数据的数据预处理请求;
将所述数据预处理请求分别发送给至少两个数据存储模块;
在接收到每个数据存储模块根据所述数据预处理请求返回的预处理完成通知的情况下,将所述数据处理请求发送至所述每个数据存储模块;
接收所述每个数据存储模块根据所述数据处理请求返回的数据处理完成通知。
根据本说明书实施例的第二方面,提供了一种数据处理装置,包括:
第一接收模块,被配置为接收数据处理请求,其中,所述数据处理请求中携带目标数据;
生成模块,被配置为根据所述目标数据,生成针对所述目标数据的数据预处理请求;
第一发送模块,被配置为将所述数据预处理请求分别发送给至少两个数据存储模块;
第二发送模块,被配置为在接收到每个数据存储模块根据所述数据预处理请求返回的 预处理完成通知的情况下,将所述数据处理请求发送至所述每个数据存储模块;
第二接收模块,被配置为接收所述每个数据存储模块根据所述数据处理请求返回的数据处理完成通知。
根据本说明书实施例的第三方面,提供了一种数据处理系统,包括请求处理模块以及至少两个数据存储模块,其中
所述请求处理模块,被配置为接收数据更新请求,其中,所述数据更新请求中携带对数据存储模块包含的数据存储单元中的初始数据,进行更新的目标数据,根据所述目标数据,生成针对所述目标数据的数据预处理请求,将所述数据预处理请求分别发送给所述至少两个数据存储模块;
所述至少两个数据存储模块,被配置为基于所述数据预处理请求,将所述目标数据对应的数据存储单元设为不可访问,并向所述请求处理模块发送预处理完成通知;
所述请求处理模块,还被配置为在接收到每个数据存储模块根据所述数据预处理请求返回的预处理完成通知的情况下,将所述数据更新请求发送至所述至少两个数据存储模块;
所述至少两个数据存储模块,被配置为根据所述数据更新请求,通过所述目标数据更新所述数据存储单元中的初始数据,并向所述请求处理模块发送数据处理完成通知。
根据本说明书实施例的第四方面,提供了一种计算设备,包括:
存储器和处理器;
所述存储器用于存储计算机可执行指令,所述处理器用于执行所述计算机可执行指令,该计算机可执行指令被处理器执行时实现所述数据处理方法的步骤。
根据本说明书实施例的第五方面,提供了一种计算机可读存储介质,其存储有计算机可执行指令,该计算机可执行指令被处理器执行时实现所述数据处理方法的步骤。
根据本说明书实施例的第六方面,提供了一种计算机程序,其中,当所述计算机程序在计算机中执行时,令计算机执行所述数据处理方法的步骤。
本说明书提供的数据处理方法,接收数据处理请求,其中,所述数据处理请求中携带目标数据;根据所述目标数据,生成针对所述目标数据的数据预处理请求;将所述数据预处理请求分别发送给至少两个数据存储模块;在接收到每个数据存储模块根据所述数据预处理请求返回的预处理完成通知的情况下,将所述数据处理请求发送至所述每个数据存储模块;接收所述每个数据存储模块根据所述数据处理请求返回的数据处理完成通知。
具体地,在接收到数据处理请求的情况下,先将基于该数据处理请求中携带的目标数据生成数据预处理请求,发送给至少两个数据存储模块;并在接收到每个数据存储模块返回的预处理完成通知的情况下,再将数据处理请求发送给每个数据存储模块,从而使得每个数据存储模块能够获取该目标数据,从而保证了每个数据存储模块的数据一致性,进一步避免了当至少两个数据存储模块中任意数据存储模块发生故障时,所造成的数据丢失问题,保证了数据的安全性。
附图说明
图1是本说明书一个实施例提供的一种跨集群同步复制方案的过程示意图;
图2是本说明书一个实施例提供的一种跨集群同步复制方案中数据更新的示意图;
图3是本说明书一个实施例提供的一种数据处理系统的结构示意图;
图4是本说明书一个实施例提供的一种数据处理方法的流程图;
图5是本说明书一个实施例提供的一种数据处理方法中一致性队列的示意图;
图6是本说明书一个实施例提供的一种数据处理方法中从集群读取数据的处理示意图;
图7是本说明书一个实施例提供的一种数据处理方法的处理过程流程图;
图8是本说明书一个实施例提供的一种数据处理装置的结构示意图;
图9是本说明书一个实施例提供的一种计算设备的结构框图。
具体实施方式
在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本说明书内涵的情况下做类似推广,因此本说明书不受下面公开的具体实施的限制。
在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书一个或多个实施例范围的情况下,第一也可以被称为第二,类似地,第二也可以被称为第一。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
首先,对本说明书一个或多个实施例涉及的名词术语进行解释。
集群:多台乃至成千上万台服务器集中在一起,划分成多个机器组,每个机器组运行相同服务,这里每台服务器都不是不可或缺的,共同存在作用是缓解并发访问压力和避免单点故障等,进而实现一套高可用、高可扩展、低成本的分布式系统。
可用区(Available Zone):可用区是指在同一地区内,电力和网络互相独立的物理区域,包含一个或多个IDC机房。同一可用区内网络延时会比较小,不同的可用区之间能够做到故障隔离。
RPO(Recovery Point Objective):容灾系统中针对数据恢复点目标,以时间为单位,即灾难发生后系统和数据必须恢复的时间点要求。RPO标志系统能够容忍的最大数据丢失 量。系统容忍丢失数据量越小,RPO值越小。
RTO(Recovery Time Objective):容灾系统中针对服务恢复点目标,以时间为单位,即灾难发生后系统功能从停止到必须恢复的时间要求。RTO标志系统能够容忍的服务停止最长时间。服务紧迫性要求越高,RTO值越小。
同步复制:针对容灾场景中数据丢失维度的设计,通常基于主备机制。异步复制是指远程周期性地从主复制数据到备,通常跨可用区或者跨地区,RPO一般从秒级到分钟级,对用户IO影响较小;同步复制是远程实时的复制数据,RPO为零,最大限度保证用户数据一致性,相应的会带来性能损耗。
异地多活:异地多活是指多个不同物理地区之间能够同时提供数据访问服务,各个地区不是主备关系。多活是容灾系统中更高级别的要求,RPO为零的同时要求RTO也为零,即发生单地区故障的时候,系统服务可以立即恢复,数据也同样没有任何丢失。
一致性协议:分布式系统中的多个节点就一个提议值达成共识的机制,通过连续针对多轮提议值达成共识,从而形成分布式一致性系统。典型的一致性协议包括Paxos,Raft,EPaxos等。
随着大数据时代的来临,数据已经成为企业的核心资产、企业的生命线,数据容灾也因此成为了众多企业级用户的普适需求,特别是对互联网、金融等企业。而许多企业可以通过数据保护产品,来实现数据容灾的需求。数据保护产品可以根据RPO,RTO来划分为不同的容灾等级,通常来说,RPO,RTO越小,意味着更少的数据丢失和更快的数据恢复时间,但相应的成本也会更高。根据RPO数值和RTO数值的从小到大,数据保护产品可以分为异步复制和同步复制。其中,异步复制是远程周期性的复制数据,通常是跨可用区或者跨地区的,RPO一般从秒级到分钟级,对用户IO影响较小;同步复制是远程实时的复制数据,通常是跨可用区的,RPO为零,最大限度的保证用户数据一致性,相应的会带来性能损耗。
异步复制、同步复制作为存储产品的一个重要特性,许多云存储产品以及数据库产品都在全力建设异步复制、同步复制。在此基础之上,互联网、金融等企业进一步提出了将RTO/RPO压缩至零的更高级别容灾能力需求,即发生单地区(单个可用区)故障的时候,系统服务(分布式系统提供的存储服务)可以立即恢复,并且保证数据同样也没有任何丢失。因此,存储产品应该如何应对以及布局这类更高级别容灾能力需,成为需要解决的问题。
基于此,本说明书考虑到需要实现冗余容灾的技术的问题,提供了数据复制以及一致性共识的方案;具体为基于一致性协议的跨集群同步复制方案,图1是本说明书一个实施例提供的一种跨集群同步复制方案的过程示意图,也即复制状态机模型。参见图1,基于一致性共识的技术方案,因为其天然定序能力以及数据一致性的可靠性保障,被比较多地采用。通过在跨集群中间部署一个基于一致性协议的同步系统(即图中的跨集群日志同步 系统,consensus based replication),将每个集群视为复制状态机。客户端的写请求(即图中的“X←4”“Y←7”“Y←5”“Z←1”)会被每个集群中配置的调度程序,转发到这个跨集群的同步系统中。同步系统中每一个节点都对应某个集群的服务状态机。同步系统中节点会将自己同步到的日志数据(即写请求)应用到相应服务状态机。每个集群的服务状态机直接对客户端提供读访问服务。
参见图1可知,多个集群中的任意一个集群出现故障,并不会导致数据丢失。而且基于一致性协议的跨集群日志同步系统不仅是支持严格的全局定序,还提供了数据容灾能力。部分节点的失败并不会影响整体服务可用性以及数据完整性。因此,基于一致性协议可以比较容易地实现RPO=0的同步复制能力。
但是,基于一致性协议的同步复制实现简单,但是也同样存在缺点,即数据更新状态在各个机房是最终一致。如图2所示,图2是本说明书一个实施例提供的一种跨集群同步复制方案中数据更新的示意图;其中,每个集群的服务状态机更新到的日志必然是有先有后,相应状态机的数据更新无法做到严格同步,因此客户端访问不同集群可能会看到同一个键值对应不同值,甚至读到旧数据。因此,无法严格保证不同集群之间数据的一致性,导致到多个集群中的任意一个集群出现故障时,同样会存储数据丢失的问题。
基于此,在本说明书中,提供了一种数据处理方法,本说明书同时涉及一种数据处理装置,一种数据处理系统,一种计算设备,一种计算机可读存储介质以及一种计算机程序,在下面的实施例中逐一进行详细说明。
图3示出了根据本说明书一个实施例提供的一种数据处理系统的结构示意图,其中,所述系统包括请求处理模块302以及至少两个数据存储模块304,其中,
所述请求处理模块302,被配置为接收数据更新请求,其中,所述数据更新请求中携带对数据存储模块304包含的数据存储单元中的初始数据,进行更新的目标数据,根据所述目标数据,生成针对所述目标数据的数据预处理请求,将所述数据预处理请求分别发送给所述至少两个数据存储模块304;
所述至少两个数据存储模块304,被配置为基于所述数据预处理请求,将所述目标数据对应的数据存储单元设为不可访问,并向所述请求处理模块302发送预处理完成通知;
所述请求处理模块302,还被配置为在接收到每个数据存储模块304根据所述数据预处理请求返回的预处理完成通知的情况下,将所述数据更新请求发送至所述至少两个数据存储模块304;
所述至少两个数据存储模块304,被配置为根据所述数据更新请求,通过所述目标数据更新所述数据存储单元中的初始数据,并向所述请求处理模块302发送数据处理完成通知。
其中,请求处理模块302可以理解为能够对接收到数据处理发送至每个数据存储模块304,且保证每个数据存储模块304的目标数据一致性的模块。例如,该请求处理模块302 可以理解为上述实施例中的跨集群日志同步系统(consensus based replication)。
该数据存储模块304可以理解为能够存储目标数据的模块;在实际应用中,该数据存储模块304可以理解为集群、集群中的服务状态机(stata machine)、服务器、可用区、数据中心、计算机中的物理磁盘、计算机中的存储器等,本说明书对此不做具体限制。为了避免过多赘述,下述以数据存储模块304为集群进行解释说明。
该初始数据可以理解为数据存储模块304中存储的数据,例如,该初始数据可以为参数、多媒体数据、文档、应用程序、脚本等,本说明书对此不做具体限定。该目标数据可以理解为需要对数据存储模块304中的初始参数进行更新的数据,该目标数据同样可以为参数、多媒体数据、文档、应用程序、脚本等,本说明书对此不做具体限定。需要说明的是,该初始对象的数据类型与目标数据的数据类型可以相同,也可以不同。
该数据存储单元可以理解为数据存储模块304中存储该初始数据的单元,例如。该数据存储单元可以理解为集群中存储该初始数据的一块物理存储介质、或者该初始数据对应的键值。在实际应用中,当初始数据为数值的情况下,该数据存储单元可以为该数值对应的键值。例如,该初始数据为数值“1”;该数据存储单元为键值“Z”。
数据更新请求可以理解为需要对数据存储模块304中的初始数据进行更新的请求。例如,在初始数据为数值“1”;目标数据为数值“5”的情况下,该数据更新请求可以理解为将集群中存储的数值“1”更新为数值“5”的请求。
数据预处理请求可以理解为指示数据存储模块304在进行数据更新之前,将数据存储单元设为不可访问的请求。在实际应用中,为了避免由于多个集群,或集群中的服务状态机执行数据更新操作的速度不同,导致用户可能从集群中读取到历史数据的情况发送。因此,跨集群日志同步系统(下述简称为同步系统)能够在至少多个集群,或集群中的服务状态机执行数据更新操作的过程中,先将存储有初始数据的数据存储单元设置为不可访问,当数据更新完成之后,再将该数据存储单元的当前状态设置为可访问;例如,集群中的服务状态机将键值“Z”对应的状态设置为“Prep(准备)”状态。用户针对“Prep”状态的键值“Z”发起读请求时,该读请求会被block住,也即是暂停执行。并且,在同步系统在将键值“Z”对应的数值,从数值“1”更新为数值“5”之后,将清理到键值“Z”的“Prep”状态,避免了用户在集群中的服务状态机进行更新过程中,读取到旧数据的问题。基于此,该数据预处理请求也可以为“Prep”请求。通过该“Prep”请求能够指示集群中的服务状态机将特定键值的状态设置为“Prep”。相应地,该数据更新请求可以为请求“Z←1”。
预处理完成通知可以理解为数据存储模块304指示请求处理模块302,其已经完成将数据存储单元设置为不可访问操作的通知。
数据处理完成通知可以理解为数据存储模块304指示(即告知)请求处理模块302,其已经完成通过目标数据对初始数据进行更新操作的通知。
具体地,本说明书提供的数据处理系统中,通过请求处理模块302,能够接收到数据 更新请求,该数据更新请求中携带有目标数据,该目标数据用于对数据存储模块304包含的数据存储单元中的初始数据进行更新。
该请求处理模块302在接收到该数据更新请求之后,首先,需要基于该目标数据生成针对该目标数据的数据预处理请求,并将该数据预处理请求发送至少两个数据存储模块304,用于指示至少两个数据存储模块304执行数据更新前的预处理工作。
每个数据存储模块304在接收到数据预处理请求之后,能够将目标数据对应的数据存储单元设为不可访问,并向请求处理模块302发送预处理完成通知,用于指示请求处理模块302下发数据更新请求。
然后,该请求处理模块302在确定接收到每个数据存储模块304根据数据预处理请求返回的预处理完成通知的情况下,确定所有数据存储模块304都已经准备完成,因此,将数据处理请求发送至每个数据存储模块304。
而每个数据存储模块304在接收请求处理模块302发送的数据更新请求后,基于该数据更新请求中携带的目标数据,对数据存储单元中的初始数据进行更新。并向该请求处理模块302发送数据处理完成通知。
在实际应用中,当请求处理模块302在接收到数据存储模块304发送的数据处理完成通知之后,才会继续将新的数据更新请求以及数据预处理请求发送至数据存储,便于每个数据存储模块304继续执行数据更新的操作。
下面以数据处理系统在支持异地多活的跨集群同步复制场景下的应用为例,对数据处理系统做进一步说明,其中,该请求处理模块302为同步系统,该数据存储模块304为集群中的服务状态机,目标数据为数值“5”,初始数据为数值“1”,数据存储单元为键值“Z”,数据预处理请求为针对键值“Z”的“Prep Z”请求,该数据更新请求为数据写请求(请求“Z←1”)。
基于此,本说明书提供的数据处理系统中的同步系统,在接收到数据写请求(请求“Z←1”)情况下,能够针对每一个写请求,都会将其拆成两个子请求。因此,请求“Z←1”,会被转换成两个子请求“Z←1”以及“Prep Z”。需要说明的是,将一个写请求拆成两个子请求,可以为根据写请求中携带的“Z←1”,生成一个指示集群中的服务状态机将键值“Z”对应的状态,修改“Prep”状态的“Prep Z”请求。
之后,该两个子请求会被严格顺序、同步地被各个集群的状态机学习并应用。首先把子请求“Prep Z”公开给各个集群状态机学习,也即是,将该“Prep Z”请求发送至每个集群中的服务状态机,从而指示该每个集群中的服务状态机将键值“Z”对应的状态,修改“Prep”状态。
并且,该同步系统只有等到该子请求被各个集群状态机学习并明确回复之后,也即是接收到每个集群中的服务状态机发送的状态变更完成的通知后,才会进一步再公开子请求“Z←1”,让各个集群的状态机学习并应用,也即是将键值“Z”对应的数值“1”,修改为数值“5”。 从而保证数据同步更新。
本说明书提供的数据处理系统,基于一致性的复制状态机实现的跨集群同步复制存在的问题是呈现在各个集群状态机的数据是最终一致的,因此在多活场景下,客户端先后访问不同集群的状态机,那么看到的数据可能会出现回退(即先看到新版本数据,然后又看到了老版本数据),因此无法有效支持多活。本说明书提供的数据处理系统,通过引入两阶段提交,状态机基于第一阶段Prep子请求修改相关键值的状态,标记该键值在修改中,进而读请求在面对“Prep”状态的键值,需要主动等待直至第二阶段的修改子请求被学习并应用到本地,然后可以读到最新数据。从而实现了异地多活的跨集群同步复制功能。从而保证了每个集群的数据一致性,进一步避免了当多个集群中任意集群发生故障时,不会造成数据丢失的问题,保证了数据的安全性。
图4示出了根据本说明书一个实施例提供的一种数据处理方法的流程图,具体包括以下步骤。
步骤402:接收数据处理请求,其中,所述数据处理请求中携带目标数据。
其中,该数据处理请求可以理解为能够对目标数据进行处理的请求,在实际应用中,该数据处理请求可以理解为数据更新请求、数据存储请求;该数据更新请求可以参见上述对数据处理系统的描述中,对该数据更新请求的说明。该数据存储请求可以理解为需要将目标数据存储至数据存储模块的请求。
在本说明书提供的一实施例中,该数据处理请求是部署在集群中的调度程序(dispatcher)发送的,在实际应用中,当多个集群中任意一个集群接收到客户端(client)发送的数据处理请求的情况下,能够通过部署在该集群中的调度程序将该数据处理请求发送至同步系统。并通过该同步系统将该数据处理请求发送至每个集群,从而保证了多个集群的同步复制,具体实现方式如下。
所述接收数据处理请求,包括:
接收至少两个数据存储模块中目标数据存储模块发送的数据处理请求,其中,所述目标数据存储模块为接收到请求发送对象发送的所述数据处理请求的模块。
具体地,数据存储模块在接收到客户端发送的数据处理请求的情况下,能够通过自身配置的请求转发单元将该数据处理请求转发至请求处理模块,使得该请求处理模块能够接收到该数据处理请求。其中,该请求转发单元可以理解为能够将数据处理请求转发至请求处理模块的单元,例如部署在集群中的调度程序。
下面以本说明书提供的数据处理方法应用于支持异地多活的跨集群同步复制场景下为例,对接收数据处理请求做进一步说明。其中,多个集群中的任意一个集群在接收到数据更新请求和/或数据存储请求的情况下,能够通过部署在集群自身的调度程序,将该数据更新请求和/或数据存储请求发送至请求处理模块。
进一步地,在本说明书提供的实施例中,该数据处理请求可以为数据更新请求;基于 此,当请求处理模块在接收到该数据更新请求之后,后续能够通过两阶段提交的方式将该数据更新请求发送至多个数据存储模块,从而保证了数据存储模块的数据一致性。具体实现方式如下。
所述接收数据处理请求,包括:
接收携带有目标数据的数据更新请求,其中,所述目标数据为对数据存储单元中的初始数据进行更新的数据。
其中,初始对象以及数据存储单元可以参见上述对数据处理系统的描述中对应或相应的内容。
具体地,该请求处理平台能够接收到携带有目标数据的数据更新请求,对应的,针对该数据更新请求的描述可以参见上述对数据处理系统的描述中对应或相应的内容。
步骤404:根据所述目标数据,生成针对所述目标数据的数据预处理请求。
其中,该数据处理请求为数据更新请求的情况下,数据预处理请求可以理解为指示数据存储模块在进行数据更新之前,将数据存储单元设为不可访问的请求;在该数据处理请求为数据存储请求的情况下,数据预处理请求可以理解为指示数据存储模块在进行数据存储之前,确定用于存储该目标数据的数据存储单元,且将该数据存储单元设为不可访问的请求;从而便于后续能够顺利的将目标数据存储至数据存储单元。
在本说明书提供的一实施例中,所述根据所述目标数据,生成针对所述目标数据的数据预处理请求,包括:
在所述数据处理请求包括至少两个的情况下,确定每个数据处理请求中携带的目标数据;
根据每个目标数据在至少两个数据存储模块中对应的数据存储单元,分别生成针对所述每个目标数据对应的数据存储单元的数据预处理请求。
其中,在数据处理请求为数据存储请求的情况下,该目标数据可以理解为需要存储至数据存储模块的数据,对应的,该数据存储单元可以理解为该目标数据需要存储至的单元。该数据存储单元可以根据实际应用场景进行设置,在数据存储模块为集群的情况下,该数据存储单元可以为集群中的一个物理存储介质、或者集群中的一个键值。
具体地,该数据处理请求包括至少两个,在此情况下,请求处理系统需要确定每个数据处理请求中携带的目标数据;并确定该目标数据在至少两个数据存储模块中对应的数据存储单元;并分别生成针对每个目标数据对应的数据存储单元的数据预处理请求。
在本说明书提供的一实施例中,该数据处理方法的主要设计思想包括三个方面:一致性队列、两阶段提交、读写分离。
其中,一致性队列是指,解耦集群服务状态机与跨集群日志同步系统,使得跨集群日志同步系统依赖单独的一个基于一致性协议的队列,通过该队列将数据处理请求以及对应 的数据预处理请求发送至集群的服务状态机。而后续每个集群的状态机,需要通过主动向队列学习并确认的方式,来更新本地数据,这样的解耦设计可以平滑支持集群状态机扩展;具体实现方式如下。
所述根据每个目标数据在至少两个数据存储模块中对应的数据存储单元,分别生成针对所述每个目标数据对应的数据存储单元的数据预处理请求之后,还包括:
确定所述每个数据处理请求对应的请求处理序列信息,其中,所述请求序列信息根据所述每个数据处理请求对应的请求接收时间确定;
根据所述请求处理序列信息,将所述数据处理请求以及对应的数据预处理请求存储至请求发送队列。
其中,请求处理序列信息可以理解为每个数据处理请求对应的请求接收时间,该请求接收时间可以理解为请求处理平台在接收到每个数据处理请求的时间,或者,该请求处理序列信息也可以为请求处理平台根据该请求接收时间为数据处理请求分配的序号、编号、ID等。例如,请求处理平台第一个接收到的数据处理请求所对应的请求处理序列信息,可以为序号“1”,对应的,第二个接收到的数据处理请求所对应的请求处理序列信息为序号“2”。
具体地,该请求处理平台在确定数据处理请求对应的数据预处理请求之后,确定每个数据处理请求对应的请求处理序列信息,并按照该请求处理序列信息,将数据处理请求对应的数据预处理请求存放在请求发送队列中,该请求下发队列可以为一致性队列。参见图5,图5是本说明书一个实施例提供的一种数据处理方法中一致性队列的示意图,该一致性队列存储有多个数据写请求,以及每个数据写请求对应的子请求“Prep Z”。
参见图5,在本说明书提供的一实施例中,该数据处理方法的主要设计思想中的两阶段提交(two-phase apply)是指,针对每一个写请求,基于一致性协议的复制队列都会将其拆成两个子请求。譬如请求“Z←1”,会被转换成两个子请求“Z←1”以及“Prep Z”,两个子请求会被严格顺序、同步地被各个集群的状态机学习并应用。复制队列会首先把子请求“Prep Z”公开给各个集群状态机学习,只有等到该子请求被各个集群状态机学习并明确回复之后,复制队列才会进一步再公开子请求“Z←1”,让各个集群的状态机学习并应用,从而保证数据同步更新。也即是,只有“Prep”请求被各集群apply(运行),然后再是否正式请求(数据处理请求)供各集群learn(学习)。具体参见步骤406至步骤408。
步骤406:将所述数据预处理请求分别发送给至少两个数据存储模块。
具体地,该请求处理平台能够首先需要将数据预处理请求分别发送给至少两个数据存储模块。
所述将所述数据预处理请求分别发送给至少两个数据存储模块,包括:
从所述请求发送队列中获取所述数据预处理请求,并将所述数据预处理请求分别发送给所述至少两个数据存储模块。
具体地,该请求处理模块在将数据处理请求和数据预处理请求存储至请求下发队列之 后,可以通过先进先出的方式,从该请求发送队列中确定最先进入队列的数据处理请求和数据预处理请求;获取该数据预处理请求,并将该数据预处理请求分别发送给至少两个数据存储模块。例如,参见图5,同步系统在将数据写请求和“Prep”请求存储至一致性队列之后,能够先将“Prep Z”请求发送至多个集群,后续在接收到集群针对“Prep Z”请求的回复后,将该请求“Z←1”发送至多个集群。
在实际应用中,同步系统为了避免数据不一致的问题,会基于一致性队列,通过一次处理一个请求的方式,向多个集群下发“Prep”请求和数据写请求。虽然保证了数据一致性,但是任务下发的效率较低。因此,本说明书提供的数据处理方法中,能够同时执行多个针对不同数据存储单元的数据处理请,从而在保证了数据一致性的情况下,提高了请求下发的效率;具体实现方式如下。
所述将所述数据预处理请求分别发送给至少两个数据存储模块,包括:
确定所述每个目标数据对应的数据存储单元的标识信息;
根据所述标识信息从所述数据存储单元中确定出目标数据存储单元,其中,所述目标数据存储单元的标识信息与其他数据存储单元不同;
将针对所述目标数据存储单元的数据预处理请求,确定为目标数据预处理请求;
从所述请求发送队列中获取所述目标数据预处理请求,并将所述目标数据预处理请求分别发送给至少两个数据存储模块。
其中,该数据存储单元的标识信息可以理解为唯一标识一个数据存储单元的信息,例如,在数据存储单元为键值的情况下,该标识信息为键值的名称;在数据存储单元为物理磁盘中的一个存储区域时,该标识信息可以为该存储区域的编号。
沿用上例,参见图5,该一致性队列中存储有多个数据写请求(“X←4”“Y←7”“Y←5”“Z←1”)以及对应的“Prep”请求(“Prep X”“Prep Y”“Prep Y”“Prep Z”),基于此,该同步系统能够确定从每个数据写请求中确定目标数据所对应的键值的键值名称(即,X、Y、Y、Z)。之后,该同步系统根据该键值名称、从多个键值中确定出目标键值(X、Y、Z);根据该键值在一致性队列中的位置,确定该键值对应的该“Prep”请求(即“Prep X”“Prep Y”“Prep Z”),并将该“Prep X”请求、“Prep Y”请求、“Prep Z”请求,发送给多个集群。
步骤408:在接收到每个数据存储模块根据所述数据预处理请求返回的预处理完成通知的情况下,将所述数据处理请求发送至所述每个数据存储模块。
其中,所述预处理完成通知为所述每个数据存储模块基于所述数据预处理请求、将所述目标数据对应的数据存储单元设为不可访问后生成的通知。例如,在数据处理请求为数据更新请求的情况下,该预处理完成通知可以为集群中的服务状态机,将键值“Z”对应的状态设置为“Prep(准备)”状态之后,向同步系统发送的请求。其中,当集群接到的针对该键值“Z”的读请求后,那么请求会被block(暂停执行)。
在实际应用中,本说明书提供的数据处理方法的主要设计思想中的读写分离是指:写请求可以持续提交给基于一致性协议的复制队列,两个子请求均会被基于一致性协议提交、并持久化。其中,该两个子请求均会被持久化是指,该同步系统能够将两个子请求存储在本地磁盘中,该本地磁盘中存储的数据,在同步系统发生断电、停机等问题时也不会丢失,从而保证了两个子请求能够持久的存在。
因此,本说明书提供的数据处理方法中,写请求的访问性能可以得到充分保障。另一方面,本说明书提供的数据处理方法能够面向异地多活场景,因此读请求均为强一致性读,当读的请求访问状态机的情况下,如果某个键值对应的状态为Prep状态,那么该读请求会被block住,直到相关的第二阶段子请求被学习并应用,更新键值对应的值,然后读请求会返回。即基于读写分离的架构,一方面保障了写的吞吐性能,另一方面也保证了读的强一致性。
参见图6,图6是本说明书一个实施例提供的一种数据处理方法中从集群读取数据的处理示意图,参见图6可知,本说明书提供的数据处理方法,采用读写分离设计,当读到“Prep”状态的键值的情况下,则等待第二阶段日志apply。具体地,参见图6,本说明书提供的数据处理方法,支持异地多活的跨集群同步复制,因此在整个系统中没有所谓主备集群,所有集群都可以直接对外提供服务。
在图6中给出了方案中如何实现强一致读的实例。以请求“Z←1”为例,该写请求会被转换成“Prep Z”以及“Z←1”两个子请求。其中,状态机应用子请求“Prep Z”时候,将相关键值Z的状态加上“Prep”标记,即表示该键值处于修改状态,读访问请求需要block直至状态机更新。状态机应用子请求“Z←1”,则是去除掉“Prep”状态标记,并将键值对应的值修改为1。读请求可以直接方案没有“Prep”状态标记的键值直接返回。
继续以图6为例,首先,第一阶段“Prep Z”在所有集群的状态机都学习并应用到,然后公共第二阶段的“Z←1”给各个集群学习并应用。假设,集群1与集群3率先学习并应用了,因此状态机中键值Z对应的值变成1,而集群2中当前键值Z的值还是9,但是因为第一阶段的Prep,该键值对应的状态为“Prep”,因此访问到该集群的读请求会持续等待后续的“Z←1”请求应用到状态机,并清理掉“Prep”状态。通过两阶段提交,本技术提案实现了强一致性读。
在本说明书提供的实施例中,在从所述请求发送队列中获取所述数据预处理请求,并将所述数据预处理请求分别发送给所述至少两个数据存储模块之后,相应地,所述将所述数据处理请求发送至所述每个数据存储模块,包括:
从所述请求发送队列中确定所述数据预处理请求对应的数据处理请求,并将所述数据处理请求发送给每个数据存储模块。
沿用上例,在同步系统在接收到每个集群针对“Prep Z”请求返回的回复通知后,能够从一致性队列中确定“Prep Z”请求对应的请求“Z←1”,并将请求“Z←1”发送至每个集群。
在本说明书提供的实施例中,所述将所述数据处理请求发送至所述每个数据存储模块,包括:
从所述请求发送队列中确定所述目标数据预处理请求对应的数据处理请求,并将所述数据处理请求发送给每个数据存储模块。
沿用上例,在同步系统在接收到每个集群针对“Prep X”请求、“Prep Y”请求或“Prep Z”请求返回的回复通知后,能够从一致性队列中确定与“Prep X”请求、“Prep Y”请求或“Prep Z”请求对应的数据写请求,并将数据写请求发送至每个集群。
步骤410:接收所述每个数据存储模块根据所述数据处理请求返回的数据处理完成通知。
其中,在数据处理请求为数据存储请求的情况下,该数据处理完成通知可以理解为每个数据存储模块将目标数据存储至数据存储单元后生成的通知,也即是,每个集群在将数值“1”存储至对应的物理存储区域或键值之后,会向该同步系统返回数据存储完成的通知,已告知同步系统该集群已经完成数据存储;便于后续同步系统在确定所有集群均完成数据存储之后,继续执行后续的数据处理请求。
其中,所述数据处理完成通知为所述每个数据存储模块根据所述数据处理请求,通过所述目标数据更新所述数据存储单元中的初始数据后生成的通知。也即是,每个集群基于数值“5”对存储在键值“Z”中的数值“1”进行更新之后,会向该同步系统返回数据更新完成的通知,已告知同步系统该集群已经完成数据更新;便于后续同步系统在确定所有集群均完成数据更新之后,继续执行后续的数据处理请求。
基于此,在数据存储模块基于数据处理请求完成对目标数据的处理后,能向请求处理模块发送数据处理完成通知。从而使得该请求处理模块能够接收到每个数据存储模块根据数据处理请求返回的数据处理完成通知。
此外,本说明书提供的数据处理方法支持异地多活的跨集群同步复制,因此,请求处理模块还能够接收集群中部署的调度程序发送的读访问请求;该读访问请求可以为客户端发送至调度系统的请求。
请求处理模块在接收到该读访问请求后,可以将读访问转换成no-op写请求(无操作读请求),将该no-op写请求作为数据处理请求,并执行上述针对数据处理请求的操作,或者将该no-op写请求作为数据处理请求存放在队列中,并通过队列将该no-op写请求发送至每个集群,或发送该读访问请求的集群;当发送该读访问请求的集群中的状态机,在等待该no-op写请求在本状态机上学习并应用到之后,则可以直接读取本地状态机键值对应数据。基于此,通过此方式不需要一致性复制队列的两阶段提交,而是借助状态机对每个读请求转换成写请求来同步最新数据,保证读的强一致性。
本说明书提供的数据处理方法,在接收到数据处理请求的情况下,先将基于该数据处理请求中携带的目标数据生成数据预处理请求,发送给至少两个数据存储模块;并在接收 到每个数据存储模块返回的预处理完成通知的情况下,再将数据处理请求发送给每个数据存储模块,从而使得每个数据存储模块能够获取该目标数据,从而保证了每个数据存储模块的数据一致性,进一步避免了当至少两个数据存储模块中任意数据存储模块发生故障时,所造成的数据丢失问题,保证了数据的安全性。
上述为本实施例的一种数据处理方法的示意性方案。需要说明的是,该数据处理方法的技术方案与上述的数据处理系统的技术方案属于同一构思,数据处理方法的技术方案未详细描述的细节内容,均可以参见上述数据处理系统的技术方案的描述。
同理,该上述数据处理系统的技术方案与该数据处理方法的技术方案属于同一构思,数据处理系统的技术方案未详细描述的细节内容,同样可以参见数据处理方法的技术方案的描述。
下述结合附图7,以本说明书提供的数据处理方法在跨集群同步复制场景下的应用为例,对所述数据处理方法进行进一步说明。其中,图7示出了本说明书一个实施例提供的一种数据处理方法的处理过程流程图,具体包括以下步骤。
步骤702:客户端向多个集群中任意一个集群发送数据写请求。
其中,该数据写请求可以为请求“Z←1”。
步骤704:集群在接收到数据写请求后,通过调度程序将该数据写请求发送至同步系统。
其中,该同步系统为跨集群日志同步系统。
步骤706:同步系统为该数据写请求生成“Prep Z”请求。
步骤708:同步系统将请求“Z←1”以及对应的“Prep Z”请求存储至一致性队列。
步骤710:同步系统先将一致性队列中的“Prep Z”请求发送至每个集群的状态机。
步骤712:每个集群的状态机在接收到“Prep Z”请求后,将键值“Z”的状态修改为“Prep”,并向同步系统回复状态修改通知。
步骤714:同步系统在接收到每个集群的状态修改通知之后,将该请求“Z←1”发送至每个集群。
步骤716:每个集群的状态机基于请求“Z←1”,将键值“Z”对应的值修改为“1”,并将键值“Z”的“Prep”状态重置掉,且向同步系统回复数据写入完成。
本说明书提供的数据处理方法,基于一致性的复制状态机实现的跨集群同步复制存在的问题是呈现在各个集群状态机的数据是最终一致的,因此在多活场景下,客户端先后访问不同集群的状态机,那么看到的数据可能会出现回退(即先看到新版本数据,然后又看到了老版本数据),因此无法有效支持多活。本说明书提供的数据处理系统,通过引入两阶段提交,状态机基于第一阶段Prep子请求修改相关键值的状态,标记该键值在修改中,进而读请求在面对“Prep”状态的键值,需要主动等待直至第二阶段的修改子请求被学习并 应用到本地,然后可以读到最新数据。从而实现了异地多活的跨集群同步复制功能。从而保证了每个集群的数据一致性,进一步避免了当多个集群中任意集群发生故障时,不会造成数据丢失的问题,保证了数据的安全性。
与上述方法实施例相对应,本说明书还提供了数据处理装置实施例,图8示出了本说明书一个实施例提供的一种数据处理装置的结构示意图。如图8所示,该装置包括:
第一接收模块802,被配置为接收数据处理请求,其中,所述数据处理请求中携带目标数据;
生成模块804,被配置为根据所述目标数据,生成针对所述目标数据的数据预处理请求;
第一发送模块806,被配置为将所述数据预处理请求分别发送给至少两个数据存储模块;
第二发送模块808,被配置为在接收到每个数据存储模块根据所述数据预处理请求返回的预处理完成通知的情况下,将所述数据处理请求发送至所述每个数据存储模块;
第二接收模块810,被配置为接收所述每个数据存储模块根据所述数据处理请求返回的数据处理完成通知。
可选地,所述生成模块804,还被配置为:
在所述数据处理请求包括至少两个的情况下,确定每个数据处理请求中携带的目标数据;
根据每个目标数据在至少两个数据存储模块中对应的数据存储单元,分别生成针对所述每个目标数据对应的数据存储单元的数据预处理请求。
可选地,所述数据处理装置,还包括存储模块,被配置为:
确定所述每个数据处理请求对应的请求处理序列信息,其中,所述请求序列信息根据所述每个数据处理请求对应的请求接收时间确定;
根据所述请求处理序列信息,将所述数据处理请求以及对应的数据预处理请求存储至请求发送队列。
可选地,所述第一发送模块806,还被配置为:
从所述请求发送队列中获取所述数据预处理请求,并将所述数据预处理请求分别发送给所述至少两个数据存储模块。
相应地,所述第二发送模块808,还被配置为:
从所述请求发送队列中确定所述数据预处理请求对应的数据处理请求,并将所述数据处理请求发送给每个数据存储模块。
可选地,所述第一发送模块806,还被配置为:
确定所述每个目标数据对应的数据存储单元的标识信息;
根据所述标识信息从所述数据存储单元中确定出目标数据存储单元,其中,所述目标数据存储单元的标识信息与其他数据存储单元不同;
将针对所述目标数据存储单元的数据预处理请求,确定为目标数据预处理请求;
从所述请求发送队列中获取所述目标数据预处理请求,并将所述目标数据预处理请求分别发送给至少两个数据存储模块。
可选地,所述第二发送模块808,还被配置为:
从所述请求发送队列中确定所述目标数据预处理请求对应的数据处理请求,并将所述数据处理请求发送给每个数据存储模块。
可选地,所述预处理完成通知为所述每个数据存储模块基于所述数据预处理请求、将所述目标数据对应的数据存储单元设为不可访问后生成的通知。
可选地,所述第一接收模块802,还被配置为:
接收至少两个数据存储模块中目标数据存储模块发送的数据处理请求,其中,所述目标数据存储模块为接收到请求发送对象发送的所述数据处理请求的模块。
可选地,所述第一接收模块802,还被配置为:
接收携带有目标数据的数据更新请求,其中,所述目标数据为对数据存储单元中的初始数据进行更新的数据。
可选地,所述第二接收模块810,还被配置为:
所述数据处理完成通知为所述每个数据存储模块根据所述数据处理请求,通过所述目标数据更新所述数据存储单元中的初始数据后生成的通知。
本说明书提供的数据处理系统,在接收到数据处理请求的情况下,先将基于该数据处理请求中携带的目标数据生成数据预处理请求,发送给至少两个数据存储模块;并在接收到每个数据存储模块返回的预处理完成通知的情况下,再将数据处理请求发送给每个数据存储模块,从而使得每个数据存储模块能够获取该目标数据,从而保证了每个数据存储模块的数据一致性,进一步避免了当至少两个数据存储模块中任意数据存储模块发生故障时,所造成的数据丢失问题,保证了数据的安全性。
上述为本实施例的一种数据处理装置的示意性方案。需要说明的是,该数据处理装置的技术方案与上述的数据处理方法的技术方案属于同一构思,数据处理装置的技术方案未详细描述的细节内容,均可以参见上述数据处理方法的技术方案的描述。
图9示出了根据本说明书一个实施例提供的一种计算设备900的结构框图。该计算设备900的部件包括但不限于存储器910和处理器920。处理器920与存储器910通过总线930相连接,数据库950用于保存数据。
计算设备900还包括接入设备940,接入设备940使得计算设备900能够经由一个或多个网络960通信。这些网络的示例包括公用交换电话网(PSTN)、局域网(LAN)、广 域网(WAN)、个域网(PAN)或诸如因特网的通信网络的组合。接入设备940可以包括有线或无线的任何类型的网络接口(例如,网络接口卡(NIC))中的一个或多个,诸如IEEE802.11无线局域网(WLAN)无线接口、全球微波互联接入(Wi-MAX)接口、以太网接口、通用串行总线(USB)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC)接口,等等。
在本说明书的一个实施例中,计算设备900的上述部件以及图9中未示出的其他部件也可以彼此相连接,例如通过总线。应当理解,图9所示的计算设备结构框图仅仅是出于示例的目的,而不是对本说明书范围的限制。本领域技术人员可以根据需要,增添或替换其他部件。
计算设备900可以是任何类型的静止或移动计算设备,包括移动计算机或移动计算设备(例如,平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如,智能手机)、可佩戴的计算设备(例如,智能手表、智能眼镜等)或其他类型的移动设备,或者诸如台式计算机或PC的静止计算设备。计算设备900还可以是移动式或静止式的服务器。
其中,处理器920用于执行如下计算机可执行指令,该计算机可执行指令被处理器920执行时实现上述数据处理方法的步骤。
上述为本实施例的一种计算设备的示意性方案。需要说明的是,该计算设备的技术方案与上述的数据处理方法的技术方案属于同一构思,计算设备的技术方案未详细描述的细节内容,均可以参见上述数据处理方法的技术方案的描述。
本说明书一实施例还提供一种计算机可读存储介质,其存储有计算机可执行指令,该计算机可执行指令被处理器执行时实现上述数据处理方法的步骤。
上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是,该存储介质的技术方案与上述的数据处理方法的技术方案属于同一构思,存储介质的技术方案未详细描述的细节内容,均可以参见上述数据处理方法的技术方案的描述。
本说明书一实施例还提供一种计算机程序,其中,当所述计算机程序在计算机中执行时,令计算机执行上述数据处理方法的步骤。
上述为本实施例的一种计算机程序的示意性方案。需要说明的是,该计算机程序的技术方案与上述的数据处理方法的技术方案属于同一构思,计算机程序的技术方案未详细描述的细节内容,均可以参见上述数据处理方法的技术方案的描述。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
所述计算机指令包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本说明书实施例并不受所描述的动作顺序的限制,因为依据本说明书实施例,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本说明书实施例所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节,也不限制该发明仅为所述的具体实施方式。显然,根据本说明书实施例的内容,可作很多的修改和变化。本说明书选取并具体描述这些实施例,是为了更好地解释本说明书实施例的原理和实际应用,从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。

Claims (14)

  1. 一种数据处理方法,包括
    接收数据处理请求,其中,所述数据处理请求中携带目标数据;
    根据所述目标数据,生成针对所述目标数据的数据预处理请求;
    将所述数据预处理请求分别发送给至少两个数据存储模块;
    在接收到每个数据存储模块根据所述数据预处理请求返回的预处理完成通知的情况下,将所述数据处理请求发送至所述每个数据存储模块;
    接收所述每个数据存储模块根据所述数据处理请求返回的数据处理完成通知。
  2. 根据权利要求1所述的数据处理方法,所述根据所述目标数据,生成针对所述目标数据的数据预处理请求,包括:
    在所述数据处理请求包括至少两个的情况下,确定每个数据处理请求中携带的目标数据;
    根据每个目标数据在至少两个数据存储模块中对应的数据存储单元,分别生成针对所述每个目标数据对应的数据存储单元的数据预处理请求。
  3. 根据权利要求2所述的数据处理方法,所述根据每个目标数据在至少两个数据存储模块中对应的数据存储单元,分别生成针对所述每个目标数据对应的数据存储单元的数据预处理请求之后,还包括:
    确定所述每个数据处理请求对应的请求处理序列信息,其中,所述请求序列信息根据所述每个数据处理请求对应的请求接收时间确定;
    根据所述请求处理序列信息,将所述数据处理请求以及对应的数据预处理请求存储至请求发送队列。
  4. 根据权利要求3所述的数据处理方法,所述将所述数据预处理请求分别发送给至少两个数据存储模块,包括:
    从所述请求发送队列中获取所述数据预处理请求,并将所述数据预处理请求分别发送给所述至少两个数据存储模块;
    相应地,所述将所述数据处理请求发送至所述每个数据存储模块,包括:
    从所述请求发送队列中确定所述数据预处理请求对应的数据处理请求,并将所述数据处理请求发送给每个数据存储模块。
  5. 根据权利要求3所述的数据处理方法,所述将所述数据预处理请求分别发送给至少两个数据存储模块,包括:
    确定所述每个目标数据对应的数据存储单元的标识信息;
    根据所述标识信息从所述数据存储单元中确定出目标数据存储单元,其中,所述目标 数据存储单元的标识信息与其他数据存储单元不同;
    将针对所述目标数据存储单元的数据预处理请求,确定为目标数据预处理请求;
    从所述请求发送队列中获取所述目标数据预处理请求,并将所述目标数据预处理请求分别发送给至少两个数据存储模块。
  6. 根据权利要求5所述的数据处理方法,所述将所述数据处理请求发送至所述每个数据存储模块,包括:
    从所述请求发送队列中确定所述目标数据预处理请求对应的数据处理请求,并将所述数据处理请求发送给每个数据存储模块。
  7. 根据权利要求5所述的数据处理方法,所述预处理完成通知为所述每个数据存储模块基于所述数据预处理请求、将所述目标数据对应的数据存储单元设为不可访问后生成的通知。
  8. 根据权利要求1所述的数据处理方法,所述接收数据处理请求,包括:
    接收至少两个数据存储模块中目标数据存储模块发送的数据处理请求,其中,所述目标数据存储模块为接收到请求发送对象发送的所述数据处理请求的模块。
  9. 根据权利要求1至8任意一项所述的数据处理方法,所述接收数据处理请求,包括:
    接收携带有目标数据的数据更新请求,其中,所述目标数据为对数据存储单元中的初始数据进行更新的数据。
  10. 根据权利要求9所述的数据处理方法,所述数据处理完成通知为所述每个数据存储模块根据所述数据处理请求,通过所述目标数据更新所述数据存储单元中的初始数据后生成的通知。
  11. 一种数据处理装置,包括:
    第一接收模块,被配置为接收数据处理请求,其中,所述数据处理请求中携带目标数据;
    生成模块,被配置为根据所述目标数据,生成针对所述目标数据的数据预处理请求;
    第一发送模块,被配置为将所述数据预处理请求分别发送给至少两个数据存储模块;
    第二发送模块,被配置为在接收到每个数据存储模块根据所述数据预处理请求返回的预处理完成通知的情况下,将所述数据处理请求发送至所述每个数据存储模块;
    第二接收模块,被配置为接收所述每个数据存储模块根据所述数据处理请求返回的数据处理完成通知。
  12. 一种数据处理系统,包括请求处理模块以及至少两个数据存储模块,其中,
    所述请求处理模块,被配置为接收数据更新请求,其中,所述数据更新请求中携带对数据存储模块包含的数据存储单元中的初始数据,进行更新的目标数据,根据所述目标数据,生成针对所述目标数据的数据预处理请求,将所述数据预处理请求分别发送给所述至 少两个数据存储模块;
    所述至少两个数据存储模块,被配置为基于所述数据预处理请求,将所述目标数据对应的数据存储单元设为不可访问,并向所述请求处理模块发送预处理完成通知;
    所述请求处理模块,还被配置为在接收到每个数据存储模块根据所述数据预处理请求返回的预处理完成通知的情况下,将所述数据更新请求发送至所述至少两个数据存储模块;
    所述至少两个数据存储模块,被配置为根据所述数据更新请求,通过所述目标数据更新所述数据存储单元中的初始数据,并向所述请求处理模块发送数据处理完成通知。
  13. 一种计算设备,包括:
    存储器和处理器;
    所述存储器用于存储计算机可执行指令,所述处理器用于执行所述计算机可执行指令,该计算机可执行指令被处理器执行时实现权利要求1至10任意一项所述数据处理方法的步骤。
  14. 一种计算机可读存储介质,其存储有计算机可执行指令,该计算机可执行指令被处理器执行时实现权利要求1至10任意一项所述数据处理方法的步骤。
PCT/CN2023/084738 2022-03-31 2023-03-29 数据处理方法及装置 WO2023185934A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210333042.1 2022-03-31
CN202210333042.1A CN114415984B (zh) 2022-03-31 2022-03-31 数据处理方法及装置

Publications (1)

Publication Number Publication Date
WO2023185934A1 true WO2023185934A1 (zh) 2023-10-05

Family

ID=81264145

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084738 WO2023185934A1 (zh) 2022-03-31 2023-03-29 数据处理方法及装置

Country Status (2)

Country Link
CN (1) CN114415984B (zh)
WO (1) WO2023185934A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114415984B (zh) * 2022-03-31 2022-08-16 阿里云计算有限公司 数据处理方法及装置
CN115080582A (zh) * 2022-06-29 2022-09-20 中电金信软件有限公司 一种数据更新方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036717A (zh) * 2012-12-12 2013-04-10 北京邮电大学 分布式数据的一致性维护系统和方法
CN105407117A (zh) * 2014-09-10 2016-03-16 腾讯科技(深圳)有限公司 分布式备份数据的方法、装置和系统
CN110399383A (zh) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 应用于服务器的数据处理方法、装置、计算设备、介质
CN112988882A (zh) * 2019-12-12 2021-06-18 阿里巴巴集团控股有限公司 数据的异地灾备系统、方法及装置、计算设备
CN114116737A (zh) * 2021-10-22 2022-03-01 北京旷视科技有限公司 对象更新方法、装置和电子设备
CN114415984A (zh) * 2022-03-31 2022-04-29 阿里云计算有限公司 数据处理方法及装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8037476B1 (en) * 2005-09-15 2011-10-11 Oracle America, Inc. Address level log-based synchronization of shared data
US9519555B2 (en) * 2011-05-23 2016-12-13 Microsoft Technology Licensing, Llc Synchronous replication in a distributed storage environment
CN106980625B (zh) * 2016-01-18 2020-08-04 阿里巴巴集团控股有限公司 一种数据同步方法、装置及系统
CN110597467B (zh) * 2019-09-19 2023-04-07 中国工商银行股份有限公司 高可用数据零丢失存储系统及方法
CN111124301B (zh) * 2019-12-18 2024-02-23 深圳供电局有限公司 一种对象存储设备的数据一致性存储方法及系统
CN111343277B (zh) * 2020-03-04 2021-12-14 腾讯科技(深圳)有限公司 分布式数据存储方法、系统、计算机设备和存储介质
CN113918380A (zh) * 2020-07-09 2022-01-11 浙江宇视科技有限公司 分布式存储读取系统及方法
CN113204435B (zh) * 2021-07-01 2021-12-03 阿里云计算有限公司 数据处理方法以及系统
CN113312316B (zh) * 2021-07-28 2022-01-04 阿里云计算有限公司 数据处理方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036717A (zh) * 2012-12-12 2013-04-10 北京邮电大学 分布式数据的一致性维护系统和方法
CN105407117A (zh) * 2014-09-10 2016-03-16 腾讯科技(深圳)有限公司 分布式备份数据的方法、装置和系统
CN110399383A (zh) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 应用于服务器的数据处理方法、装置、计算设备、介质
CN112988882A (zh) * 2019-12-12 2021-06-18 阿里巴巴集团控股有限公司 数据的异地灾备系统、方法及装置、计算设备
CN114116737A (zh) * 2021-10-22 2022-03-01 北京旷视科技有限公司 对象更新方法、装置和电子设备
CN114415984A (zh) * 2022-03-31 2022-04-29 阿里云计算有限公司 数据处理方法及装置

Also Published As

Publication number Publication date
CN114415984A (zh) 2022-04-29
CN114415984B (zh) 2022-08-16

Similar Documents

Publication Publication Date Title
WO2023185934A1 (zh) 数据处理方法及装置
US10735509B2 (en) Systems and methods for synchronizing microservice data stores
CN111091429B (zh) 电子票据标识分配方法及装置、电子票据生成系统
CN109951331B (zh) 用于发送信息的方法、装置和计算集群
CN111368002A (zh) 一种数据处理方法、系统、计算机设备和存储介质
CN110399356B (zh) 一种在线数据迁移方法、装置、计算设备及存储介质
WO2015006371A2 (en) Replication of data between mirrored data sites
CN110795503A (zh) 分布式存储系统的多集群数据同步方法及相关装置
WO2010069135A1 (zh) 分布式系统版本控制方法、节点及系统
WO2014059804A1 (zh) 数据同步方法及系统
US9760529B1 (en) Distributed state manager bootstrapping
US20160062854A1 (en) Failover system and method
CN113987064A (zh) 数据处理方法、系统及设备
WO2018010501A1 (zh) 全局事务标识gtid的同步方法、装置及系统、存储介质
WO2014177085A1 (zh) 分布式多副本数据存储方法及装置
US20160100006A1 (en) Transferring Data Between Sites
CN111475583B (zh) 事务处理方法及装置
US10048983B2 (en) Systems and methods for enlisting single phase commit resources in a two phase commit transaction
CN114363154A (zh) 节点选举方法、装置、电子设备及存储介质
CN114610532A (zh) 数据库处理方法以及装置
KR20140047230A (ko) 분산 시스템에서 분산 트랜잭션의 최적화를 위한 방법 및 트랜잭션을 최적화한 분산 시스템
WO2023193671A1 (zh) 数据传输方法以及系统
US20130013892A1 (en) Hierarchical multi-core processor, multi-core processor system, and computer product
WO2021052237A1 (zh) 事务处理方法、装置、设备、存储介质、数据库
CN102427474B (zh) 云存储中的数据传输系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23778299

Country of ref document: EP

Kind code of ref document: A1