CN113297323A - Data processing system, method and device - Google Patents

Data processing system, method and device Download PDF

Info

Publication number
CN113297323A
CN113297323A CN202110189207.8A CN202110189207A CN113297323A CN 113297323 A CN113297323 A CN 113297323A CN 202110189207 A CN202110189207 A CN 202110189207A CN 113297323 A CN113297323 A CN 113297323A
Authority
CN
China
Prior art keywords
data
queue
log
database node
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110189207.8A
Other languages
Chinese (zh)
Inventor
张天雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202110189207.8A priority Critical patent/CN113297323A/en
Publication of CN113297323A publication Critical patent/CN113297323A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/547Messaging middleware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Abstract

The embodiment of the specification provides a data processing system, a method and a device, wherein the data processing system comprises: the system comprises a client, a database node and an object storage service node; the database node is configured to receive a data writing instruction sent by the client, add the data writing instruction to a log synchronization queue, generate a log entry of the data writing instruction, write the log entry into a plurality of task queues concurrently based on a pre-established data writing link, generate queue overload information under the condition that the data volume of any one of the task queues is determined to be greater than a preset threshold value, and reversely propagate the task queue overload information to the client based on the data writing link; and the client is configured to store the data to be written to the object storage service node under the condition of determining to receive the overload information of the task queue.

Description

Data processing system, method and device
Technical Field
The embodiment of the specification relates to the technical field of databases, in particular to a data processing system. One or more embodiments of the present specification also relate to a data processing method, a data processing apparatus, a computing device, and a computer-readable storage medium.
Background
In today's cloud computing era, the storage performance of a distributed storage system is increasingly required by mass data. The stability of a distributed storage system is extremely important, and a user can be provided with a continuous and stable storage service for a long time only if the stability is high enough.
Due to the complexity of the application, it is sometimes the case that a large number of write operations are momentarily flooded in a memory system. Due to untimely processing, the large number of write operations are easy to accumulate in the memory of the distributed storage system, so that the memory usage of the cluster is increased suddenly, the performance of distributed storage is slowed down, and a downtime accident can be caused in serious cases. It can be seen that this large number of write operations burst into the flash memory causes the distributed storage system to be less stable. How to handle a large amount of instantaneous write operations to avoid the above-mentioned effects has become an important problem to be solved urgently in the field.
Disclosure of Invention
In view of this, the present specification provides a data processing system. One or more embodiments of the present disclosure also relate to a data processing method, a data processing apparatus, a computing device, and a computer-readable storage medium to solve the technical problems in the prior art.
According to a first aspect of embodiments herein, there is provided a data processing system comprising:
the system comprises a client, a database node and an object storage service node;
the database node is configured to receive a data write-in instruction sent by the client, add the data write-in instruction to a log synchronization queue, generate a log entry of the data write-in instruction, write the log entry into a plurality of task queues concurrently based on a pre-established data write-in link, generate queue overload information when it is determined that the data volume of any one of the task queues is greater than a preset threshold, and reversely propagate the task queue overload information to the client based on the data write-in link;
and the client is configured to store the data to be written to the object storage service node under the condition of determining to receive the overload information of the task queue.
Optionally, the database node is further configured to monitor load information of the log synchronization queue and the plurality of task queues, and if it is determined that a load of the log synchronization queue and/or any one of the plurality of task queues is greater than a preset load threshold, generate queue overload information.
Optionally, the database node is further configured to monitor load information of the log synchronization queue and the plurality of task queues, and if it is determined that a data amount of the log synchronization queue and/or any one of the plurality of task queues is greater than a preset data amount threshold, generate queue overload information, where the load information includes the data amount of the log synchronization queue and/or any one of the plurality of task queues.
Optionally, the database node is further configured to add the log entry to a log application queue, apply the log entry in the log application queue to a state machine, and store an execution result of the state machine to a message queue.
Optionally, the database node is further configured to monitor the log synchronization queue and the load information of the message queue, and if it is determined that a difference between the load of the log synchronization queue and the load of the message queue is greater than a preset threshold, generate queue overload information.
Optionally, the database nodes include a master database node and at least two slave database nodes;
the master database node is configured to receive a data writing instruction sent by the client, add the data writing instruction to a log synchronization queue, generate a log entry of the data writing instruction, and write the log entry into the at least two slave database nodes concurrently based on a pre-established data writing link.
Optionally, the master database node is further configured to concurrently write the log entry to a first log application queue of the at least two slave database nodes based on a pre-established data write link;
the at least two slave database nodes are configured to apply the log entries in the first log application queue to a first state machine, store the execution result of the first state machine to a first message queue, and send feedback information of successful log submission to the master database node;
the master database node further configured to submit the log entry if the feedback information is received.
Optionally, the master database node is further configured to, in a case that the feedback information is received, add the log entry to a second log application queue, apply the log entry in the second log application queue to a second state machine, and store an execution result of the second state machine to a second message queue.
Optionally, the master database node is further configured to monitor load information of the log synchronization queue and the second message queue, and if it is determined that a difference between the load of the log synchronization queue and the load of the second message queue is greater than a preset threshold, generate queue overload information.
Optionally, the database node is further configured to lock any one of the task queues with a data volume greater than a preset threshold, monitor load update information of the task queue with the data volume greater than the preset threshold, and send queue unlocking information to the client if it is determined that the loads of the plurality of queues are all less than the preset threshold according to the load update information;
the client is further configured to receive the queue unlocking information, acquire the data to be written stored in the object storage service node, and send a data writing instruction to the database node based on the data to be written.
According to a second aspect of embodiments herein, there is provided a data processing method including:
sending a data writing instruction to a database node based on data to be written, wherein the data writing instruction carries data identification information of the data to be written;
and under the condition of receiving the overload information of the task queue returned by the database node, storing the data to be written into an object storage service node.
Optionally, the data processing method further includes:
and receiving queue unlocking information returned by the database node, acquiring the data to be written stored in the object storage service node, and sending a data writing instruction to the database node based on the data to be written.
According to a third aspect of embodiments herein, there is provided a data processing apparatus comprising:
the data writing module is configured to send a data writing instruction to a database node based on data to be written, wherein the data writing instruction carries data identification information of the data to be written;
and the storage module is configured to store the data to be written to the object storage service node under the condition of receiving the overload information of the task queue returned by the database node.
According to a fourth aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is for storing computer-executable instructions, and the processor is for executing the computer-executable instructions to implement the steps of the data processing method.
According to a fifth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the data processing method.
In one embodiment of the present description, a data write instruction sent by a client is received by a database node, the data write instruction is added to a log synchronization queue, a log entry of the data write instruction is generated, the log entry is concurrently written into a plurality of task queues based on a pre-established data write link, queue overload information is generated when it is determined that the data volume of any one of the task queues is greater than a preset threshold, and the task queue overload information is reversely propagated to the client based on the data write link; and under the condition that the client determines to receive the overload information of the task queue, storing the data to be written into the object storage service node.
In the embodiment of the specification, the data volume of each task queue is controlled by using a preset threshold value so as to control the writing speed, which is beneficial to improving the stability of the distributed database; when the task queue of the database node is blocked due to high concurrent write-in flow, a backpressure mechanism is formed, and generated overload information of the task queue is reversely transmitted to the client, so that the client takes the object storage service node as a data cache region to temporarily store the write-in flow which cannot be normally written into the database node, write-in pressure brought by the database node due to the high concurrent write-in flow is relieved, and the capability of the database node for bearing burst write-in flow is further improved while the stability of the distributed database is ensured.
Drawings
FIG. 1 is a block diagram of a data processing system, according to one embodiment of the present disclosure;
FIG. 2 is a flow chart of a data processing method provided by an embodiment of the present specification;
FIG. 3 is an interaction diagram of a data processing method provided in an embodiment of the present specification;
FIG. 4 is a schematic diagram of a data processing apparatus provided in one embodiment of the present description;
fig. 5 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
First, the noun terms to which one or more embodiments of the present specification relate are explained.
Back pressure: when a load spike for a short period of time causes the system to receive data at a rate much higher than the rate at which it processes the data, back pressure is created and the system will back pressure on upstream writes.
And OSS: the object storage service can store and access at any position of the internet by using RESTful API, the capacity and the processing capacity are flexibly expanded, and various storage types are selected to comprehensively optimize the storage cost.
And (3) queue: a queue is a special linear table, which is special in that it allows only delete operations at the front end (front) of the table, while insert operations at the back end (rear) of the table, as with stacks, a linear table with restricted operations. The end performing the insert operation is called the tail of the queue, and the end performing the delete operation is called the head of the queue.
The consistency protocol is as follows: distributed consistency protocols such as raft, paxos, etc.
In the present specification, a data processing system is provided, and the present specification relates to a data processing method, a data processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.
Aiming at the problem that the stability of a distributed storage system is poor because a large number of write operations are often not processed in time due to the inrush in the existing distributed storage system, the embodiment of the specification provides a data processing system as a solution. The data processing system provided by the embodiment of the specification is described in detail below with reference to the embodiment and the drawings.
Fig. 1 is a schematic structural diagram of a data processing system provided in accordance with an embodiment of the present specification, including:
a client 102, a database node 104, and an object storage service node 106;
the database node 104 is configured to receive a data write instruction sent by the client 102, add the data write instruction to a log synchronization queue, generate a log entry of the data write instruction, concurrently write the log entry into a plurality of task queues based on a pre-established data write link, generate queue overload information when it is determined that the data amount of any one of the task queues is greater than a preset threshold, and reversely propagate the task queue overload information to the client 102 based on the data write link;
the client 102 is configured to store the data to be written to the object storage service node 106 when determining to receive the task queue overload information.
With the rapid development of internet technology, the application range of the distributed database is wider and wider. Because the distributed database comprises a plurality of distributed nodes and can provide read/write services for data for the client, the client can access the data through any one distributed node in the distributed database.
Based on this, in the data processing system provided in the embodiment of the present specification, first, a data write instruction is sent to a distributed database (database node 104) by a client 102, after the data write instruction is received by the database node 104, a log is written by using a distributed consistency algorithm (Raft), that is, the data write instruction is added to a log synchronization queue to obtain a log entry of the data write instruction, and the log entry is synchronized to other slave distributed nodes based on a write link established in advance.
Specifically, the database according to the embodiment of the present invention may be a database based on an LSM structure, including but not limited to an X-DB database, a LevelDB database, a RocksDB database, and the like.
In addition, the data write instruction may be a database transaction write instruction, or a write instruction for an operation of a database, or the like. The database node 104 may determine, according to the data write instruction, information of a database write task corresponding to the data write instruction, where the information of the database write task includes, but is not limited to, information of a write task corresponding to a database transaction, such as write log information, write information of log data of the database transaction, write memory information of the database transaction, or commit information of the database transaction. It should be understood that the above description is only exemplary, and the embodiments of the present disclosure are not limited thereto.
In some optional embodiments, when determining the information of the database writing task corresponding to the data writing instruction, the data writing instruction may be analyzed to obtain the information of the database writing task corresponding to the data writing instruction, and the information of the database writing task is added to the log synchronization queue. It should be understood that the above description is only exemplary, and the embodiments of the present disclosure are not limited thereto.
In addition, log entries may be synchronized through a data write link, and therefore, before the log entries are synchronized, the data write link needs to be established in advance, and in order to improve data write efficiency, in the embodiments of the present disclosure, the log entries are synchronized in a multi-thread asynchronous processing manner, so that when the data write link is established, task types corresponding to the multiple task queues may be determined, and task queues of the same task type are set at the same link node level of the data write link, for example, a log synchronization queue is set as a first link node level, a log application queue is set as a second link node level, the first link node level and the second link node level are in an upstream-downstream relationship, and the second link node level includes at least two task queues.
In addition, the write flow control and backpressure mechanism of the embodiment of the present specification is designed based on a plurality of task queues, in order to avoid the overload of the queues caused by the insertion of the centralized huge logs, in the process of writing the journal entries into the plurality of task queues, the queue load of any one of the plurality of task queues needs to be monitored, namely the data volume, in the case that the data volume is greater than the preset threshold, it is determined that the load of the task queue is too heavy, and in order to ensure the data writing efficiency of the database node, in the case that it is determined that the load of any one task queue is greater than the preset threshold, the queue is locked and the overload information of the task queue is propagated back to the client 102 through the data write link, and the locking process of any one task queue forms a back pressure on the data write link, which finally results in the locking (limited use) of the write of the client 102 to the database node 104.
In the embodiment of the present specification, the data amount includes: the number of data and/or the size of the data. The number of data is the maximum number of the tasks to be processed in the task queue, and the size of the data is the total size of the tasks to be processed in the task queue. And monitoring the number of the tasks to be processed in the queue and the data size corresponding to the tasks to be processed for each queue, and triggering a write flow control mechanism when the number or the sum of the number and the size exceeds a preset threshold value, so that the write operation of the task queue is locked. The locking of any one task queue can form back pressure on a data writing link, and finally the data writing of the client is limited.
When the data volume of the task queue is used for controlling the processing of the write operation, a plurality of optional specific control methods can be provided. For example, if the data size or the total data number of any task queue is greater than a preset threshold, no new task to be processed is added to all task queues of the data write link to which the task queue belongs; and if the data size or the total data number of any task queue does not reach the preset threshold value, new tasks to be processed can be continuously added to all the task queues.
Finally, the client 102 may temporarily store the data to be written to an object storage service node (OSS) upon receiving the task queue overload information returned by the database node 104.
In the embodiment of the present description, when a task queue of a data storage node is blocked due to high concurrent write traffic, an object storage service node is used as a data cache region to temporarily store write traffic that cannot be normally written into a database node, which is beneficial to relieving write pressure brought by the high concurrent write traffic for the database node, and further improves the capability of the database node for bearing burst write traffic.
In a specific implementation, in addition to monitoring the data amount of the plurality of task queues, the load of the log synchronization queue may also be monitored, and therefore, the database node 104 is further configured to monitor the load information of the log synchronization queue and the plurality of task queues, and if it is determined that the load of the log synchronization queue and/or any one of the plurality of task queues is greater than a preset load threshold, queue overload information is generated.
Further, the load is the data volume carried by the queue, and the database node 104 is further configured to monitor the load information of the log synchronization queue and the plurality of task queues, and if it is determined that the data volume of the log synchronization queue and/or any one of the plurality of task queues is greater than a preset data volume threshold, generate queue overload information, where the load information includes the data volume of the log synchronization queue and/or any one of the plurality of task queues.
Specifically, in the process of writing the log entries into the plurality of task queues, in addition to monitoring the data volume of any one of the plurality of task queues, the data volume of the log synchronization queue may also be monitored, and in a case that the data volume of any one of the log synchronization queue and the plurality of task queues is greater than a preset threshold, it is determined that there is an overload of the queue, and in order to ensure the data writing efficiency of the database node, in the embodiment of the present specification, in a case that it is determined that there is a load of any one of the queue that is greater than the preset threshold, the queue is locked, and the data writing link is used to reversely transmit the overload information of the task queue to the client 102, so that the write of the client 102 to the database node 104 is locked (limited in use).
As mentioned before, the data amount also includes the number of data and/or the size of the data.
In the embodiment of the description, the data volume of each task queue is controlled by using the preset threshold value, so that the writing speed is controlled, and the stability of the distributed database is improved.
In a specific implementation, the database node 104 is further configured to add the log entry to a log application queue, apply the log entry in the log application queue to a state machine, and store an execution result of the state machine in a message queue.
Further, the database node 104 is further configured to monitor the log synchronization queue and the load information of the message queue, and if it is determined that a difference between the load of the log synchronization queue and the load of the message queue is greater than a preset threshold, generate queue overload information.
Specifically, after the database node adds the data write instruction to the log synchronization queue and generates a log entry of the data write instruction, the log entry is applied to a state machine of the database node, and an execution result of the state machine is stored in a message queue of the database node.
In addition, in order to avoid the asynchronization of the data processing speeds of the log synchronization queue and the message queue, the data write-in flow monitoring mechanism also monitors the number (load) of tasks to be processed (data write-in instructions to be processed) in the log synchronization queue and the message queue, calculates the difference between the number of the tasks to be processed in the message queue and the number of the tasks to be processed in the log synchronization queue, and if the difference between the number of the tasks to be processed in the message queue and the number of the tasks to be processed in the log synchronization queue is greater than a preset threshold, it is indicated that the task processing speed of the message queue is far behind the task processing speed of the log synchronization queue, so that the write-in flow control can be performed on the log synchronization queue to ensure that the data processing progress of each task queue in the database node is kept synchronous.
In specific implementation, the database nodes 104 include a master database node and at least two slave database nodes;
the master database node is configured to receive a data writing instruction sent by the client, add the data writing instruction to a log synchronization queue, generate a log entry of the data writing instruction, and write the log entry into the at least two slave database nodes concurrently based on a pre-established data writing link.
Further, the master database node is further configured to concurrently write the log entries to a first log application queue of the at least two slave database nodes based on a pre-established data write link;
the at least two slave database nodes are configured to apply the log entries in the first log application queue to a first state machine, store the execution result of the first state machine to a first message queue, and send feedback information of successful log submission to the master database node;
the master database node further configured to submit the log entry if the feedback information is received.
Further, the master database node is further configured to, in a case where the feedback information is received, add the log entry to a second log application queue, apply the log entry in the second log application queue to a second state machine, and store an execution result of the second state machine to a second message queue.
Specifically, after the distributed database is created, the distributed database may include a plurality of nodes, one of which is a master database node, and all nodes except the master database node are slave database nodes, and in case of a failure of the master database node, the master database node may be reselected from other slave database nodes, and in order to ensure that the slave database node may be elected as the master database node, thereby providing a service to the client 102, each slave database node needs to store the same data as the master database node.
The master database node in the distributed database (database node 104) implements log synchronization through a distributed consistency algorithm, and is configured to receive a data write instruction sent by the client 102, add the data write instruction to a log synchronization queue, obtain a log entry of the data write instruction, initiate a synchronization request operation on all other slave database nodes, and copy the log entry by the slave database node, so as to implement consistency of data stored by each database node in the distributed database.
Specifically, the master database node synchronizes log entries to the slave database nodes concurrently, that is, one master database node synchronizes log entries to at least two slave database nodes simultaneously, and after the log entries are synchronized to most of the slave database nodes, the master database node may send a prompt message for submitting the log to the slave database nodes;
specifically, a master database node concurrently transmits log entries to slave database nodes, that is, the master database simultaneously synchronizes the log entries to a plurality of slave database nodes, the slave database nodes add the log entries to a log application queue, and send prompt information of successful log replication to the master database node; after receiving most (more than half) of feedback information of successful log replication returned by the slave database nodes, the master database node sends prompt information for submitting logs to the slave database nodes;
after receiving prompt information of submitting a log sent by a master database node, submitting the log by a slave database node, specifically, applying a log entry to a first state machine of the slave database node, storing an execution result of the first state machine to a first message queue of the slave database node, and sending feedback information of successful log submission to the master database node;
the master database node applies the log entries submitted from the database node to a second state machine of the master database node, and particularly, the master database node adds the copied log entries from the database node to a second log application queue of the master database node, and after receiving feedback information of successful log submission returned from the database node, applies the log entries submitted from the database node to the second state machine of the master database node, and stores an execution result of the second state machine to a second message queue of the master database node.
Through a distributed consistency protocol algorithm (Raft algorithm), and log synchronization is carried out on the master database node to each slave database node concurrently, the method is not only beneficial to ensuring that all submitted log entries are persistent and can be executed by all available state machines, but also beneficial to improving the processing efficiency of data writing instructions. Once the master database node that created the log entry copies it onto the second half of the slave database nodes (i.e., the slave database nodes securely copy the log entry), the log entry is committed and the master database node applies the log entry committed from the slave database nodes to the state machine, thereby facilitating the assurance of consistency of the data stored by the master database node and each slave database node.
In specific implementation, the master database node is further configured to monitor load information of the log synchronization queue and the second message queue, and if it is determined that a difference between the load of the log synchronization queue and the load of the second message queue is greater than a preset threshold, generate queue overload information.
Specifically, in this embodiment of the present specification, in consideration of modular design, a log application layer of a master database node does not really do data storage, but is designed in an interface form (interfaces), what is done is to send a data write instruction to be processed to a second message queue of a storage layer, and add 1 to a log application value (application index) after data to be written corresponding to any data write instruction in the message queue is successfully written, where the log application value is used to indicate which log entries have completed data storage.
And the storage layer of the main database node can guarantee that the data in the table queue can be successfully applied by the application through checkpoint (check node/check mechanism) and replay (playback) mechanisms. Specifically, after the master database node adds the execution result of the second state machine to the second message queue, the callpoint may call the check node (checkpoint) to ensure that the data in the second message queue is successfully written through a playback mechanism.
In addition, in order to avoid the asynchronous data processing speed between the log synchronization queue and the second message queue, the data write flow monitoring mechanism may also monitor the log application value and the number (last index) of the to-be-processed tasks (to-be-processed data write instructions) in the second message queue of the storage layer, and calculate a difference (Gap) between the number of the to-be-processed tasks in the second message queue and the log application value, and if the difference is greater than a preset threshold, it is indicated that the task processing speed of the storage layer is far behind the task processing speed of the log synchronization queue, so that write flow control may be performed on the log synchronization queue to ensure that the data processing progress of each task queue in the database node is kept synchronous.
In addition, the database node 104 is further configured to lock any one of the task queues with a data amount greater than a preset threshold, monitor load update information of the task queue with the data amount greater than the preset threshold, and send queue unlocking information to the client 102 if it is determined that the loads of all the task queues are less than the preset threshold according to the load update information;
the client 102 is further configured to receive the queue unlocking information, obtain the data to be written stored in the object storage service node 106, and send a data writing instruction to the database node 104 based on the data to be written.
Specifically, when determining that the data volume of any one task queue in the plurality of task queues is greater than a preset threshold according to the load monitoring result, the database node performs locking processing on any one task queue in the plurality of task queues having the data volume greater than the preset threshold, or performs locking processing on all the plurality of task queues to generate queue locking information, and reversely transmits the queue locking information to the client 102 based on the data write-in link through a back-pressure mechanism.
After receiving the queue locking information, the client 102 performs data writing operation to the database node 104, and is also locked, so that the client 102 can temporarily store the data to be written to the object storage service node.
The database node 104 continues to monitor the load update information of each task queue, and if it is determined that the loads of the queues are all smaller than the preset threshold according to the load update information, queue unlocking information is sent to the client 102, so that the client 102 rewrites the data to be written, which is temporarily stored in the object storage service node 106, into the database node 104.
In the embodiment of the specification, a database node receives a data write-in instruction sent by a client, adds the data write-in instruction to a log synchronization queue, generates a log entry of the data write-in instruction, concurrently writes the log entry into a plurality of task queues based on a pre-established data write-in link, generates queue overload information when it is determined that the data volume of any one of the task queues is greater than a preset threshold, and reversely transmits the task queue overload information to the client based on the data write-in link; and under the condition that the client determines to receive the overload information of the task queue, storing the data to be written into the object storage service node.
In the embodiment of the specification, the data volume of each task queue is controlled by using a preset threshold value so as to control the writing speed, which is beneficial to improving the stability of the distributed database; when the task queue of the database node is blocked due to high concurrent write-in flow, a backpressure mechanism is formed, and generated overload information of the task queue is reversely transmitted to the client, so that the client takes the object storage service node as a data cache region to temporarily store the write-in flow which cannot be normally written into the database node, write-in pressure brought by the database node due to the high concurrent write-in flow is relieved, and the capability of the database node for bearing burst write-in flow is further improved while the stability of the distributed database is ensured.
Fig. 2 shows a process flow diagram of a data processing method provided in accordance with an embodiment of the present specification, applied to a client, and including steps 202 to 204.
Step 202, sending a data writing instruction to a database node based on data to be written, where the data writing instruction carries data identification information of the data to be written.
And 204, storing the data to be written into an object storage service node under the condition of receiving the overload information of the task queue returned by the database node.
Specifically, the database nodes comprise a master database node and at least two slave database nodes; a client sends a data writing instruction to a master database node, the master database node adds the data writing instruction to a log synchronization queue after receiving the data writing instruction sent by the client, generates a log entry of the data writing instruction, and writes the log entry into a first log application queue of at least two slave database nodes concurrently based on a pre-established data writing link;
the at least two slave database nodes apply the log entries in the first log application queue to a first state machine, store the execution result of the first state machine to a first message queue, and send feedback information of successful log submission to the master database node;
and the master database node adds the log entries to a second log application queue under the condition of receiving the feedback information, applies the log entries in the second log application queue to a second state machine, and stores the execution result of the second state machine to a second message queue.
In addition, the master database node may monitor load information of the log synchronization queue and the second message queue, generate queue overload information if it is determined that a difference between the load of the log synchronization queue and the load of the second message queue is greater than a preset threshold, and reversely propagate the task queue overload information to the client based on a data write link.
And the client stores the data to be written into the object storage service node after receiving the overload information of the task queue.
Further, the master database node can continue to monitor load updating information of each task queue, and if the loads of the queues are determined to be smaller than the preset threshold value according to the load updating information, queue unlocking information is sent to the client;
and after receiving queue unlocking information returned by the database node, the client acquires the data to be written stored in the object storage service node, and sends a data writing instruction to the database node based on the data to be written so as to rewrite the data to be written into the database.
In the embodiment of the specification, the data volume of each task queue is controlled by using a preset threshold value so as to control the writing speed, which is beneficial to improving the stability of the distributed database; when the task queue of the database node is blocked due to high concurrent write-in flow, a backpressure mechanism is formed, and generated overload information of the task queue is reversely transmitted to the client, so that the client takes the object storage service node as a data cache region to temporarily store the write-in flow which cannot be normally written into the database node, write-in pressure brought by the database node due to the high concurrent write-in flow is relieved, and the capability of the database node for bearing burst write-in flow is further improved while the stability of the distributed database is ensured.
The foregoing is a schematic solution of a data processing method applied to a client according to this embodiment. It should be noted that the technical solution of the data processing method and the technical solution of the data processing system belong to the same concept, and details that are not described in detail in the technical solution of the data processing method can be referred to the description of the technical solution of the data processing system.
The data processing method is further described below with reference to fig. 3. Fig. 3 shows an interaction diagram of a data processing method provided in an embodiment of the present specification, and specific steps include steps 302 to 326.
Step 302, the client sends a first data write command to the primary service node.
Step 304, the master database node adds the data writing instruction to a log synchronization queue, and generates a log entry of the data writing instruction.
The master database node concurrently synchronizes the log entries to the first log application queue of the slave database node based on the pre-established data write link, step 306.
Step 308, applying the log entry in the first log application queue from the database node to a first state machine, and storing the execution result of the first state machine to a first message queue.
Step 310, the slave database node sends feedback information that the log submission is successful to the master database node.
In step 312, the master database node adds the log entry to a second log application queue, applies the log entry in the second log application queue to a second state machine, and stores the execution result of the second state machine to a second message queue.
Step 314, the master database node monitors the load information of the log synchronization queue and the second message queue, and if it is determined that the difference value between the load of the log synchronization queue and the load of the second message queue is greater than a preset threshold value, queue locking information is generated.
In step 316, the master database node propagates the queue locking information back to the client based on the data write link.
Step 318, the client stores the data to be written into the object storage service node.
In step 320, the master database node monitors the load update information of the logged synchronization queue and the second message queue.
Step 322, if it is determined according to the load update information that the difference between the load of the log synchronization queue and the load of the second message queue is less than or equal to a preset threshold, sending, by the master database node, queue unlocking information to the client.
In step 324, the client obtains the data to be written stored in the object storage service node.
In step 326, the client sends a second data writing instruction to the master database node based on the data to be written.
In the embodiment of the specification, the data volume of each task queue is controlled by using a preset threshold value so as to control the writing speed, which is beneficial to improving the stability of the distributed database; when the task queue of the database node is blocked due to high concurrent write-in flow, a backpressure mechanism is formed, and generated overload information of the task queue is reversely transmitted to the client, so that the client takes the object storage service node as a data cache region to temporarily store the write-in flow which cannot be normally written into the database node, write-in pressure brought by the database node due to the high concurrent write-in flow is relieved, and the capability of the database node for bearing burst write-in flow is further improved while the stability of the distributed database is ensured.
Corresponding to the above method embodiment, the present specification further provides a data processing apparatus embodiment, and fig. 4 shows a schematic diagram of a data processing apparatus provided in an embodiment of the present specification. As shown in fig. 4, the apparatus includes:
a sending module 402, configured to send a data writing instruction to a database node based on data to be written, where the data writing instruction carries data identification information of the data to be written;
and the storage module 404 is configured to store the data to be written to the object storage service node when receiving the overload information of the task queue returned by the database node.
Optionally, the data processing apparatus further includes:
and the receiving module is configured to receive queue unlocking information returned by the database node, acquire the data to be written stored in the object storage service node, and send a data writing instruction to the database node based on the data to be written.
In the embodiment of the specification, the data volume of each task queue is controlled by using a preset threshold value so as to control the writing speed, which is beneficial to improving the stability of the distributed database; when the task queue of the database node is blocked due to high concurrent write-in flow, a backpressure mechanism is formed, and generated overload information of the task queue is reversely transmitted to the client, so that the client takes the object storage service node as a data cache region to temporarily store the write-in flow which cannot be normally written into the database node, write-in pressure brought by the database node due to the high concurrent write-in flow is relieved, and the capability of the database node for bearing burst write-in flow is further improved while the stability of the distributed database is ensured.
The above is a schematic configuration of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method.
FIG. 5 illustrates a block diagram of a computing device 500 provided in accordance with one embodiment of the present description. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530, and database 550 is used to store data.
Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 540 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 500, as well as other components not shown in FIG. 5, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 5 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 500 may also be a mobile or stationary server.
Wherein the memory 510 is configured to store computer-executable instructions and the processor 520 is configured to execute the following computer-executable instructions to implement the steps of the data processing method.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.
An embodiment of the present specification also provides a computer readable storage medium storing computer instructions which, when executed by a processor, are used for implementing the steps of the data processing method.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims (15)

1. A data processing system comprising:
the system comprises a client, a database node and an object storage service node;
the database node is configured to receive a data write-in instruction sent by the client, add the data write-in instruction to a log synchronization queue, generate a log entry of the data write-in instruction, write the log entry into a plurality of task queues concurrently based on a pre-established data write-in link, generate queue overload information when it is determined that the data volume of any one of the task queues is greater than a preset threshold, and reversely propagate the task queue overload information to the client based on the data write-in link;
and the client is configured to store the data to be written to the object storage service node under the condition of determining to receive the overload information of the task queue.
2. The data processing system of claim 1, wherein the database node is further configured to monitor load information of the log synchronization queue and the plurality of task queues, and generate queue overload information if it is determined that a load of the log synchronization queue and/or any one of the plurality of task queues is greater than a preset load threshold.
3. The data processing system according to claim 1 or 2, wherein the database node is further configured to monitor load information of the log synchronization queue and the plurality of task queues, and generate queue overload information if it is determined that a data amount of the log synchronization queue and/or any one of the plurality of task queues is greater than a preset data amount threshold, where the load information includes the data amount of the log synchronization queue and/or any one of the plurality of task queues.
4. The data processing system of claim 1 or 2, the database node further configured to add the journal entry to a journal application queue and apply the journal entry in the journal application queue to a state machine, store the execution result of the state machine to a message queue.
5. The data processing system of claim 4, the database node further configured to monitor the log sync queue and the load information of the message queue, and generate queue overload information if it is determined that a difference between the load of the log sync queue and the load of the message queue is greater than a preset threshold.
6. The data processing system of claim 1, the database nodes comprising a master database node and at least two slave database nodes;
the master database node is configured to receive a data writing instruction sent by the client, add the data writing instruction to a log synchronization queue, generate a log entry of the data writing instruction, and write the log entry into the at least two slave database nodes concurrently based on a pre-established data writing link.
7. The data processing system of claim 6, the master database node further configured to concurrently write the log entries to a first log application queue of the at least two slave database nodes based on a pre-established data write link;
the at least two slave database nodes are configured to apply the log entries in the first log application queue to a first state machine, store the execution result of the first state machine to a first message queue, and send feedback information of successful log submission to the master database node;
the master database node further configured to submit the log entry if the feedback information is received.
8. The data processing system of claim 7, the master database node further configured to, upon receipt of the feedback information, add the log entry to a second log application queue, apply the log entry in the second log application queue to a second state machine, and store the result of the execution of the second state machine to a second message queue.
9. The data processing system of claim 8, the master database node further configured to monitor load information of the journaled synchronization queue and the second message queue, and generate queue overload information if it is determined that a difference between the load of the journaled synchronization queue and the load of the second message queue is greater than a preset threshold.
10. The data processing system according to claim 1, wherein the database node is further configured to lock any one of the plurality of task queues having a data volume greater than a preset threshold, monitor load update information of the task queue having the data volume greater than the preset threshold, and send queue unlocking information to the client if it is determined according to the load update information that the loads of the plurality of queues are all less than the preset threshold;
the client is further configured to receive the queue unlocking information, acquire the data to be written stored in the object storage service node, and send a data writing instruction to the database node based on the data to be written.
11. A data processing method is applied to a client and comprises the following steps:
sending a data writing instruction to a database node based on data to be written, wherein the data writing instruction carries data identification information of the data to be written;
and under the condition of receiving the overload information of the task queue returned by the database node, storing the data to be written into an object storage service node.
12. The data processing method of claim 11, further comprising:
and receiving queue unlocking information returned by the database node, acquiring the data to be written stored in the object storage service node, and sending a data writing instruction to the database node based on the data to be written.
13. A data processing apparatus comprising:
the data writing module is configured to send a data writing instruction to a database node based on data to be written, wherein the data writing instruction carries data identification information of the data to be written;
and the storage module is configured to store the data to be written to the object storage service node under the condition of receiving the overload information of the task queue returned by the database node.
14. A computing device, comprising:
a memory and a processor;
the memory is for storing computer-executable instructions for execution by the processor to perform the steps of the data processing method of any one of claims 11 to 12.
15. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the data processing method of any one of claims 11 to 12.
CN202110189207.8A 2021-02-19 2021-02-19 Data processing system, method and device Pending CN113297323A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110189207.8A CN113297323A (en) 2021-02-19 2021-02-19 Data processing system, method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110189207.8A CN113297323A (en) 2021-02-19 2021-02-19 Data processing system, method and device

Publications (1)

Publication Number Publication Date
CN113297323A true CN113297323A (en) 2021-08-24

Family

ID=77318971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110189207.8A Pending CN113297323A (en) 2021-02-19 2021-02-19 Data processing system, method and device

Country Status (1)

Country Link
CN (1) CN113297323A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722401A (en) * 2021-11-04 2021-11-30 树根互联股份有限公司 Data caching method and device, computer equipment and readable storage medium
CN114116774A (en) * 2022-01-28 2022-03-01 北京安帝科技有限公司 Log data query method and device
CN114338560A (en) * 2021-12-30 2022-04-12 中国电信股份有限公司 Processing system, method, storage medium and electronic device of queue data
CN115174366A (en) * 2022-05-30 2022-10-11 阿里巴巴(中国)有限公司 Data processing method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722401A (en) * 2021-11-04 2021-11-30 树根互联股份有限公司 Data caching method and device, computer equipment and readable storage medium
CN114338560A (en) * 2021-12-30 2022-04-12 中国电信股份有限公司 Processing system, method, storage medium and electronic device of queue data
CN114116774A (en) * 2022-01-28 2022-03-01 北京安帝科技有限公司 Log data query method and device
CN115174366A (en) * 2022-05-30 2022-10-11 阿里巴巴(中国)有限公司 Data processing method and device
CN115174366B (en) * 2022-05-30 2023-10-20 浙江天猫技术有限公司 Data processing method and device

Similar Documents

Publication Publication Date Title
CN113297323A (en) Data processing system, method and device
US10735509B2 (en) Systems and methods for synchronizing microservice data stores
US20190235979A1 (en) Systems and methods for performing computing cluster node switchover
CN102521083B (en) Backup method and system of virtual machine in cloud computing system
JP6602866B2 (en) Message broker system with parallel persistence
CN105338095A (en) Conversation data processing method and device
CN113641511A (en) Message communication method and device
CN114637475A (en) Distributed storage system control method and device and readable storage medium
CN111651275A (en) MySQL cluster automatic deployment system and method
WO2023185934A1 (en) Data processing method and device
CN110825562A (en) Data backup method, device, system and storage medium
KR20180061493A (en) Recovery technique of data intergrity with non-stop database server redundancy
CN113553179A (en) Distributed key value storage load balancing method and system
JPH03206542A (en) Remote in terruption system
CN113297159B (en) Data storage method and device
CN107465725B (en) Heterogeneous long transaction processing system and method based on client information control system
CN113110948A (en) Disaster tolerance data processing method and device
WO2019071801A1 (en) Data synchronization method
CN112364104A (en) Distributed database capacity expansion method, distributed database system and computer readable storage medium
WO2023240995A1 (en) Data recovery method and apparatus for dual-machine hot standby system, and medium
CN113297229A (en) Method for routing read request and feedback message, respective device and database
US20180309702A1 (en) Method and device for processing data after restart of node
CN115174596A (en) Equipment remote copying method, device and medium
CN111581023A (en) Bank memory data processing method and device
Shvidkiy et al. Evaluation of the impact the hyper-converged infrastructure storage subsystem synchronization on the overall performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40058618

Country of ref document: HK