CN115454613A - Distributed processing method, node and distributed database system of DDL task - Google Patents

Distributed processing method, node and distributed database system of DDL task Download PDF

Info

Publication number
CN115454613A
CN115454613A CN202211269138.2A CN202211269138A CN115454613A CN 115454613 A CN115454613 A CN 115454613A CN 202211269138 A CN202211269138 A CN 202211269138A CN 115454613 A CN115454613 A CN 115454613A
Authority
CN
China
Prior art keywords
subtask
task
ddl
scheduling information
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211269138.2A
Other languages
Chinese (zh)
Inventor
李霞
莫航杰
黄潇
刘奇
黄东旭
崔秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pingkai Star Beijing Technology Co ltd
Original Assignee
Pingkai Star Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pingkai Star Beijing Technology Co ltd filed Critical Pingkai Star Beijing Technology Co ltd
Priority to CN202211269138.2A priority Critical patent/CN115454613A/en
Publication of CN115454613A publication Critical patent/CN115454613A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a distributed processing method, a node and a distributed database system for DDL tasks, and relates to the field of databases. The method comprises the following steps: obtaining a DDL task to be executed from a DDL task queue in a distributed database system; splitting the DDL task into a plurality of subtasks; and creating scheduling information of each subtask, wherein the scheduling information comprises execution state information, the initial execution state information is unexecuted, so that at least one execution node in each node determines an unexecuted target subtask according to the scheduling information of each subtask, and updates the scheduling information of the target subtask according to the condition of executing the target subtask. The embodiment of the application realizes that one DDL task is executed through a plurality of nodes, compared with the prior art, the decentralized processing mode is realized, and the problems of insufficient execution capacity, flexibility and efficiency of the existing DDL task are solved.

Description

Distributed processing method, node and distributed database system of DDL (distributed data storage) task
Technical Field
The present application relates to the field of database technologies, and in particular, to a distributed processing method for DDL tasks, a node, a distributed database system electronic device, a computer-readable storage medium, and a computer program product.
Background
At present, a distributed scheduling system of Data Definition Language (DDL) statements is mainly implemented in the following centralized manner:
1, taking traditional databases such as Oracle, mySQL and the like as examples. Generally, a Meta data Lock system is implemented, and the execution sequence of various DDL statements is coordinated in a manner of locking an object which needs to be subjected to DDL change. Achieving the concurrent effect.
2, the distributed database system Ocean Base, TDSQL database also achieves the effect of concurrent DDL statement scheduling by implementing a similar MDL lock in the distributed system.
In the related art, for one DDL task, only one master node in the distributed database system executes the task, and bottlenecks exist in the execution capacity, flexibility and efficiency.
Disclosure of Invention
Embodiments of the present application provide a distributed processing method, a node, a distributed database system, an electronic device, a computer-readable storage medium, and a computer program product for a DDL task, which can solve the above problems in the prior art. The technical scheme is as follows:
according to an aspect of the embodiments of the present application, a distributed processing method for a DDL task is provided, and is applied to at least one master node in each node of a distributed database system, where the method includes:
obtaining a DDL task to be executed from a DDL task queue in a distributed database system;
splitting the DDL task into a plurality of subtasks;
and creating scheduling information of each subtask, wherein the scheduling information comprises execution state information, the initial execution state information is unexecuted, so that at least one execution node in each node determines an unexecuted target subtask according to the scheduling information of each subtask, and the scheduling information of the target subtask is updated according to the condition of executing the target subtask.
As an alternative embodiment, splitting the DDL task into a plurality of subtasks includes:
splitting the DDL task in multiple rounds;
the number of the sub tasks of each round of splitting is related to the number of the callback threads of each node in the current round, and after each round of splitting, whether the number of the callback threads of each node in the next round of adjusting is shown or not is determined according to the part of the DDL task which is not split;
each callback thread is used to execute a subtask.
As an alternative embodiment, the updated execution state information includes completed;
the method further comprises the following steps:
and determining that the execution of the DDL task is finished according to the fact that the execution state information corresponding to all subtasks of the DDL task is finished.
As an optional embodiment, the scheduling information further includes a task identifier of the DDL task to which the subtask belongs;
the method further comprises the following steps:
determining a task identifier of a DDL task to be cancelled;
determining target scheduling information comprising a task identifier of the DDL task which is cancelled to be executed;
and updating the execution state information of the target scheduling information to be cancelled, so that the execution node executing the corresponding subtask is updated to be cancelled according to the execution state information of the target scheduling information, cancels the execution of the corresponding subtask, and updates the execution state information of the target scheduling information to be cancelled after the execution is cancelled.
As an optional embodiment, the scheduling information further includes at least one of a subtask identification of the subtask, a start value, an end value, a currently processed value of the subtask, a thread identification of a callback thread executing the subtask, lease information, a number of processing lines, or error information.
According to another aspect of the embodiments of the present application, a distributed processing method for a data definition language DDL task, applied to at least one execution node in nodes of a distributed database system, includes:
acquiring scheduling information of each subtask of the DDL task, wherein the scheduling information comprises execution state information, and the initial execution state information is unexecuted;
determining unexecuted target subtasks according to the scheduling information of each subtask;
initiating to seize the target subtask, and if the target subtask is successfully seized, executing the target subtask;
updating the scheduling information of the target subtask according to the condition of executing the target subtask;
the subtasks are obtained by splitting a DDL task to be executed which is obtained by a main node in a distributed database system from a DDL task queue, and the scheduling information of the subtasks is created by the main node.
As an alternative embodiment, the execution node comprises at least one callback thread;
the initiating and preempting the target subtask, and if the preempting is successful, executing the target subtask includes:
and initiating the target subtask preemption by the callback thread in an idle state, if the preemption is successful, executing the target subtask by the callback thread, and updating the state of the callback thread to be busy.
As an alternative embodiment, executing the target subtask further includes:
if the execution state information of the target subtask is updated to be cancelled, the target subtask is stopped to be executed;
the updating the scheduling information of the target subtask according to the condition of executing the target subtask includes:
and updating the execution state information of the target subtask to be cancelled according to the stop of executing the target subtask.
According to another aspect of an embodiment of the present application, there is provided a master node in a distributed database system, the master node including:
the task acquisition module is used for acquiring DDL tasks to be executed from a DDL task queue in the distributed database system;
the task splitting module is used for splitting the DDL task into a plurality of subtasks;
and the scheduling information creating module is used for creating scheduling information of each subtask, wherein the scheduling information comprises execution state information, the initial execution state information is unexecuted, so that at least one execution node in each node determines an unexecuted target subtask according to the scheduling information of each subtask, and the scheduling information of the target subtask is updated according to the condition of executing the target subtask.
According to another aspect of the embodiments of the present application, there is provided an executing node in a distributed database system, the executing node including:
the device comprises a scheduling information acquisition module, a task scheduling module and a task scheduling module, wherein the scheduling information acquisition module is used for acquiring scheduling information of each subtask of a DDL (distributed document language) task, the scheduling information comprises execution state information, and the initial execution state information is unexecuted;
the subtask determining module is used for determining unexecuted target subtasks according to the scheduling information of each subtask;
the subtask execution module is used for initiating and seizing the target subtask, and if the target subtask is successfully seized, the target subtask is executed;
the scheduling information updating module is used for updating the scheduling information of the target subtask according to the condition of executing the target subtask;
the subtasks are obtained by splitting a DDL task to be executed which is obtained by a main node in a distributed database system from a DDL task queue, and the scheduling information of the subtasks is created by the main node.
According to another aspect of an embodiment of the present application, there is provided a distributed database system including the master node and the execution node of the above aspect.
According to another aspect of embodiments of the present application, there is provided an electronic device, which includes a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to implement the steps of the distributed processing method of the DDL task.
According to still another aspect of embodiments of the present application, there is provided a computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of the distributed processing method of the DDL task described above.
According to an aspect of embodiments of the present application, there is provided a computer program product comprising a computer program, which when executed by a processor, implements the steps of the distributed processing method of the DDL task described above.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
obtaining a DDL task to be executed from a DDL task queue; the DDL task is divided into a plurality of subtasks, scheduling information of each subtask is created, the scheduling information comprises execution state information, the initial execution state information is not executed, the scheduling information is disclosed to all nodes in a distributed database system, so that at least one execution node in each node determines an unexecuted target subtask according to the scheduling information of each subtask, and the scheduling information of the target subtask is updated according to the condition of executing the target subtask.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a distributed processing method for a DDL task executed by a master node according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a distributed processing method for a DDL task executed by an execution node according to an embodiment of the present application;
fig. 3 is a schematic diagram of an architecture of distributed processing of a DDL task according to an embodiment of the present application;
fig. 4 is a schematic flowchart of a distributed processing method for a DDL task according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a host node according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an execution node according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification in connection with embodiments of the present application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, as embodied in the art. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" can be implemented as "a", or as "B", or as "a and B".
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The present application provides a distributed processing method, a node, a distributed database system, an electronic device, a computer-readable storage medium, and a computer program product for DDL tasks, which aim to solve the above technical problems in the prior art.
The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below through descriptions of several exemplary embodiments. It should be noted that the following embodiments may be referred to, referred to or combined with each other, and the description of the same terms, similar features, similar implementation steps and the like in different embodiments is not repeated.
The embodiment of the application provides a distributed processing method of a DDL task, which is applied to at least one main node (owner node) in each node of a distributed database system. It should be noted that the number of the master nodes in the embodiment of the present application is adjustable, and the operation mode of the distributed database system may be selected by setting a system parameter, namely a DDL _ Parallel _ Model parameter: the usable forms are as follows:
Single-MTasks, single master model, i.e. a Single master node is deployed, but it should be noted that one DDL task is split into multiple subtasks, and thus is executed in a distributed manner in the whole cluster;
Mutli-Owner, multi-master model, i.e., deploying multiple master nodes, in one embodiment, all nodes in a distributed database system may autonomously extract and execute DDL tasks.
Which operating mode is used is determined by the switch. In general, a multi-master model is better suited in a distributed database cluster.
As shown in fig. 1, the method includes:
s101, obtaining DDL tasks to be executed from a DDL task queue in the distributed database system.
It should be understood that each node in the distributed database system may receive a DDL change statement from a client, and store the DDL change statement as a DDL task in a DDL task queue, and the master node may take the DDL from the DDL task queue for execution, so as to improve the efficiency of executing the DDL statement for a large table (i.e., a data table with a large data capacity), and perform parallel processing on the DDL task, thereby achieving better performance.
S102, splitting the DDL task into a plurality of subtasks.
It should be noted that, the DDL task indicates a piece of data to be processed in the database, and therefore may be split into multiple subtasks according to the number of lines of the data, for example, the piece of data has 1000 lines, and each 100 lines of data to be processed may be taken as one subtask, so that the data is split into 10 subtasks.
S103, creating scheduling information of each subtask, wherein the scheduling information comprises execution state information, the initial execution state information is unexecuted, so that at least one execution node in each node determines an unexecuted target subtask according to the scheduling information of each subtask, and the scheduling information of the target subtask is updated according to the situation of executing the target subtask.
According to the method and the device, corresponding scheduling information is created for each subtask, the scheduling information is used for recording various key information in the execution process of the subtask, the scheduling information comprises execution state information, the initial execution state information of each subtask is not executed, each execution node in a distributed database system finds the unexecuted subtask and performs preemption by reading the scheduling information, if the preemption is successful, the preempted subtask is executed, and the execution node can update the scheduling information of the target subtask according to the condition of executing the target subtask, so that a master node can conveniently determine whether the DDL task is executed completely according to the scheduling information of each subtask.
It should be noted that the master node in the embodiment of the present application may also be an execution node, so that the sub-tasks of the DDL task may be completed by the master node itself, or may be completed by other nodes other than the master node.
In an embodiment, the present application embodiment may set at least one reorganization thread in the master node, where each reorganization thread is configured to:
1) Obtaining a DDL task to be executed from a DDL task queue;
2) Periodically checking whether all subtasks of one DDL task are finished;
3) An initial resource usage intent is set.
The distributed processing method of the embodiment of the application is applied to the main node, and the DDL task to be executed is obtained from the DDL task queue; the DDL task is divided into a plurality of subtasks, scheduling information of each subtask is created, the scheduling information comprises execution state information, the initial execution state information is not executed, the scheduling information is disclosed to all nodes in a distributed database system, so that at least one execution node in each node determines an unexecuted target subtask according to the scheduling information of each subtask, and the scheduling information of the target subtask is updated according to the condition of executing the target subtask.
On the basis of the foregoing embodiments, as an optional embodiment, splitting the DDL task into multiple subtasks includes:
splitting the DDL task in multiple rounds;
the number of the sub tasks of each splitting round is related to the number of the callback threads of each node in the current round, and after each splitting round, whether the number of the callback threads of each node in the next round is adjusted is determined according to the part of the DDL task which is not split;
each callback thread is used to execute a subtask.
For example, for 1000 rows of data, 2 nodes are provided in a distributed database system for one DDL task, and each node has 10 callback threads in an initial state, so that in the first round of splitting, 20 subtasks are considered to be split first, each node can process 10 subtasks by using the current callback thread, if each subtask occupies 10 rows of data, 800 rows of data are left for processing in the DDL task after the first round of splitting, and at this time, capacity expansion can be performed on the callback thread of each node, for example, the number of the callback threads of each node is increased to 20 rows, then in the second round of splitting, 40 subtasks can be split, and 400 rows of data are left for processing in the DDL task after the first round of splitting, since the 400 rows of data can be split into 40 subtasks, and the callback threads of two nodes at present support that the subtasks are all processed in the next round, capacity expansion may not be performed on the callback thread of each node.
According to the method and the device, the DDL is split by dividing into multiple wheels, the total number of the scheduling information can be reduced, the efficiency of acquiring the subtasks by the execution nodes is improved, the large table of mass data can be prevented from being inserted into the mass subtasks at a time, and the efficiency of preparing the subtasks is improved.
On the basis of the foregoing embodiments, as an alternative embodiment, the updated execution state information includes the completion;
the method further comprises the following steps:
and determining that the DDL task is executed according to the completion of the execution state information corresponding to all the subtasks of the DDL task.
In one embodiment, after the execution node completes a subtask, the scheduling information of the subtask may be deleted, so that the master node determines that the DDL task is completed according to the deletion of all scheduling information of the DDL task.
The embodiment of the present application further supports a function of canceling the execution of the subtask, and specifically, the method includes:
determining a task identifier of a DDL task to be cancelled;
determining target scheduling information comprising a task identifier of the DDL task which is cancelled to be executed;
and updating the execution state information of the target scheduling information to be cancelled, so that the execution node executing the corresponding subtask is updated to be cancelled according to the execution state information of the target scheduling information, cancels the execution of the corresponding subtask, and updates the execution state information of the target scheduling information to be cancelled after the execution is cancelled.
When a user needs to cancel execution of a DDL task, sending an instruction for determining to cancel execution to a main node, wherein the instruction comprises a task identifier of the DDL task to be canceled, the main node searches the task identifier in scheduling information, if the task identifier is searched, the sub-task needs to be canceled to be executed, the execution state information of the scheduling information is updated to be canceled to be executed, the execution node checks the execution state information in the scheduling information at regular time, if the execution state information is updated to be canceled to be executed, the execution of the corresponding sub-task is canceled, and the execution state information of target scheduling information is updated to be canceled to be executed after cancellation of execution.
On the basis of the foregoing embodiments, as an optional embodiment, the scheduling information further includes at least one of a subtask identifier of the subtask, a start value, an end value, a currently processed value of the subtask, a thread identifier of a callback thread executing the subtask, lease information, a processing line number, or error information.
Referring to fig. 2, a flowchart of a distributed processing method for a DDL task executed by an execution node according to an embodiment of the present application is exemplarily shown, where the flowchart includes:
s201, obtaining scheduling information of each subtask of the DDL task, wherein the scheduling information comprises execution state information, and the initial execution state information is not executed.
As can be seen from the foregoing embodiments, the subtasks in the embodiments of the present application are obtained by splitting a DDL task to be executed, which is obtained from a DDL task queue by a master node in a distributed database system, and the scheduling information of the subtasks is created by the master node.
S202, determining unexecuted target subtasks according to the scheduling information of each subtask.
The executing node may determine the target subtask by determining that the execution state information in the scheduling information is not executed.
S203, initiating the target subtask preemption, and executing the target subtask if the preemption is successful.
Since there are many nodes in the distributed database system and one sub-task can only be executed by one executing node, when multiple executing nodes find the same unexecuted target sub-task, the target sub-task needs to be preempted.
And S204, updating the scheduling information of the target subtask according to the condition of executing the target subtask.
When the executing node executes the target subtask, the executing state information in the scheduling information of the target subtask may be updated to be in execution, so that other nodes do not preempt the subtask. The currently processed values in the scheduling information may also be updated in real time.
According to the distributed processing method, the scheduling information of each subtask of the DDL task is obtained, the unexecuted target subtask is determined according to the scheduling information of each subtask, then the target subtask is preempted, if the target subtask is successfully preempted, the target subtask is executed, and the scheduling information of the target subtask is updated according to the situation of executing the target subtask.
On the basis of the foregoing embodiments, as an optional embodiment, the execution node includes at least a callback thread;
initiating to seize the target subtask, and if the target subtask is successfully seized, executing the target subtask, including:
and initiating the target subtask preemption by the callback thread with the idle state, if the preemption is successful, executing the target subtask by the callback thread, and updating the state of the callback thread to be busy.
Referring to fig. 3, which exemplarily shows an architecture diagram of distributed processing of DDL tasks according to an embodiment of the present application, as shown in the drawing, the architecture mainly includes a reorganization thread pool, a subtask schedule table, and a callback thread pool, where the reorganization thread pool includes a plurality of reorganization threads, each reorganization thread corresponds to a master node, the reorganization threads are used to split a DDL task into a plurality of subtasks, and create scheduling information of the subtasks to be inserted into the subtask schedule table, each subtask in the subtask schedule table in fig. 3 is represented by a circular pattern, two reorganization threads are shown in fig. 3, two reorganization threads are expressed, and are used to obtain different DDL tasks, and schedule information of each subtask corresponding to a DDL task obtained by the two reorganization threads is inserted into a subtask schedule table corresponding to each DDL task.
Each execution node corresponds to a callback thread pool, which includes at least one callback thread, each callback thread (represented by a triangle) for executing a subtask. Two callback thread pools are shown in fig. 3, and it can be found that the callback threads in each callback thread pool execute different subtasks of the DDL task, that is, each subtask of one DDL task is executed by two execution nodes in a distributed manner.
Based on the above embodiments, the idle callback thread will periodically check whether the subtask is still active (the callback thread needs to periodically update the lease information in the scheduling information), if the lease information is not updated for a period of time. The idle callback thread preempts the subtask, and the callback thread that originally executed the subtask exits because of the failure of the update when the next update of the executed subtask is performed.
On the basis of the above embodiments, executing the target subtask further includes:
if the execution state information of the target subtask is updated to be cancelled, the target subtask is stopped to be executed;
according to the situation of executing the target subtask, updating the scheduling information of the target subtask, including:
and updating the execution state information of the target subtask to be cancelled according to the stop of executing the target subtask.
Referring to fig. 4, a schematic flowchart of a distributed processing method for a DDL task according to an embodiment of the present application is exemplarily shown, and as shown in the drawing, the method includes:
1. acquiring a DDL task to be executed from a DDL task queue;
2. judging whether the DDL task needs to be started, if so, turning to the step 3; if not, turning to the step 5;
3. starting a DDL task;
4. dividing data to be backfilled into a certain number of subtasks, and creating scheduling information of the subtasks;
5. the idle callback thread of the execution node acquires the executable subtasks according to the scheduling information of each subtask and starts execution;
6. the execution node prepares an execution environment for the start of the subtask;
7. in the execution process of the call-back thread, regularly checking whether to cancel the DDL task execution, if so, turning to the step 11, and if not, turning to the step 8;
8. completing the processing of a batch of data;
9. updating the execution state information of the subtasks;
10. judging whether the subtasks are finished, if not, returning to the step 7, and if so, entering the step 14;
11. judging whether the task is a main subtask, if not, entering the step 14, and if so, entering the step 12;
12. judging whether all subtasks of the DDL task are finished, if so, entering the step 15, and if not, entering the step 13;
13. sleeping for a period of time, returning to step 5
14. After the subtask is finished, the callback thread returns to the step 5 to obtain the next subtask;
15. the DDL ends.
Compared with the prior art, the beneficial effects of the embodiment of the application comprise the following points:
1, the realization logic is simple, the expandability is strong, the universality is good, and the frame correctness is easy to ensure;
2, by decoupling parallel DDL tasks into parts executed by the main node and no-cooking executed by the execution node, the scheme can still efficiently execute parallel computation under a complex distributed environment
3, the method can adapt to the deployment of different database forms;
a) Single-instance database deployment;
b) Cluster single-master node, full-cluster distributed DDL task scheduling
c) Cluster distributed DDL task scheduling (multiple master nodes)
4, communication between task coordinators and executors of distributed parallel task execution is basically eliminated, so that parallel execution efficiency is higher;
5, the preemptive mode enables exception handling and fault recovery of the parallel execution framework to be simple and efficient, and the execution of the DDL task cannot be influenced even if a task coordinator fails in the execution process;
and 6, utilizing the characteristics of the table and the transaction to realize the isolation and the interaction synchronization between concurrent threads. The most sophisticated and reliable technical guarantees are provided (other data structures may also be employed).
An embodiment of the present application provides a master node in a distributed database system, and as shown in fig. 5, the master node may include: a task obtaining module 501, a task splitting module 502, and a scheduling information creating module 503, wherein,
a task obtaining module 501, configured to obtain a DDL task to be executed from a DDL task queue in a distributed database system;
a task splitting module 502, configured to split the DDL task into multiple subtasks;
a scheduling information creating module 503, configured to create scheduling information of each sub-task, where the scheduling information includes execution state information, and the initial execution state information is non-execution, so that at least one execution node in the nodes determines a non-execution target sub-task according to the scheduling information of each sub-task, and updates the scheduling information of the target sub-task according to a situation of executing the target sub-task.
The master node in the embodiments of the present application may execute the method provided in the embodiments of the present application, and the implementation principle is similar, the actions executed by the modules in the master node in the embodiments of the present application correspond to the steps in the method in the embodiments of the present application, and for the detailed functional description of the modules in the master node, reference may be made to the description in the corresponding method shown in the foregoing, and details are not repeated here.
An embodiment of the present application provides an execution node in a distributed database system, and as shown in fig. 6, the execution node may include: a scheduling information obtaining module 601, a subtask determining module 602, a subtask executing module 603, and a scheduling information updating module 604, which are specific
A scheduling information obtaining module 601, configured to obtain scheduling information of each subtask of the DDL task, where the scheduling information includes execution state information, and the initial execution state information is unexecuted;
a subtask determining module 602, configured to determine an unexecuted target subtask according to scheduling information of each subtask;
a subtask execution module 603, configured to initiate preemption of the target subtask, and if preemption is successful, execute the target subtask;
a scheduling information updating module 604, configured to update the scheduling information of the target subtask according to a situation of executing the target subtask;
the subtasks are obtained by splitting a DDL task to be executed which is obtained by a main node in a distributed database system from a DDL task queue, and the scheduling information of the subtasks is created by the main node.
The execution node in the embodiments of the present application may execute the method provided in the embodiments of the present application, and the implementation principle is similar, the actions executed by the modules in the execution node in the embodiments of the present application correspond to the steps in the method in the embodiments of the present application, and for the detailed functional description of the modules in the execution node, reference may be specifically made to the description in the corresponding method shown in the foregoing, and details are not repeated here.
The embodiment of the application provides a distributed database system, which comprises at least one main node and at least one execution node.
The embodiment of the application provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of the distributed processing method of the DDL task, and compared with the related art, the method can realize the following steps: obtaining a DDL task to be executed from a DDL task queue; the DDL task is divided into a plurality of subtasks, scheduling information of each subtask is created, the scheduling information comprises execution state information, the initial execution state information is not executed, the scheduling information is disclosed to all nodes in a distributed database system, so that at least one execution node in each node determines an unexecuted target subtask according to the scheduling information of each subtask, and the scheduling information of the target subtask is updated according to the condition of executing the target subtask.
In an alternative embodiment, an electronic device is provided, as shown in fig. 7, an electronic device 4000 shown in fig. 7 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 4001 may also be a combination that performs a computing function, e.g., comprising one or more microprocessors, a combination of DSPs and microprocessors, etc.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, and is not limited herein.
The memory 4003 is used for storing computer programs for executing the embodiments of the present application, and is controlled by the processor 4001 to execute. The processor 4001 is used to execute computer programs stored in the memory 4003 to implement the steps shown in the foregoing method embodiments.
Embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program may implement the steps and corresponding contents of the foregoing method embodiments.
Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments may be implemented.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and claims of this application and in the preceding drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than illustrated or otherwise described herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as needed, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times, respectively. In a scenario where execution times are different, an execution sequence of the sub-steps or the phases may be flexibly configured according to requirements, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in this application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of this application are also within the protection scope of the embodiments of this application without departing from the technical idea of this application.

Claims (13)

1. A distributed processing method of a Data Definition Language (DDL) task is applied to at least one main node in each node of a distributed database system, and the method comprises the following steps:
obtaining a DDL task to be executed from a DDL task queue in a distributed database system;
splitting the DDL task into a plurality of subtasks;
and creating scheduling information of each subtask, wherein the scheduling information comprises execution state information, the initial execution state information is unexecuted, so that at least one execution node in each node determines an unexecuted target subtask according to the scheduling information of each subtask, and the scheduling information of the target subtask is updated according to the condition of executing the target subtask.
2. The method of claim 1, wherein the splitting the DDL task into a plurality of subtasks comprises:
splitting the DDL task in multiple rounds;
the number of the sub tasks of each round of splitting is related to the number of the callback threads of each node in the current round, and after each round of splitting, whether the number of the callback threads of each node in the next round of adjusting is shown or not is determined according to the part of the DDL task which is not split;
each callback thread is used to execute a subtask.
3. The method of claim 1, wherein the updated execution state information includes completed;
the method further comprises the following steps:
and determining that the execution of the DDL task is finished according to the fact that the execution state information corresponding to all subtasks of the DDL task is finished.
4. The method according to claim 1, wherein the scheduling information further comprises a task identifier of the DDL task to which the subtask belongs;
the method further comprises the following steps:
determining a task identifier of a DDL task to be cancelled;
determining target scheduling information comprising a task identifier of the execution-cancelled DDL task;
and updating the execution state information of the target scheduling information to be cancelled, so that the execution node executing the corresponding subtask is updated to be cancelled according to the execution state information of the target scheduling information, cancels the execution of the corresponding subtask, and updates the execution state information of the target scheduling information to be cancelled after the execution is cancelled.
5. The method of any of claims 1-4, wherein the scheduling information further comprises at least one of a subtask identification of a subtask, a start value of a subtask, an end value, a value currently being processed, a thread identification of a callback thread executing the subtask, lease information, a number of processing lines, or error information.
6. A distributed processing method of a Data Definition Language (DDL) task, which is applied to at least one execution node in nodes of a distributed database system, and comprises the following steps:
acquiring scheduling information of each subtask of the DDL task, wherein the scheduling information comprises execution state information, and the initial execution state information is not executed;
determining unexecuted target subtasks according to the scheduling information of each subtask;
initiating to seize the target subtask, and if the target subtask is successfully seized, executing the target subtask;
updating the scheduling information of the target subtask according to the condition of executing the target subtask;
the subtasks are obtained by splitting a DDL task to be executed which is obtained by a main node in a distributed database system from a DDL task queue, and the scheduling information of the subtasks is created by the main node.
7. The method of claim 6, wherein the execution node comprises at least one callback thread;
the initiating and preempting the target subtask, and if the preempting is successful, executing the target subtask includes:
and initiating the target subtask preemption by the callback thread with the idle state, if the preemption is successful, executing the target subtask by the callback thread, and updating the state of the callback thread to be busy.
8. The method of claim 6, wherein the executing the target subtask further comprises:
if the execution state information of the target subtask is updated to be cancelled, stopping executing the target subtask;
the updating the scheduling information of the target subtask according to the condition of executing the target subtask includes:
and updating the execution state information of the target subtask to be cancelled according to the stop of executing the target subtask.
9. A master node in a distributed database system, comprising:
the task acquisition module is used for acquiring DDL tasks to be executed from a DDL task queue in the distributed database system;
the task splitting module is used for splitting the DDL task into a plurality of subtasks;
and the scheduling information creating module is used for creating scheduling information of each subtask, wherein the scheduling information comprises execution state information, the initial execution state information is unexecuted, so that at least one execution node in each node determines an unexecuted target subtask according to the scheduling information of each subtask, and the scheduling information of the target subtask is updated according to the condition of executing the target subtask.
10. An execution node in a distributed database system, comprising:
the device comprises a scheduling information acquisition module, a task scheduling module and a task scheduling module, wherein the scheduling information acquisition module is used for acquiring scheduling information of each subtask of a DDL (distributed document language) task, the scheduling information comprises execution state information, and the initial execution state information is unexecuted;
the subtask determining module is used for determining unexecuted target subtasks according to the scheduling information of each subtask;
the subtask execution module is used for initiating and seizing the target subtask, and if the target subtask is successfully seized, the target subtask is executed;
the scheduling information updating module is used for updating the scheduling information of the target subtask according to the condition of executing the target subtask;
the subtasks are obtained by splitting a DDL task to be executed which is obtained by a main node in a distributed database system from a DDL task queue, and the scheduling information of the subtasks is created by the main node.
11. A distributed database system comprising at least one master node according to claim 9 and at least one executing node according to claim 10.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the steps of the distributed processing method of DDL tasks according to any of the claims 1-8.
13. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the steps of a method for distributed processing of DDL tasks according to any of the claims 1-8.
CN202211269138.2A 2022-10-17 2022-10-17 Distributed processing method, node and distributed database system of DDL task Pending CN115454613A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211269138.2A CN115454613A (en) 2022-10-17 2022-10-17 Distributed processing method, node and distributed database system of DDL task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211269138.2A CN115454613A (en) 2022-10-17 2022-10-17 Distributed processing method, node and distributed database system of DDL task

Publications (1)

Publication Number Publication Date
CN115454613A true CN115454613A (en) 2022-12-09

Family

ID=84310945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211269138.2A Pending CN115454613A (en) 2022-10-17 2022-10-17 Distributed processing method, node and distributed database system of DDL task

Country Status (1)

Country Link
CN (1) CN115454613A (en)

Similar Documents

Publication Publication Date Title
US10521268B2 (en) Job scheduling method, device, and distributed system
US9619430B2 (en) Active non-volatile memory post-processing
US7760743B2 (en) Effective high availability cluster management and effective state propagation for failure recovery in high availability clusters
US9086911B2 (en) Multiprocessing transaction recovery manager
US20160378785A1 (en) Distributed work flow using database replication
US8392920B2 (en) Parallel query engine with dynamic number of workers
US10942824B2 (en) Programming model and framework for providing resilient parallel tasks
US11176086B2 (en) Parallel copying database transaction processing
US10282457B1 (en) Distributed transactions across multiple consensus groups
US9400767B2 (en) Subgraph-based distributed graph processing
WO2011137672A1 (en) Method and device for task execution based on database
CN109063005B (en) Data migration method and system, storage medium and electronic device
CN110704112B (en) Method and apparatus for concurrently executing transactions in a blockchain
US11392414B2 (en) Cooperation-based node management protocol
CN111258985A (en) Data cluster migration method and device
US20140282625A1 (en) Asynchronous programming model for concurrent workflow scenarios
CN115454613A (en) Distributed processing method, node and distributed database system of DDL task
CN115687378A (en) DDL task parallel processing method, computing node and electronic equipment
CN113342499B (en) Distributed task calling method, device, equipment, storage medium and program product
CN115438025A (en) Data processing method and device
CN107608662B (en) MongoDB-based distributed timing system
US20200310870A1 (en) Method for managing a plurality of tasks by a multicore motor vehicle processor
CN113032131B (en) Redis-based distributed timing scheduling system and method
CN112991061B (en) Method and apparatus for concurrently executing transactions in blockchain
JPH11353284A (en) Job re-executing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination