CN115687378A - DDL task parallel processing method, computing node and electronic equipment - Google Patents

DDL task parallel processing method, computing node and electronic equipment Download PDF

Info

Publication number
CN115687378A
CN115687378A CN202211268650.5A CN202211268650A CN115687378A CN 115687378 A CN115687378 A CN 115687378A CN 202211268650 A CN202211268650 A CN 202211268650A CN 115687378 A CN115687378 A CN 115687378A
Authority
CN
China
Prior art keywords
task
ddl
target
job
parallel processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211268650.5A
Other languages
Chinese (zh)
Inventor
黄文俊
李霞
黄潇
刘奇
黄东旭
崔秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pingkai Star Beijing Technology Co ltd
Original Assignee
Pingkai Star Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pingkai Star Beijing Technology Co ltd filed Critical Pingkai Star Beijing Technology Co ltd
Priority to CN202211268650.5A priority Critical patent/CN115687378A/en
Publication of CN115687378A publication Critical patent/CN115687378A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a parallel processing method of DDL tasks, a computing node and electronic equipment, and relates to the field of databases. The method comprises the following steps: acquiring a job task table, wherein the job task table is used for recording relevant information of the unprocessed DDL tasks, and the relevant information comprises the sequence of the DDL tasks entering a database system and a data mode of a change object corresponding to the DDL tasks; determining a first DDL task which is being executed, and determining a second DDL task which accords with a predetermined task parallel processing rule from a job task table; and processing the target second DDL task and the first DDL task in parallel. The embodiment of the application has simple implementation logic and strong expandability, can adapt to the deployment of different data block forms, and does not have the limitation that the concurrent DDL scheduling effect is achieved by depending on the coordination execution sequence similar to a metadata lock mode in the prior art.

Description

DDL task parallel processing method, computing node and electronic equipment
Technical Field
The application relates to the technical field of databases, in particular to a parallel processing method of DDL tasks, a computing node, electronic equipment and a computer readable storage medium.
Background
At present, a distributed scheduling system of Data Definition Language (DDL) statements is mainly implemented in the following centralized manner:
1, taking traditional databases such as Oracle, mySQL and the like as examples. Generally, a Meta data Lock system is implemented, and the execution sequence of various DDL statements is coordinated in a manner of locking an object which needs to be subjected to DDL change. Achieving the concurrent effect.
2, the distributed database system Ocean Base, TDSQL database also achieves the effect of concurrent DDL statement scheduling by implementing a similar MDL lock in the distributed system.
How to ensure that when a plurality of DDL workers (also called task execution units) execute a DDL task concurrently, inconsistency of object definitions or data stored in objects in a user database is not caused is a problem to be solved.
Disclosure of Invention
Embodiments of the present application provide a parallel processing method for DDL tasks, a computing node, an electronic device, and a computer-readable storage medium, which can solve the above problems in the prior art. The technical scheme is as follows:
according to an aspect of the embodiments of the present application, there is provided a parallel processing method for data definition language DDL tasks, which is executed by a target computing node in a database system, the method including:
acquiring an operation task table, wherein the operation task table is used for recording relevant information of an unprocessed DDL task, and the relevant information comprises the sequence of the DDL task entering a database system and a data mode of a change object corresponding to the DDL task;
determining a first DDL task which is being executed, and determining a second DDL task which accords with a predetermined task parallel processing rule from the job task table, wherein the task parallel processing rule is related to the sequence of the DDL task entering a database system and the data mode of a change object corresponding to the DDL task;
and processing the target second DDL task and the first DDL task in parallel.
According to another aspect of an embodiment of the present application, there is provided a target computing node in a database system, the node including:
the system comprises an operation task table module, a database system and a task execution module, wherein the operation task table module is used for obtaining an operation task table, the operation task table is used for recording relevant information of the DDL tasks which are not processed and completed, and the relevant information comprises the sequence of the DDL tasks entering the database system and the data mode of a change object corresponding to the DDL tasks;
the task determining module is used for determining a first DDL task which is being executed, and determining a second DDL task which accords with a predetermined task parallel processing rule from the job task table, wherein the task parallel processing rule is related to the sequence of the DDL task entering the database system and the data mode of a change object corresponding to the DDL task;
and the parallel processing module is used for processing the second DDL task and the first DDL task in parallel.
According to another aspect of embodiments of the present application, there is provided an electronic device, which includes a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to implement the steps of the above-mentioned parallel processing method for DDL tasks.
According to still another aspect of embodiments of the present application, there is provided a computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of the above-described parallel processing method for DDL tasks.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
by acquiring a task table, wherein the task table records related information of at least one DDL task, and based on a task parallel processing rule, a second DDL task which can be processed in parallel with a first DDL task in processing is determined from the task table, and the second DDL task and the first DDL task are processed in parallel, so that the method has the advantages of simple logic, strong expandability, adaptability to deployment of different data block forms, and no limitation that the DDL tasks can only be selected from a queue according to sequencing in the prior art.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic diagram illustrating an execution flow of a DDL task provided in the related art;
fig. 2 is a schematic flowchart of a parallel processing method for DDL tasks according to an embodiment of the present application;
fig. 3 is a tree structure formed by three levels of modification objects according to an embodiment of the present application;
fig. 4 is a flowchart illustrating a processing method of a DDL task according to another embodiment of the present application;
FIG. 5 is a flowchart illustrating a DDL task processing method according to yet another embodiment of the present application;
fig. 6 is a schematic structural diagram of a target computing node according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The solution adopted for realizing the conventional DDL statement at present is that all DDL tasks in a distributed cluster are registered in a queue according to the sequence of entering the cluster, and therefore the DDL tasks need to be registered in the queue and are mainly used for the purpose of fault recovery. These tasks are acquired and executed by the task execution unit executing the DDL task. As shown in fig. 1, the computing nodes may all receive a DDL change statement from a client, and then store the statement into a DDL task queue, and the task execution unit takes one DDL task from the DDL task queue to execute the DDL task at a time. Often the database will use MDL locks to ensure correct execution between DDL statements.
The present application provides a parallel processing of DDL tasks, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product, which are intended to solve the above technical problems in the prior art.
The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below through descriptions of several exemplary embodiments. It should be noted that the following embodiments may be referred to, referred to or combined with each other, and the description of the same terms, similar features, similar implementation steps, etc. in different embodiments is not repeated.
An embodiment of the present application provides a parallel processing method for DDL tasks, and as shown in fig. 2, the method includes:
s101, obtaining a job task table, wherein the job task table is used for recording relevant information of an unprocessed DDL task, and the relevant information comprises a sequence of the DDL task entering a database system and a data mode schema where a change object corresponding to the DDL task is located.
After the client sends the DDL statement to the database system, at least one DDL task can be obtained according to the DDL statement.
The relevant information of the DDL task in the embodiment of the present application may include a task identifier of the DDL task, meta information of the DDL task (that is, key information for executing the DDL task, which may ensure correct execution and exception handling of the DDL task), a computing node executed by the DDL task, and the like, which is not limited in the embodiment of the present application.
In one embodiment, the order in which the DDL tasks enter the database system is characterized by the task identifiers, for example, the task identifier of the DDL task entering the database system first is 1, the task identifier of the DDL task entering the database system second is 2, and so on.
S102, determining a first DDL task which is being executed, and determining a second DDL task which accords with a predetermined task parallel processing rule from the job task table, wherein the task parallel processing rule is related to the sequence of the DDL task entering a database system and the data mode of a change object corresponding to the DDL task;
according to the method and the device, the relevant information of the DDL tasks to be processed is recorded in the job task table, and the limitation that the DDL tasks can only be selected from the queue in an ordering mode in the prior art does not exist when the second DDL task currently executed is determined.
The DDL task concurrency rules of the embodiments of the present application include, but are not limited to, the following:
1) DDL tasks corresponding to DDL statements on the same data table need to be executed according to the sequence of the DDL tasks entering a database system;
2) DDL tasks corresponding to DDL statements on different data tables can be executed concurrently;
3) The DDL tasks corresponding to the changed objects at different levels in the data also need to be executed according to the order of the levels.
Referring to fig. 3, a tree structure formed by three levels of change objects according to an embodiment of the present application is exemplarily shown, where a root node of the tree structure is a change object of a first level: DB (schema), the second level alteration object including: function, stored procedure, table, view, and the third level of change objects for Table include: index, column, tiger, and constraint.
In one embodiment, the SQL statement-based implementation of the task parallel processing rule has the advantages of being lock-free and light-weight, and not affecting the execution of the DML and other DDL tasks.
The task parallel processing rule realized based on the SQL statement in the embodiment of the application comprises the following steps:
statement 1: select min (jobid) from tidb _ ddl _ jobwherejobjd in and exec owner =0;
the meaning of statement 1 is: and if the current second DDL task is found, if the current second DDL task is a physical DDL task, after the second DDL task is selected to be executed, the related information of the second DDL task needs to be recorded in a reorganization information table. exec _ owner =0 indicates that the DDL task has not been executed.
Statement 2: select from tidb _ ddl _ jobword is _ drop _ schema and schema _ id = { job.schema id } and job _ id < { job.id } limit 1;
the meaning of statement 2 is: and if the object is the change object of the second level or the third level, judging whether a DDL task corresponding to the change object of the higher level is executed.
Statement 3: select from mysql, tidb, ddl, jobword schema id (= { jobschemaid } and jobid [ { jobid } limit 1);
the meaning of statement 3 is that if the change object is the first level, it is determined whether there is execution of the DDL task corresponding to the change object of the lower level.
It should be noted that, the 3 SQL statements are implemented for the TiDB open-source distributed relational database, and for other databases, if corresponding implementation is performed, the SQL statements may be changed and have unchanged essence, which is not described in detail in this embodiment of the present application.
S103, the second DDL task and the first DDL task are processed in parallel.
After the second DDL task is determined, the second DDL task and the executing first DDL task can be processed in parallel. The number of the first DDL tasks in the embodiment of the present application is not particularly limited, and may be, for example, one or more.
According to the method and the device, the job task table is obtained, the related information of at least one unprocessed DDL task is recorded in the job task table, the second DDL task which can be processed in parallel with the first DDL task in the process is determined from the job task table based on the task parallel processing rule, and the second DDL task and the first DDL task are processed in parallel, so that the method and the device are simple in logic, high in expandability and capable of adapting to deployment of different data block forms, and the limitation that DDL tasks can only be selected from a queue according to the sequence in the prior art does not exist.
On the basis of the foregoing embodiments, as an optional embodiment, the related information includes a computing node identifier for indicating a computing node currently executing the corresponding DDL task, and the job task table includes a first field for recording the computing node identifier.
Specifically, the embodiment of the present application may set an exec owner field (i.e., a first field) in the job task table, where the computing node identifier of the computing node currently executing the corresponding DDL task is recorded in the field, and it should be understood that when the DDL task is not executed, the first field may be empty or another flag (e.g., 0) indicating that the DDL task is not executed. If the DDL task was executed but not successfully executed, this field would still record the compute node identification of the compute node executing the DDL task.
Processing the second DDL task in parallel with the first DDL task, including:
s201, if the first execution of the second DDL task is determined according to a first field of the second DDL task in the job task table, recording a target computing node identifier of a target computing node in the first field;
s202, calling a task execution unit in the database system to execute the second DDL task, wherein the type of the task execution unit comprises at least one of a process, a thread and a coroutine.
When the second DDL task is determined to be executed for the first time, the second DDL task needs to be initialized, the target computing node identification of the target computing node is recorded in the first field, and subsequently, if the second DDL task is not executed successfully, rollback is facilitated.
After the target computing node identifier is recorded, the task execution unit can be called to execute the second DDL task in the embodiment of the application. It should be noted that, in the embodiment of the present application, the computing node has a plurality of task execution units, and an idle task execution unit may be selected from the plurality of task execution units to execute the second DDL task. The plurality of task execution units may execute the DDL task in parallel. In some embodiments, a DDL task may be split into multiple subtasks, with each task execution unit being configured to execute a subtask.
On the basis of the foregoing embodiments, as an optional embodiment, the related information includes an object identifier of a change object corresponding to the DDL task and meta information of the DDL task.
Specifically, in the embodiment of the present application, a table _ id field (i.e., a second field) may be set in the job task table to record an object identifier of a change object corresponding to the DDL task, a schema _ id field (i.e., a third field) may be set to record a schema identifier of a schema in which the change object is located, and a jobmeta field (i.e., a fourth field) may be set to record meta information of the DDL task.
Calling a task execution unit in the database system to execute a second DDL task, wherein the second DDL task comprises:
s301, calling the task execution unit to read the second field to the fourth field of the second DDL task in the job task table, and obtaining a target object identifier of a target change object corresponding to the second DDL task, a target mode identifier of a target data mode where the target change object is located, and target meta information of the second DDL task;
s302, determining the target change object from the target data mode of the database system according to the target object identifier and the target mode identifier;
s303, executing the second DDL task on the target change object according to the target meta information.
The task execution unit in this embodiment may obtain, by reading second to fourth fields of a second DDL task in the job task table, a destination object identifier of a destination change object corresponding to the second DDL task, a destination mode identifier of a destination schema in which the destination change object is located, and destination meta information of the second DDL task, may determine, based on the destination object identifier and the destination mode identifier, the destination change object from the database system, and finally execute the second DDL task on the destination change object based on the destination meta information, so that the destination change object may be accurately determined and the task processing may be performed.
On the basis of the foregoing embodiments, as an optional embodiment, the parallel processing of the second DLL task further includes:
s401, if the second DDL task is determined not to be executed for the first time according to the first field of the second DDL task in the job task table, judging whether to recover the execution of the second DDL task;
s402, if the execution of the second DDL task is determined to be recovered, the second DDL task is executed, and if the execution of the second DDL task is determined not to be recovered, the data generated by the last execution of the second DDL task is rolled back.
It should be noted that, when the computing node identifiers of the computing nodes other than the target computing node are recorded in the first field, which indicates that the second DDL task is not executed for the first time and the last execution fails, it needs to be determined whether to resume the execution of the second DDL task (if the DDL task is executed completely, it is filed in the history task table, and if the DDL task is still in the job task list, it indicates that the execution still needs to be continued).
If it is determined that the execution of the second DDL task is resumed, the second DDL task is executed, and specifically, in the embodiment of the present application, the last execution progress of the second DDL task may be recorded in the job task table, so that the second DDL task is continuously executed according to the last execution progress. If it is determined that the execution of the second DDL task is not resumed, the data generated by the last execution of the second DDL task is rolled back, and it can be understood that the second DDL task is not executed any more after the rolling back.
On the basis of the foregoing embodiments, as an optional embodiment, the related information includes a type identifier for indicating whether the corresponding DDL task is a logical DDL task, and the job task table includes a fifth field for recording the type identifier;
in general, DDL tasks can be divided into logical DDLs and physical DDLs, and a logical DDL only needs to modify the definition of an object (data table, index, column, etc.) in a database (e.g., modify a table name). Physical DDL usually has a re-organizing process, which usually needs to scan the complete data of the data table once to synchronize the data on the data table to the newly added or modified object, so as to ensure the consistency of the data.
The job task table includes a reorg field (i.e., a fifth field) for recording the type identifier.
Processing the second DDL task in parallel with the first DDL task, including:
s501, if the second DDL task is determined not to be a logic DDL task according to a fifth field of the second DDL task in the job task table, splitting the second DDL task into a plurality of subtasks;
and S502, calling task execution units with corresponding quantity to execute the subtasks in parallel according to the quantity of the subtasks.
Specifically, in the embodiment of the present application, if it is determined that the second DDL task is a logical DDL task according to the fifth field of the second DDL task, the second DDL task is not split, that is, only one task execution unit is called to execute the second DDL task. If the second DDL task is not a logical DDL task, that is, a physical DDL task, the second DDL task is split into a plurality of sub-tasks, and the sub-tasks are executed in parallel by a corresponding number of task execution units.
On the basis of the foregoing embodiments, as an optional embodiment, invoking a corresponding number of task execution units to execute a sub-task further includes:
and creating a reorganization information table of the second DDL task.
Recording the meta-information of the subtasks in the reorganization information table;
the calling of the task execution units of the corresponding number to execute the subtasks includes:
and calling the task execution unit, and executing the corresponding subtask according to the meta-information of the corresponding subtask in the reorganization information table.
Specifically, the meta-information of the subtask includes at least one of a data position where the subtask starts scanning and ends scanning, a node identifier of the target computing node, and an execution state of the subtask.
The reorganization information table of the embodiment of the application comprises the following fields:
job id used for recording task identification of DDL task;
and the reorg _ obj _ id is used for recording the object identification of the change object needing to be recombined, such as the identification of the index in the add index.
physical _ id, id requiring scan table;
and the dist para is used for recording the meta information of each subtask to perform parallel scheduling.
In some embodiments, the dist para field includes:
exec _ dist for indicating whether to distributively execute the DDL task in the whole cluster;
is _ cancelled, which is used to indicate whether to cancel execution of the DDL task;
curr _ reorg _ type, used to represent the task type of reorg;
reorg sub tasks, further comprising:
a sub-scan start point used for recording the data position of the start scanning of the subtask;
the sub scan end point is used for recording the data position of the end scanning of the subtask;
the sub scan executer is used for recording the computing nodes of the subtasks;
and the sub task status is used for recording the execution state of the sub task.
In an alternative embodiment, the task parallel processing rule is implemented based on a Structured Query Language (SQL) statement;
determining a second DDL task which accords with a predetermined task parallel processing rule from the job task table, wherein the step of determining the second DDL task comprises the following steps:
and selecting a reference DDL task from the operation task table, writing a task identifier and a mode identifier of the reference DDL task into the task parallel processing rule, and taking the reference DDL task as the target DDL task if the task parallel processing rule is determined to meet a preset condition.
On the basis of the foregoing embodiments, as an optional embodiment, the processing the second DDL task in parallel with the first DDL task, and then further includes:
and after the second DDL task is processed, the related information of the second DDL task is moved from the job task table to a task history table, wherein the task history table comprises fields in the job task table.
Referring to fig. 4, a schematic flowchart of a processing method of a DDL task according to another embodiment of the present application is exemplarily shown, and as shown, the processing method includes:
step S601, the target computing node starts to operate;
step S602, the target computing node selects a second DDL task from the job task table according to the task parallel processing rule;
step S603, the task execution node is started to concurrently execute the logic of the step S603-608, whether the second DDL task is started for the first time is judged, if not, the step S604 is executed, and if yes, the step S605 is executed;
step S604, initializing a second DDL task, updating a job task table, and then entering step S610;
step S605, judging whether the second DDL task is recovered, if so, entering step S606, otherwise, entering step S607;
step S606, restoring the execution progress of the second DDL task stored before, continuing to execute, and entering step 608 after the second DDL task is completed or an abnormal condition occurs;
step S607, rolling back the data generated by executing the second DDL task last time;
step S608, ending the execution of the second DDL task, and moving the related information of the second DDL task from the job task table to a task history table, wherein the task history table comprises fields in the job task table;
step S609, the computing node initiates a background task in parallel, continuously selects executable DDL tasks, if the DDL tasks exist, the step S602 is executed, and if the DDL tasks do not exist, the step S610 is executed;
step S610, judging whether the operation of the computing node is finished, if not, entering step S611, and if so, entering step S612;
step S611, sleep for a period, for example, 1 second, and then proceed to step SS609;
step S612, the flow ends.
Referring to fig. 5, a schematic flowchart of a processing method of a DDL task according to another embodiment of the present application is exemplarily shown, and as shown in the drawing, the processing method includes:
s701, each physical DDL task has a main subtask, and the first subtask is used as the main subtask;
s702, judging whether a DDL task needs to be started, if so, entering a step S703; if not, go to step S706;
s703, starting a DDL task;
s704, reading a prefer _ rule field of a DDL task in the job task table, wherein the prefer _ rule sets a parallel distribution strategy when the DDL task is executed in a cluster (for example, the number of parallel different task execution nodes is set according to the calculation resources of different calculation stages, or different subtasks are set on different calculation nodes according to the data positions).
S705, dividing the data processing range of the subtasks according to the number of the subtasks generated by the DDL task. If the data needing to be backfilled is less, the number of subtasks is correspondingly reduced. Initializing disk _ para in the reorganization information table, initializing disk _ para for each computing node, and setting exec _ owner.
S706, each computing node acquires the current active executable DDL task according to the task parallel processing rule, checks whether the subtask needing to be executed by the computing node exists in the subtask, and if so, restores the context of the subtask;
s707, executing the subtasks, wherein the computing node starts the corresponding task executing node to execute the DDL task according to the self condition;
s708, each computing node starts a task execution node, whether execution of the DDL task is cancelled or not is checked, if yes, the step S712 is carried out, and if not, the step S709 is carried out;
s709, finishing data processing;
s710, updating the subtask state in the reorganization information table, wherein an optimistic transaction mode can be adopted, and because other subtasks update the same DDL task, the updating subtasks may fail, so that only the latest DDL task needs to be obtained again, only the subtasks related to the task are updated, and the task is submitted again until the updating is successful; a pessimistic transaction mode can also be adopted, and when the subtask state needs to be submitted each time, the latest DDL task is obtained first, the record is locked, and the transaction is submitted after the update is completed.
S711, judging whether the subtasks are completed, if not, entering step S708, and if so, entering step S715;
s712, judging whether the task is a main subtask, if not, going to step S715, and if so, going to step S713;
s713, judging whether all the subtasks are finished. Specifically, whether the subtasks are all finished is judged by judging the status field of each subtask. If not, go to step S714, if yes, go to step S716;
s714, sleeping for a period of time;
and S715, ending the subtasks.
And S716, finishing the DDL task.
It should be noted that, when it is necessary to cancel the execution of the DDL task, only the is _ cancelled field needs to be set to true. When all tasks update the subtask state, the task record needs to be read first, and whether the task is cancelled can be determined by acquiring the field. Including abnormal exit of a certain subtask, the DDL task can also be stopped by setting the field, and error information is returned through the status of the subtask.
And for the abnormal condition of the Crash restart of the computing node, each task execution node of the computing node selects an inactive DDL task, checks subtasks executed by the task execution node, and restores the subtasks. The computing nodes can be numbered, that is, even if the computing nodes are restarted on other machines, as long as the numbers are not changed, the computing nodes can still obtain the subtasks according to the numbers, and then recover the corresponding subtasks to continue executing.
And for the condition that the computing node cannot be started for a long time after Crash, adding a perception for computing node change in the cluster by the computing node, modifying exec _ winner of the subtask in the recombined information table into other nodes or 0 (all active nodes can obtain the modified DDL task and recover execution) in the cluster when the computing node cannot be added into the node again in a period of time, and waiting for the task execution nodes of other computing nodes to recover execution.
The application has the following technical effects:
1, the scheme has simple implementation logic and strong expandability
2, the scheme can fully utilize the overall resources of the cluster, automatically carry out the load balance of the tasks and have strong online expansibility
3, the method can adapt to the deployment of different database forms;
a) Deploying a single-instance database;
b) Cluster single main node, full cluster distributed DDL task scheduling;
c) Cluster distributed DDL task scheduling (multi-master nodes);
4, the method has particularly good performance for fault recovery in a distributed system, and does not have the complexity problem of distributed deadlock detection as in a lock implementation scheme or the performance and fault single point problem of single-point lock central control.
And 5, the fault recovery is simple and efficient, and the DDL task scheduling can be recovered to be normal under the condition of basically no extra operation.
6, it is easy to implement specific distributed scheduling rules in the present model, and schedule specific DDL tasks to be executed on specific computing nodes, for example, by conditions of data location, multi-tenant available computing nodes, and so on.
An embodiment of the present application provides a target computing node in a database system, and as shown in fig. 6, the node may include: a job task table module 601, a task determination module 602, and a parallel processing module 603, wherein,
the task list module 601 is configured to obtain a task list, where the task list is configured to record relevant information of an unprocessed DDL task, where the relevant information includes an order in which the DDL task enters a database system and a data pattern in which a change object corresponding to the DDL task is located;
a task determining module 602, configured to determine a first DDL task being executed, and determine, from the job task table, a second DDL task that meets a predetermined task parallel processing rule, where the task parallel processing rule is related to an order in which the DDL task enters the database system and a data pattern in which a change object corresponding to the DDL task is located;
a parallel processing module 603 configured to process the second DDL task in parallel with the first DDL task
The node of the embodiment of the present application may execute the method provided by the embodiment of the present application, and the implementation principle is similar, the actions executed by the modules in the node of the embodiments of the present application correspond to the steps in the method of the embodiments of the present application, and for the detailed functional description of the modules of the node, reference may be specifically made to the description in the corresponding method shown in the foregoing, and details are not repeated here.
The embodiment of the application provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of the parallel processing method of the DDL task, and compared with the related technology, the method can realize the following steps: by acquiring a task operation task table, the task operation task table records related information of at least one DDL task, a task parallel processing rule is realized based on SQL statements, a second DDL task which can be processed in parallel with other DDL tasks in the process is determined from the task operation table, and the second DDL task and the first DDL task are processed in parallel, so that the method has the advantages of simple logic realization, strong expandability, adaptability to the deployment of different data block forms, and no limitation that the DDL tasks can only be selected from a queue according to the sequence in the prior art.
In an alternative embodiment, an electronic device is provided, as shown in fig. 7, an electronic device 4000 shown in fig. 7 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 4001 may also be a combination that performs a computing function, e.g., comprising one or more microprocessors, a combination of DSPs and microprocessors, etc.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, without limitation.
The memory 4003 is used for storing computer programs for executing the embodiments of the present application, and execution is controlled by the processor 4001. The processor 4001 is used to execute computer programs stored in the memory 4003 to implement the steps shown in the foregoing method embodiments.
Embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program may implement the steps and corresponding contents of the foregoing method embodiments.
Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than illustrated or otherwise described herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as desired, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times, respectively. In a scenario where execution times are different, an execution sequence of the sub-steps or the phases may be flexibly configured according to requirements, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in the present application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of the present application are also within the protection scope of the embodiments of the present application without departing from the technical idea of the present application.

Claims (13)

1. A method for parallel processing of data definition language, DDL, tasks, performed by a target compute node in a database system, the method comprising:
acquiring a job task table, wherein the job task table is used for recording relevant information of an unprocessed DDL task, and the relevant information comprises the sequence of the DDL task entering a database system and a data mode of a change object corresponding to the DDL task;
determining a first DDL task which is being executed, and determining a second DDL task which accords with a predetermined task parallel processing rule from the job task table, wherein the task parallel processing rule is related to the sequence of the DDL task entering a database system and the data mode of a change object corresponding to the DDL task;
and processing the target second DDL task and the first DDL task in parallel.
2. The method of claim 1, wherein the related information comprises a compute node identifier indicating a compute node currently executing a corresponding DDL task, and wherein the job task table comprises a first field for recording the compute node identifier;
the processing the second DDL task in parallel with the first DDL task comprises:
if the second DDL task is determined to be executed for the first time according to the first field of the second DDL task in the job task table, recording the target computing node identification of the target computing node in the first field;
and calling a task execution unit in the database system to execute the second DDL task, wherein the type of the task execution unit comprises at least one of a process, a thread and a coroutine.
3. The method according to claim 2, wherein the related information includes object identification of a change object corresponding to the DDL task and meta information of the DDL task;
the job task table comprises a second field for recording the object identifier, a third field for recording the mode identifier of the data mode in which the changed object corresponding to the DDL task is positioned, and a fourth field for recording the meta information;
calling a task execution unit in the database system to execute the second DDL task, wherein the step of calling the task execution unit in the database system to execute the second DDL task comprises the following steps:
calling the task execution unit to read the second field to the fourth field of the second DDL task in the job task table, and obtaining a target object identifier of a target change object corresponding to the second DDL task, a target mode identifier of a target data mode in which the target change object is located, and target meta-information of the second DDL task;
determining the target change object from the target data mode of the database system according to the target object identifier and the target mode identifier;
and executing the second DDL task on the target change object according to the target meta-information.
4. The method of claim 2, wherein the parallel processing of the second DLL task further comprises:
if the second DDL task is determined to be executed for the first time according to the first field of the second DDL task in the operation task table, judging whether the execution of the second DDL task is recovered;
and if the execution of the second DDL task is determined to be recovered, executing the second DDL task, and if the execution of the second DDL task is determined not to be recovered, rolling back the data generated by executing the second DDL task last time.
5. The method according to claim 1, wherein the related information comprises a type identifier for indicating whether the corresponding DDL task is a logical DDL task, and the job task table comprises a fifth field for recording the type identifier;
the processing the second DDL task in parallel with the first DDL task comprises:
if the second DDL task is determined not to be a logic DDL task according to a fifth field of the second DDL task in the job task table, splitting the second DDL task into a plurality of subtasks;
and calling task execution units with corresponding quantity to execute the subtasks in parallel according to the quantity of the subtasks.
6. The method of claim 5, wherein invoking a corresponding number of task execution units to execute a subtask further comprises:
creating a reorganization information table of the second DDL task;
recording the meta-information of the subtasks in the reorganization information table;
the calling of the task execution units of the corresponding number to execute the subtasks includes:
and calling the task execution unit, and executing the corresponding subtask according to the meta-information of the corresponding subtask in the reorganization information table.
7. The method of claim 6, wherein the meta-information of the subtasks includes at least one of data locations where the subtasks start scanning and end scanning, node identification of the target computing node, and execution status of the subtasks.
8. The method of claim 1, wherein the task parallel processing rules are implemented based on Structured Query Language (SQL) statements;
the determining, from the job task table, a second DDL task that meets a predetermined task parallel processing rule includes:
and selecting a reference DDL task from the operation task table, writing a task identifier and a mode identifier of the reference DDL task into the task parallel processing rule, and taking the reference DDL task as the target DDL task if the task parallel processing rule is determined to meet a preset condition.
9. The method of any of claims 1-8, wherein the processing the second DDL task in parallel with the first DDL task further comprises:
and after the second DDL task is processed, the related information of the second DDL task is moved from the job task table to a task history table, wherein the task history table comprises fields in the job task table.
10. The method of claim 1, wherein processing the second DDL task in parallel with the first DDL task further comprises:
configuring an execution policy of the second DDL task, the execution policy comprising: information of a target task execution unit for executing the first DDL task.
11. A target compute node in a database system, comprising:
the data processing system comprises an operation task table module, a database system and a task execution module, wherein the operation task table module is used for obtaining an operation task table, the operation task table is used for recording relevant information of the unprocessed DDL tasks, and the relevant information comprises the sequence of the DDL tasks entering the database system and a data mode of a change object corresponding to the DDL tasks;
the task determining module is used for determining a first DDL task which is being executed, and determining a second DDL task which accords with a predetermined task parallel processing rule from the job task table, wherein the task parallel processing rule is related to the sequence of the DDL task entering the database system and the data mode of a change object corresponding to the DDL task;
and the parallel processing module is used for processing the second DDL task and the first DDL task in parallel.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the steps of the parallel processing method of any of claims 1-10.
13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the parallel processing method according to any one of claims 1 to 10.
CN202211268650.5A 2022-10-17 2022-10-17 DDL task parallel processing method, computing node and electronic equipment Pending CN115687378A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211268650.5A CN115687378A (en) 2022-10-17 2022-10-17 DDL task parallel processing method, computing node and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211268650.5A CN115687378A (en) 2022-10-17 2022-10-17 DDL task parallel processing method, computing node and electronic equipment

Publications (1)

Publication Number Publication Date
CN115687378A true CN115687378A (en) 2023-02-03

Family

ID=85066210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211268650.5A Pending CN115687378A (en) 2022-10-17 2022-10-17 DDL task parallel processing method, computing node and electronic equipment

Country Status (1)

Country Link
CN (1) CN115687378A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573730A (en) * 2024-01-16 2024-02-20 腾讯科技(深圳)有限公司 Data processing method, apparatus, device, readable storage medium, and program product

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573730A (en) * 2024-01-16 2024-02-20 腾讯科技(深圳)有限公司 Data processing method, apparatus, device, readable storage medium, and program product
CN117573730B (en) * 2024-01-16 2024-04-05 腾讯科技(深圳)有限公司 Data processing method, apparatus, device, readable storage medium, and program product

Similar Documents

Publication Publication Date Title
US11120006B2 (en) Ordering transaction requests in a distributed database according to an independently assigned sequence
US6772155B1 (en) Looking data in a database system
US7870226B2 (en) Method and system for an update synchronization of a domain information file
CN111949454B (en) Database system based on micro-service component and related method
US10599676B2 (en) Replication control among redundant data centers
KR20170132873A (en) Method for processing database transactions in a distributed computing system
CN111736964B (en) Transaction processing method and device, computer equipment and storage medium
US10380085B2 (en) Method, apparatus and computer program for migrating records in a database from a source database schema to a target database schema
US10983981B1 (en) Acid transaction for distributed, versioned key-value databases
US20170169090A1 (en) Promoted properties in relational structured data
EP3391249B1 (en) Replication of structured data records among partitioned data storage spaces
JP7438603B2 (en) Transaction processing methods, apparatus, computer devices and computer programs
US10248686B2 (en) Shared data with relationship information
CN112789606A (en) Data redistribution method, device and system
CN115145943B (en) Method, system, equipment and storage medium for rapidly comparing metadata of multiple data sources
CN113168371A (en) Write-write collision detection for multi-master shared storage databases
CN115687378A (en) DDL task parallel processing method, computing node and electronic equipment
CN114282074B (en) Database operation method, device, equipment and storage medium
US20090164521A1 (en) Method and system for integrating multiple authoring applications in a collaborative environment
CN114564500A (en) Method and system for implementing structured data storage and query in block chain system
CN111258985A (en) Data cluster migration method and device
CN113641686B (en) Data processing method, data processing apparatus, electronic device, storage medium, and program product
CN109710698A (en) A kind of data assemblage method, device, electronic equipment and medium
US20100049714A1 (en) Fast search replication synchronization processes
CN106933657B (en) Database deadlock processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination