WO2022262240A1 - 数据处理方法、电子设备及存储介质 - Google Patents

数据处理方法、电子设备及存储介质 Download PDF

Info

Publication number
WO2022262240A1
WO2022262240A1 PCT/CN2021/140176 CN2021140176W WO2022262240A1 WO 2022262240 A1 WO2022262240 A1 WO 2022262240A1 CN 2021140176 W CN2021140176 W CN 2021140176W WO 2022262240 A1 WO2022262240 A1 WO 2022262240A1
Authority
WO
WIPO (PCT)
Prior art keywords
script
task
identifier
data table
time
Prior art date
Application number
PCT/CN2021/140176
Other languages
English (en)
French (fr)
Inventor
邹宇
赵学亮
曾广锐
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022262240A1 publication Critical patent/WO2022262240A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/74Reverse engineering; Extracting design information from source code

Definitions

  • the present application relates to the field of computer technology, in particular to a data processing method, electronic equipment and storage media.
  • embodiments of the present application provide a data processing method, electronic equipment, and a storage medium.
  • the embodiment of the present application provides a data processing method, including:
  • the first task script and the second task script are different task scripts in the at least two task scripts; the adjacency relationship in the adjacency relationship set represents the adjacent dependent tasks of the first task script and the corresponding intersection; the The directed edges in the set of directed edges represent the dependencies between every two task scripts.
  • the embodiment of the present application also provides an electronic device, including:
  • the extraction unit is configured to extract the first text described in Structured Query Language corresponding to each task script from the source code of each task script in the received at least two task scripts;
  • the first determination unit is configured to determine the input items and output items corresponding to each task script from the extracted abstract syntax tree corresponding to each first text;
  • the second determination unit is configured to determine the adjacency corresponding to the first task script based on the intersection between the input item corresponding to the first task script and the output item corresponding to each second task script in at least one second task script relationship collection;
  • the output unit is configured to determine at least one directed edge set corresponding to the at least two task scripts based on the determined adjacency set corresponding to each of the first task scripts, and output the at least one directed edge set The directed acyclic graph corresponding to each set of directed edges in the set;
  • the first task script and the second task script are different task scripts in the at least two task scripts; the adjacency relationship in the adjacency relationship set represents the adjacent dependent tasks of the first task script and the corresponding intersection; the The directed edges in the set of directed edges represent the dependencies between every two task scripts.
  • An embodiment of the present application also provides an electronic device, a processor and a memory configured to store a computer program that can run on the processor, wherein the processor is configured to execute the above data processing method when running the computer program A step of.
  • the embodiment of the present application also provides a storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above data processing method are realized.
  • the input items and output items corresponding to each task script are determined; based on the input item of the first task script and the output item of the second task script Intersection between the first task scripts to determine the set of adjacencies corresponding to each first task script; based on the determined set of adjacencies corresponding to each first task script, determine at least one directed edge set corresponding to the batch of task scripts, and output at least one Directed acyclic graph corresponding to each directed edge set in the directed edge set.
  • the dependencies between task scripts can be determined through the source code of the task scripts, without manual configuration of the dependencies between task scripts, which improves the efficiency of determining the dependencies between task scripts and reduces the error rate.
  • the electronic device can accurately determine the batch size based on the directed acyclic graph The execution sequence corresponding to the task script.
  • FIG. 1 is a schematic diagram of the implementation flow of the data processing method provided by the embodiment of the present application.
  • FIG. 2 is a schematic diagram of an implementation flow for determining an adjacency set in the data processing method provided in the embodiment of the present application;
  • FIG. 3 is a schematic diagram of a directed acyclic graph provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of determining the execution sequence of task scripts in the data processing method provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of an implementation flow for determining a directed acyclic graph in the data processing method provided in the embodiment of the present application;
  • FIG. 6 is a schematic diagram of an implementation flow of updating a dependent task set and a directed edge set in the data processing method provided by the application embodiment of the present application;
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a hardware composition structure of an electronic device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an implementation flow of a data processing method provided by an embodiment of the present application, wherein the subject of execution of the flow is an electronic device such as a terminal device or a server.
  • the data processing method includes:
  • Step 101 From the received source code of each task script in at least two task scripts, extract the first text corresponding to each task script described by using a structured query language.
  • a batch of task scripts consists of at least two task scripts.
  • the electronic device converts the source code of the task scripts into Structured Query Language (SQL, Structured Query) Language) describes the first text.
  • SQL Structured Query Language
  • the implementation process is as follows:
  • the electronic device judges whether the corresponding task script is a task script written in SQL, and obtains a first judgment result.
  • the suffix of the task script is named sql, it indicates that the task script is a task script written in SQL.
  • the first judgment result is that the first task script is a task script written in SQL
  • the source code included in the first task script is read to obtain the first text corresponding to the first task script.
  • the first task script is any one of at least two task scripts.
  • the SQL text content is extracted from the source code included in the first task script through a set regular expression to obtain the first text; wherein , the set regular expression is used to extract the SQL text content.
  • Step 102 Determine the input items and output items corresponding to each task script from the abstract syntax tree corresponding to each extracted first text.
  • the electronic device divides the content in the first text corresponding to each task script into multiple independent SQL statements according to the set separator; the format of the SQL statements obtained by the segmentation corresponding to each first text is regularized, Obtain the regularized SQL statement corresponding to the corresponding first text, for example, use setting characters to replace the interfering characters such as format placeholders in the SQL statement; the regularized SQL statement corresponding to the first text corresponding to each task script Convert it into an abstract syntax tree (AST, Abstract Syntax Tree), so as to obtain the abstract syntax tree corresponding to each task script.
  • AST is an abstract representation of the grammatical structure of the source code. The AST expresses the grammatical structure of the programming language in the form of a tree, and each node on the tree represents a grammatical structure in the source code.
  • the set delimiter can be a seal (;).
  • the electronic device will filter out the SQL statements without data changes from the determined SQL statements after determining the SQL statements, so as to obtain the SQL statements with data changes. statement, and update the abstract syntax tree corresponding to the task script based on the SQL statement with data changes. With data changes means that the input items and/or output items have changed.
  • the electronic device recognizes the first statement and the second statement from the abstract syntax tree corresponding to the task script, and determines the task script from the identified first statement For the corresponding input item, the output item corresponding to the task script is determined from the recognized second sentence.
  • the input items of the task script represent the input parameters corresponding to the functions involved in the task script; the output items of the task script represent the output parameters corresponding to the functions involved in the task script.
  • the first statement is a query statement (Select Statement)
  • the second statement is an insert statement (Insert Statement).
  • the task script is used to perform operations such as data query and data update related to the database; both output items and output items are data tables. That is to say, the input item is the data table included in the query statement, and the output item is the data table included in the insert statement.
  • the input items corresponding to the task script are data tables T1 and T2, and the output items are data table T3.
  • the SQL text content can be extracted from the corresponding task script in different ways according to the different programming language types of the task script, and the abstract syntax tree corresponding to the corresponding task script can be determined based on the extracted SQL text content. And based on the first statement and the second statement included in the abstract syntax tree corresponding to the task script, the input items and output items of the task script are respectively determined. Thus, the input items and output items corresponding to each task script can be extracted completely and accurately.
  • Step 103 Based on the intersection between the input item corresponding to the first task script and the output item corresponding to each second task script in at least one second task script, determine a set of adjacency relationships corresponding to the first task script.
  • the first task script is different from the second task script, and both the first task script and the second task script generally refer to any task script in at least two task scripts received.
  • Any adjacency in the set of adjacencies represents the intersection of the adjacency-dependent tasks of the first task script and the corresponding ones.
  • the electronic device can compare the input items of any task script Jx with the output items of each task script Jy except task script Jx, so as to judge the task script Jx Whether there is an intersection between the input items and the output items of the task script Jy, the first judgment result is obtained.
  • the first judgment result indicates that the input items of the task script Jx and the output items of the task script Jy have an intersection
  • one of the tasks corresponding to the task script Jx is determined based on the intersection. adjacency.
  • the electronic device can determine all adjacency relationships corresponding to the task script Jx, and obtain a set of adjacency relationships corresponding to the task script Jx.
  • the format of the adjacency relationship may be: the script identifier of the adjacent dependent task of the first task script: corresponding intersection.
  • the adjacency set corresponding to task script J1 is ⁇ J2:T1, J2:T2, J4:T1 ⁇ .
  • the first data table can be used to store each The input items and output items corresponding to each task script, and the adjacency relationship corresponding to each task script is stored in the second data table.
  • the intersection between the input item corresponding to the first task script and the output item corresponding to each second task script in at least one second task script is determined to determine the The adjacency set corresponding to the first task script includes:
  • Step 201 write the script identifier of each task script, the corresponding input item set and output item set into the first data table in association;
  • Step 202 Based on the intersection between the input item set corresponding to the first task script and the output item set corresponding to the second task script in the first data table, determine the adjacency relationship corresponding to the first task script;
  • Step 203 Associate and write the script ID of the first task script and the corresponding determined adjacency relationship into a second data table; wherein, the second data table is used to associate and store the script ID and the adjacency relationship set.
  • the electronic device associates and writes the script identifier of each task script, the corresponding set of input items and the set of output items into the first data table in the case of the determined input items and output items corresponding to each task script.
  • the first data table includes at least a script identifier, a set of input items and a set of output items corresponding to the script identifier.
  • the first data table further includes a first time representing a set of changed input items and a second time representing a set of changed output items.
  • the first data table is as follows:
  • the input item set of task script J1 is composed of T1 and T2, the output item set is composed of T3 and T4; the input item set of task script J2 is composed of T5 and T6, and the output item set is composed of T1 and T2 Composition; the output item set of task script J3 includes T8, and the output item set is composed of T5 and T10; the input item set of task script J4 is composed of T9 and T10, and the output item set is composed of T1 and T6; the input item set of task script J5 is an empty set, and the output item set consists of T2 and T9.
  • each time the electronic device determines an adjacency relationship it writes the determined adjacency relationship into the second data table.
  • the second data table is used for associatively storing the script ID and the adjacency relationship set.
  • the second data table includes script identifiers and corresponding adjacency relationship sets. Considering that when the input items and/or output items in the task script are changed, the adjacency relationship corresponding to the task script needs to be updated synchronously, because when the adjacency relationship corresponding to the task script is updated, the corresponding adjacency relationship of the task script may result in When the adjacent dependent tasks are changed and the adjacent dependent tasks are updated, the corresponding DAG needs to be updated. Therefore, in order to facilitate the determination of whether the corresponding DAG needs to be updated, the second data table also includes the representation change The third time of adjacency and the fourth time of representation change adjacency dependent tasks. In one embodiment, the second data table is as follows:
  • the set of adjacency relationships corresponding to J1 in the second data table consists of three adjacency relationships.
  • the third time corresponds to the last change time of the adjacency relationship set
  • the fourth time corresponds to the last change time of the adjacent dependent tasks.
  • the method further includes at least one of the following:
  • the electronic device when it receives a delete instruction for the third task script, it deletes the script identifier of the third task script and the corresponding set of input items, set of output items, first time, second Time: delete the script ID of the third task script and the corresponding adjacency set, the third time, the fourth time, and the adjacency including the script ID of the third task script from the second data table.
  • the script identifier of the third task script is J3, delete J3, ⁇ T8 ⁇ , ⁇ T5, T10 ⁇ , and the first time and second time corresponding to J3 from the first data table; Delete J3, ⁇ J3:T10, J5:T9 ⁇ , the third time and the fourth time corresponding to J3 from the table, and also delete J3:T5 and J3:T10 from the second data table.
  • the electronic device determines all input items and all output items corresponding to the third task script according to the above method; and all output items, write into the first data table; write the time of the last input item corresponding to the third task script as the corresponding first time, and write the first time into the first data table, and write Enter the time of the last output item corresponding to the third task script as the corresponding second time, and write the second time into the first data table.
  • the electronic device determines the adjacency relationship corresponding to the third task script based on the above method, writes all adjacency relationships corresponding to the script identifier of the third task script into the second data table; writes the last adjacency relationship corresponding to the third task script
  • the time is used as the corresponding third time and the corresponding fourth time, and the third time and the fourth time are correspondingly written into the second data table.
  • the method further includes at least one of the following:
  • At least one set of the input item set, the output item set, the first time and the second time corresponding to the first task script in the first data table is updated;
  • the third time corresponding to the first task script is earlier than the corresponding first time, based on the intersection between the corresponding updated set of input items and the set of output items corresponding to task scripts other than the first task script, Updating at least one of the adjacency set corresponding to the first task script in the second data and the fourth time;
  • the third time corresponding to the first task script is earlier than the corresponding second time, based on the intersection between the corresponding updated output item set and the input item set corresponding to the adjacent dependent task of the first task script, update all At least one of the adjacency relationship set corresponding to the adjacency dependent task corresponding to the second data and the fourth time;
  • the third time corresponding to the first task script is updated to the maximum value of the corresponding first time and the corresponding second time;
  • the fourth time is updated when the adjacent dependent tasks in the corresponding updated adjacency relationship set are changed.
  • the set of input items is a set with the first time
  • the set of output items is a set with the second time.
  • the electronic device judges whether the corresponding input item in the first data table has changed by comparing the third time corresponding to the first task script with the corresponding first time; and by comparing the third time corresponding to the first task script with the corresponding At the second time, it is determined whether the corresponding output item in the first data table is changed.
  • the third time corresponding to the first task script in the case that the third time corresponding to the first task script is earlier than the corresponding first time, it indicates that the corresponding input item in the first data table has changed; the third time corresponding to the first task script is equal to or later than In the case of the corresponding first time, it indicates that the corresponding input item in the first data table has not changed; in the case that the third time corresponding to the first task script is earlier than the corresponding second time, it indicates that in the first data table The corresponding output item is changed; when the third time corresponding to the first task script is equal to or later than the corresponding second time, it means that the corresponding output item in the first data table has not changed.
  • the electronic device detects that the source code of the first task script in at least two task scripts has changed, it determines the set of input items and the set of output items corresponding to the changed first task script according to the above method, and compares and determines The set of input items and the set of input items corresponding to the first task script in the first data table, so as to determine whether the input items in the set of input items corresponding to the changed first task script have changed; compare the determined set of output items and The set of output items corresponding to the first task script in the first data table, so as to determine whether the output items in the set of output items corresponding to the changed first task script are changed.
  • the input items and/or output items corresponding to the first task script are changed, at least one of the following is executed:
  • the input item set and the first time corresponding to the script identifier of the first task script in the first data table are updated.
  • the set of output items corresponding to the script identifier of the first task script in the first data table and the second time are updated.
  • the electronic device determines based on the intersection between the corresponding updated input item set and the output item set corresponding to the task scripts other than the first task script according to the above method Obtain the adjacency set corresponding to the updated first task script; compare the determined adjacency set with the adjacency set corresponding to the first task script in the second data table, and if the two are different, characterize the first The adjacency set corresponding to the task script is changed.
  • the adjacency set corresponding to the first task script in the second data table is replaced with the determined adjacency set, and the third time corresponding to the first task script is updated; And based on the two adjacency sets corresponding to the first task script, it is judged whether the adjacent dependent task corresponding to the first task script has changed, and if the adjacent dependent task corresponding to the first task script has changed, update the second data table The fourth time corresponding to the first task script. Wherein, if the adjacency relationship in the adjacency relationship set corresponding to the first task script has not changed, the third time corresponding to the first task script in the second data table does not need to be updated. In the case that the adjacent dependent task corresponding to the first task script does not change, the fourth time corresponding to the first task script in the second data table does not need to be updated.
  • the electronic device determines the corresponding updated set of output items and the set of input items corresponding to task scripts other than the first task script in the first data table according to the above method Whether there is an intersection between , if there is an intersection between the updated output item set corresponding to the first task script and the input item set corresponding to any task script, it means that the task script adjacency depends on the first task script, that is, That is, the first task script is an adjacency-dependent task of the task script, at this time, the adjacency relationship represented by the first task script and the intersection is updated to the adjacency set corresponding to the task script, and in the task script corresponding When the adjacent dependent task is changed, the fourth time corresponding to the task script in the second data table is updated.
  • the third time corresponding to the first task script in the second data table is updated to the corresponding first time;
  • the third time corresponding to the first task script in the second data table is updated to the corresponding second time;
  • Jx. rel_upate_time represents the third time (rel_upate_time) corresponding to Jx
  • Jx.in_update_time represents the first time corresponding to Jx
  • Jx.out_update_time represents the second time corresponding to Jx.
  • Jx.rel_upate_time ⁇ Jx.in_update_time it indicates that the corresponding Cin of Jx in the first data table is changed.
  • the update of Jx.rel causes the adjacent dependent tasks of Jx to change, replace the fourth time corresponding to Jx with Jx.in_update_time; for example, when Jx When the corresponding adjacency set is changed from ⁇ J2:T1, J2:T2, J4:T1 ⁇ to ⁇ J2:T1, J2:T2 ⁇ , it indicates that the adjacency dependency task corresponding to Jx has changed; when the adjacency relationship corresponding to Jx When the set is changed from ⁇ J2:T1, J2:T2, J4:T1 ⁇ to ⁇ J2:T1, J4:T1 ⁇ , it
  • Jx.rel_upate_time MAX(Jx.in_update_tim, Jx.out_update_tim).
  • Step 104 Based on the determined adjacency set corresponding to each of the first task scripts, determine at least one directed edge set corresponding to the at least two task scripts, and output each directed edge set in the at least one directed edge set.
  • the directed edges in the set of directed edges represent the dependency relationship between every two task scripts.
  • the adjacency relationship in the adjacency relationship set corresponding to each first task script determined by the electronic device determines a group of task scripts that have an adjacency dependency relationship between each other, and obtains the first dependent task set; based on the first dependent task set
  • Each script identifies each adjacency relationship in the corresponding adjacency relationship set, and determines the corresponding directed edge, thereby obtaining the first directed edge set composed of the determined directed edges; the first directed edge set
  • the script identifier whose in-degree is zero is determined as the starting point of the directed acyclic graph corresponding to the first set of directed edges, and according to the directed edges included in the first set of directed edges, the The script identifies the directed connection, obtains the directed acyclic graph corresponding to the first set of directed edges, and outputs the directed acyclic graph corresponding to the first set of directed edges.
  • the first directed edge set represents every two task scripts with adjacent dependencies in the corresponding first dependent task set
  • An adjacency relationship determines a directed edge, and a directed edge is directed from a script identifier in the adjacency relationship (the script identifier of the adjacent dependent task) to the script identifier of the corresponding first task script.
  • the directed edges determined by the adjacency set corresponding to J1 include ⁇ J2, J1> and ⁇ J4, J1>.
  • the first dependent task set includes the first script identifier, the second script identifier and the third script identifier; the first script identifier is the script identifier of the first task script in the second data table, and the second script identifier represents the first task script
  • the script identifier is included in the adjacency relationship set corresponding to the script identifier
  • the third script identifier represents the script identifier included in the adjacency relationship set corresponding to the second script identifier.
  • the first set of directed edges includes a first subset, a second subset and a third subset.
  • the first subset is a directed edge set determined by the adjacency relationship in the adjacency relationship set corresponding to the first script identifier; the second subset is determined by the adjacency relationship in the adjacency relationship set corresponding to the second script identifier A set of directed edges; the third subset is a set of directed edges determined by the adjacency relationship in the adjacency relationship set corresponding to the corresponding third script identifier.
  • the number of directed acyclic graphs is the same as the number of the first set after deduplication.
  • a directed edge represents the dependency between two task scripts.
  • the first dependent task set is ⁇ J1, J2, J3, J4 and J5 ⁇
  • a set of directed edges is ⁇ J2,J1>, ⁇ J4,J1>, ⁇ J3,J2>, ⁇ J4,J2>, ⁇ J5,J3>, ⁇ J3,J4>, ⁇ J5,J4> ⁇
  • the first subset of the first directed edge is ⁇ J2,J1>, ⁇ J4,J1> ⁇
  • the second subset is ⁇ J3,J2>, ⁇ J4,J2> ⁇
  • the second subset is ⁇ J5,J3>, ⁇ J3,J4>, ⁇ J5,J4> ⁇ .
  • the directed acyclic graph corresponding to the output of the electronic device is shown in Figure 3.
  • the method further includes:
  • an execution order of task scripts corresponding to the at least two task scripts is determined, and the at least two task scripts are executed according to the determined execution order.
  • the electronic device determines the execution sequence of the task scripts in each DAG based on the directed edges between every two task scripts in each DAG; and based on the determined execution sequence, Execute batch task scripts.
  • the electronic device may execute task scripts corresponding to different DAGs in parallel.
  • the electronic device determines the execution order of the task scripts corresponding to the directed acyclic graph in the following manner:
  • FIG. 4 a schematic diagram of determining the execution order of task scripts is shown in FIG. 4 , and the execution order of task scripts corresponding to the directed acyclic graph is .
  • the input items and output items corresponding to each task script are determined; based on the input item of the first task script and the output item of the second task script Intersection between the first task scripts to determine the set of adjacencies corresponding to each first task script; based on the determined set of adjacencies corresponding to each first task script, determine at least one directed edge set corresponding to the batch of task scripts, and output at least one Directed acyclic graph corresponding to each directed edge set in the directed edge set.
  • the dependencies between task scripts can be determined through the source code of the task scripts, without manual configuration of the dependencies between task scripts, which improves the efficiency of determining the dependencies between task scripts and reduces the error rate.
  • the electronic device can accurately determine the batch size based on the directed acyclic graph The execution sequence corresponding to the task script.
  • the electronic device can determine the execution order of task scripts based on the directed acyclic graph, which can improve the accuracy and efficiency of determining the execution order of task scripts , Execute task scripts based on the exact execution sequence, which can ensure the accuracy of the execution results.
  • the electronic device when it generates the directed acyclic graph corresponding to the first directed edge set, it can change the script identifier, the corresponding dependent task set, the corresponding directed edge set and the representation change directed acyclic graph
  • the fifth time association of the graph is written into the third data table, so that the set of dependent tasks and the set of directed edges corresponding to the batch task scripts obtained subsequently are the same as the set of dependent tasks and the corresponding directed edges stored in the third data table
  • the corresponding DAG in the database can be output without regenerating the corresponding DAG, which improves the efficiency of outputting the DAG.
  • the fifth time is the time for generating or updating the directed acyclic graph corresponding to the set of directed edges.
  • the third data table obtained from the data in the second data table is as follows:
  • the dependent task set in the third data table can represent the execution sequence of the task script, that is, the dependent task set corresponding to J1 in the third data table can be [J5, J3, J4, J2, J1].
  • the script identifier On the basis of writing the script identifier, the corresponding dependent task set, the corresponding directed edge set, and the fifth time association representing the change directed acyclic graph into the third data table, as shown in FIG. 5 , in some embodiments wherein, based on the determined adjacency set corresponding to each of the first task scripts, determine at least one directed edge set corresponding to the at least two task scripts, and output the at least one directed edge set
  • the directed acyclic graph corresponding to each set of directed edges includes:
  • Step 501 Search for the first script identifier in the script identifiers stored in the second data table in the script identifiers stored in the third data table; wherein, the third data table is used for associatively storing script identifiers, dependent task sets, Sets of directed edges and fifth times representing changes in directed acyclic graphs.
  • the electronic device determines the adjacency set corresponding to each task script, and writes the script identifier and adjacency corresponding to the task script into the second data table in association, in the script identifier stored in the third data table Find the first script identifier in the script identifiers stored in the second data table.
  • the first script identifier in the second data table is found in the third data table, it indicates that a directed acyclic graph corresponding to the set of directed edges corresponding to the first script identifier has been generated before, and step 502 is performed ;
  • the first script identifier in the second data table is not found in the third data table, it indicates that the directed acyclic graph corresponding to the directed edge set corresponding to the first script identifier has not been generated before, and the third There is also no dependent task set and directed edge set corresponding to the first script identifier in the data table, and step 504 is executed.
  • Step 502 When the first script identifier is found in the third data table, check whether the fifth time corresponding to the first script identifier in the third data table is equal to or later than the corresponding fourth time.
  • the fifth time corresponding to the first script identifier in the third data table is equal to or later than the corresponding fourth time, indicating that the directed acyclic graph corresponding to the set of directed edges corresponding to the first script identifier is changed
  • the first script is generated after the adjacency dependency represented by the corresponding adjacency set is identified, the DAG is the latest DAG, and step 503 is executed.
  • the directed acyclic graph corresponding to the set of directed edges corresponding to the first script identifier stored in the representation database is If the adjacency dependency relationship represented by the adjacency relationship set corresponding to the first script identifier is changed, the corresponding DAG needs to be regenerated. In this case, step 504 is performed.
  • Step 503 In the case that the fifth time corresponding to the first script identifier is equal to or later than the corresponding fourth time, output the directed acyclic graph stored in the database associated with the first script identifier.
  • the electronic device acquires the directed acyclic graph corresponding to the directed edge set corresponding to the first script identifier from the database, and outputs the acquired directed acyclic graph.
  • Step 504 If the first script identifier is not found in the third data table, or the fifth time corresponding to the first script identifier is earlier than the corresponding fourth time, based on each of the determined first script identifiers
  • the adjacency set corresponding to the task script updating the dependent task set and the directed edge set corresponding to the first script identifier in the three data tables, and based on the updated dependent task set and the directed edge set corresponding to the first script identifier, Output the corresponding directed acyclic graph.
  • the first script identifier is not found in the third data table, based on the determined adjacency set corresponding to each first task script, the corresponding dependency set and the corresponding directed edge set, and based on the determined dependency set and directed edge set, generate the corresponding directed acyclic graph, and generate the corresponding directed acyclic graph with the first script identifier, the corresponding adjacency set, the corresponding directed edge set, and A third data table is written to the time association of the acyclic graph.
  • the specific implementation process of generating the corresponding directed acyclic graph please refer to the related description in step 104, which will not be repeated here.
  • the electronic device searches the script identifier J1 in the second data table from the script identifier stored in the third data table; in the case of finding the script identifier J1 in the script identifier stored in the third data table, it means that J1 has been generated before this
  • the DAG corresponding to the corresponding dependent task set at this time, the electronic device judges whether the fifth time corresponding to J1 is equal to or later than the corresponding fourth time, and the fifth time corresponding to J1 is equal to or later than the corresponding fourth time
  • the directed acyclic graph stored in the database associated with J1 is output.
  • the script identifier J1 In the case that the script identifier J1 is not found in the script identifier stored in the third data table, it indicates that the directed acyclic graph corresponding to the dependent task set corresponding to J1 has not been generated before. At this time, it needs to be based on the second data table
  • the adjacency relationship set corresponding to the stored script identifier is determined to determine the dependent task set and directed edge set corresponding to J1, and based on the dependent task set and directed edge set corresponding to J1, a corresponding directed acyclic graph is generated, and J1,
  • the corresponding dependent task set, the corresponding directed edge set, and the fifth time association for generating the corresponding directed acyclic graph are written into the third data table.
  • the script identifier recorded in the third data table it can be judged whether there is a corresponding DAG in the database through the script identifier recorded in the third data table, and the fifth time recorded in the third data table can be used to judge whether the corresponding DAG is stored in the database.
  • the corresponding DAG is the latest DAG; if there is no corresponding DAG in the database or the corresponding DAG is not the latest DAG, generate a corresponding directed acyclic graph; when the latest directed acyclic graph is stored in the database, the corresponding directed acyclic graph can be directly output without regenerating the directed acyclic graph, which improves the output of the directed acyclic graph s efficiency.
  • FIG. 6 is a schematic diagram of an implementation flow of updating a dependent task set and a directed edge set in the data processing method provided by the application embodiment of the present application.
  • step 504 based on the determined adjacency set corresponding to each of the first task scripts, update the dependent task set and the directed edge corresponding to the first script identifier in the three data tables collection, including:
  • Step 601 In the case where the first script identifier, the corresponding first dependent task set and the corresponding first directed edge set are associated and written in the third data table, it will be determined by the adjacency set corresponding to the first script identifier The set of directed edges is added to the corresponding first set of directed edges and deduplicated; the first set of dependent tasks includes the first script identifier.
  • the corresponding set of directed edges is determined based on the set of adjacency relationships corresponding to the first script identifier, and the set of directed edges corresponding to the first script identifier is added to the first set of directed edges.
  • step 104 For the implementation process of determining the corresponding directed edge set based on the adjacency relationship set, please refer to the relevant description in step 104, and details are not repeated here.
  • the electronic device determines that there is an adjacency relationship set corresponding to the first script identifier After the edge set is added to the corresponding first directed edge set, deduplication processing is performed on the first directed edge set.
  • the electronic device adds the directed edge set determined by the adjacency relationship set corresponding to the first script identifier to the corresponding first directed edge set, and does not need to Perform deduplication processing on the first set of directed edges.
  • the method further includes:
  • a first script identifier is added to the first set of dependent tasks.
  • the electronic device writes the first script identifier into the location for recording the script identifier in the third data table, and creates a corresponding script identifier in the third data table.
  • the first dependent task set and the corresponding first directed edge set of , at this time, both the first dependent task set and the first directed edge set are empty sets, and the first script identifier is added to the created first dependent task set ; Determine the corresponding directed edge set based on the adjacency relationship set corresponding to the first script identifier, and add the directed edge set corresponding to the first script identifier to the first directed edge set.
  • step 601 the electronic device performs processing according to steps 602 to 608 for the script identifiers included in the adjacency relationship set corresponding to the first script identifier in the second data table.
  • Step 602 Determine whether the i-th script identifier in the adjacency relationship set corresponding to the first script identifier in the second data table exists in the first dependent task set.
  • i is a positive integer, and i is less than or equal to the total number of script identifiers included in the adjacency set where the i-th script identifier is located in the second data table.
  • step 602 when step 602 is executed for the first time, i is equal to 1.
  • the i-th script identifier in the adjacency relationship set corresponding to the first script identifier in the second data table exists in the first dependent task set, it is judged whether i is smaller than the adjacency relationship where the i-th script identifier is located The total number of script IDs included in the collection.
  • step 603 is executed.
  • step 604 is executed.
  • Step 603 If there is an i-th script identifier in the first dependent task set, and i is less than the total number of script identifiers included in the adjacency set where the i-th script identifier is located, assign i to i+ 1. Execute the judging whether the i-th script identifier in the adjacency relationship set corresponding to the first script identifier exists in the first dependent task set.
  • the representation has been determined by The directed edge set determined by the adjacency relationship set where the i-th script identifier is located is added to the corresponding first directed edge set. At this time, assign i as i+1, and return to step 602 .
  • Step 602 is executed for the next script identifier in the adjacency relationship set where the identifier is located.
  • step 602 is executed for the next script identifier in the adjacency relationship set corresponding to the corresponding first script identifier.
  • it is judged whether there is a next script identifier in the adjacency set where the first script identifier corresponding to the adjacency relationship set is located; If there is no next script identifier in the adjacency set where the first script identifier corresponding to the adjacency set is located, exit the loop;
  • step 602 is executed for the next script identifier in the adjacency relationship set where the first script identifier corresponding to the adjacency relationship set is located.
  • Step 604 If the i-th script identifier does not exist in the first dependent task set, determine whether the i-th script identifier is included in the script identifiers stored in the third data table.
  • step 605 is performed; in the case that the i-th script identifier corresponding to the first script identifier is not included in the script identifier stored in the third data table, the representation has never generated the first script identifier before this
  • step 607 to step 608 are executed.
  • Step 605 If the i-th script identifier is included in the script identifiers stored in the third data table, determine whether the fifth time corresponding to the i-th script identifier is equal to or later than the corresponding fourth time.
  • step 606 when the fifth time corresponding to the i-th script identifier corresponding to the first script identifier is equal to or later than the corresponding fourth time, it indicates that the i-th script identifier corresponding to the first script identifier in the third data table corresponds to Both the dependent task set and the directed edge set of are up-to-date, and step 606 is executed.
  • step 607 to step 608 are performed.
  • Step 606 In the case that the fifth time corresponding to the i-th script identifier is equal to or later than the corresponding fourth time, add the i-th script identifier to the first dependent task set and de-duplicate it, and add the i-th script identifier
  • the script identifies the corresponding set of directed edges in the third data table, adds to the first set of directed edges and deduplicates; assigns i to i+1, and executes step 602 .
  • the electronic device adds the i-th script identifier corresponding to the first script identifier in the second data table to the first dependent task set corresponding to the first script identifier in the third data table, and deletes the first dependent task set.
  • Reprocessing read the directed edge set corresponding to the i-th script identifier corresponding to the first script identifier from the third data table, and add the read directed edge set to the first directed edge set corresponding to the first script identifier In the set of directed edges, and deduplication processing is performed on the first set of directed edges.
  • i is less than the total number of script identifiers included in the adjacency set of the i-th script identifier, it means that the directed edge set determined by the adjacency set of the i-th script identifier has been added to the corresponding The first directed edge set of , at this time, assign i as i+1, and return to step 602 .
  • step 602 is executed.
  • Step 607 If the i-th script identifier is not included in the script identifiers stored in the third data table, or the fifth time corresponding to the i-th script identifier is earlier than the corresponding fourth time, save the i-th script
  • the identifier is added to the first dependent task set and deduplicated, and the directed edge set determined by the adjacency relationship set corresponding to the i-th script identifier is added to the first directed edge set and deduplicated.
  • the script identifier stored in the third data table does not include the i-th script identifier corresponding to the first script identifier, or the fifth time corresponding to the i-th script identifier is earlier than the corresponding fourth time
  • the The i-th script identifier corresponding to the first script identifier in the second data table is added to the first dependent task set corresponding to the first script identifier in the third data table, and the first dependent task set is deduplicated; from the second Read the adjacency set corresponding to the i-th script identifier in the data table, and determine the directed edge set corresponding to the adjacency set based on the adjacency set corresponding to the i-th script identifier, and the determined directed edge set Add to the first set of directed edges, and perform deduplication processing on the first set of directed edges.
  • Step 608 Identify the i-th script identifier as the first script identifier, and perform the judging whether there is the first script identifier in the adjacency set corresponding to the first script identifier in the second data table in the first dependent task set. i script identifiers.
  • step 607 When the electronic device executes step 607, it recognizes the i-th script identifier corresponding to the first script identifier as the first script identifier, and returns to step 602, so that the i-th script identifier corresponding to the first script identifier is included in the second data
  • Each script identifier included in the corresponding adjacency relationship set in the table is processed according to step 602 to step 608 .
  • step 602 is executed.
  • step 603 For the specific implementation process, please refer to the related description in step 603 , which will not be repeated here.
  • the first script identifier is not found in the third data table, or the fifth time corresponding to the first script identifier is earlier than the corresponding fourth time, based on the determination
  • the adjacency relationship set corresponding to each of the first task scripts is obtained, and the implementation process of updating the dependent task set and the directed edge set corresponding to the first script identifier in the three data tables:
  • the electronic device searches the script identifier J1 in the second data table from the script identifiers stored in the third data table.
  • the script identifier J1 is found in the script identifier stored in the third data table, and the fifth time corresponding to J1 is earlier than the corresponding fourth time, it indicates that the directed task corresponding to the dependent task set corresponding to J1 has been generated before In an acyclic graph, at this time, neither the first dependent task set Ls corresponding to J1 nor the first directed edge set E in the third data table is an empty set, and the electronic device adds the directed edge set E1 corresponding to J1 to the third data table In the E corresponding to J1 in the table, deduplication processing is performed on E.
  • the first script identifier is not found in the third data table, and the first script identifier is found in the script identifier stored in the third data table, determine the first script in the third data table
  • the implementation method of identifying the corresponding dependent task set and each element in the directed edge set is similar.
  • the following describes adding E1 corresponding to J1 to the corresponding In the case of E, for the script identifiers J2 and J4 in the adjacency set corresponding to J1 in the second data table, process according to the above steps 602 to 608 respectively:
  • the steps 604. Determine whether J2 is included in the script identifier stored in the third data table, and obtain a second determination result.
  • step 607 is performed to add J2 to the Ls corresponding to J1, and according to the second data
  • step 602 when step 602 is executed, it is judged whether the first script identifier J3 in the adjacency relationship set corresponding to J2 exists in the Ls corresponding to J1, and the third judgment result is obtained.
  • step 604 is executed to judge the Whether J3 is included in the script identifier stored in the third data table, the fourth judgment result is obtained.
  • step 602 when step 602 is executed, it is judged whether the first script identifier J5 in the adjacency relationship set corresponding to J3 exists in the Ls corresponding to J1, and the fifth judgment result is obtained.
  • the fifth judgment result indicates that J5 in the adjacency relationship set corresponding to J3 does not exist in the Ls corresponding to J1, and step 604 is executed to judge whether J5 is included in the script identifier stored in the third data table.
  • the judgment result indicates that J5 is not included in the script identifier stored in the third data table, and step 607 is executed to add J5 to the Ls corresponding to J1 and deduplicate, and the adjacency set J5.
  • the next script identifier in the relationship set is processed according to step 602 to step 608 .
  • step 602 it is determined whether the first script identifier J3 in the adjacency set corresponding to J4 exists in the Ls corresponding to J1. Since J3 exists in the Ls corresponding to J1, step 603 is executed at this time. Since the adjacency set corresponding to J4 includes two script identifiers, assign i to i+1, and execute step 602 to determine whether the second script identifier J5 in the adjacency set corresponding to J4 exists in the Ls corresponding to J1 , since J5 exists in the Ls corresponding to J1, at this time, step 603 is performed, since i is equal to the total number of script identifiers included in the adjacency set where J5 is located, for the first script identifier J4 corresponding to the adjacency set where J5 is located For the next script identifier in the adjacency relationship set J2.rel, execute step 602; since J4 is the last script identifier in J2.rel, the first script identifier J2
  • the embodiment of the present application also provides an electronic device, as shown in FIG. 7, the electronic device includes:
  • the extracting unit 71 is configured to extract the first text described in Structured Query Language corresponding to each task script from the source code of each task script in the received at least two task scripts;
  • the first determining unit 72 is configured to determine the input items and output items corresponding to each task script from the extracted abstract syntax tree corresponding to each first text;
  • the second determining unit 73 is configured to determine, based on the intersection between the input item corresponding to the first task script and the output item corresponding to each second task script in at least one second task script, the adjacency set;
  • the output unit 74 is configured to determine at least one directed edge set corresponding to the at least two task scripts based on the determined adjacency set corresponding to each of the first task scripts, and output the at least one directed edge set corresponding to the at least two task scripts.
  • the directed acyclic graph corresponding to each directed edge set in the edge set; where,
  • the first task script and the second task script are different task scripts in the at least two task scripts; the adjacency relationship in the adjacency relationship set represents the adjacent dependent tasks of the first task script and the corresponding intersection; the The directed edges in the set of directed edges represent the dependencies between every two task scripts.
  • the first determining unit 72 is specifically configured as:
  • the script identifier of the first task script and the corresponding determined adjacency relationship are associated and written into a second data table; wherein the second data table is used to associate and store the script identifier and the adjacency relationship set.
  • the electronic device also includes a first update unit configured to perform at least one of the following:
  • the first data table further includes a first time characterizing a set of changed input items and a second time characterizing a set of changed output items;
  • the second data table further includes a third time characterizing a changed adjacency A fourth time when the representation change is adjacent to the dependent task,
  • the electronic device further includes a second update unit configured to perform at least one of the following:
  • At least one set of the input item set, the output item set, the first time and the second time corresponding to the first task script in the first data table is updated;
  • the third time corresponding to the first task script is earlier than the corresponding first time, based on the intersection between the corresponding updated set of input items and the set of output items corresponding to task scripts other than the first task script, Updating at least one of the adjacency set corresponding to the first task script in the second data and the fourth time;
  • the third time corresponding to the first task script is earlier than the corresponding second time, based on the intersection between the corresponding updated output item set and the input item set corresponding to the adjacent dependent task of the first task script, update all At least one of the adjacency relationship set corresponding to the task script that is adjacent to the task script that depends on the first task script in the second data and the fourth time;
  • the third time corresponding to the first task script is updated to the maximum value of the corresponding first time and the corresponding second time; wherein, the fourth time corresponds to the time when the adjacent dependent tasks in the updated adjacency set are changed case update.
  • the output unit 74 is specifically configured as:
  • the third data table is used for associatively storing script identifiers, dependent task sets, and directed edges Fifth time for set and representation change DAGs;
  • the directed acyclic graph stored in association with the first script identifier in the output database
  • the first script identifier is not found in the third data table, or the fifth time corresponding to the first script identifier is earlier than the corresponding fourth time, based on the determined script corresponding to each first task adjacency relationship set, update the dependent task set and directed edge set corresponding to the first script identifier in the three data tables, and based on the updated dependent task set and directed edge set corresponding to the first script identifier, output the corresponding Directed Acyclic Graph.
  • the output unit 74 is specifically configured as:
  • the adjacency relationship determined by the first script identifier corresponding to the A set of directed edges is added to the corresponding first set of directed edges and deduplicated;
  • the first set of dependent tasks includes a first script identifier;
  • the i-th script identifier is included in the script identifier stored in the third data table, it is judged whether the fifth time corresponding to the i-th script identifier is equal to or later than the corresponding fourth time;
  • the fifth time corresponding to the i-th script identifier is equal to or later than the corresponding fourth time, add the i-th script identifier to the first dependent task set and de-duplicate it, and place the i-th script identifier in The corresponding set of directed edges in the third data table is added to the first set of directed edges and deduplicated; assigning i to i+1, and performing the determination of whether the first dependent task set exists The i-th script identifier in the adjacency relationship set corresponding to the first script identifier;
  • the script identification stored in the third data table does not include the i-th script identification, or the fifth time corresponding to the i-th script identification is earlier than the corresponding fourth time, add the i-th script identification to The first dependent task set is combined and deduplicated, and the directed edge set determined by the adjacency relationship set corresponding to the i-th script identifier is added to the first directed edge set and deduplicated;
  • i is equal to the total number of script identifiers included in the adjacency set where the i-th script identifier is located, or the adjacency set corresponding to the i-th script identifier is an empty set, exit the loop, or for the corresponding first script Identify the next script identifier in the adjacency set, and perform the judging whether the i-th script identifier in the adjacency set corresponding to the first script identifier exists in the first dependent task set.
  • the output unit 74 is further configured to:
  • a first script identifier is added to the first set of dependent tasks.
  • each unit included in the electronic equipment can be controlled by a processor in the electronic equipment, such as a central processing unit (CPU, Central Processing Unit), a digital signal processor (DSP, Digital Signal Processor), a micro control unit (MCU, Microcontroller) Unit) or programmable gate array (FPGA, Field-Programmable Gate Array) and other implementations.
  • a processor in the electronic equipment such as a central processing unit (CPU, Central Processing Unit), a digital signal processor (DSP, Digital Signal Processor), a micro control unit (MCU, Microcontroller) Unit) or programmable gate array (FPGA, Field-Programmable Gate Array) and other implementations.
  • CPU central processing unit
  • DSP Digital Signal Processor
  • MCU Microcontroller
  • FPGA Field-Programmable Gate Array
  • the electronic device provided in the above embodiment performs data processing, it only uses the division of the above-mentioned program modules as an example. In practical applications, the above-mentioned processing can be assigned to different program modules to complete according to needs. The internal structure of the device is divided into different program modules to complete all or part of the processing described above.
  • the electronic device and the data processing method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, and will not be repeated here.
  • FIG. 8 is a schematic diagram of the hardware composition structure of the electronic device provided by the embodiment of the present application. As shown in FIG. 8, the electronic device 8 includes:
  • Communication interface 81 capable of exchanging information with other devices such as network devices;
  • the processor 82 is connected to the communication interface 81 to realize information interaction with other devices, and is configured to execute the data processing method provided by one or more of the above technical solutions when running a computer program. Instead, the computer program is stored on the memory 83 .
  • bus system 84 is used to realize connection and communication between these components.
  • bus system 84 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as bus system 84 in FIG. 8 for clarity of illustration.
  • the memory 83 in the embodiment of the present application is configured to store various types of data to support the operation of the electronic device 8 .
  • Examples of such data include: any computer program configured to operate on electronic device 8 .
  • the memory 83 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memories.
  • the non-volatile memory can be read-only memory (ROM, Read Only Memory), programmable read-only memory (PROM, Programmable Read-Only Memory), erasable programmable read-only memory (EPROM, Erasable Programmable Read-Only Memory) Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), Magnetic Random Access Memory (FRAM, ferromagnetic random access memory), Flash Memory (Flash Memory), Magnetic Surface Memory , CD, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface storage can be disk storage or tape storage.
  • the volatile memory may be random access memory (RAM, Random Access Memory), which is used as an external cache.
  • RAM random access memory
  • RAM Random Access Memory
  • many forms of RAM are available, such as Static Random Access Memory (SRAM, Static Random Access Memory), Synchronous Static Random Access Memory (SSRAM, Synchronous Static Random Access Memory), Dynamic Random Access Memory Memory (DRAM, Dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, Synchronous Dynamic Random Access Memory), double data rate synchronous dynamic random access memory (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), Synchronous Link Dynamic Random Access Memory (SLDRAM, SyncLink Dynamic Random Access Memory), Direct Memory Bus Random Access Memory (DRRAM, Direct Rambus Random Access Memory ).
  • the memory 83 described in the embodiments of the present application is intended to include, but not limited to, these and any other suitable types of memory.
  • the methods disclosed in the foregoing embodiments of the present application may be applied to the processor 82 or implemented by the processor 82 .
  • the processor 82 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 82 or instructions in the form of software.
  • the aforementioned processor 82 may be a general-purpose processor, DSP, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like.
  • the processor 82 may implement or execute various methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a storage medium, and the storage medium is located in the memory 83, and the processor 82 reads the program in the memory 83, and completes the steps of the foregoing method in combination with its hardware.
  • the processor 82 executes the program, it implements a corresponding process implemented by the terminal in each method of the embodiment of the present application. For the sake of brevity, details are not repeated here.
  • the embodiment of the present application also provides a storage medium, that is, a computer storage medium, specifically a computer-readable storage medium, for example, including a first memory 83 storing a computer program, and the above-mentioned computer program can be processed by the terminal
  • the device 82 is executed to complete the steps described in the foregoing method.
  • the computer-readable storage medium can be memories such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disc, or CD-ROM.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • the mutual coupling, or direct coupling, or communication connection between the various components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing module, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration
  • the unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
  • the term "and/or" in the embodiments of the present application is only an association relationship describing associated objects, which means that there may be three kinds of relationships, for example, A and/or B, which may mean that A exists alone , both A and B exist, and B exists alone.
  • the term "at least one" herein means any combination of any one or more of at least two of a plurality, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)

Abstract

本申请公开了一种数据处理方法、电子设备及存储介质,所述数据处理方法包括:从接收到的至少两个任务脚本中每个任务脚本的源码中,提取出每个任务脚本对应的采用结构化查询语言描述的第一文本;从提取的每个第一文本对应的抽象语法树中,确定出每个任务脚本对应的输入项和输出项;基于第一任务脚本对应的输入项与至少一个第二任务脚本中每个第二任务脚本对应的输出项之间的交集,确定出所述第一任务脚本对应的邻接关系集合;基于确定出的每个所述第一任务脚本对应的邻接关系集合,确定出所述至少两个任务脚本对应的至少一个有向边集合,并输出所述至少一个有向边集合中每个有向边集合对应的有向无环图。

Description

数据处理方法、电子设备及存储介质
相关申请的交叉引用
本申请基于申请号为202110671384.X,申请日为2021年6月17日,的中国专利申请提出,并要求上述中国专利申请的优先权,上述中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机技术领域,具体涉及一种数据处理方法、电子设备及存储介质。
背景技术
随着计算机技术的发展,越来越多的技术(例如,大数据等)应用在金融领域,传统金融业正在逐步向金融科技转变,然而,由于金融行业的安全性、实时性要求,金融科技也对技术提出了更高的要求。金融科技领域下,在大数据平台对应的服务器处理批量任务的应用场景下,终端将任务和任务对应的配置文件发送至服务器,配置文件用于指定任务之间的依赖关系,以便服务器基于接收到的配置文件,确定出批量任务中各任务的执行顺序。然而,相关技术中,需要人工设置每个任务对应的配置文件,不仅效率低,且容易出错,可能导致服务器确定出的批量任务的执行顺序不正确。
发明内容
为解决相关技术问题,本申请实施例提供了一种数据处理方法、电子设备及存储介质。
本申请实施例提供了一种数据处理方法,包括:
从接收到的至少两个任务脚本中每个任务脚本的源码中,提取出每个任务脚本对应的采用结构化查询语言描述的第一文本;
从提取的每个第一文本对应的抽象语法树中,确定出每个任务脚本对应的输入项和输出项;
基于第一任务脚本对应的输入项与至少一个第二任务脚本中每个第二任务脚本对应的输出项之间的交集,确定出所述第一任务脚本对应的邻接关系集合;
基于确定出的每个所述第一任务脚本对应的邻接关系集合,确定出所述至少两个任务脚本对应的至少一个有向边集合,并输出所述至少一个有向边集合中每个有向边集合对应的有向无环图;其中,
所述第一任务脚本和所述第二任务脚本为所述至少两个任务脚本中不同的任务脚本;邻接关系集合中的邻接关系表征第一任务脚本的邻接依赖任务和对应的交集;所述有向边集合中的有向边表征每两个任务脚本之间的依赖关系。
本申请实施例还提供了一种电子设备,包括:
提取单元,配置为从接收到的至少两个任务脚本中每个任务脚本的源码中,提 取出每个任务脚本对应的采用结构化查询语言描述的第一文本;
第一确定单元,配置为从提取的每个第一文本对应的抽象语法树中,确定出每个任务脚本对应的输入项和输出项;
第二确定单元,配置为基于第一任务脚本对应的输入项与至少一个第二任务脚本中每个第二任务脚本对应的输出项之间的交集,确定出所述第一任务脚本对应的邻接关系集合;
输出单元,配置为基于确定出的每个所述第一任务脚本对应的邻接关系集合,确定出所述至少两个任务脚本对应的至少一个有向边集合,并输出所述至少一个有向边集合中每个有向边集合对应的有向无环图;其中,
所述第一任务脚本和所述第二任务脚本为所述至少两个任务脚本中不同的任务脚本;邻接关系集合中的邻接关系表征第一任务脚本的邻接依赖任务和对应的交集;所述有向边集合中的有向边表征每两个任务脚本之间的依赖关系。
本申请实施例还提供了一种电子设备,处理器和配置为存储能够在处理器上运行的计算机程序的存储器,其中,所述处理器配置为运行所述计算机程序时,执行上述数据处理方法的步骤。
本申请实施例还提供了一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述数据处理方法的步骤。
本申请实施例中,从批量的任务脚本中每个任务脚本的源码,确定出每个任务脚本对应的输入项和输出项;基于第一任务脚本的输入项与第二任务脚本的输出项之间的交集,确定出第一任务脚本对应的邻接关系集合;基于确定出的每个第一任务脚本对应的邻接关系集合,确定出批量的任务脚本对应的至少一个有向边集合,输出至少一个有向边集合中每个有向边集合对应的有向无环图。本方案中通过任务脚本的源码即可确定出任务脚本之间的依赖关系,不需要人工配置任务脚本之间的依赖关系,提高了确定任务脚本之间的依赖关系的效率,降低了出错率。另外,由于有向无环图中每两个任务脚本之间的有向边,表征了对应的两个任务脚本的执行顺序,因此,电子设备可以基于有向无环图准确地确定出批量的任务脚本对应的执行顺序。
附图说明
图1为本申请实施例提供的数据处理方法的实现流程示意图;
图2为本申请实施例提供的数据处理方法中确定邻接关系集合的实现流程示意图;
图3为本申请实施例提供的一种有向无环图的示意图;
图4为本申请实施例提供的数据处理方法中确定任务脚本的执行顺序的示意图;
图5为本申请实施例提供的数据处理方法中确定有向无环图的实现流程示意图;
图6为本申请应用实施例提供的数据处理方法中更新依赖任务集合和有向边集合的实现流程示意图;
图7为本申请实施例提供的电子设备的结构示意图;
图8为本申请实施例提供的电子设备的硬件组成结构示意图。
具体实施方式
图1为本申请实施例提供的数据处理方法的实现流程示意图,其中,流程的执行主体为终端设备、服务器等电子设备。如图1示出的,数据处理方法包括:
步骤101:从接收到的至少两个任务脚本中每个任务脚本的源码中,提取出每个任务脚本对应的采用结构化查询语言描述的第一文本。
在实际应用中,电子设备在执行批量的任务脚本的情况下,需要确定出批量的任务脚本对应的执行顺序。批量的任务脚本由至少两个任务脚本构成。
考虑到在实际应用中可能采用不同的编程语言编写任务脚本,为了准确地提取出任务脚本对应的输入项和输出项,电子设备将任务脚本的源码转换为采用结构化查询语言(SQL,Structured Query Language)描述的第一文本。实现过程如下:
电子设备基于每个任务脚本的后缀名,判断对应的任务脚本是否为SQL编写的任务脚本,得到第一判断结果。其中,在任务脚本的后缀名为sql的情况下,表征该任务脚本为采用SQL编写的任务脚本。
在第一判断结果为第一任务脚本为采用SQL编写的任务脚本的情况下,读取第一任务脚本中包括的源码,得到第一任务脚本对应的第一文本。第一任务脚本为至少两个任务脚本中的任一任务脚本。
在第一判断结果表征第一任务脚本不是采用SQL编写的任务脚本的情况下,通过设定的正则表达式,从第一任务脚本包括的源码中提取出SQL文本内容,得到第一文本;其中,设定的正则表达式用于提取SQL文本内容。
步骤102:从提取的每个第一文本对应的抽象语法树中,确定出每个任务脚本对应的输入项和输出项。
这里,电子设备按设定的分隔符,将每个任务脚本对应的第一文本中的内容分割成多条独立的SQL语句;对每个第一文本对应的分割得到的SQL语句进行格式规整,得到对应的第一文本对应的规整后的SQL语句,例如,采用设定字符替换SQL语句中的格式占位符等干扰字符;将每个任务脚本对应的第一文本对应的规整后的SQL语句转换成抽象语法树(AST,Abstract Syntax Tree),从而得到每个任务脚本对应的抽象语法树。AST是源码语法结构的一种抽象表示,AST以树状的形式表现编程语言的语法结构,树上的每个节点都表示源码中的一种语法结构。
实际应用时,设定的分隔符可以为封号(;)。
需要说明的是,在任务脚本的源码发生变更的情况下,电子设备在确定出SQL语句的情况下,从确定出的SQL语句中过滤掉无数据变更的SQL语句,从而得到有数据变更的SQL语句,并基于有数据变更的SQL语句更新该任务脚本对应的抽象语法树。有数据变更是指输入项和/或输出项发生变更。
在确定出每个任务脚本对应的抽象语法树的情况下,电子设备从任务脚本对应的抽象语法树中识别出第一语句和第二语句,从识别出的第一语句中确定出该任务脚本对应的输入项,从识别出的第二语句中确定出该任务脚本对应的输出项。任务脚本的输入项表征该任务脚本中涉及的函数对应的输入参数;任务脚本的输出项表征该任务脚本中涉及的函数对应的输出参数。
实际应用时,第一语句为查询语句(Select Statement),第二语句为插入语句(Insert Statement)。
实际应用时,任务脚本用于执行跟数据库相关的数据查询、数据更新等操作;输出项和输出项均为数据表。也就是说,输入项为查询语句中包括的数据表,输出项为插入语句中包括的数据表。例如,从任务脚本对应的抽象语法树中识别出的第一语句为select T1.XXX,T2.XXX from T1,T2 where T1.XXX=T2.XXX,识别出的第二语句为insert overwrite table T3的情况下,该任务脚本对应的输入项为数据表T1和T2,输出项为数据表T3。
本实施例中,可以根据任务脚本的编程语言类型的不同,采用不同方式从对应的任 务脚本中提取出SQL文本内容,基于提取出的SQL文本内容确定出对应的任务脚本对应的抽象语法树,并基于任务脚本对应的抽象语法树中包括的第一语句和第二语句,分别确定出该任务脚本的输入项和输出项。由此,可以完整且准确地提取出每个任务脚本对应的输入项和输出项。
步骤103:基于第一任务脚本对应的输入项与至少一个第二任务脚本中每个第二任务脚本对应的输出项之间的交集,确定出所述第一任务脚本对应的邻接关系集合。
其中,第一任务脚本与第二任务脚本不同,第一任务脚本和第二任务脚本均是泛指接收到的至少两个任务脚本中的任一任务脚本。邻接关系集合中的任一邻接关系表征第一任务脚本的邻接依赖任务和对应的交集。
在实际应用中,针对接收到每个任务脚本,电子设备可以将任一任务脚本Jx的输入项与除任务脚本Jx之外的每个任务脚本Jy的输出项进行比较,从而判断任务脚本Jx的输入项与任务脚本Jy的输出项是否存在交集,得到第一判断结果。
在第一判断结果表征任务脚本Jx的输入项与任务脚本Jy的输出项不存在交集的情况下,表征任务脚本Jy不是任务脚本Jx的邻接依赖任务,任务脚本Jx和任务脚本Jy之间不存在邻接关系。
在第一判断结果表征任务脚本Jx的输入项与任务脚本Jy的输出项存在交集的情况下,表征任务脚本Jy是任务脚本Jx的邻接依赖任务,基于该交集确定出任务脚本Jx对应的其中一个邻接关系。
按照此方法,电子设备可以确定出任务脚本Jx对应的所有邻接关系,得到任务脚本Jx对应的邻接关系集合。
在实际应用中,邻接关系的格式可以为:第一任务脚本的邻接依赖任务的脚本标识:对应的交集。当任务脚本J1的输入项与任务脚本J2的输出项存在交集T1和T2,任务脚本J1的输入项与任务脚本J4的输出项存在交集T1的情况下,任务脚本J1对应的邻接关系集合为{J2:T1,J2:T2,J4:T1}。为了更快且准确地确定出每个任务脚本对应的邻接关系,以及更快且准确地基于每个任务脚本对应的邻接关系确定出任务脚本之间的依赖关系,可以采用第一数据表存储每个任务脚本对应的输入项和输出项,以及采用第二数据表存储每个任务脚本对应的邻接关系。如图2所示,在一些实施例中,所述基于第一任务脚本对应的输入项与至少一个第二任务脚本中每个第二任务脚本对应的输出项之间的交集,确定出所述第一任务脚本对应的邻接关系集合,包括:
步骤201:将每个任务脚本的脚本标识、对应的输入项集合和输出项集合,关联写入第一数据表;
步骤202:基于所述第一数据表中第一任务脚本对应的输入项集合与第二任务脚本对应的输出项集合之间的交集,确定出所述第一任务脚本对应的邻接关系;
步骤203:将所述第一任务脚本的脚本标识和对应确定出的邻接关系,关联写入第二数据表;其中,所述第二数据表用于关联存储脚本标识和邻接关系集合。
这里,电子设备在确定出的每个任务脚本对应的输入项和输出项的情况下,将每个任务脚本的脚本标识、对应的输入项集合和输出项集合,关联写入第一数据表。实际应用时,第一数据表至少包括脚本标识、脚本标识对应的输入项集合和输出项集合。考虑到任务脚本中的源码可能会发生变更,而在任务脚本中的输入项和/或输出项发生变更的情况下,需要同步更新任务脚本对应的邻接关系,因此,为了准确地确定出任务脚本对应的邻接关系,第一数据表还包括表征变更输入项集合的第一时间和表征变更输出项集合的第二时间。
在一实施例中,第一数据表如下:
脚本 输入项集 输出项集 第一时间 第二时间
标识 合(Cin) 合(Cout) (in_update_time) (out_update_time)
J1 {T1,T2} {T3,T4} yyyymmdd:HH:MM:SS yyyymmdd:HH:MM:SS
J2 {T5,T6} {T1,T2}
J3 {T8} {T5,T10}
J4 {T9,T10} {T1,T6}
J5 {} {T8,T9}
Jn
如第一数据表所示,任务脚本J1的输入项集合由T1和T2构成,输出项集合由T3和T4构成;任务脚本J2的输入项集合由T5和T6构成,输出项集合由T1和T2构成;任务脚本J3的输出项集合包括T8,输出项集合由T5和T10构成;任务脚本J4的输入项集合由T9和T10构成,输出项集合由T1和T6构成;任务脚本J5的输入项集合为空集,输出项集合由T2和T9构成。
电子设备在将每个任务脚本的脚本标识、对应的输入项集合和输出项集合,关联写入第一数据表的情况下,针对第一数据表中的任意两个任务脚本Jx和Jy,在Jx的Cin与Jy的Cout存在交集的情况下,表征Jy是Jx的邻接依赖任务,使用Jx.rel=Jy:Tz表示Jx通过Tz邻接依赖Jy;即,Jy:Tz表征任务脚本Jx对应的一个邻接关系;将Jx和对应确定出的邻接关系,关联写入第二数据表;按照这种方式,确定出每个任务脚本对应的所有邻接关系,并将确定出的所有邻接关系写入第二数据表。其中,
Jx.rel={Jy:Tz}(Tz∈Jx.Cin∩Jy.Cout,x=1、2、3…n;y=1、2、3…n,x≠y);Jx.Cin∩Jy.Cout表征Jx的Cin与Jy的Cout的交集。
实际应用时,电子设备每确定出一个邻接关系,将确定出的邻接关系写入第二数据表。第二数据表用于关联存储脚本标识和邻接关系集合。第二数据表中包括脚本标识和对应的邻接关系集合。考虑到在任务脚本中的输入项和/或输出项发生变更的情况下,需要同步更新任务脚本对应的邻接关系,由于在更新了任务脚本对应的邻接关系的情况下,可能导致任务脚本对应的邻接依赖任务发生变更,而邻接依赖任务发生更新的情况下,需要更新对应的有向无环图,因此,为了方便确定是否需要更新对应的有向无环图,第二数据表还包括表征变更邻接关系的第三时间和表征变更邻接依赖任务的第四时间。在一实施例中,第二数据表如下:
Figure PCTCN2021140176-appb-000001
其中,在第二数据表中J1对应的邻接关系集合由3个邻接关系构成。第三时间对应于邻接关系集合最后一次变更的时间,第四时间对应于邻接依赖任务最后一次变更的时间。需要说明的是,即使任务脚本对应的邻接关系集合中的邻接关系发生变更,但是该任务脚本对应的邻接关系集合中包括的邻接依赖任务无变更的情况下,该任务脚本对应的第四时间不更新。例如,在J1的邻接关系由{J2:T1,J2:T2,J4:T1}变更为{J2:T1,J2:T3,J4:T1}的情况下,由于J1对应的邻接依赖任务仍然为J2和J4,此时,J1对应的 第四时间不更新。
考虑到在实际应用中用户可能在批量的任务脚本中新增任务脚本和/或删除任务脚本,需要对第一数据表和/或第二数据表中的相关数据进行更新,以保证确定出的邻接关系集合的准确性。在一些实施例中,在将任务脚本的相关数据写入第一数据表和第二数据表之后,所述方法还包括以下至少之一:
在接收到针对第三任务脚本的删除指令的情况下,从所述第一数据表和所述第二数据表中删除所述第三任务脚本的脚本标识对应的数据,以及删除包含所述第三任务脚本的脚本标识的邻接关系;
在接收到新增的第三任务脚本的情况下,将所述第三任务脚本的脚本标识、对应的输入项集合和输出项集合,关联写入第一数据表,以及将所述第三任务脚本的脚本标识和对应的邻接关系集合,关联写入第二数据表。
这里,电子设备在接收到针对第三任务脚本的删除指令的情况下,从第一数据表中删除第三任务脚本的脚本标识和对应的输入项集合、输出项集合、第一时间、第二时间;从第二数据表中删除第三任务脚本的脚本标识和对应的邻接关系集合、第三时间、第四时间,以及包含第三任务脚本的脚本标识的邻接关系。例如,在第三任务脚本的脚本标识为J3的情况下,从第一数据表中删除J3、{T8}、{T5、T10}、J3对应的第一时间和第二时间;从第二数据表中删除J3、{J3:T10,J5:T9}、J3对应的第三时间和第四时间,还从第二数据表中删除J3:T5和J3:T10。
在第三任务脚本为新增的任务脚本的情况下,电子设备按照上述方法确定出第三任务脚本对应的所有输入项和所有输出项;将第三任务脚本的脚本标识、对应的所有输入项和所有输出项,写入第一数据表;将写入第三任务脚本对应的最后一个输入项的时间作为对应的第一时间,并将该第一时间写入第一数据表,以及将写入第三任务脚本对应的最后一个输出项的时间作为对应的第二时间,并将该第二时间写入第一数据表。电子设备基于上述方法确定出第三任务脚本对应的邻接关系,将第三任务脚本的脚本标识对应的所有邻接关系,写入第二数据表;将写入第三任务脚本对应的最后一个邻接关系的时间作为对应的第三时间和对应的第四时间,将该第三时间和该第四时间对应写入第二数据表。
考虑到在实际应用中用户可能修改已提交过的任务脚本中的输入项和/或输出项,此时,需要对第一数据表和/或第二数据表中的相关数据进行更新,以保证确定出的邻接关系集合的准确性。在一些实施例中,在将任务脚本的相关数据写入第一数据表和第二数据表之后,所述方法还包括以下至少之一:
在第一任务脚本的源码发生变更的情况下,更新所述第一数据表中第一任务脚本对应的输入项集合、输出项集合、第一时间和第二时间中的至少一组;
在第一任务脚本对应的第三时间早于对应的第一时间的情况下,基于对应更新后的输入项集合和除第一任务脚本之外的任务脚本对应的输出项集合之间的交集,更新所述第二数据中第一任务脚本对应的邻接关系集合和第四时间中的至少之一;
在第一任务脚本对应的第三时间早于对应的第二时间的情况下,基于对应更新后的输出项集合和第一任务脚本的邻接依赖任务对应的输入项集合之间的交集,更新所述第二数据中对应的邻接依赖任务对应的邻接关系集合和第四时间中的至少之一;
将第一任务脚本对应的第三时间,更新为对应的第一时间和对应的第二时间中的最大值;其中,
第四时间在对应更新后的邻接关系集合中的邻接依赖任务发生变更的情况下更新。
这里,输入项集合和第一时间为一组,输出项集合和第二时间为一组。电子设备通过比较第一任务脚本对应的第三时间与对应的第一时间,来判断第一数据表中对应的输 入项是否发生变更;以及通过比较第一任务脚本对应的第三时间与对应的第二时间,来判断第一数据表中对应的输出项是否发生变更。其中,在第一任务脚本对应的第三时间早于对应的第一时间的情况下,表征第一数据表中对应的输入项发生变更;在第一任务脚本对应的第三时间等于或晚于对应的第一时间的情况下,表征第一数据表中对应的输入项未发生变更;在第一任务脚本对应的第三时间早于对应的第二时间的情况下,表征第一数据表中对应的输出项发生变更;在第一任务脚本对应的第三时间等于或晚于对应的第二时间的情况下,表征第一数据表中对应的输出项未发生变更。
电子设备在检测到至少两个任务脚本中的第一任务脚本的源码发生变更的情况下,按照上述方法确定出变更后的第一任务脚本对应的输入项集合和输出项集合,并比较确定出的输入项集合和第一数据表中第一任务脚本对应的输入项集合,从而判断变更后的第一任务脚本对应的输入项集合中的输入项是否发生变更;比较确定出的输出项集合和第一数据表中第一任务脚本对应的输出项集合,从而判断变更后的第一任务脚本对应的输出项集合中的输出项是否发生变更。在第一任务脚本对应的输入项和/或输出项发生变更的情况下,执行以下至少之一:
在第一任务脚本对应的输入项发生变更的情况下,更新第一数据表中第一任务脚本的脚本标识对应的输入项集合和第一时间。
在第一任务脚本对应的输出项发生变更的情况下,更新第一数据表中第一任务脚本的脚本标识对应的输出项集合和第二时间。
将第一时间与第三数据表中第一任务脚本的脚本标识对应的第三时间进行比较,由于在更新了第一数据表中对应的输入项集合和第一时间的情况下,第一任务脚本对应的第三时间早于对应的第一时间,电子设备按照上述方法,基于对应更新后的输入项集合和除第一任务脚本之外的任务脚本对应的输出项集合之间的交集,确定出更新后的第一任务脚本对应的邻接关系集合;将确定出的邻接关系集合与第二数据表中第一任务脚本对应的邻接关系集合进行比较,在两者不同的情况下,表征第一任务脚本对应的邻接关系集合发生变更,此时,将第二数据表中第一任务脚本对应的邻接关系集合,替换为确定出的邻接关系集合,并更新第一任务脚本对应的第三时间;以及基于第一任务脚本对应的两个邻接关系集合,判断第一任务脚本对应的邻接依赖任务是否发生变更,在第一任务脚本对应的邻接依赖任务发生变更的情况下,更新第二数据表中第一任务脚本对应的第四时间。其中,在第一任务脚本对应的邻接关系集合中的邻接关系未发生变更的情况下,不需要更新第二数据表中第一任务脚本对应的第三时间。在第一任务脚本对应的邻接依赖任务未发生变更的情况下,不需要更新第二数据表中第一任务脚本对应的第四时间。
将第二时间与第三数据表中第一任务脚本的脚本标识对应的第四时间进行比较,由于在更新了第一数据表中对应的输出项集合和第二时间的情况下,第一任务脚本对应的第四时间早于对应的第二时间,电子设备按照上述方法,确定出对应更新后的输出项集合和第一数据表中除第一任务脚本之外的任务脚本对应的输入项集合之间是否存在交集,在第一任务脚本对应的更新后的输出项集合和任一任务脚本对应的输入项集合之间存在交集的情况下,表征该任务脚本邻接依赖第一任务脚本,也就是说,第一任务脚本为该任务脚本的邻接依赖任务,此时,将由第一任务脚本和该交集表征的邻接关系,更新至该任务脚本对应的邻接关系集合中,并在该任务脚本对应的邻接依赖任务发生变更的情况下,更新第二数据表中该任务脚本对应的第四时间。
在第一任务脚本在第一数据表中对应的第一时间晚于第二时间的情况下,将第一任务脚本在第二数据表中对应的第三时间更新为对应的第一时间;在第一任务脚本在第一数据表中对应的第二时间晚于第一时间的情况下,将第一任务脚本在第二数据表中对应 的第三时间更新为对应的第二时间;在第一任务脚本在第一数据表中对应的第二时间等于第一时间的情况下,将第一任务脚本在第二数据表中对应的第三时间更新为对应的第一时间或第二时间。
下面以脚本标识为Jx的任务脚本中的源码发生变更为例,说明更新第二数据表的相关数据的实现过程:
判断Jx在第一数据表中对应的Cin和Cout是否有变更,即,判断是否满足(Jx.rel_upate_time≥Jx.in_update_time)和(Jx.rel_upate_time≥Jx.out_update_time)这两个条件;其中,Jx.rel_upate_time表征Jx对应的第三时间(rel_upate_time);Jx.in_update_time表征Jx对应的第一时间;Jx.out_update_time表征Jx对应的第二时间。
在满足(Jx.rel_upate_time≥Jx.in_update_time)和(Jx.rel_upate_time≥Jx.out_update_time)这两个条件的情况下,表征Jx在第一数据表中对应的Cin和Cout均没有变更,此时,不需要更新第二数据表;
在Jx.rel_upate_time<Jx.in_update_time的情况下,表征Jx在第一数据表中对应的Cin发生变更,此时,更新Jx.rel={Jy:Tz}(Tz∈Jx.Cin∩Jy.Cout,y=1、2、3…n,y≠x);在Jx.rel的更新导致Jx的邻接依赖任务发生变更的情况下,将Jx对应的第四时间替换为Jx.in_update_time;例如,当Jx对应的邻接关系集合由{J2:T1,J2:T2,J4:T1}变更为{J2:T1,J2:T2}的情况下,表征Jx对应的邻接依赖任务有变更;当Jx对应的邻接关系集合由{J2:T1,J2:T2,J4:T1}变更为{J2:T1,J4:T1}的情况下,表征Jx对应的邻接依赖任务没有变更,不需要更新第三数据表中对应的第四时间。
在Jx.rel_upate_time<Jx.out_update_time的情况下,表征Jx在第一数据表中对应的Cout有变更,电子设备更新第二数据表中所有邻接依赖Jx的任务脚本对应的邻接关系Jy.rel,Jy.rel={Jx:Tz}(Tz∈Jy.Cin∩Jx.Cout,y=1、2、3…n,y≠x),并且在Jy.rel的更新导致Jy的邻接依赖任务发生变更的情况下,将Jy对应的第四时间替换为Jx.out_update_time。
将Jx对应的第三时间,替换为Jx.in_update_tim和Jx.out_update_tim中的最大值,即Jx.rel_upate_time=MAX(Jx.in_update_tim,Jx.out_update_tim)。
步骤104:基于确定出的每个所述第一任务脚本对应的邻接关系集合,确定出所述至少两个任务脚本对应的至少一个有向边集合,输出所述至少一个有向边集合中每个有向边集合对应的有向无环图;其中,
所述有向边集合中的有向边表征每两个任务脚本之间的依赖关系。
这里,电子设备确定出的每个第一任务脚本对应的邻接关系集合中的邻接关系,确定出相互间具有邻接依赖关系的一组任务脚本,得到第一依赖任务集合;基于第一依赖任务集合中每个脚本标识对应的邻接关系集合中的每个邻接关系,确定出对应的有向边,从而得到由确定出的有向边构成的第一有向边集合;将第一有向边集合中入度为零的脚本标识,确定为第一有向边集合对应的有向无环图的起点,按照第一有向边集合中包括的有向边,将第一依赖任务集合中包括的脚本标识进行有向连接,得到第一有向边集合对应的有向无环图,并输出第一有向边集合对应的有向无环图。其中,
第一有向边集合表征对应的第一依赖任务集合中每两个具有邻接依赖关系的任务脚本;
一个邻接关系确定出一条有向边,一条有向边由邻接关系中的脚本标识(邻接依赖任务的脚本标识)指向对应的第一任务脚本的脚本标识。例如,在J1对应的邻接关系集合为{J2:T1,J2:T2,J4:T1}的情况下,由J1对应的邻接关系集合确定出的有向边包括<J2,J1>和<J4,J1>。
第一依赖任务集合中包括第一脚本标识、第二脚本标识和第三脚本标识;第一脚本 标识为第二数据表中第一任务脚本的脚本标识,第二脚本标识表征第一任务脚本的脚本标识对应的邻接关系集合中包括的脚本标识,第三脚本标识表征第二脚本标识对应的邻接关系集合中包括的脚本标识。
第一有向边集合包括第一子集、第二子集和第三子集。
第一子集是由第一脚本标识对应的邻接关系集合中的邻接关系确定出的有向边集合;第二子集是由对应的第二脚本标识的邻接关系集合中的邻接关系确定出的有向边集合;第三子集是由对应的第三脚本标识对应的邻接关系集合中的邻接关系确定出的有向边集合。
对确定出的所有第一集合进行去重处理,以删除重复的第一集合,得到去重后的第一集合,基于第一集合对应的第二集合中每两个具有依赖关系的任务脚本,确定出第一集合对应的有向边,基于确定出的有向边,输出对应的第一集合对应的有向无环图。其中,有向无环图的数量与去重后的第一集合的数量相同。一条有向边表征两个任务脚本之间的依赖关系。
下面以上文第二数据表中的数据为例,确定出一个第一依赖任务集合和对应的第一有向边集合,第一依赖任务集合为{J1、J2、J3、J4和J5},第一有向边集合为{<J2,J1>,<J4,J1>,<J3,J2>,<J4,J2>,<J5,J3>,<J3,J4>,<J5,J4>},其中,第一有向边的第一子集为{<J2,J1>,<J4,J1>},第二子集为{<J3,J2>,<J4,J2>},第二子集为{<J5,J3>,<J3,J4>,<J5,J4>}。电子设备对应输出的有向无环图如图3所示。
在一些实施例中,在步骤104之后,所述方法还包括:
基于输出的有向无环图,确定出所述至少两个任务脚本对应的任务脚本的执行顺序,按照确定出的执行顺序,执行所述至少两个任务脚本。
这里,电子设备基于每个有向无环图中每两个任务脚本之间的有向边,确定出每个有向无环图中的任务脚本的执行顺序;并基于确定出的执行顺序,执行批量的任务脚本。
需要说明的是,当批量的任务脚本包括至少两个有向无环图时,电子设备可以并行执行不同的有向无环图对应的任务脚本。
在实际应用中,电子设备按照以下方式确定出有向无环图对应的任务脚本的执行顺序:
1、在有向无环图中找出第k个不依赖其他任务脚本的目标任务脚本,并输出第k个目标任务脚本的标识;
2、删除第k个目标任务脚本;
3、在k小于有向无环图中包含的任务脚本的总数的情况下,将k赋值为k+1,并执行1;
4、在k等于有向无环图中包含的任务脚本的总数的情况下,结束。
其中,基于图3所示的有向无环图,确定任务脚本的执行顺序的示意图如图4所示,该有向无环图对应的任务脚本的执行顺序为。
本申请实施例中,从批量的任务脚本中每个任务脚本的源码,确定出每个任务脚本对应的输入项和输出项;基于第一任务脚本的输入项与第二任务脚本的输出项之间的交集,确定出第一任务脚本对应的邻接关系集合;基于确定出的每个第一任务脚本对应的邻接关系集合,确定出批量的任务脚本对应的至少一个有向边集合,输出至少一个有向边集合中每个有向边集合对应的有向无环图。本方案中通过任务脚本的源码即可确定出任务脚本之间的依赖关系,不需要人工配置任务脚本之间的依赖关系,提高了确定任务脚本之间的依赖关系的效率,降低了出错率。另外,由于有向无环图中每两个任务脚本之间的有向边,表征了对应的两个任务脚本的执行顺序,因此,电子设备可以基于有向无环图准确地确定出批量的任务脚本对应的执行顺序。
另外,由于有向无环图能够准确地反映出任务脚本之间的依赖关系,电子设备基于有向无环图确定出任务脚本的执行顺序,可以提高确定任务脚本的执行顺序的准确度和效率,基于准确的执行顺序执行任务脚本,可以保证执行结果的准确性。
在实际应用中,电子设备在生成第一有向边集合对应的有向无环图的情况下,可以将脚本标识、对应的依赖任务集合、对应的有向边集合和表征变更有向无环图的第五时间关联写入第三数据表中,以便后续获取到的批量任务脚本对应的依赖任务集合和有向边集合,与第三数据表中存储的依赖任务集合和对应的有向边集合均相同的情况下,可以输出数据库中对应的有向无环图,不需要重新生成对应的有向无环图,提升了输出有向无环图的效率。
其中,第五时间为生成或更新对应的有向边集合对应的有向无环图的时间。
示例性地,由第二数据表的数据得到的第三数据表如下所示:
Figure PCTCN2021140176-appb-000002
需要说明的是,在一些实施例中,第三数据表中的依赖任务集合可以表征任务脚本的执行顺序,也就是说,第三数据表中J1对应的依赖任务集合可以为[J5、J3、J4、J2、J1]。
在将脚本标识、对应的依赖任务集合、对应的有向边集合和表征变更有向无环图的第五时间关联写入第三数据表的基础上,如图5所示,在一些实施例中,所述基于确定出的每个所述第一任务脚本对应的邻接关系集合,确定出所述至少两个任务脚本对应的至少一个有向边集合,输出所述至少一个有向边集合中每个有向边集合对应的有向无环图,包括:
步骤501:在第三数据表存储的脚本标识中查找所述第二数据表存储的脚本标识中的第一脚本标识;其中,所述第三数据表用于关联存储脚本标识、依赖任务集合、有向边集合和表征变更有向无环图的第五时间。
这里,电子设备在确定出每个任务脚本对应的邻接关系集合,并将任务脚本对应的脚本标识和邻接关系,关联写入第二数据表的情况下,在第三数据表存储的脚本标识中查找第二数据表存储的脚本标识中的第一脚本标识。
在第三数据表中查找到第二数据表中的第一脚本标识的情况下,表征在此之前已生成了第一脚本标识对应的有向边集合对应的有向无环图,执行步骤502;在第三数据表中未查找到第二数据表中的第一脚本标识的情况下,表征在此之前未生成第一脚本标识对应的有向边集合对应的有向无环图,第三数据表中也不存在第一脚本标识对应的依赖任务集合和有向边集合,执行步骤504。
步骤502:在所述第三数据表中查找到第一脚本标识的情况下,检测所述第三数据表中第一脚本标识对应的第五时间是否等于或晚于对应的第四时间。
这里,在第三数据表中第一脚本标识对应的第五时间等于或晚于对应的第四时间,表征第一脚本标识对应的有向边集合对应的有向无环图,是在变更了第一脚本标识对应的邻接关系集合表征的邻接依赖关系之后生成的,该有向无环图为最新的有向无环图,执行步骤503。
在第三数据表中第一脚本标识对应的第五时间早于对应的第四时间的情况下,表征数据库中存储的第一脚本标识对应的有向边集合对应的有向无环图,是在变更第一脚本 标识对应的邻接关系集合表征的邻接依赖关系之前生成的,需要重新生成对应的有向无环图,此时,执行步骤504。
步骤503:在第一脚本标识对应的第五时间等于或晚于对应的第四时间的情况下,输出数据库中与第一脚本标识关联存储的有向无环图。
这里,电子设备从数据库中获取第一脚本标识对应的有向边集合对应的有向无环图,并输出获取到的有向无环图。
步骤504:在所述第三数据表中未查找到第一脚本标识,或第一脚本标识对应的第五时间早于对应的第四时间的情况下,基于确定出的每个所述第一任务脚本对应的邻接关系集合,更新所述三数据表中第一脚本标识对应的依赖任务集合和有向边集合,并基于第一脚本标识对应的更新后的依赖任务集合和有向边集合,输出对应的有向无环图。
这里,在第三数据表中未查找到第一脚本标识的情况下,基于确定出的每个第一任务脚本对应的邻接关系集合,按照上述方法确定出对应的依赖关系集合和对应的有向边集合,并基于确定出的依赖关系集合和有向边集合,生成对应的有向无环图,并将第一脚本标识、对应的邻接关系集合、对应的有向边集合、生成对应的有向无环图的时间关联写入第三数据表。其中,生成对应的有向无环图具体实现过程请参照步骤104中的相关描述,此处不赘述。
在第三数据表中查找到第一脚本标识,且第一脚本标识对应的第五时间早于对应的第四时间的情况下,更新第三数据表中第一脚本标识对应的邻接关系集合和对应的有向边集合,并基于更新后的依赖关系集合和有向边集合,重新生成对应的有向无环图,将第三数据表中第一脚本标识对应的第五时间,更新为重新生成对应的有向无环图的时间。
下面,以第一脚本标识为J1为例,说明基于第二数据表和第三数据表,输出对应的有向无环图的实现过程:
电子设备从第三数据表存储的脚本标识中,查找第二数据表中的脚本标识J1;在第三数据表存储的脚本标识中查找到脚本标识J1的情况下,表征在此之前已生成J1对应的依赖任务集合对应的有向无环图,此时,电子设备判断J1对应的第五时间是否等于或晚于对应的第四时间,在J1对应的第五时间等于晚于对应的第四时间的情况下,输出数据库中与J1关联存储的有向无环图。
在J1对应的第五时间早于对应的第四时间的情况下,此时,需要更新第三数据表中J1对应的依赖任务集合和有向边集合,并基于J1对应的更新后的依赖任务集合和有向边集合,输出对应的有向无环图;
在第三数据表存储的脚本标识中未查找到脚本标识J1的情况下,表征在此之前未生成J1对应的依赖任务集合对应的有向无环图,此时,需要基于第二数据表中存储的脚本标识对应的邻接关系集合,确定出J1对应的依赖任务集合和有向边集合,基于J1对应的依赖任务集合和有向边集合,生成对应的有向无环图,并将J1、对应的依赖任务集合、对应的有向边集合、生成对应的有向无环图的第五时间关联写入第三数据表。
在本实施例中,可以通过第三数据表中记录的脚本标识,来判断数据库中是否已存在对应的有向无环图,通过第三数据表中记录的第五时间,来判断数据库中存储的对应的有向无环图是否为最新的有向无环图;在数据库中不存在对应的有向无环图或对应的有向无环图不是最新的有向无环图时,生成对应的有向无环图;在数据库中存储有最新的有向无环图时,可以直接输出对应的有向无环图,不需要重新生成有向无环图,提高了输出有向无环图的效率。
图6为本申请应用实施例提供的数据处理方法中更新依赖任务集合和有向边集合的实现流程示意图。如图6所示,步骤504中,所述基于确定出的每个所述第一任务脚本 对应的邻接关系集合,更新所述三数据表中第一脚本标识对应的依赖任务集合和有向边集合,包括:
步骤601:在所述第三数据表中关联写入第一脚本标识、对应的第一依赖任务集合和对应的第一有向边集合的情况下,将由第一脚本标识对应的邻接关系集合确定出的有向边集合,添加至对应的第一有向边集合并去重;所述第一依赖任务集合中包括第一脚本标识。
这里,基于第一脚本标识对应的邻接关系集合确定出对应的有向边集合,并将第一脚本标识对应的有向边集合添加至第一有向边集合。其中,
基于邻接关系集合确定出对应的有向边集合的实现过程请参照步骤104中的相关描述,此处不赘述。
在第三数据表中查找到第一脚本标识,且第一脚本标识对应的第五时间早于对应的第四时间的情况下,电子设备由第一脚本标识对应的邻接关系集合确定出的有向边集合添加至对应的第一有向边集合之后,对第一有向边集合进行去重处理。
在第三数据表中未查找到第一脚本标识的情况下,电子设备由第一脚本标识对应的邻接关系集合确定出的有向边集合添加至对应的第一有向边集合之后,不需要对第一有向边集合进行去重处理。
实际应用中,在所述第三数据表中未查找到第一脚本标识的情况下,在步骤601之前还包括:
在所述第三数据表中关联写入第一脚本标识、对应的第一依赖任务集合和对应的第一有向边集合;其中,第一依赖任务集合和第一有向边集合均为空集;
将第一脚本标识添加至所述第一依赖任务集合。
这里,在第三数据表中未查找到第一脚本标识的情况下,电子设备将第一脚本标识写入第三数据表中用于记录脚本标识的位置,并在第三数据表中创建对应的第一依赖任务集合和对应的第一有向边集合,此时,第一依赖任务集合和第一有向边集合均为空集,将第一脚本标识添加至创建的第一依赖任务集合;基于第一脚本标识对应的邻接关系集合确定出对应的有向边集合,并将第一脚本标识对应的有向边集合添加至第一有向边集合。
电子设备在执行完步骤601之后,针对第二数据表中第一脚本标识对应的邻接关系集合中包括的脚本标识,按照步骤602至步骤608进行处理。
步骤602:判断所述第一依赖任务集合中是否存在第一脚本标识在所述第二数据表中对应的邻接关系集合中的第i个脚本标识。
这里,i为正整数,且i小于或等于第二数据表中第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数。
实际应用时,在首次执行步骤602时,i等于1。
其中,在第一依赖任务集合中存在第一脚本标识在第二数据表中对应的邻接关系集合中的第i个脚本标识的情况下,判断i是否小于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数。
在i小于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数的情况下,执行步骤603。
在第一依赖任务集合中不存在第一脚本标识在第二数据表中对应的邻接关系集合中的第i个脚本标识的情况下,执行步骤604。
步骤603:在所述第一依赖任务集合中存在第i个脚本标识,且i小于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数的情况下,将i赋值为i+1,执行所述判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚 本标识。
这里,在第一依赖任务集合中存在第一脚本标识对应的第i个脚本标识,且i小于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数的情况下,表征已将由第i个脚本标识所处的邻接关系集合确定出的有向边集合,添加至对应的第一有向边集合,此时,将i赋值为i+1,返回步骤602。
在i等于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数,或第i个脚本标识对应的邻接关系集合为空集的情况下,退出循环,或针对对应的第一脚本标识所处的邻接关系集合中的下一个脚本标识,执行步骤602。
其中,在i等于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数的情况下,判断第二数据表中对应的第一脚本标识所处的邻接关系集合中,是否存在下一个脚本标识。
在对应的第一脚本标识所处的邻接关系集合中存在下一个脚本标识的情况下,针对对应的第一脚本标识对应的邻接关系集合中的下一个脚本标识,执行步骤602。在对应的第一脚本标识所处的邻接关系集合中不存在下一个脚本标识的情况下,判断该邻接关系集合对应的第一脚本标识所处的邻接关系集合中是否存在下一个脚本标识;在该邻接关系集合对应的第一脚本标识所处的邻接关系集合中不存在下一个脚本标识的情况下,退出循环;在该邻接关系集合对应的第一脚本标识所处的邻接关系集合中存在下一个脚本标识的情况下,针对该邻接关系集合对应的第一脚本标识所处的邻接关系集合中存在下一个脚本标识执行步骤602。
步骤604:在所述第一依赖任务集合中不存在第i个脚本标识的情况下,判断所述第三数据表存储的脚本标识中是否包括第i个脚本标识。
这里,在第三数据表存储的脚本标识中包括第一脚本标识对应的第i个脚本标识的情况下,表征在此之前已生成了第一脚本标识对应的第i个脚本标识对应的有向边集合对应的有向无环图,执行步骤605;在第三数据表存储的脚本标识中不包括第一脚本标识对应的第i个脚本标识的情况下,表征在此之前从未生成第一脚本标识对应的第i个脚本标识对应的有向边集合对应的有向无环图,执行步骤607至步骤608。
步骤605:在所述第三数据表存储的脚本标识中包括第i个脚本标识的情况下,判断第i个脚本标识对应的第五时间是否等于或晚于对应的第四时间。
其中,在第一脚本标识对应的第i个脚本标识对应的第五时间等于或晚于对应的第四时间的情况下,表征第三数据表中第一脚本标识对应的第i个脚本标识对应的依赖任务集合和有向边集合均是最新的,执行步骤606。
在第一脚本标识对应的第i个脚本标识对应的第五时间早于对应的第四时间的情况下,表征需要更新第三数据表中第一脚本标识对应的第i个脚本标识对应的依赖任务集合和有向边集合,以重新生成对应的有向无环图,此时,执行步骤607至步骤608。
步骤606:在第i个脚本标识对应的第五时间等于或晚于对应的第四时间的情况下,将第i个脚本标识添加至所述第一依赖任务集合并去重,将第i个脚本标识在所述第三数据表中对应的有向边集合,添加至所述第一有向边集合并去重;将i赋值为i+1,执行所述步骤602。
这里,电子设备将第二数据表中第一脚本标识对应的第i个脚本标识,添加至第三数据表中第一脚本标识对应的第一依赖任务集合,并对第一依赖任务集合进行去重处理;从第三数据表中读取第一脚本标识对应的第i个脚本标识对应的有向边集合,并将读取出的有向边集合添加至第一脚本标识对应的第一有向边集合中,并对第一有向边集合进行去重处理。
在i小于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数的情况下, 表征已将由第i个脚本标识所处的邻接关系集合确定出的有向边集合,添加至对应的第一有向边集合,此时,将i赋值为i+1,返回步骤602。
在i等于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数的情况下,退出循环,或针对第i个脚本标识所处的邻接关系集合对应的第一脚本标识对应的邻接关系集合中的下一个脚本标识,执行步骤602。
步骤607:在所述第三数据表存储的脚本标识中不包括第i个脚本标识,或者第i个脚本标识对应的第五时间早于对应的第四时间的情况下,将第i个脚本标识添加至所述第一依赖任务集合并去重,将由第i个脚本标识对应的邻接关系集合确定出的有向边集合,添加至所述第一有向边集合并去重。
这里,在第三数据表存储的脚本标识中不包括第一脚本标识对应的第i个脚本标识,或者第i个脚本标识对应的第五时间早于对应的第四时间的情况下,将第二数据表中第一脚本标识对应的第i个脚本标识,添加至第三数据表中第一脚本标识对应的第一依赖任务集合,并对第一依赖任务集合进行去重处理;从第二数据表中读取第i个脚本标识对应的邻接关系集合,并基于第i个脚本标识对应的邻接关系集合,确定出该邻接关系集合对应的有向边集合,将确定出的有向边集合添加至第一有向边集合,并对第一有向边集合进行去重处理。
步骤608:将第i个脚本标识识别为第一脚本标识,执行所述判断所述第一依赖任务集合中是否存在第一脚本标识在所述第二数据表中对应的邻接关系集合中的第i个脚本标识。
电子设备在执行步骤607的情况下,将第一脚本标识对应的第i个脚本标识识别为第一脚本标识,返回步骤602,以针对第一脚本标识对应的第i个脚本标识在第二数据表中对应的邻接关系集合中包括的每个脚本标识,按照步骤602至步骤608进行处理。
需要说明的是,在i等于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数,或第i个脚本标识对应的邻接关系集合为空集的情况下,退出循环,或针对对应的第一脚本标识所处的邻接关系集合中的下一个脚本标识,执行步骤602,具体实现过程请参照步骤603中的相关描述,此处不赘述。
下面,以第一脚本标识为J1为例,说明在第三数据表中未查找到第一脚本标识,或第一脚本标识对应的第五时间早于对应的第四时间的情况下,基于确定出的每个所述第一任务脚本对应的邻接关系集合,更新所述三数据表中第一脚本标识对应的依赖任务集合和有向边集合的实现过程:
电子设备从第三数据表存储的脚本标识中,查找第二数据表中的脚本标识J1。
在第三数据表中未查找到第一脚本标识的情况下,从第二数据表中取出J1,将J1写入第三数据表,并在第三数据表中新建J1对应的第一依赖任务集合Ls和第一有向边集合E,此时,J1对应的Ls和E1均为空集;将J1添加至Ls,根据第二数据表中J1对应的邻接关系集合J1.rel={J2:T1,J2:T2,J4:T1},确定出J1对应的有向边集合E1={<J2,J1>,<J4,J1>},将E1添加至E。
在第三数据表存储的脚本标识中查找到脚本标识J1,且J1对应的第五时间早于对应的第四时间的情况下,表征在此之前已生成J1对应的依赖任务集合对应的有向无环图,此时,第三数据表中J1对应的第一依赖任务集合Ls和第一有向边集合E均不是空集,电子设备将J1对应的有向边集合E1添加至第三数据表中J1对应的E中,并对E进行去重处理。
考虑到在第三数据表中未查找到第一脚本标识的情况下,以及在第三数据表存储的脚本标识中查找到第一脚本标识的情况下,确定出第三数据表中第一脚本标识对应的依赖任务集合和有向边集合中的各元素的实现方法类似,为了方便描述,下面以在第三数 据表中未查找到J1的情况下,说明在将J1对应的E1添加至对应的E的情况下,针对第二数据表中J1对应的邻接关系集合中的脚本标识J2和J4,分别按照上述步骤602至步骤608进行处理:
判断J1对应的第一依赖任务集合Ls中是否存在J1对应的邻接关系集合中的第一个脚本标识J2,得到第一判断结果。
由于在第三数据表中未查找到J1的情况下,第三数据表中J1对应的Ls中只包括J1,第一判断结果表征第一依赖任务集合Ls中不存在J2,此时,执行步骤604,判断第三数据表存储的脚本标识中是否包括J2,得到第二判断结果。
由于在第三数据表中未查找到J1的情况下,第二判断结果表征第三数据表中不包括J2,此时,执行步骤607,将J2添加至J1对应的Ls中,根据第二数据表中J2对应的邻接关系集合J2.rel={J3:T5,J4:T6},确定出J2对应的有向边集合E2={<J3,J2>,<J4,J2>},将E2添加至J1对应的E中,并对J1对应的E进行去重处理,执行步骤608,针对J2对应的邻接关系集合J2.rel中的J3和J4分别按照上述步骤602至步骤608进行处理。
这里,执行步骤602时,判断J1对应的Ls中是否存在J2对应的邻接关系集合中的第一个脚本标识J3,得到第三判断结果。
此时,由于在第三数据表中未查找到J1的情况下,J1对应的Ls中不存在J3,第三判断结果表征J1对应的Ls中不存在J3,此时,执行步骤604,判断第三数据表存储的脚本标识中是否包括J3,得到第四判断结果。
由于,在第三数据表中未查找到J1的情况下,第三数据表中不存在J3,因此,第四判断结果表征第三数据表存储的脚本标识中不包括J3,执行步骤607,将J3添加至J1对应的Ls中,并对J1对应的Ls进行去重处理;根据第二数据表中J3对应的邻接关系集合J3.rel={J5:T8},得到J3对应的有向边集合E3={<J8,J3>},将E3添加至J1对应的E中,并对J1对应的E进行去重处理;执行步骤608,针对J3对应的邻接关系集合J3.rel中的J5按照上述步骤602至步骤608进行处理。
这里,在执行步骤602时,判断J1对应的Ls中是否存在J3对应的邻接关系集合中的第一个脚本标识J5,得到第五判断结果。此时,第五判断结果表征J1对应的Ls中不存在J3对应的邻接关系集合中的J5,执行步骤604,判断第三数据表存储的脚本标识中是否包括J5。此时,判断结果表征第三数据表存储的脚本标识中不包括J5,执行步骤607,将J5添加至J1对应的Ls中并去重,由第二数据表中J5对应的邻接关系集合J5.rel={}确定出的有向边集合为空集,由于J5对应J5.rel为空集,结束对J5的处理,针对J5所处的邻接关系集合对应的第一脚本标识J3所处的邻接关系集合中的下一个脚本标识,按照步骤602至步骤608进行处理。
这里,J3所处的邻接关系集合为J2.rel={J3:T5,J4:T6},J2.rel中的下一个脚本标识为J4,因此,针对J4按照步骤602至步骤608进行处理。
针对J4执行步骤602时,判断J1对应的Ls中是否存在J4。由于J1对应的Ls中不存在J4,此时,执行步骤604判断第三数据表存储的脚本标识中是否包括J4,由于第三数据表存储的脚本标识中不包括J4,执行步骤607将J4添加至J1对应的Ls中并去重,由J4对应的邻接关系集合J4.rel={J3:T10,J5:T9}确定出J4对应的有向边集合E4={<J3,J4>,<J5,J4>},将E4添加至J1对应的E中并去重,执行步骤608将J4识别为第一脚本标识,返回步骤602,以针对第二数据表中J4对应的邻接关系集合中的J3和J5,按照步骤602至步骤608进行处理。
在执行步骤602时,判断J1对应的Ls中是否存在J4对应的邻接关系集合中的第一个脚本标识J3,由于J1对应的Ls中存在J3,此时,执行步骤603。由于J4对应的邻 接关系集合中包括2个脚本标识,因此,将i赋值为i+1,执行步骤602,判断J1对应的Ls中是否存在J4对应的邻接关系集合中的第二个脚本标识J5,由于J1对应的Ls中存在J5,此时,执行步骤603,由于i等于J5所处的邻接关系集合中包括的脚本标识的总数,针对J5所处的邻接关系集合对应的第一脚本标识J4所处的邻接关系集合J2.rel中的下一个脚本标识,执行步骤602;由于J4为J2.rel中的最后一个脚本标识,因此,针对J4所处的邻接关系集合对应的第一脚本标识J2所处的邻接关系集合J1.rel中的下一个脚本标识,执行步骤602,由于J4为J1.rel中的最后一个脚本标识,因此,退出循环,输出J1对应的Ls和E。此时,J1对应的Ls=[J5,J3,J4,J2,J1],J1对应的E={<J2,J1>,<J4,J1>,<J3,J2>,<J4,J2>,<J5,J3>,<J3,J4>,<J5,J4>}。
为实现本申请实施例的方法,本申请实施例还提供了一种电子设备,如图7所示,该电子设备包括:
提取单元71,配置为从接收到的至少两个任务脚本中每个任务脚本的源码中,提取出每个任务脚本对应的采用结构化查询语言描述的第一文本;
第一确定单元72,配置为从提取的每个第一文本对应的抽象语法树中,确定出每个任务脚本对应的输入项和输出项;
第二确定单元73,配置为基于第一任务脚本对应的输入项与至少一个第二任务脚本中每个第二任务脚本对应的输出项之间的交集,确定出所述第一任务脚本对应的邻接关系集合;
输出单元74,配置为基于确定出的每个所述第一任务脚本对应的邻接关系集合,确定出所述至少两个任务脚本对应的至少一个有向边集合,并输出所述至少一个有向边集合中每个有向边集合对应的有向无环图;其中,
所述第一任务脚本和所述第二任务脚本为所述至少两个任务脚本中不同的任务脚本;邻接关系集合中的邻接关系表征第一任务脚本的邻接依赖任务和对应的交集;所述有向边集合中的有向边表征每两个任务脚本之间的依赖关系。
在一些实施例中,第一确定单元72具体配置为:
将每个任务脚本的脚本标识、对应的输入项集合和输出项集合,关联写入第一数据表;
基于所述第一数据表中第一任务脚本对应的输入项集合与第二任务脚本对应的输出项集合之间的交集,确定出所述第一任务脚本对应的邻接关系;
将所述第一任务脚本的脚本标识和对应确定出的邻接关系,关联写入第二数据表;其中,所述第二数据表用于关联存储脚本标识和邻接关系集合。
在一些实施例中,该电子设备还包括第一更新单元,配置为执行以下至少之一:
在接收到针对第三任务脚本的删除指令的情况下,从所述第一数据表和所述第二数据表中删除所述第三任务脚本的脚本标识对应的数据,以及删除包含所述第三任务脚本的脚本标识的邻接关系;
在接收到新增的第三任务脚本的情况下,将所述第三任务脚本的脚本标识、对应的输入项集合和输出项集合,关联写入第一数据表,以及将所述第三任务脚本的脚本标识和对应的邻接关系集合,关联写入第二数据表。
在一些实施例中,所述第一数据表还包括表征变更输入项集合的第一时间和表征变更输出项集合的第二时间;所述第二数据表还包括表征变更邻接关系的第三时间和表征变更邻接依赖任务的第四时间,该电子设备还包括第二更新单元,配置为执行以下至少之一:
在第一任务脚本的源码发生变更的情况下,更新所述第一数据表中第一任务脚本对应的输入项集合、输出项集合、第一时间和第二时间中的至少一组;
在第一任务脚本对应的第三时间早于对应的第一时间的情况下,基于对应更新后的输入项集合和除第一任务脚本之外的任务脚本对应的输出项集合之间的交集,更新所述第二数据中第一任务脚本对应的邻接关系集合和第四时间中的至少之一;
在第一任务脚本对应的第三时间早于对应的第二时间的情况下,基于对应更新后的输出项集合和第一任务脚本的邻接依赖任务对应的输入项集合之间的交集,更新所述第二数据中邻接依赖第一任务脚本的任务脚本对应的邻接关系集合和第四时间中的至少之一;
将第一任务脚本对应的第三时间,更新为对应的第一时间和对应的第二时间中的最大值;其中,第四时间在对应更新后的邻接关系集合中的邻接依赖任务发生变更的情况下更新。
在一些实施例中,输出单元74具体配置为:
在第三数据表存储的脚本标识中查找所述第二数据表存储的脚本标识中的第一脚本标识;其中,所述第三数据表用于关联存储脚本标识、依赖任务集合、有向边集合和表征变更有向无环图的第五时间;
在所述第三数据表中查找到第一脚本标识的情况下,检测所述第三数据表中第一脚本标识对应的第五时间是否等于或晚于对应的第四时间;
在第一脚本标识对应的第五时间等于或晚于对应的第四时间的情况下,输出数据库中与第一脚本标识关联存储的有向无环图;
在所述第三数据表中未查找到第一脚本标识,或第一脚本标识对应的第五时间早于对应的第四时间的情况下,基于确定出的每个所述第一任务脚本对应的邻接关系集合,更新所述三数据表中第一脚本标识对应的依赖任务集合和有向边集合,并基于第一脚本标识对应的更新后的依赖任务集合和有向边集合,输出对应的有向无环图。
在一些实施例中,在所述第三数据表中未查找到第一脚本标识,或第一脚本标识对应的第五时间早于对应的第四时间的情况下,输出单元74具体配置为:
在所述第三数据表中关联写入第一脚本标识、对应的第一依赖任务集合和对应的第一有向边集合的情况下,将由第一脚本标识对应的邻接关系集合确定出的有向边集合,添加至对应的第一有向边集合并去重;所述第一依赖任务集合中包括第一脚本标识;
判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚本标识;
在所述第一依赖任务集合中存在第i个脚本标识,且i小于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数的情况下,将i赋值为i+1,执行所述判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚本标识;
在所述第一依赖任务集合中不存在第i个脚本标识的情况下,判断所述第三数据表存储的脚本标识中是否包括第i个脚本标识;
在所述第三数据表存储的脚本标识中包括第i个脚本标识的情况下,判断第i个脚本标识对应的第五时间是否等于或晚于对应的第四时间;
在第i个脚本标识对应的第五时间等于或晚于对应的第四时间的情况下,将第i个脚本标识添加至所述第一依赖任务集合并去重,将第i个脚本标识在所述第三数据表中对应的有向边集合,添加至所述第一有向边集合并去重;将i赋值为i+1,执行所述判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚本标识;
在所述第三数据表存储的脚本标识中不包括第i个脚本标识,或者第i个脚本标识对应的第五时间早于对应的第四时间的情况下,将第i个脚本标识添加至所述第一依赖任务集合并去重,将由第i个脚本标识对应的邻接关系集合确定出的有向边集合,添加 至所述第一有向边集合并去重;
将第i个脚本标识识别为第一脚本标识,执行所述判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚本标识;其中,
在i等于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数,或第i个脚本标识对应的邻接关系集合为空集的情况下,退出循环,或针对对应的第一脚本标识所处的邻接关系集合中的下一个脚本标识,执行所述判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚本标识。
在一些实施例中,在所述第三数据表中未查找到第一脚本标识的情况下,输出单元74还配置为:
在所述第三数据表中关联写入第一脚本标识、对应的第一依赖任务集合和对应的第一有向边集合;其中,第一依赖任务集合和第一有向边集合均为空集;
将第一脚本标识添加至所述第一依赖任务集合。
实际应用时,电子设备包括的各单元可通过电子设备中的处理器,比如中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)、微控制单元(MCU,Microcontroller Unit)或可编程门阵列(FPGA,Field-Programmable Gate Array)等实现。
需要说明的是:上述实施例提供的电子设备在进行数据处理时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将装置的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外,上述实施例提供的电子设备与数据处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
基于上述程序模块的硬件实现,且为了实现本申请实施例的方法,本申请实施例还提供了一种电子设备。图8为本申请实施例提供的电子设备的硬件组成结构示意图,如图8所示,电子设备8包括:
通信接口81,能够与其它设备比如网络设备等进行信息交互;
处理器82,与所述通信接口81连接,以实现与其它设备进行信息交互,配置为运行计算机程序时,执行上述一个或多个技术方案提供的数据处理方法。而所述计算机程序存储在存储器83上。
当然,实际应用时,电子设备8中的各个组件通过总线系统84耦合在一起。可理解,总线系统84用于实现这些组件之间的连接通信。总线系统84除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图8中将各种总线都标为总线系统84。
本申请实施例中的存储器83配置为存储各种类型的数据以支持电子设备8的操作。这些数据的示例包括:配置为在电子设备8上操作的任何计算机程序。
可以理解,存储器83可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,ferromagnetic random access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random  Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本申请实施例描述的存储器83旨在包括但不限于这些和任意其它适合类型的存储器。
上述本申请实施例揭示的方法可以应用于处理器82中,或者由处理器82实现。处理器82可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器82中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器82可以是通用处理器、DSP,或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器82可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器83,处理器82读取存储器83中的程序,结合其硬件完成前述方法的步骤。
可选地,所述处理器82执行所述程序时实现本申请实施例的各个方法中由终端实现的相应流程,为了简洁,在此不再赘述。
在示例性实施例中,本申请实施例还提供了一种存储介质,即计算机存储介质,具体为计算机可读存储介质,例如包括存储计算机程序的第一存储器83,上述计算机程序可由终端的处理器82执行,以完成前述方法所述步骤。计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、Flash Memory、磁表面存储器、光盘、或CD-ROM等存储器。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理模块中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM, Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是:“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
需要说明的是,本申请实施例所记载的技术方案之间,在不冲突的情况下,可以任意组合。
需要说明的是,本申请实施例中的术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多个中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种数据处理方法,包括:
    从接收到的至少两个任务脚本中每个任务脚本的源码中,提取出每个任务脚本对应的采用结构化查询语言描述的第一文本;
    从提取的每个第一文本对应的抽象语法树中,确定出每个任务脚本对应的输入项和输出项;
    基于第一任务脚本对应的输入项与至少一个第二任务脚本中每个第二任务脚本对应的输出项之间的交集,确定出所述第一任务脚本对应的邻接关系集合;
    基于确定出的每个所述第一任务脚本对应的邻接关系集合,确定出所述至少两个任务脚本对应的至少一个有向边集合,并输出所述至少一个有向边集合中每个有向边集合对应的有向无环图;其中,
    所述第一任务脚本和所述第二任务脚本为所述至少两个任务脚本中不同的任务脚本;邻接关系集合中的邻接关系表征第一任务脚本的邻接依赖任务和对应的交集;所述有向边集合中的有向边表征每两个任务脚本之间的依赖关系。
  2. 根据权利要求1所述的方法,其中,所述基于第一任务脚本对应的输入项与至少一个第二任务脚本中每个第二任务脚本对应的输出项之间的交集,确定出所述第一任务脚本对应的邻接关系集合,包括:
    将每个任务脚本的脚本标识、对应的输入项集合和输出项集合,关联写入第一数据表;
    基于所述第一数据表中第一任务脚本对应的输入项集合与第二任务脚本对应的输出项集合之间的交集,确定出所述第一任务脚本对应的邻接关系;
    将所述第一任务脚本的脚本标识和对应确定出的邻接关系,关联写入第二数据表;其中,所述第二数据表用于关联存储脚本标识和邻接关系集合。
  3. 根据权利要求2所述的方法,其中,所述方法还包括以下至少之一:
    在接收到针对第三任务脚本的删除指令的情况下,从所述第一数据表和所述第二数据表中删除所述第三任务脚本的脚本标识对应的数据,以及删除包含所述第三任务脚本的脚本标识的邻接关系;
    在接收到新增的第三任务脚本的情况下,将所述第三任务脚本的脚本标识、对应的输入项集合和输出项集合,关联写入第一数据表,以及将所述第三任务脚本的脚本标识和对应的邻接关系集合,关联写入第二数据表。
  4. 根据权利要求2所述的方法,其中,所述第一数据表还包括表征变更输入项集合的第一时间和表征变更输出项集合的第二时间;所述第二数据表还包括表征变更邻接关系的第三时间和表征变更邻接依赖任务的第四时间,所述方法还包括以下至少之一:
    在第一任务脚本的源码发生变更的情况下,更新所述第一数据表中第一任务脚本对应的输入项集合、输出项集合、第一时间和第二时间中的至少一组;
    在第一任务脚本对应的第三时间早于对应的第一时间的情况下,基于对应更新后的输入项集合和除第一任务脚本之外的任务脚本对应的输出项集合之间的交集,更新所述第二数据中第一任务脚本对应的邻接关系集合和第四时间中的至少之一;
    在第一任务脚本对应的第三时间早于对应的第二时间的情况下,基于对应更新后的输出项集合和第一任务脚本的邻接依赖任务对应的输入项集合之间的交集,更新所述第二数据中邻接依赖第一任务脚本的任务脚本对应的邻接关系集合和第四时 间中的至少之一;
    将第一任务脚本对应的第三时间,更新为对应的第一时间和对应的第二时间中的最大值;其中,
    第四时间在对应更新后的邻接关系集合中的邻接依赖任务发生变更的情况下更新。
  5. 根据权利要求4所述的方法,其中,所述基于确定出的每个所述第一任务脚本对应的邻接关系集合,确定出所述至少两个任务脚本对应的至少一个有向边集合,并输出所述至少一个有向边集合中每个有向边集合对应的有向无环图,包括:
    在第三数据表存储的脚本标识中查找所述第二数据表存储的脚本标识中的第一脚本标识;其中,所述第三数据表用于关联存储脚本标识、依赖任务集合、有向边集合和表征变更有向无环图的第五时间;
    在所述第三数据表中查找到第一脚本标识的情况下,检测所述第三数据表中第一脚本标识对应的第五时间是否等于或晚于对应的第四时间;
    在第一脚本标识对应的第五时间等于或晚于对应的第四时间的情况下,输出数据库中与第一脚本标识关联存储的有向无环图;
    在所述第三数据表中未查找到第一脚本标识,或第一脚本标识对应的第五时间早于对应的第四时间的情况下,基于确定出的每个所述第一任务脚本对应的邻接关系集合,更新所述三数据表中第一脚本标识对应的依赖任务集合和有向边集合,并基于第一脚本标识对应的更新后的依赖任务集合和有向边集合,输出对应的有向无环图。
  6. 根据权利要求5所述的方法,其中,所述基于确定出的每个所述第一任务脚本对应的邻接关系集合,更新所述三数据表中第一脚本标识对应的依赖任务集合和有向边集合,包括:
    在所述第三数据表中关联写入第一脚本标识、对应的第一依赖任务集合和对应的第一有向边集合的情况下,将由第一脚本标识对应的邻接关系集合确定出的有向边集合,添加至对应的第一有向边集合并去重;所述第一依赖任务集合中包括第一脚本标识;
    判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚本标识;
    在所述第一依赖任务集合中存在第i个脚本标识,且i小于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数的情况下,将i赋值为i+1,执行所述判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚本标识;
    在所述第一依赖任务集合中不存在第i个脚本标识的情况下,判断所述第三数据表存储的脚本标识中是否包括第i个脚本标识;
    在所述第三数据表存储的脚本标识中包括第i个脚本标识的情况下,判断第i个脚本标识对应的第五时间是否等于或晚于对应的第四时间;
    在第i个脚本标识对应的第五时间等于或晚于对应的第四时间的情况下,将第i个脚本标识添加至所述第一依赖任务集合并去重,将第i个脚本标识在所述第三数据表中对应的有向边集合,添加至所述第一有向边集合并去重;将i赋值为i+1,执行所述判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚本标识;
    在所述第三数据表存储的脚本标识中不包括第i个脚本标识,或者第i个脚本标识对应的第五时间早于对应的第四时间的情况下,将第i个脚本标识添加至所述第一 依赖任务集合并去重,将由第i个脚本标识对应的邻接关系集合确定出的有向边集合,添加至所述第一有向边集合并去重;
    将第i个脚本标识识别为第一脚本标识,执行所述判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚本标识;其中,
    在i等于第i个脚本标识所处的邻接关系集合中包括的脚本标识的总数,或第i个脚本标识对应的邻接关系集合为空集的情况下,退出循环,或针对对应的第一脚本标识所处的邻接关系集合中的下一个脚本标识,执行所述判断所述第一依赖任务集合中是否存在第一脚本标识对应的邻接关系集合中的第i个脚本标识。
  7. 根据权利要求6所述的方法,其中,在第三数据表中未查找到第一脚本标识的情况下,所述基于确定出的每个所述第一任务脚本对应的邻接关系集合,更新所述三数据表中第一脚本标识对应的依赖任务集合和有向边集合,还包括:
    在所述第三数据表中关联写入第一脚本标识、对应的第一依赖任务集合和对应的第一有向边集合;其中,第一依赖任务集合和第一有向边集合均为空集;
    将第一脚本标识添加至所述第一依赖任务集合。
  8. 一种电子设备,包括:
    提取单元,用于从接收到的至少两个任务脚本中每个任务脚本的源码中,提取出每个任务脚本对应的采用结构化查询语言描述的第一文本;
    第一确定单元,配置为从提取的每个第一文本对应的抽象语法树中,确定出每个任务脚本对应的输入项和输出项;
    第二确定单元,配置为基于第一任务脚本对应的输入项与至少一个第二任务脚本中每个第二任务脚本对应的输出项之间的交集,确定出所述第一任务脚本对应的邻接关系集合;
    第三确定单元,配置为基于确定出的每个所述第一任务脚本对应的邻接关系集合,确定出所述至少两个任务脚本对应的至少一个有向边集合,并输出所述至少一个有向边集合中每个有向边集合对应的有向无环图;其中,
    所述第一任务脚本和所述第二任务脚本为所述至少两个任务脚本中不同的任务脚本;邻接关系集合中的邻接关系表征第一任务脚本的邻接依赖任务和对应的交集;所述有向边集合中的有向边表征每两个任务脚本之间的依赖关系。
  9. 一种电子设备,包括:处理器和配置为存储能够在处理器上运行的计算机程序的存储器,
    其中,所述处理器配置为运行所述计算机程序时,执行权利要求1至7任一项所述的方法的步骤。
  10. 一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至7任一项所述的方法的步骤。
PCT/CN2021/140176 2021-06-17 2021-12-21 数据处理方法、电子设备及存储介质 WO2022262240A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110671384.XA CN113326063B (zh) 2021-06-17 2021-06-17 数据处理方法、电子设备及存储介质
CN202110671384.X 2021-06-17

Publications (1)

Publication Number Publication Date
WO2022262240A1 true WO2022262240A1 (zh) 2022-12-22

Family

ID=77423625

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140176 WO2022262240A1 (zh) 2021-06-17 2021-12-21 数据处理方法、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN113326063B (zh)
WO (1) WO2022262240A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326063B (zh) * 2021-06-17 2023-03-03 深圳前海微众银行股份有限公司 数据处理方法、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160085584A1 (en) * 2014-09-18 2016-03-24 Robert D. Pedersen Distributed activity control systems and methods
CN110727834A (zh) * 2019-09-30 2020-01-24 北京百度网讯科技有限公司 有向无环图的获取方法、装置、电子设备和存储介质
CN110795455A (zh) * 2019-09-06 2020-02-14 中国平安财产保险股份有限公司 依赖关系解析方法、电子装置、计算机设备及可读存储介质
CN113326063A (zh) * 2021-06-17 2021-08-31 深圳前海微众银行股份有限公司 数据处理方法、电子设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5231698A (en) * 1991-03-20 1993-07-27 Forcier Mitchell D Script/binary-encoded-character processing method and system
US9594845B2 (en) * 2010-09-24 2017-03-14 International Business Machines Corporation Automating web tasks based on web browsing histories and user actions
CN104216888B (zh) * 2013-05-30 2017-10-17 中国电信股份有限公司 数据处理任务关系设置方法及系统
US10782775B2 (en) * 2017-01-13 2020-09-22 Atheer, Inc. Methods and apparatus for providing procedure guidance
CN109445881A (zh) * 2018-11-02 2019-03-08 拉卡拉支付股份有限公司 脚本运行方法、装置、电子设备及存储介质
CN109787858B (zh) * 2018-12-29 2021-01-26 福建天泉教育科技有限公司 一种批量发布服务的方法及终端

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160085584A1 (en) * 2014-09-18 2016-03-24 Robert D. Pedersen Distributed activity control systems and methods
CN110795455A (zh) * 2019-09-06 2020-02-14 中国平安财产保险股份有限公司 依赖关系解析方法、电子装置、计算机设备及可读存储介质
CN110727834A (zh) * 2019-09-30 2020-01-24 北京百度网讯科技有限公司 有向无环图的获取方法、装置、电子设备和存储介质
CN113326063A (zh) * 2021-06-17 2021-08-31 深圳前海微众银行股份有限公司 数据处理方法、电子设备及存储介质

Also Published As

Publication number Publication date
CN113326063A (zh) 2021-08-31
CN113326063B (zh) 2023-03-03

Similar Documents

Publication Publication Date Title
JP6998928B2 (ja) データを記憶およびクエリするための方法、装置、設備、および媒体
US7822710B1 (en) System and method for data collection
US20170109378A1 (en) Distributed pipeline optimization for data preparation
JP6070936B2 (ja) 情報処理装置、情報処理方法及びプログラム
CN111258966A (zh) 一种数据去重方法、装置、设备及存储介质
US20140122455A1 (en) Systems and Methods for Intelligent Parallel Searching
CN106991100B (zh) 数据导入方法及装置
WO2019161645A1 (zh) 基于Shell的数据表提取方法、终端、设备及存储介质
US20170109388A1 (en) Signature-based cache optimization for data preparation
WO2019136855A1 (zh) 保单多维度分析实现方法、装置、终端设备及存储介质
EP3362808B1 (en) Cache optimization for data preparation
WO2021258853A1 (zh) 词汇纠错方法、装置、计算机设备及存储介质
CN106909554A (zh) 一种数据库文本表数据的加载方法及装置
WO2022262240A1 (zh) 数据处理方法、电子设备及存储介质
CN113407565B (zh) 跨库数据查询方法、装置和设备
CN109656947B (zh) 数据查询方法、装置、计算机设备和存储介质
US7984072B2 (en) Three-dimensional data structure for storing data of multiple domains and the management thereof
WO2024078122A1 (zh) 数据库表扫描的方法、装置以及设备
US9305080B2 (en) Accelerating queries using delayed value projection of enumerated storage
CN116010345A (zh) 一种实现流批一体数据湖的表服务方案的方法、装置及设备
CN110795915A (zh) xml文件批量修改方法、系统、设备和计算机可读存储介质
US20170031909A1 (en) Locality-sensitive hashing for algebraic expressions
JP7487115B2 (ja) データフローグラフ最適化のシステム及び方法
CN113434673A (zh) 数据处理方法和计算机可读存储介质、电子设备
CN112463896A (zh) 档案编目数据处理方法、装置、计算设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21945816

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 220324)