CN112035367B - Method and system for checking workflow correctness of big data platform - Google Patents

Method and system for checking workflow correctness of big data platform Download PDF

Info

Publication number
CN112035367B
CN112035367B CN202010908992.3A CN202010908992A CN112035367B CN 112035367 B CN112035367 B CN 112035367B CN 202010908992 A CN202010908992 A CN 202010908992A CN 112035367 B CN112035367 B CN 112035367B
Authority
CN
China
Prior art keywords
workflow
information
file
task
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010908992.3A
Other languages
Chinese (zh)
Other versions
CN112035367A (en
Inventor
于东东
于敛青
邢利菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202010908992.3A priority Critical patent/CN112035367B/en
Publication of CN112035367A publication Critical patent/CN112035367A/en
Application granted granted Critical
Publication of CN112035367B publication Critical patent/CN112035367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a method and a system for verifying workflow correctness of a big data platform, wherein the method comprises the following steps: acquiring a workflow XML code file, and extracting scheduling information and parameter information of a workflow; generating a first workflow file according to the scheduling information and the parameter information of the workflow; acquiring a scheduling script code and extracting a data table on which a script depends; generating a second workflow file according to the data table; judging the correctness of the workflow file by comparing the differences among the first workflow file, the second workflow file and the workflow configuration file, and generating a workflow correctness checking result; the application automatically checks the correctness of workflow configuration through two dimensions of the developed workflow XML code and the scheduling script developed by the developer, discovers the problem of workflow configuration in advance, ensures that the workflow configuration accords with code logic, prevents the adverse effect caused by human error and provides powerful guarantee for batch normal operation.

Description

Method and system for checking workflow correctness of big data platform
Technical Field
The application relates to the technical field of big data processing, in particular to a method and a system for verifying the correctness of a workflow of a big data platform.
Background
At present, the existing large data suite workflow development function only supports manual drag development on a visual page, and does not have a batch development workflow function. Most of the code (hive sql, shell, etc.) pre-dependencies are data tables in the platform. In the actual development process, a developer generally applies for a workflow by filling in a document, and fills in the contents such as workflow names, steps, pre-dependency list names, script names and the like.
In the existing big data platform scheduling system, the system automatically triggers task execution in the workflow according to the configuration of each tenant workflow, and the execution process generally includes checking whether a data file used in a code already exists, and then executing scripts (hive sql, shell, etc.) of each task step in the workflow. In actual engineering activities, a developer applies for a workflow configuration file (Request), and a workflow responsible person uniformly develops and implements a workflow. However, a developer often fills in the application form by mistake due to code modification or other reasons, so that the developer often omits or wrongly fills in the preamble (data table partition), the data table name and the like on which each task instance depends in the workflow configuration, and further causes batch exception due to the fact that the data does not exist or is filled in the error in the batch operation process. When a workflow responsible person develops a workflow according to a workflow configuration file of a request form, errors of a developer are inherited, and when the developer develops the workflow, the developer can also write out errors of names of data tables, rely on missing data tables and the like, so that batch errors are caused. The error configuration of the workflow may cause data processing errors, thereby affecting the normal function use and causing adverse effects on the normal business function development of the bank.
In summary, in the existing big data suite, the workflow adopts a mode similar to a flow chart to set each configuration item, and due to inconvenient interfaces, the workflow is often manually configured and neglected, the integrity and the correctness of the workflow task cannot be completely reflected, and batch errors and other important influences can be caused.
Therefore, a technical solution is needed to overcome the above-mentioned problems and avoid the batch abnormal influence of the big data platform caused by the artificial workflow configuration error.
Disclosure of Invention
In order to solve the problems in the prior art, the application provides a method and a system for verifying the correctness of a workflow of a big data platform, which can automatically generate the front position of each task dependence in the workflow, automatically check the correctness of workflow XML codes developed by a developer and a workflow responsible person, enable the configuration structure of the workflow with multiple dimensions to be correct, provide an automatic code review mechanism for the developer, avoid abnormal conditions when the big data platform is in batch, and ensure the running stability of a bank system.
In a first aspect of an embodiment of the present application, a method for verifying workflow correctness of a big data platform is provided, where the method includes:
acquiring a workflow XML code file, and extracting scheduling information and parameter information of a workflow;
generating a first workflow file according to the scheduling information and the parameter information of the workflow;
acquiring a scheduling script code and extracting a data table on which a script depends;
generating a second workflow file according to the data table;
and judging the correctness of the workflow file by comparing the differences among the first workflow file, the second workflow file and the workflow configuration file, and generating a workflow correctness checking result.
Further, acquiring a workflow XML code file, extracting scheduling information and parameter information of a workflow, including:
acquiring a workflow XML code file and reading a workflow information set;
judging whether unprocessed workflows exist in the workflow information set, and if unprocessed workflows exist, reading scheduling information and parameter information of the workflows.
Further, the scheduling information includes: data type, initial data time, self-dependence, scheduling time point and pre-table; the parameter information includes: script name, step size, task scheduling priority, and number of retries.
Further, generating a first workflow file according to the scheduling information and the parameter information of the workflow includes:
according to the scheduling information and the parameter information of the workflow, analyzing and obtaining the workflow ID, the task type, the task state, the cycle period, the starting time, the dependency relationship and the side information of the creator corresponding to the current task;
according to the side information of the current workflow, sequentially reading a starting point task, an ending point task and a current side ID of all the side information in the current workflow;
finding an ancestral task according to the obtained starting point task, the finishing point task and the current edge ID, marking the ancestral task as a first stage, searching a corresponding starting point task in the edge information according to the finishing point task of the first stage task, taking the corresponding task as a second stage task, and the like until all the edge information is processed, so as to obtain a hierarchical relation among the tasks;
and updating the edge ID and the hierarchical relationship to the EXCEL file to form a first workflow file.
Further, obtaining a scheduling script code, extracting a data table on which a script depends, including:
the script information of the workflow XML code file is read, and the workflow information and the task name are obtained according to the annotation of the script information;
reading the content of the scheduling script line by line according to the script information, and analyzing to obtain the name of the data table;
and obtaining the dependent data table in the data table name according to the tenant name configuration file in a keyword matching mode.
Further, generating a second workflow file according to the data table includes:
obtaining a hierarchical relationship among the scheduling scripts according to the dependent data table names, the target data table names and the workflow information;
and generating a second workflow file in the EXCEL format according to the workflow information, the task name, the hierarchical relationship and the dependency data table.
In a second aspect of the embodiment of the present application, a system for verifying workflow correctness of a big data platform is provided, the system comprising:
the information extraction module is used for acquiring the XML code file of the workflow and extracting the scheduling information and the parameter information of the workflow;
the first workflow file generation module is used for generating a first workflow file according to the scheduling information and the parameter information of the workflow;
the data table extraction module is used for acquiring the scheduling script codes and extracting the data tables relied by the scripts;
the second workflow file generation module is used for generating a second workflow file according to the data table;
and the verification module is used for judging the correctness of the workflow file by comparing the differences among the first workflow file, the second workflow file and the workflow configuration file and generating a workflow correctness verification result.
Further, the information extraction module includes:
the code file acquisition unit is used for acquiring the workflow XML code file and reading the workflow information set;
and the workflow information reading unit is used for judging whether unprocessed workflows exist in the workflow information set, and if the unprocessed workflows exist, reading the scheduling information and the parameter information of the workflows.
Further, the scheduling information includes: data type, initial data time, self-dependence, scheduling time point and pre-table; the parameter information includes: script name, step size, task scheduling priority, and number of retries.
Further, the first workflow file generation module includes:
the information analysis unit is used for analyzing and obtaining the side information including the workflow ID, the task type, the task state, the cycle period, the starting time, the dependency relationship and the creator corresponding to the current task according to the scheduling information and the parameter information of the workflow;
the side information reading unit is used for sequentially reading the starting point tasks, the end point tasks and the current side ID of all the side information in the current workflow according to the side information of the current workflow;
the hierarchical relation calculating unit is used for finding ancestral tasks according to the obtained starting point tasks, the obtained ending point tasks and the obtained current edge ID, marking the ancestral tasks as one level, searching corresponding starting point tasks in the edge information according to the ending point tasks of the one-level tasks, taking the corresponding tasks as two-level tasks, and the like until all the edge information is processed, so as to obtain the hierarchical relation among the tasks;
and the first workflow file generating unit is used for updating the edge ID and the hierarchical relationship to the EXCEL file to form a first workflow file.
Further, the data table extraction module includes:
the script information reading unit is used for reading script information of the workflow XML code file and acquiring workflow information and task names according to comments of the script information;
the script analysis unit is used for reading the content of the scheduling script line by line according to the script information and analyzing to obtain the name of the data table;
and the information matching unit is used for obtaining the dependent data table in the data table name according to the tenant name configuration file in a keyword matching mode.
Further, the second workflow file generation module includes:
the hierarchical relation calculating unit is used for obtaining the hierarchical relation among the scheduling scripts according to the dependent data table names, the target data table names and the workflow information;
and the second workflow file generating unit is used for generating a second workflow file in the EXCEL format according to the workflow information, the task name, the hierarchical relationship and the dependency data table.
In a third aspect of the embodiments of the present application, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements a method for verifying workflow correctness of a big data platform when executing the computer program.
In a fourth aspect of the embodiments of the present application, a computer readable storage medium is provided, where a computer program is stored, where the computer program when executed by a processor implements a method for verifying workflow correctness of a big data platform.
The method and the system for verifying the workflow correctness of the big data platform automatically verify the correctness of the workflow configuration through two dimensions of the developed workflow XML code and the scheduling script developed by a developer respectively, discover the problem of the workflow configuration in advance, ensure the workflow configuration to conform to code logic, prevent the adverse effect caused by human errors and provide powerful guarantee for batch normal operation.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart illustrating a method for verifying workflow correctness of a big data platform according to an embodiment of the present application.
FIG. 2 is a flow chart of a workflow correctness checking mechanism according to an embodiment of the present application.
FIG. 3 is a schematic diagram of a system architecture for verifying workflow correctness of a big data platform according to an embodiment of the present application.
FIG. 4 is a schematic diagram of a computer device according to an embodiment of the present application.
Detailed Description
The principles and spirit of the present application will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and practice the application and are not intended to limit the scope of the application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Those skilled in the art will appreciate that embodiments of the application may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the application, a method and a system for verifying the correctness of a workflow of a big data platform are provided.
Because the large data platform visualization development only supports IDE forms, with the increase of business functions and the large-batch function on-line, the artificial dragging and workflow configuration have great risks, such as configuration omission and writing errors. When a developer applies, the situation of missing writing and misplacing workflow application form information can be caused, so that the situation of batch long interruption, data structure error and the like of a large data platform can be caused, and great hidden danger exists for the stability of a banking system.
According to the method and the system provided by the application, the task dependency relationship of each workflow is automatically extracted according to the workflow XML code developed by a workflow responsible person, and the information such as the name of each task script, the export mode of a data table structure, the step level and the like forms a workflow file (Tool); meanwhile, automatically analyzing information such as Code dependence and a used data table according to codes such as hive sql and shell submitted by a developer in a configuration library to form a workflow file (Code); automatically verifying whether the workflow is correct by comparing differences between a workflow configuration file (Request), a workflow file (Tool), and a workflow file (Code); the method and the system can automatically provide a tool for checking whether the workflow configuration structure is correct or not according to multiple dimensions, provide an automatic code review mechanism for developers, well solve the problems, and effectively ensure the running stability of a banking system.
In the embodiments of the present application, terms to be described are as follows:
the workflow: the plurality of task nodes are combined together to execute the configuration file codes according to a certain scheduling sequence.
Workflow profile (Request): filling in by a developer, and reflecting the EXCEL format file of all workflow configuration information under the current tenant according to a certain format.
Workflow XML code: the developer develops code on the big data platform according to a workflow profile (Request), which is in the form of XML.
First workflow file (Code): the application automatically generates the workflow file according to the scheduling script.
Second workflow file (Tool): the application automatically generates the workflow file according to the workflow XML code.
The principles and spirit of the present application are explained in detail below with reference to several representative embodiments thereof.
FIG. 1 is a flowchart illustrating a method for verifying workflow correctness of a big data platform according to an embodiment of the present application. As shown in fig. 1, the method includes:
step S1, a workflow XML code file is obtained, and scheduling information and parameter information of a workflow are extracted.
And S2, generating a first workflow file according to the scheduling information and the parameter information of the workflow.
And S3, acquiring a scheduling script code and extracting a data table on which the script depends.
And S4, generating a second workflow file according to the data table.
And S5, judging the correctness of the workflow file by comparing the differences among the first workflow file, the second workflow file and the workflow configuration file, and generating a workflow correctness checking result.
In order to more clearly explain the above-mentioned method for verifying the correctness of the workflow of the big data platform, a specific embodiment is described below.
Step S1:
and S11, acquiring a workflow XML code file and reading a workflow information set.
Step S12, judging whether unprocessed workflows exist in the workflow information set, and if unprocessed workflows exist, reading scheduling information and parameter information of the workflows. Wherein,,
the scheduling information includes: data type, initial data time, self-dependence, scheduling time point and pre-table;
the parameter information includes: script name, step size, task scheduling priority, and number of retries.
Step S2:
step S21, according to the scheduling information and the parameter information of the workflow, the workflow ID, the task type, the task state, the cycle period, the starting time, the dependency relationship and the side information of the creator corresponding to the current task are obtained through analysis.
Step S22, according to the side information of the current workflow, sequentially reading the starting point task, the ending point task and the current side ID of all the side information in the current workflow.
Step S23, finding an ancestral task according to the obtained starting point task, the finishing point task and the current edge ID, marking the ancestral task as a first stage, searching a corresponding starting point task in the edge information according to the finishing point task of the first stage task, taking the corresponding task as a second stage task, and the like until all the edge information is processed, so as to obtain the hierarchical relation among the tasks.
And step S24, updating the edge ID and the hierarchical relationship to the EXCEL file to form a first workflow file.
Step S3:
and S31, reading script information of the workflow XML code file, and acquiring the workflow information and the task name according to the annotation of the script information.
And step S32, reading the contents of the scheduling script line by line according to the script information, and analyzing to obtain the names of the data tables.
Step S33, obtaining the dependent data table according to the tenant name configuration file in the data table name by means of keyword matching.
Step S4:
step S41, obtaining the hierarchical relationship among the scheduling scripts according to the dependent data table names, the target data table names and the workflow information.
Step S42, generating a second workflow file in the EXCEL format according to the workflow information, the task names, the hierarchical relationship and the dependency data table.
Step S5:
and judging the correctness of the workflow file by comparing the differences among the first workflow file, the second workflow file and the workflow configuration file, and generating a workflow correctness checking result.
The verification method for workflow correctness of the big data platform provided by the application automatically verifies the correctness of workflow configuration through two dimensions of the developed workflow XML code and the scheduling script developed by a developer, discovers the problem of workflow configuration in advance, ensures that the workflow configuration accords with code logic, prevents adverse effects caused by human errors, and provides powerful guarantee for normal operation of batches.
It should be noted that although the operations of the method of the present application are described in a particular order in the above embodiments and the accompanying drawings, this does not require or imply that the operations must be performed in the particular order or that all of the illustrated operations be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
In order to more clearly explain the above-mentioned verification method for workflow correctness of the big data platform, a specific embodiment is described below, however, it should be noted that this embodiment is only for better explaining the present application and is not meant to limit the present application unduly.
Referring to fig. 2, a flow chart of a workflow correctness checking mechanism according to an embodiment of the present application is shown. As shown in fig. 2, a developer writes Code, code 1, …, n is submitted to the tool to automatically generate a workflow file (Code). Workflow developers write workflow XML code that is submitted to tools to automatically generate workflow files (Tool).
Specifically, the process of automatically generating a workflow file (Tool) according to workflow XML code is as follows:
step S101, reading all workflow information workflows of a workflow XML code file; one workflow XML code file workflow is a set of all workflows under the whole tenant, namely, a plurality of workflows.
Step S102, judging whether unprocessed workflow exists, if so, continuing, otherwise, ending.
Step S103, basic information such as workflow ID, workflow name, affiliated tenant, creation time and the like of the current workflow is read.
Step S104, processing all task information task of the current workflow, and automatically analyzing information such as workflow ID, task type, task state, cycle period, start time, self-dependence, creator and the like of the current task.
Step S105, processing Edge information Edge of the current workflow and analyzing hierarchical information between tasks, and the specific flow is as follows:
and sequentially reading the starting point tasks, the ending point tasks and the current edge IDs of all edges in the current workflow.
And according to all the obtained side information, judging the starting point task and the ending point task of the current side in sequence until an ancestor task is found and marked as a first stage.
And searching a corresponding starting point task in the side information according to the end point task of the first-level task, taking the corresponding task as a second-level task, and the like until all the side information is processed, so as to obtain the hierarchical relation between the tasks. .
Step S106, updating the result into the EXCEL file to form a workflow file (Tool).
Specifically, the process of automatically generating a workflow file (Code) according to the scheduling script Code is as follows:
step S201, reading all scheduling script names in the workflow XML code file, and processing one by one.
Step S202, reading annotation information to obtain information such as workflow, task name, creator and the like.
Step S203, reading the scheduling script according to the name of the scheduling script.
Step S204, the contents of the scheduling script are read, the names of the data tables are analyzed by the line-by-line reading script, and contents such as annotation lines, insert lines and the like are ignored. And calculating a data table containing 'tenant name and table name' in a keyword matching mode aiming at the tenant name configuration file, so as to obtain the name of the dependent data table.
Step S205, calculating the hierarchical relationship between the scheduling scripts according to the dependency data table name, the target data table name, the workflow and other information
In step S206, all the calculated workflow information, task names, task levels, script dependency data tables, and the like are summarized to form a workflow file (Code) in the EXCEL format.
By comparing the differences among the workflow file (Tool), the workflow file (Code) and the workflow file (Request), the method further assists in manually analyzing whether the workflow Code writing has problems of omission, errors and the like.
Having described the method of an exemplary embodiment of the present application, a description is next made of a system for verifying workflow correctness of a big data platform according to an exemplary embodiment of the present application with reference to fig. 3.
The implementation of the verification system for the correctness of the workflow of the big data platform can be referred to the implementation of the method, and the repetition is not repeated. The term "module" or "unit" as used below may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Based on the same inventive concept, the application also provides a system for verifying the correctness of the workflow of the big data platform, as shown in fig. 3, the system comprises:
the information extraction module 310 is configured to obtain a workflow XML code file, and extract scheduling information and parameter information of a workflow;
a first workflow file generating module 320, configured to generate a first workflow file according to the scheduling information and the parameter information of the workflow;
a data table extracting module 330, configured to obtain a scheduling script code and extract a data table on which a script depends;
a second workflow file generation module 340, configured to generate a second workflow file according to the data table;
and the verification module 350 is configured to determine the correctness of the workflow file by comparing the differences among the first workflow file, the second workflow file and the workflow configuration file, and generate a workflow correctness verification result.
Further, the information extraction module 310 includes:
the code file acquisition unit is used for acquiring the workflow XML code file and reading the workflow information set;
and the workflow information reading unit is used for judging whether unprocessed workflows exist in the workflow information set, and if the unprocessed workflows exist, the workflow information reading unit is used for reading the scheduling information and the parameter information of the workflows.
Wherein the scheduling information includes: data type, initial data time, self-dependence, scheduling time point and pre-table; the parameter information includes: script name, step size, task scheduling priority, and number of retries.
Further, the first workflow file generation module 320 includes:
the information analysis unit is used for analyzing and obtaining the side information including the workflow ID, the task type, the task state, the cycle period, the starting time, the dependency relationship and the creator corresponding to the current task according to the scheduling information and the parameter information of the workflow;
the side information reading unit is used for sequentially reading the starting point tasks, the end point tasks and the current side ID of all the side information in the current workflow according to the side information of the current workflow;
the hierarchical relation calculating unit is used for finding ancestral tasks according to the obtained starting point tasks, the obtained ending point tasks and the obtained current edge ID, marking the ancestral tasks as one level, searching corresponding starting point tasks in the edge information according to the ending point tasks of the one-level tasks, taking the corresponding tasks as two-level tasks, and the like until all the edge information is processed, so as to obtain the hierarchical relation among the tasks;
and the first workflow file generation unit is used for updating the edge ID and the hierarchy relation to the EXCEL file to form a first workflow file.
Further, the data table extracting module 330 includes:
the script information reading unit is used for reading script information of the workflow XML code file and acquiring workflow information and task names according to comments of the script information;
the script analysis unit is used for reading the content of the scheduling script line by line according to the script information and analyzing to obtain the name of the data table;
and the information matching unit is used for obtaining the dependent data table in the data table name in a keyword matching mode according to the tenant name configuration file.
Further, the second workflow file generation module 340 includes:
the hierarchical relation calculating unit is used for obtaining the hierarchical relation among the scheduling scripts according to the dependent data table names, the target data table names and the workflow information;
and the second workflow file generating unit is used for generating a second workflow file in the EXCEL format according to the workflow information, the task name, the hierarchical relationship and the dependency data table.
It should be noted that while several modules of a verification system for large data platform workflow correctness are mentioned in the foregoing detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functions of two or more modules described above may be embodied in one module in accordance with embodiments of the present application. Conversely, the features and functions of one module described above may be further divided into a plurality of modules to be embodied.
Based on the foregoing inventive concept, as shown in fig. 4, the present application further proposes a computer device 400, including a memory 410, a processor 420, and a computer program 430 stored in the memory 410 and capable of running on the processor 420, where the processor 420 implements the foregoing method for verifying the correctness of the workflow of the big data platform when executing the computer program 430.
Based on the foregoing inventive concept, the present application proposes a computer readable storage medium storing a computer program which, when executed by a processor, implements the foregoing method for verifying workflow correctness of a big data platform.
The method and the system for verifying the workflow correctness of the big data platform automatically verify the correctness of the workflow configuration through two dimensions of the developed workflow XML code and the scheduling script developed by a developer respectively, discover the problem of the workflow configuration in advance, ensure the workflow configuration to conform to code logic, prevent the adverse effect caused by human errors and provide powerful guarantee for batch normal operation.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. A method for verifying the correctness of a workflow of a big data platform is characterized by comprising the following steps:
acquiring a workflow XML code file, and extracting scheduling information and parameter information of a workflow;
generating a first workflow file according to the scheduling information and the parameter information of the workflow;
acquiring a scheduling script code and extracting a data table on which a script depends;
generating a second workflow file according to the data table;
judging the correctness of the workflow file by comparing the differences among the first workflow file, the second workflow file and the workflow configuration file, and generating a workflow correctness checking result;
wherein generating a first workflow file according to the scheduling information and the parameter information of the workflow comprises:
according to the scheduling information and the parameter information of the workflow, analyzing and obtaining the workflow ID, the task type, the task state, the cycle period, the starting time, the dependency relationship and the side information of the creator corresponding to the current task;
according to the side information of the current workflow, sequentially reading a starting point task, an ending point task and a current side ID of all the side information in the current workflow;
finding an ancestral task according to the obtained starting point task, the finishing point task and the current edge ID, marking the ancestral task as a first stage, searching a corresponding starting point task in the edge information according to the finishing point task of the first stage task, taking the corresponding task as a second stage task, and the like until all the edge information is processed, so as to obtain a hierarchical relation among the tasks;
updating the edge ID and the hierarchical relationship to an EXCEL file to form a first workflow file;
the method for extracting the data table relied on by the script comprises the following steps of:
the script information of the workflow XML code file is read, and the workflow information and the task name are obtained according to the annotation of the script information;
reading the content of the scheduling script line by line according to the script information, and analyzing to obtain the name of the data table;
in the names of the data tables, according to tenant name configuration files, obtaining a dependent data table in a keyword matching mode;
generating a second workflow file according to the data table, wherein the second workflow file comprises:
obtaining a hierarchical relationship among the scheduling scripts according to the dependent data table names, the target data table names and the workflow information;
and generating a second workflow file in the EXCEL format according to the workflow information, the task name, the hierarchical relationship and the dependency data table.
2. The method for verifying the correctness of the workflow of the big data platform according to claim 1, wherein the steps of obtaining the XML code file of the workflow, extracting the scheduling information and the parameter information of the workflow, and the method comprises the steps of:
acquiring a workflow XML code file and reading a workflow information set;
judging whether unprocessed workflows exist in the workflow information set, and if unprocessed workflows exist, reading scheduling information and parameter information of the workflows.
3. The method for verifying workflow correctness of a large data platform of claim 2, wherein the scheduling information comprises: data type, initial data time, self-dependence, scheduling time point and pre-table; the parameter information includes: script name, step size, task scheduling priority, and number of retries.
4. A system for verifying workflow correctness of a big data platform, the system comprising:
the information extraction module is used for acquiring the XML code file of the workflow and extracting the scheduling information and the parameter information of the workflow;
the first workflow file generation module is used for generating a first workflow file according to the scheduling information and the parameter information of the workflow;
the data table extraction module is used for acquiring the scheduling script codes and extracting the data tables relied by the scripts;
the second workflow file generation module is used for generating a second workflow file according to the data table;
the verification module is used for judging the correctness of the workflow file by comparing the differences among the first workflow file, the second workflow file and the workflow configuration file, and generating a workflow correctness verification result;
the first workflow file generation module is specifically configured to:
according to the scheduling information and the parameter information of the workflow, analyzing and obtaining the workflow ID, the task type, the task state, the cycle period, the starting time, the dependency relationship and the side information of the creator corresponding to the current task;
according to the side information of the current workflow, sequentially reading a starting point task, an ending point task and a current side ID of all the side information in the current workflow;
finding an ancestral task according to the obtained starting point task, the finishing point task and the current edge ID, marking the ancestral task as a first stage, searching a corresponding starting point task in the edge information according to the finishing point task of the first stage task, taking the corresponding task as a second stage task, and the like until all the edge information is processed, so as to obtain a hierarchical relation among the tasks;
updating the edge ID and the hierarchical relationship to an EXCEL file to form a first workflow file;
the data table extraction module is specifically configured to:
the script information of the workflow XML code file is read, and the workflow information and the task name are obtained according to the annotation of the script information;
reading the content of the scheduling script line by line according to the script information, and analyzing to obtain the name of the data table;
in the names of the data tables, according to tenant name configuration files, obtaining a dependent data table in a keyword matching mode;
the second workflow file generation module is specifically configured to:
obtaining a hierarchical relationship among the scheduling scripts according to the dependent data table names, the target data table names and the workflow information;
and generating a second workflow file in the EXCEL format according to the workflow information, the task name, the hierarchical relationship and the dependency data table.
5. The system for verifying workflow correctness of a large data platform of claim 4, wherein said information extraction module comprises:
the code file acquisition unit is used for acquiring the workflow XML code file and reading the workflow information set;
and the workflow information reading unit is used for judging whether unprocessed workflows exist in the workflow information set, and if the unprocessed workflows exist, reading the scheduling information and the parameter information of the workflows.
6. The system for verifying workflow correctness of a large data platform of claim 5, wherein the scheduling information comprises: data type, initial data time, self-dependence, scheduling time point and pre-table; the parameter information includes: script name, step size, task scheduling priority, and number of retries.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 3 when executing the computer program.
8. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any of claims 1 to 3.
CN202010908992.3A 2020-09-02 2020-09-02 Method and system for checking workflow correctness of big data platform Active CN112035367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010908992.3A CN112035367B (en) 2020-09-02 2020-09-02 Method and system for checking workflow correctness of big data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010908992.3A CN112035367B (en) 2020-09-02 2020-09-02 Method and system for checking workflow correctness of big data platform

Publications (2)

Publication Number Publication Date
CN112035367A CN112035367A (en) 2020-12-04
CN112035367B true CN112035367B (en) 2023-08-18

Family

ID=73591431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010908992.3A Active CN112035367B (en) 2020-09-02 2020-09-02 Method and system for checking workflow correctness of big data platform

Country Status (1)

Country Link
CN (1) CN112035367B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539861A (en) * 2009-05-04 2009-09-23 江西省电力信息通讯有限公司 Tool for graphical design and verification of general workflow
CN105225066A (en) * 2015-10-27 2016-01-06 东软集团股份有限公司 The verification method of workflow legitimacy and demo plant
CN111142933A (en) * 2019-05-29 2020-05-12 浙江大搜车软件技术有限公司 Workflow generation method and device, computer equipment and storage medium
CN111208992A (en) * 2020-01-10 2020-05-29 深圳壹账通智能科技有限公司 System scheduling workflow generation method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110225565A1 (en) * 2010-03-12 2011-09-15 Van Velzen Danny Optimal incremental workflow execution allowing meta-programming

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539861A (en) * 2009-05-04 2009-09-23 江西省电力信息通讯有限公司 Tool for graphical design and verification of general workflow
CN105225066A (en) * 2015-10-27 2016-01-06 东软集团股份有限公司 The verification method of workflow legitimacy and demo plant
CN111142933A (en) * 2019-05-29 2020-05-12 浙江大搜车软件技术有限公司 Workflow generation method and device, computer equipment and storage medium
CN111208992A (en) * 2020-01-10 2020-05-29 深圳壹账通智能科技有限公司 System scheduling workflow generation method and system

Also Published As

Publication number Publication date
CN112035367A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CN109271326B (en) Cloud database testing method and device, equipment and storage medium thereof
US9898387B2 (en) Development tools for logging and analyzing software bugs
US9619373B2 (en) Method and apparatus to semantically connect independent build and test processes
US9588871B1 (en) Method and system for dynamic business rule extraction
US11036491B1 (en) Identifying and resolving firmware component dependencies
BR112015011537B1 (en) STORAGE METHOD AND DEVICE FOR ASSOCIATION OF METADATA WITH SOURCE CODE
CN110018954B (en) Code quality detection method, device and equipment, and code detection quality evaluation method, device and equipment
CN117009231B (en) Automatic generation method and device for high-reliability unit test based on conversational large language model
US20170371687A1 (en) Automated globalization enablement on development operations
CN103257919A (en) Inspection method and device for script programs
US10678572B2 (en) Framework for automated globalization enablement on development operations
CN111367890A (en) Data migration method and device, computer equipment and readable storage medium
AU2017202743A1 (en) Method for automatically validating data against a predefined data specification
CN110688823A (en) XML file verification method and device
CN116991751B (en) Code testing method and device, electronic equipment and storage medium
Yang et al. Multi-Language Software Development: Issues, Challenges, and Solutions
CN113051262A (en) Data quality inspection method, device, equipment and storage medium
CN115599388B (en) API (application program interface) document generation method, storage medium and electronic equipment
CN112035367B (en) Method and system for checking workflow correctness of big data platform
CN116841883A (en) Method, device, electronic equipment and storage medium for code annotation detection
CN109460236B (en) Program version construction and checking method and system
Frick Understanding software changes: Extracting, classifying, and presenting fine-grained source code changes
CN116841906A (en) Intelligent contract detection method and device and electronic equipment
CN107092671B (en) Method and equipment for managing meta information
US20130042224A1 (en) Application analysis device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant