CN112115192A - Efficient flow arrangement method and system for ETL system - Google Patents

Efficient flow arrangement method and system for ETL system Download PDF

Info

Publication number
CN112115192A
CN112115192A CN202011068846.0A CN202011068846A CN112115192A CN 112115192 A CN112115192 A CN 112115192A CN 202011068846 A CN202011068846 A CN 202011068846A CN 112115192 A CN112115192 A CN 112115192A
Authority
CN
China
Prior art keywords
directed acyclic
acyclic graph
node
flow
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011068846.0A
Other languages
Chinese (zh)
Other versions
CN112115192B (en
Inventor
张春林
李利军
李春青
常江波
尚雪松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dongfang tongwangxin Technology Co.,Ltd.
Beijing dongfangtong Software Co.,Ltd.
BEIJING TESTOR TECHNOLOGY Co.,Ltd.
Beijing Tongtech Co Ltd
Original Assignee
Beijing Microvision Technology Co ltd
Beijing Testor Technology Co ltd
Beijing Dongfangtong Software Co ltd
Beijing Tongtech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Microvision Technology Co ltd, Beijing Testor Technology Co ltd, Beijing Dongfangtong Software Co ltd, Beijing Tongtech Co Ltd filed Critical Beijing Microvision Technology Co ltd
Priority to CN202011068846.0A priority Critical patent/CN112115192B/en
Publication of CN112115192A publication Critical patent/CN112115192A/en
Application granted granted Critical
Publication of CN112115192B publication Critical patent/CN112115192B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • G06F9/4451User profiles; Roaming

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a high-efficiency flow arrangement method and a high-efficiency flow arrangement system of an ETL system, wherein the method comprises the following steps: calling metadata corresponding to ETL tasks in an operation flow of the ETL system, and automatically editing the metadata into a directed acyclic graph corresponding to a logical relation according to the logical relation of the operation flow; analyzing the directed acyclic graph in batch to obtain a DAG flow corresponding to the directed acyclic graph, and configuring the DAG flow; and sending the DAG flow to a computing engine, and executing the DAG flow through the computing engine. The system includes modules corresponding to the steps of the protocol.

Description

Efficient flow arrangement method and system for ETL system
Technical Field
The invention provides an efficient flow arrangement method and system of an ETL system, and belongs to the technical field of data warehouses.
Background
The ETL is a process of loading data of a business system into a data warehouse after extraction, cleaning and conversion, and aims to integrate scattered, disordered and non-uniform data in an enterprise and provide an analysis basis for the decision of the enterprise; the ETL process can be developed and completed by any programming language, and because ETL is a very complex process and handwritten programs are not easily managed, more and more enterprises adopt tools to assist in the development of ETL and use the built-in metadata function to store the correspondence between sources and destinations and the conversion rules. The tool can provide a stronger connection function to connect the source terminal and the destination terminal, and developers can develop the data without being familiar with various different platforms and data structures. The existing ETL system has the problems of low flow processing efficiency, large error, difficult tracing of blood relationship and the like in the flow processing process.
Disclosure of Invention
The invention provides a high-efficiency flow arrangement method and a high-efficiency flow arrangement system of an ETL system, which are used for solving the problems of low flow processing efficiency, large error and difficult tracing of blood relationship in the flow processing process of the conventional ETL system, and adopt the following technical scheme:
a method for efficient flow orchestration for an ETL system, the method comprising:
calling metadata corresponding to ETL tasks in an operation flow of the ETL system, and automatically editing the metadata into a directed acyclic graph corresponding to a logical relation according to the logical relation of the operation flow;
analyzing the directed acyclic graph in batch to obtain a DAG flow corresponding to the directed acyclic graph, and configuring the DAG flow;
and sending the DAG flow to a computing engine, and executing the DAG flow through the computing engine.
Furthermore, calling metadata corresponding to the ETL task in the operation flow of the ETL system, and automatically editing the metadata into a directed acyclic graph corresponding to the logical relationship according to the logical relationship of the operation flow, including:
analyzing the operation flows of the ETL system, obtaining a node variable corresponding to each operation flow, and inputting the node variable into a database with indexes in the ETL system;
calling metadata corresponding to an ETL task in an operation flow of the ETL system in the database according to the node variables and the indexes;
traversing all metadata, identifying data contents of the metadata, and logically arranging the metadata according to a logical relationship embodied by the data contents of the metadata to obtain a data sequence with the logical relationship corresponding to the metadata;
sequencing according to the sequence of the data sequence, and tracing the nodes corresponding to the metadata; obtaining nodes corresponding to the metadata arranged in sequence;
and automatically combining the nodes according to the logical relationship corresponding to the data sequence to generate a corresponding directed acyclic graph.
Further, performing batch analysis on the directed acyclic graph to obtain a DAG flow corresponding to the directed acyclic graph, and configuring the DAG flow, including:
setting a plurality of analysis buffer areas and setting the extraction time interval of the directed acyclic graph;
extracting the directed acyclic graph into the analysis buffer areas according to a preset extraction time interval, wherein in an analysis process, each analysis buffer area corresponds to one directed acyclic graph;
analyzing the directed acyclic graph in each analysis buffer area to obtain task contents corresponding to each node contained in the directed acyclic graph, and task relationships and jump conditions among the nodes;
calculating the radiation weight between the node and each node of other directed acyclic graphs by using the number of nodes contained in the directed acyclic graph and the task relationship,
adding and summing radiation weights corresponding to all nodes contained in the directed acyclic graph to obtain a total radiation weight value corresponding to the directed acyclic graph;
and performing DAG flow configuration on the directed acyclic graph according to the magnitude of the total radiation weight value and a flow configuration rule.
Further, calculating the radiation weight between the node and each node of other directed acyclic graphs by using the number of nodes contained in the directed acyclic graph and the task relationship, including:
obtaining the radiation weight by using the following formula:
Figure 843044DEST_PATH_IMAGE001
wherein the content of the first and second substances,Prepresenting the radiation weight between each node contained in the directed acyclic graph and other directed acyclic graphs;G i represents the directed acyclic graphiRadiometric between each node and other directed acyclic graphs;nrepresenting the number of other directed acyclic graphs having a task relationship with each node in the directed acyclic graph;G 0representing an average value of radiance corresponding to each node of the directed acyclic graph;G maxrepresenting a radiance maximum corresponding to each node of the directed acyclic graph;G minrepresenting a radiance minimum corresponding to each node of the directed acyclic graph;hrepresenting that the node in the directed acyclic graph has a direct task relationship with the node in one other directed acyclic graph with a task relationshipThe number of nodes;
Figure 494606DEST_PATH_IMAGE002
representing the number of nodes existing in the directed acyclic graph;
Figure 8764DEST_PATH_IMAGE003
indicating the number of nodes existing in the other directed acyclic graphs.
Further, performing DAG flow configuration on the directed acyclic graph according to the total radiation weight value and a flow configuration rule, including:
sending the analyzed directed acyclic graph with the total radiation weight value to a configuration buffer area;
arranging the directed acyclic graphs in the configuration buffer zone according to the sequence of the total radiation weight values from high to low, and obtaining a directed acyclic graph list after finishing the arrangement;
setting a DAG process configuration time interval, wherein the DAG process configuration time interval meets the following conditions:
Figure 138394DEST_PATH_IMAGE004
wherein the content of the first and second substances,Trepresenting a DAG flow configuration time interval;T 1representing an extraction time interval of the directed acyclic graph,T 2representing the average time length for analyzing each directed acyclic graph;
and sequentially configuring DAG flows for the directed acyclic graphs according to the sequence of the directed acyclic graph list according to the set DAG flow configuration time interval.
An efficient process orchestration system for an ETL system, the system comprising:
the automatic generation module is used for calling metadata corresponding to the ETL task in the operation flow of the ETL system and automatically editing the metadata into a directed acyclic graph corresponding to the logical relationship according to the logical relationship of the operation flow;
the analysis configuration module is used for carrying out batch analysis on the directed acyclic graph to obtain a DAG flow corresponding to the directed acyclic graph and configuring the DAG flow;
and the sending module is used for sending the DAG flow to a computing engine and executing the DAG flow through the computing engine.
Further, the automatic generation module includes:
the operation flow analysis module is used for analyzing the operation flows of the ETL system, obtaining node variables corresponding to each operation flow, and inputting the node variables into a database with indexes in the ETL system;
the retrieval module is used for retrieving metadata corresponding to the ETL task in the operation flow of the ETL system according to the node variable and the index in the database;
the identification and arrangement module is used for traversing all metadata, identifying the data content of the metadata, and logically arranging the metadata according to the logical relationship embodied by the data content of the metadata to obtain a data sequence with the logical relationship corresponding to the metadata;
the node acquisition module is used for sequencing according to the sequence of the data sequence and tracing the nodes corresponding to the metadata; obtaining nodes corresponding to the metadata arranged in sequence;
and the generating module is used for automatically combining the nodes to generate a corresponding directed acyclic graph according to the logical relationship corresponding to the data sequence.
Further, the parsing configuration module comprises:
the time interval setting module is used for setting a plurality of analysis buffer areas and setting the extraction time interval of the directed acyclic graph;
the extraction module is used for extracting the directed acyclic graph into the analysis buffer areas according to a preset extraction time interval, and in an analysis process, each analysis buffer area corresponds to one directed acyclic graph;
the analysis module is used for analyzing the directed acyclic graph in each analysis buffer area to obtain task contents corresponding to each node contained in the directed acyclic graph, and task relationships and jump conditions among the nodes;
the weight calculation module is used for calculating the radiation weight between the node and each node of other directed acyclic graphs by utilizing the number of the nodes contained in the directed acyclic graphs and the task relationship;
a total weight value obtaining module, configured to add and sum radiation weights corresponding to all nodes included in the directed acyclic graph, and obtain a total radiation weight value corresponding to the directed acyclic graph;
and the configuration module is used for carrying out DAG flow configuration on the directed acyclic graph according to the total radiation weight value and the flow configuration rule.
Further, the weight calculation module obtains the radiation weight by using the following formula:
Figure 296974DEST_PATH_IMAGE005
wherein the content of the first and second substances,Prepresenting the radiation weight between each node contained in the directed acyclic graph and other directed acyclic graphs;G i represents the directed acyclic graphiRadiometric between each node and other directed acyclic graphs;nrepresenting the number of other directed acyclic graphs having a task relationship with each node in the directed acyclic graph;G 0representing an average value of radiance corresponding to each node of the directed acyclic graph;G maxrepresenting a radiance maximum corresponding to each node of the directed acyclic graph;G minrepresenting a radiance minimum corresponding to each node of the directed acyclic graph;hrepresenting the number of nodes which have direct task relations with the nodes in the directed acyclic graph and between the nodes in the directed acyclic graph and one other directed acyclic graph which has task relations;
Figure 853857DEST_PATH_IMAGE002
representing the number of nodes existing in the directed acyclic graph;
Figure 714366DEST_PATH_IMAGE003
indicating the number of nodes existing in the other directed acyclic graphs.
Further, the configuration module includes:
the directed acyclic graph sending module is used for sending the directed acyclic graph which is well analyzed and obtains the total radiation weight value to the configuration buffer area;
the sequencing module is used for sequencing the directed acyclic graphs in the configuration buffer zone according to the sequence of the total radiation weight values from high to low, and obtaining a directed acyclic graph list after sequencing is completed;
the configuration time setting module is used for setting DAG process configuration time intervals, and the DAG process configuration time intervals meet the following conditions:
Figure 647686DEST_PATH_IMAGE004
wherein the content of the first and second substances,Trepresenting a DAG flow configuration time interval;T 1representing an extraction time interval of the directed acyclic graph,T 2representing the average time length for analyzing each directed acyclic graph;
and the flow configuration module is used for sequentially configuring the DAG flows for the directed acyclic graphs according to the sequence of the directed acyclic graph list according to the set DAG flow configuration time interval.
The invention has the beneficial effects that:
the invention provides a high-efficiency flow arrangement method and a high-efficiency flow arrangement system of an ETL system, wherein the flow processing efficiency of the ETL system can be effectively improved by automatically generating a directed acyclic graph and adding an analysis processing mode of radiation weight into the directed acyclic graph in the flow arrangement process; on the other hand, in the process of programming, by adding the element of the radiation weight, the important task flow can be efficiently and preferentially processed, and meanwhile, the blood relationship between each directed acyclic graph can be clearly analyzed in the process of processing the task flow through the radiation weight, so that the time and the error rate of analyzing and searching the subsequent blood relationship of the ETL system are greatly reduced, and the efficiency and the accuracy of analyzing and searching the blood relationship are improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic structural diagram of the system of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The invention provides an efficient flow arrangement method and system of an ETL system, which are used for solving the problems of low flow processing efficiency, large error and difficult tracing of blood relationship in the flow processing process of the conventional ETL system.
The embodiment of the invention provides an efficient flow arrangement method of an ETL system, which comprises the following steps of:
s1, calling metadata corresponding to ETL tasks in the operation process of the ETL system, and automatically editing the metadata into a directed acyclic graph corresponding to the logical relationship according to the logical relationship of the operation process;
s2, carrying out batch analysis on the directed acyclic graph to obtain a DAG flow corresponding to the directed acyclic graph, and configuring the DAG flow;
s3, sending the DAG flow to a calculation engine, and executing the DAG flow through the calculation engine.
The working principle of the technical scheme is as follows: firstly, calling metadata corresponding to ETL tasks in an operation flow of the ETL system, and automatically editing the metadata into a directed acyclic graph corresponding to a logical relation according to the logical relation of the operation flow; then, carrying out batch analysis on the directed acyclic graph to obtain a DAG flow corresponding to the directed acyclic graph, and configuring the DAG flow; and finally, sending the DAG flow to a computing engine, and executing the DAG flow through the computing engine.
The effect of the above technical scheme is as follows: the flow processing efficiency of the ETL system can be effectively improved by automatically generating the directed acyclic graph and adding the analysis processing mode of the radiation weight into the directed acyclic graph in the process of flow arrangement; on the other hand, in the process of programming, by adding the element of the radiation weight, the important task flow can be efficiently and preferentially processed, and meanwhile, the blood relationship between each directed acyclic graph can be clearly analyzed in the process of processing the task flow through the radiation weight, so that the time and the error rate of analyzing and searching the subsequent blood relationship of the ETL system are greatly reduced, and the efficiency and the accuracy of analyzing and searching the blood relationship are improved.
In an embodiment of the present invention, retrieving metadata corresponding to an ETL task in an operation flow of the ETL system, and automatically editing the metadata into a directed acyclic graph corresponding to a logical relationship according to the logical relationship of the operation flow, includes:
s101, analyzing the operation flows of the ETL system to obtain a node variable corresponding to each operation flow, and inputting the node variable into a database with indexes in the ETL system;
s102, calling metadata corresponding to ETL tasks in the operation flow of the ETL system in the database according to the node variables and the indexes;
s103, traversing all metadata, identifying data contents of the metadata, logically arranging the metadata according to a logical relationship embodied by the data contents of the metadata, and obtaining a data sequence with the logical relationship corresponding to the metadata;
s104, sorting according to the sequence of the data sequence, and tracing the nodes corresponding to the metadata; obtaining nodes corresponding to the metadata arranged in sequence;
and S105, automatically combining the nodes to generate a corresponding directed acyclic graph according to the logical relation corresponding to the data sequence.
The working principle of the technical scheme is as follows: firstly, analyzing the operation flows of the ETL system to obtain a node variable corresponding to each operation flow, and inputting the node variable into a database with an index in the ETL system; then, calling metadata corresponding to the ETL task in the operation flow of the ETL system in the database according to the node variables and the indexes; then, traversing all metadata, identifying data contents of the metadata, and logically arranging the metadata according to a logical relationship embodied by the data contents of the metadata to obtain a data sequence with the logical relationship corresponding to the metadata; then, sorting according to the sequence of the data sequence, and tracing the nodes corresponding to the metadata; obtaining nodes corresponding to the metadata arranged in sequence; and finally, automatically combining the nodes according to the logical relation corresponding to the data sequence to generate a corresponding directed acyclic graph.
The effect of the above technical scheme is as follows: the efficiency of the operation flow processing of the ETL system can be effectively improved by automatically generating the directed acyclic graph, and the time and labor wasted by manual flow arrangement operation are saved. Meanwhile, the directed acyclic graph generated through metadata identification and logical relations has high accuracy, and errors generated during manual process arrangement are effectively avoided.
In an embodiment of the present invention, performing batch parsing on the directed acyclic graph to obtain a DAG flow corresponding to the directed acyclic graph, and configuring the DAG flow, includes:
s201, setting a plurality of analysis buffer areas and setting an extraction time interval of the directed acyclic graph;
s202, extracting the directed acyclic graph into the analysis buffer areas according to a preset extraction time interval, wherein in an analysis process, each analysis buffer area corresponds to one directed acyclic graph;
s203, analyzing the directed acyclic graph in each analysis buffer area to obtain task contents corresponding to each node contained in the directed acyclic graph, and task relationships and jumping conditions among the nodes;
s204, calculating the radiation weight between the node and each node of other directed acyclic graphs by using the number of the nodes contained in the directed acyclic graphs and the task relationship,
s205, adding and summing the radiation weights corresponding to all nodes contained in the directed acyclic graph to obtain a total radiation weight value corresponding to the directed acyclic graph;
s206, performing DAG flow configuration on the directed acyclic graph according to the total radiation weight value and a flow configuration rule.
The working principle of the technical scheme is as follows: in this embodiment, the analysis process of each directed acyclic graph can be effectively isolated by setting the analysis buffer areas corresponding to the directed acyclic graphs, so as to avoid interference in the analysis processes of different directed acyclic graphs, the number of directed acyclic graphs extracted at each extraction time interval depends on the number of the analysis buffer areas, and the directed acyclic graphs corresponding to the number of the analysis buffer areas are extracted and correspondingly sent to each analysis buffer area when each extraction time interval is reached. Meanwhile, the popularity and the connection compactness of each node and other directed acyclic graphs in the directed acyclic graphs, which are in radiation connection (namely, task relation), can be obtained through the calculation of the radiation weight. And determining the breadth and the connection closeness of the task relationship between each directed acyclic graph and other directed acyclic graphs through a total radiation weight value, wherein the higher the total radiation weight value is, the more the task relationship of the directed acyclic graph is, the more complex the directed acyclic graph is, and the closeness between the directed acyclic graph and other tasks is larger. And providing priority for the flow processing corresponding to the subsequent directed acyclic graph through the total radiation weight value, namely performing preferential flow configuration and processing on the directed acyclic graph with the more complex task relationship and the greater closeness between the more complex task relationship and other tasks.
The effect of the above technical scheme is as follows: through the determination of the total radiation weighted value, the whole ETL system can be effectively screened, more important task flows are effectively screened, priority processing is carried out on more important and complicated task flows, the flow processing efficiency of the ETL system can be effectively improved, meanwhile, the flow processing mode can be used for accurately and effectively screening important and complicated tasks in the early stage of flow configuration, the complexity and the importance degree corresponding to the task flows acquired through the total radiation weighted value are more accurate and objective, the judgment error caused by subjective factors in the manual judgment process is avoided, and the accuracy and the efficiency of important task screening are effectively improved. Meanwhile, once the task data of the ETL system is processed, the selected important and complex tasks can be subjected to error troubleshooting preferentially, manpower, material resources and time consumed by the error troubleshooting of a large amount of task data are avoided, and the efficiency of troubleshooting and maintenance of the ETL system is improved.
On the other hand, in the process of programming, by adding the element of the radiation weight, the important task flow can be efficiently and preferentially processed, and meanwhile, the blood relationship between each directed acyclic graph can be clearly analyzed in the process of processing the task flow through the radiation weight, so that the time and the error rate of analyzing and searching the subsequent blood relationship of the ETL system are greatly reduced, and the efficiency and the accuracy of analyzing and searching the blood relationship are improved.
In an embodiment of the present invention, calculating the radiation weight between the node and each node of other directed acyclic graphs by using the number of nodes included in the directed acyclic graph and the task relationship includes:
obtaining the radiation weight by using the following formula:
Figure 50986DEST_PATH_IMAGE005
wherein the content of the first and second substances,Prepresenting the radiation weight between each node contained in the directed acyclic graph and other directed acyclic graphs;G i represents the directed acyclic graphiRadiometric between each node and other directed acyclic graphs;nrepresenting the number of other directed acyclic graphs having a task relationship with each node in the directed acyclic graph;G 0representing an average value of radiance corresponding to each node of the directed acyclic graph;G maxrepresenting a radiance maximum corresponding to each node of the directed acyclic graph;G minrepresenting a radiance minimum corresponding to each node of the directed acyclic graph;hrepresenting the number of nodes which have direct task relations with the nodes in the directed acyclic graph and between the nodes in the directed acyclic graph and one other directed acyclic graph which has task relations;
Figure 778771DEST_PATH_IMAGE002
representing the number of nodes existing in the directed acyclic graph;
Figure 375843DEST_PATH_IMAGE003
indicating the number of nodes existing in the other directed acyclic graphs.
The working principle of the technical scheme is as follows: and calculating the radiation weight between the node and each node of other directed acyclic graphs by using the number of the nodes contained in the directed acyclic graph and the task relationship. The higher the radiation weight value is, the more the number of other directed acyclic graphs which represent that the nodes have the task relationship is, the more the task relationship which represents the nodes is wide, and the stronger the association compactness is, the more the task relationship is complex.
The effect of the above technical scheme is as follows: the radiation weight obtained by the formula can accurately and effectively reflect the task relationship compactness between each node and other directed acyclic graphs in each directed acyclic graph and the task relationship universality of the node. The method has the advantages that important and complex tasks can be accurately and effectively screened in the early stage of flow configuration, the complexity and the importance degree corresponding to the task flow acquired through the total radiation weight value are more objective and accurate, judgment errors caused by subjective factors in the manual judgment process are avoided, and the accuracy and the efficiency of important task screening are effectively improved.
In an embodiment of the present invention, performing DAG flow configuration on the directed acyclic graph according to the magnitude of the total radiation weight value and a flow configuration rule, includes:
s2061, sending the analyzed directed acyclic graph with the total radiation weight value to a configuration buffer area;
s2062, arranging the directed acyclic graphs in the configuration buffer zone according to the sequence that the total radiation weight value is from high to low, and obtaining a directed acyclic graph list after arrangement is completed;
s2063, setting a DAG process configuration time interval, wherein the DAG process configuration time interval meets the following conditions:
Figure 847275DEST_PATH_IMAGE004
wherein the content of the first and second substances,Trepresenting a DAG flow configuration time interval;T 1representing an extraction time interval of the directed acyclic graph,T 2representing the average time length for analyzing each directed acyclic graph; in the present embodiment, it is preferable thatT =0.682(T 1+ T 2) Is less preferablyT =0.724(T 1+ T 2);
S2064, configuring time intervals according to the set DAG flows, and sequentially configuring the DAG flows for the directed acyclic graphs according to the directed acyclic graph list sequence.
The working principle of the technical scheme is as follows: firstly, sending the analyzed directed acyclic graph with the total radiation weight value to a configuration buffer area; then, arranging the directed acyclic graphs in the configuration buffer according to the sequence of the total radiation weight values from high to low, and obtaining a directed acyclic graph list after finishing arrangement; and finally, according to the set DAG flow configuration time interval, sequentially carrying out DAG flow configuration on the directed acyclic graph according to the directed acyclic graph list sequence.
The effect of the above technical scheme is as follows: and sending the directed acyclic graph after the analysis in each analysis buffer area to a configuration buffer area, and performing DAG flow configuration on the directed acyclic graph according to the total radiation weight value corresponding to the directed acyclic graph. When the total radiation weight value corresponding to the directed acyclic graph is larger, DAG flow configuration is performed preferentially. Meanwhile, through setting of a DAG flow configuration time interval, the interval duration between the configuration time and the analysis time is effectively coordinated, the configuration speed and the analysis speed are coordinated, the directed acyclic graph which needs to be configured at the time can be effectively configured, and the phenomenon that the directed acyclic graph with a lower total radiation weight value cannot be configured in a delayed way due to the fact that the total radiation weight value which is newly appeared in the configuration buffer area is higher and the directed acyclic graphs which are processed preferentially are too much in the directed acyclic graph configuration process is avoided.
An efficient process orchestration system for an ETL system, as shown in fig. 2, the system comprising:
the automatic generation module is used for calling metadata corresponding to the ETL task in the operation flow of the ETL system and automatically editing the metadata into a directed acyclic graph corresponding to the logical relationship according to the logical relationship of the operation flow;
the analysis configuration module is used for carrying out batch analysis on the directed acyclic graph to obtain a DAG flow corresponding to the directed acyclic graph and configuring the DAG flow;
and the sending module is used for sending the DAG flow to a computing engine and executing the DAG flow through the computing engine.
The working principle of the technical scheme is as follows: firstly, an automatic generation module calls metadata corresponding to ETL tasks in an operation flow of the ETL system, and automatically edits the metadata into a directed acyclic graph corresponding to a logical relation according to the logical relation of the operation flow; then, an analysis configuration module performs batch analysis on the directed acyclic graph to obtain a DAG flow corresponding to the directed acyclic graph, and configures the DAG flow; and finally, the sending module sends the DAG flow to a computing engine, and the computing engine executes the DAG flow.
The effect of the above technical scheme is as follows: the flow processing efficiency of the ETL system can be effectively improved by automatically generating the directed acyclic graph and adding the analysis processing mode of the radiation weight into the directed acyclic graph in the process of flow arrangement; on the other hand, in the process of programming, by adding the element of the radiation weight, the important task flow can be efficiently and preferentially processed, and meanwhile, the blood relationship between each directed acyclic graph can be clearly analyzed in the process of processing the task flow through the radiation weight, so that the time and the error rate of analyzing and searching the subsequent blood relationship of the ETL system are greatly reduced, and the efficiency and the accuracy of analyzing and searching the blood relationship are improved.
In one embodiment of the present invention, the automatic generation module includes:
the operation flow analysis module is used for analyzing the operation flows of the ETL system, obtaining node variables corresponding to each operation flow, and inputting the node variables into a database with indexes in the ETL system;
the retrieval module is used for retrieving metadata corresponding to the ETL task in the operation flow of the ETL system according to the node variable and the index in the database;
the identification and arrangement module is used for traversing all metadata, identifying the data content of the metadata, and logically arranging the metadata according to the logical relationship embodied by the data content of the metadata to obtain a data sequence with the logical relationship corresponding to the metadata;
the node acquisition module is used for sequencing according to the sequence of the data sequence and tracing the nodes corresponding to the metadata; obtaining nodes corresponding to the metadata arranged in sequence;
and the generating module is used for automatically combining the nodes to generate a corresponding directed acyclic graph according to the logical relationship corresponding to the data sequence.
The working principle of the technical scheme is as follows: firstly, analyzing the operation flow of the ETL system by using an operation flow analyzing module to obtain a node variable corresponding to each operation flow, and inputting the node variable into a database with an index in the ETL system; then, calling metadata corresponding to the ETL task in the operation flow of the ETL system in the database through a calling module according to the node variable and the index; then, traversing all metadata by using an identification and arrangement module, identifying the data content of the metadata, and logically arranging the metadata according to the logical relationship embodied by the data content of the metadata to obtain a data sequence with the logical relationship corresponding to the metadata; then, a node acquisition module is adopted to sort according to the sequence of the data sequence, and the nodes corresponding to the metadata are traced back; obtaining nodes corresponding to the metadata arranged in sequence; and finally, automatically combining the nodes to generate a corresponding directed acyclic graph through a generating module according to the logical relation corresponding to the data sequence.
The effect of the above technical scheme is as follows: the efficiency of the operation flow processing of the ETL system can be effectively improved by automatically generating the directed acyclic graph, and the time and labor wasted by manual flow arrangement operation are saved. Meanwhile, the directed acyclic graph generated through metadata identification and logical relations has high accuracy, and errors generated during manual process arrangement are effectively avoided.
In an embodiment of the present invention, the parsing configuration module includes:
the time interval setting module is used for setting a plurality of analysis buffer areas and setting the extraction time interval of the directed acyclic graph;
the extraction module is used for extracting the directed acyclic graph into the analysis buffer areas according to a preset extraction time interval, and in an analysis process, each analysis buffer area corresponds to one directed acyclic graph;
the analysis module is used for analyzing the directed acyclic graph in each analysis buffer area to obtain task contents corresponding to each node contained in the directed acyclic graph, and task relationships and jump conditions among the nodes;
the weight calculation module is used for calculating the radiation weight between the node and each node of other directed acyclic graphs by utilizing the number of the nodes contained in the directed acyclic graphs and the task relationship;
a total weight value obtaining module, configured to add and sum radiation weights corresponding to all nodes included in the directed acyclic graph, and obtain a total radiation weight value corresponding to the directed acyclic graph;
and the configuration module is used for carrying out DAG flow configuration on the directed acyclic graph according to the total radiation weight value and the flow configuration rule.
The working principle of the technical scheme is as follows: firstly, analyzing the directed acyclic graph in each analysis buffer area through an analysis module to obtain task contents corresponding to each node contained in the directed acyclic graph, and task relationships and jump conditions among the nodes; then, calculating the radiation weight between the node and each node of other directed acyclic graphs by using the number of the nodes contained in the directed acyclic graphs and the task relationship through a weight calculation module; then, adding and summing the radiation weights corresponding to all nodes contained in the directed acyclic graph by using a total weight value acquisition module to acquire a total radiation weight value corresponding to the directed acyclic graph; and finally, performing DAG flow configuration on the directed acyclic graph through a configuration module according to the total radiation weight value and a flow configuration rule.
The effect of the above technical scheme is as follows: through the determination of the total radiation weighted value, the whole ETL system can be effectively screened, more important task flows are effectively screened, priority processing is carried out on more important and complicated task flows, the flow processing efficiency of the ETL system can be effectively improved, meanwhile, the flow processing mode can be used for accurately and effectively screening important and complicated tasks in the early stage of flow configuration, the complexity and the importance degree corresponding to the task flows acquired through the total radiation weighted value are more accurate and objective, the judgment error caused by subjective factors in the manual judgment process is avoided, and the accuracy and the efficiency of important task screening are effectively improved. Meanwhile, once the task data of the ETL system is processed, the selected important and complex tasks can be subjected to error troubleshooting preferentially, manpower, material resources and time consumed by the error troubleshooting of a large amount of task data are avoided, and the efficiency of troubleshooting and maintenance of the ETL system is improved.
On the other hand, in the process of programming, by adding the element of the radiation weight, the important task flow can be efficiently and preferentially processed, and meanwhile, the blood relationship between each directed acyclic graph can be clearly analyzed in the process of processing the task flow through the radiation weight, so that the time and the error rate of analyzing and searching the subsequent blood relationship of the ETL system are greatly reduced, and the efficiency and the accuracy of analyzing and searching the blood relationship are improved.
In an embodiment of the present invention, the weight calculating module obtains the radiation weight by using the following formula:
Figure 105081DEST_PATH_IMAGE001
wherein the content of the first and second substances,Prepresenting the radiation weight between each node contained in the directed acyclic graph and other directed acyclic graphs;G i represents the directed acyclic graphiRadiometric between each node and other directed acyclic graphs;nrepresenting the number of other directed acyclic graphs having a task relationship with each node in the directed acyclic graph;G 0representing an average value of radiance corresponding to each node of the directed acyclic graph;G maxrepresenting a radiance maximum corresponding to each node of the directed acyclic graph;G minrepresenting a radiance minimum corresponding to each node of the directed acyclic graph;hrepresenting the number of nodes which have direct task relations with the nodes in the directed acyclic graph and between the nodes in the directed acyclic graph and one other directed acyclic graph which has task relations;
Figure 128401DEST_PATH_IMAGE002
representing the number of nodes existing in the directed acyclic graph;
Figure 104447DEST_PATH_IMAGE003
indicating the number of nodes existing in the other directed acyclic graphs.
The working principle of the technical scheme is as follows: and calculating the radiation weight between the node and each node of other directed acyclic graphs by using the number of the nodes contained in the directed acyclic graph and the task relationship. The higher the radiation weight value is, the more the number of other directed acyclic graphs which represent that the nodes have the task relationship is, the more the task relationship which represents the nodes is wide, and the stronger the association compactness is, the more the task relationship is complex.
The effect of the above technical scheme is as follows: the radiation weight obtained by the formula can accurately and effectively reflect the task relationship compactness between each node and other directed acyclic graphs in each directed acyclic graph and the task relationship universality of the node. The method has the advantages that important and complex tasks can be accurately and effectively screened in the early stage of flow configuration, the complexity and the importance degree corresponding to the task flow acquired through the total radiation weight value are more objective and accurate, judgment errors caused by subjective factors in the manual judgment process are avoided, and the accuracy and the efficiency of important task screening are effectively improved.
In one embodiment of the present invention, the configuration module includes:
the directed acyclic graph sending module is used for sending the directed acyclic graph which is well analyzed and obtains the total radiation weight value to the configuration buffer area;
the sequencing module is used for sequencing the directed acyclic graphs in the configuration buffer zone according to the sequence of the total radiation weight values from high to low, and obtaining a directed acyclic graph list after sequencing is completed;
the configuration time setting module is used for setting DAG process configuration time intervals, and the DAG process configuration time intervals meet the following conditions:
Figure 113992DEST_PATH_IMAGE004
wherein the content of the first and second substances,Trepresenting a DAG flow configuration time interval;T 1representing an extraction time interval of the directed acyclic graph,T 2representing the average time length for analyzing each directed acyclic graph;
and the flow configuration module is used for sequentially configuring the DAG flows for the directed acyclic graphs according to the sequence of the directed acyclic graph list according to the set DAG flow configuration time interval.
The working principle of the technical scheme is as follows: firstly, a directed acyclic graph sending module is used for sending the directed acyclic graph which is well analyzed and obtains a total radiation weight value to a configuration buffer area; then, arranging the directed acyclic graphs in the configuration buffer area through a sorting module according to the sequence of the total radiation weight values from high to low, and obtaining a directed acyclic graph list after arrangement is completed; then, a configuration time setting module is adopted to set a DAG flow configuration time interval; and finally, sequentially configuring DAG flows for the directed acyclic graphs according to the sequence of the directed acyclic graph list through a flow configuration module according to the set DAG flow configuration time interval.
The effect of the above technical scheme is as follows: and sending the directed acyclic graph after the analysis in each analysis buffer area to a configuration buffer area, and performing DAG flow configuration on the directed acyclic graph according to the total radiation weight value corresponding to the directed acyclic graph. When the total radiation weight value corresponding to the directed acyclic graph is larger, DAG flow configuration is performed preferentially. Meanwhile, through setting of a DAG flow configuration time interval, the interval duration between the configuration time and the analysis time is effectively coordinated, the configuration speed and the analysis speed are coordinated, the directed acyclic graph which needs to be configured at the time can be effectively configured, and the phenomenon that the directed acyclic graph with a lower total radiation weight value cannot be configured in a delayed way due to the fact that the total radiation weight value which is newly appeared in the configuration buffer area is higher and the directed acyclic graphs which are processed preferentially are too much in the directed acyclic graph configuration process is avoided.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method for efficient process orchestration for an ETL system, the method comprising:
calling metadata corresponding to ETL tasks in an operation flow of the ETL system, and automatically editing the metadata into a directed acyclic graph corresponding to a logical relation according to the logical relation of the operation flow;
analyzing the directed acyclic graph in batch to obtain a DAG flow corresponding to the directed acyclic graph, and configuring the DAG flow;
and sending the DAG flow to a computing engine, and executing the DAG flow through the computing engine.
2. The method of claim 1, wherein retrieving metadata corresponding to an ETL task in an operation flow of the ETL system, and automatically editing the metadata into a directed acyclic graph corresponding to a logical relationship according to the logical relationship of the operation flow comprises:
analyzing the operation flows of the ETL system, obtaining a node variable corresponding to each operation flow, and inputting the node variable into a database with indexes in the ETL system;
calling metadata corresponding to an ETL task in an operation flow of the ETL system in the database according to the node variables and the indexes;
traversing all metadata, identifying data contents of the metadata, and logically arranging the metadata according to a logical relationship embodied by the data contents of the metadata to obtain a data sequence with the logical relationship corresponding to the metadata;
sequencing according to the sequence of the data sequence, and tracing the nodes corresponding to the metadata; obtaining nodes corresponding to the metadata arranged in sequence;
and automatically combining the nodes according to the logical relationship corresponding to the data sequence to generate a corresponding directed acyclic graph.
3. The method of claim 1, wherein the performing batch parsing on the directed acyclic graph to obtain a DAG flow corresponding to the directed acyclic graph, and configuring the DAG flow comprises:
setting a plurality of analysis buffer areas and setting the extraction time interval of the directed acyclic graph;
extracting the directed acyclic graph into the analysis buffer areas according to a preset extraction time interval, wherein in an analysis process, each analysis buffer area corresponds to one directed acyclic graph;
analyzing the directed acyclic graph in each analysis buffer area to obtain task contents corresponding to each node contained in the directed acyclic graph, and task relationships and jump conditions among the nodes;
calculating the radiation weight between the node and each node of other directed acyclic graphs by using the number of nodes contained in the directed acyclic graph and the task relationship,
adding and summing radiation weights corresponding to all nodes contained in the directed acyclic graph to obtain a total radiation weight value corresponding to the directed acyclic graph;
and performing DAG flow configuration on the directed acyclic graph according to the magnitude of the total radiation weight value and a flow configuration rule.
4. The method according to claim 3, wherein calculating the radiation weight between the node and each node of other directed acyclic graphs by using the directed acyclic graph containing the number of nodes and the task relationship comprises:
obtaining the radiation weight by using the following formula:
Figure 240475DEST_PATH_IMAGE001
wherein the content of the first and second substances,Prepresenting the radiation weight between each node contained in the directed acyclic graph and other directed acyclic graphs;G i represents the directed acyclic graphiA node andradiance between other directed acyclic graphs;nrepresenting the number of other directed acyclic graphs having a task relationship with each node in the directed acyclic graph;G 0representing an average value of radiance corresponding to each node of the directed acyclic graph;G maxrepresenting a radiance maximum corresponding to each node of the directed acyclic graph;G minrepresenting a radiance minimum corresponding to each node of the directed acyclic graph;hrepresenting the number of nodes which have direct task relations with the nodes in the directed acyclic graph and between the nodes in the directed acyclic graph and one other directed acyclic graph which has task relations;
Figure 423195DEST_PATH_IMAGE002
representing the number of nodes existing in the directed acyclic graph;
Figure 468511DEST_PATH_IMAGE003
indicating the number of nodes existing in the other directed acyclic graphs.
5. The method of claim 3, wherein performing DAG flow configuration on the directed acyclic graph according to the total radiation weight value and a flow configuration rule comprises:
sending the analyzed directed acyclic graph with the total radiation weight value to a configuration buffer area;
arranging the directed acyclic graphs in the configuration buffer zone according to the sequence of the total radiation weight values from high to low, and obtaining a directed acyclic graph list after finishing the arrangement;
setting a DAG process configuration time interval, wherein the DAG process configuration time interval meets the following conditions:
Figure 129299DEST_PATH_IMAGE004
wherein the content of the first and second substances,Trepresenting a DAG flow configuration time interval;T 1represents the aboveThe extraction time interval of the directed acyclic graph,T 2representing the average time length for analyzing each directed acyclic graph;
and sequentially configuring DAG flows for the directed acyclic graphs according to the sequence of the directed acyclic graph list according to the set DAG flow configuration time interval.
6. An efficient process orchestration system for an ETL system, the system comprising:
the automatic generation module is used for calling metadata corresponding to the ETL task in the operation flow of the ETL system and automatically editing the metadata into a directed acyclic graph corresponding to the logical relationship according to the logical relationship of the operation flow;
the analysis configuration module is used for carrying out batch analysis on the directed acyclic graph to obtain a DAG flow corresponding to the directed acyclic graph and configuring the DAG flow;
and the sending module is used for sending the DAG flow to a computing engine and executing the DAG flow through the computing engine.
7. The method of claim 6, wherein the automatic generation module comprises:
the operation flow analysis module is used for analyzing the operation flows of the ETL system, obtaining node variables corresponding to each operation flow, and inputting the node variables into a database with indexes in the ETL system;
the retrieval module is used for retrieving metadata corresponding to the ETL task in the operation flow of the ETL system according to the node variable and the index in the database;
the identification and arrangement module is used for traversing all metadata, identifying the data content of the metadata, and logically arranging the metadata according to the logical relationship embodied by the data content of the metadata to obtain a data sequence with the logical relationship corresponding to the metadata;
the node acquisition module is used for sequencing according to the sequence of the data sequence and tracing the nodes corresponding to the metadata; obtaining nodes corresponding to the metadata arranged in sequence;
and the generating module is used for automatically combining the nodes to generate a corresponding directed acyclic graph according to the logical relationship corresponding to the data sequence.
8. The method of claim 6, wherein parsing the configuration module comprises:
the time interval setting module is used for setting a plurality of analysis buffer areas and setting the extraction time interval of the directed acyclic graph;
the extraction module is used for extracting the directed acyclic graph into the analysis buffer areas according to a preset extraction time interval, and in an analysis process, each analysis buffer area corresponds to one directed acyclic graph;
the analysis module is used for analyzing the directed acyclic graph in each analysis buffer area to obtain task contents corresponding to each node contained in the directed acyclic graph, and task relationships and jump conditions among the nodes;
a weight calculation module for calculating the radiation weight between the node and each node of other directed acyclic graphs by using the number of nodes contained in the directed acyclic graph and the task relationship,
a total weight value obtaining module, configured to add and sum radiation weights corresponding to all nodes included in the directed acyclic graph, and obtain a total radiation weight value corresponding to the directed acyclic graph;
and the configuration module is used for carrying out DAG flow configuration on the directed acyclic graph according to the total radiation weight value and the flow configuration rule.
9. The system of claim 8, wherein the weight calculation module obtains the radiation weight using the following formula:
Figure 209251DEST_PATH_IMAGE001
wherein the content of the first and second substances,Prepresenting the radiation weight between each node contained in the directed acyclic graph and other directed acyclic graphs;G i represents the directed acyclic graphiRadiometric between each node and other directed acyclic graphs;nrepresenting the number of other directed acyclic graphs having a task relationship with each node in the directed acyclic graph;G 0representing an average value of radiance corresponding to each node of the directed acyclic graph;G maxrepresenting a radiance maximum corresponding to each node of the directed acyclic graph;G minrepresenting a radiance minimum corresponding to each node of the directed acyclic graph;hrepresenting the number of nodes which have direct task relations with the nodes in the directed acyclic graph and between the nodes in the directed acyclic graph and one other directed acyclic graph which has task relations;
Figure 297293DEST_PATH_IMAGE002
representing the number of nodes existing in the directed acyclic graph;
Figure 829905DEST_PATH_IMAGE003
indicating the number of nodes existing in the other directed acyclic graphs.
10. The system of claim 8, wherein the configuration module comprises:
the directed acyclic graph sending module is used for sending the directed acyclic graph which is well analyzed and obtains the total radiation weight value to the configuration buffer area;
the sequencing module is used for sequencing the directed acyclic graphs in the configuration buffer zone according to the sequence of the total radiation weight values from high to low, and obtaining a directed acyclic graph list after sequencing is completed;
the configuration time setting module is used for setting DAG process configuration time intervals, and the DAG process configuration time intervals meet the following conditions:
Figure 294384DEST_PATH_IMAGE004
wherein the content of the first and second substances,Trepresenting a DAG flow configuration time interval;T 1representing an extraction time interval of the directed acyclic graph,T 2representing the average time length for analyzing each directed acyclic graph;
and the flow configuration module is used for sequentially configuring the DAG flows for the directed acyclic graphs according to the sequence of the directed acyclic graph list according to the set DAG flow configuration time interval.
CN202011068846.0A 2020-10-09 2020-10-09 Efficient flow arrangement method and system for ETL system Active CN112115192B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011068846.0A CN112115192B (en) 2020-10-09 2020-10-09 Efficient flow arrangement method and system for ETL system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011068846.0A CN112115192B (en) 2020-10-09 2020-10-09 Efficient flow arrangement method and system for ETL system

Publications (2)

Publication Number Publication Date
CN112115192A true CN112115192A (en) 2020-12-22
CN112115192B CN112115192B (en) 2021-07-02

Family

ID=73797897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011068846.0A Active CN112115192B (en) 2020-10-09 2020-10-09 Efficient flow arrangement method and system for ETL system

Country Status (1)

Country Link
CN (1) CN112115192B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022143045A1 (en) * 2020-12-30 2022-07-07 中兴通讯股份有限公司 Method and apparatus for determining data blood relationship, and storage medium and electronic apparatus
CN114880385A (en) * 2021-07-27 2022-08-09 云南省地质环境监测院(云南省环境地质研究院) Method and device for accessing geological disaster data through automatic combined flow

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914754A (en) * 2014-03-12 2014-07-09 中国科学院信息工程研究所 Workflow task scheduling method, multi-workflow scheduling method and system thereof
US20170063903A1 (en) * 2015-08-31 2017-03-02 Splunk Inc. Event views in data intake stage of machine data processing platform
CN108270805A (en) * 2016-12-30 2018-07-10 中国移动通信集团河北有限公司 For the resource allocation methods and device of data processing
US20180322178A1 (en) * 2017-05-08 2018-11-08 Salesforce.Com, Inc. Pseudo-synchronous processing by an analytic query and build cluster
CN108897625A (en) * 2018-07-06 2018-11-27 陈霖 Method of Scheduling Parallel based on DAG model
US20190121810A1 (en) * 2016-04-25 2019-04-25 GraphSQL, Inc. System and method for querying a graph model
CN109902117A (en) * 2019-02-19 2019-06-18 新华三大数据技术有限公司 Operation system analysis method and device
CN111082976A (en) * 2019-12-02 2020-04-28 东莞数汇大数据有限公司 Method for supporting ETL task scheduling visualization
CN111291106A (en) * 2020-05-13 2020-06-16 成都四方伟业软件股份有限公司 Efficient flow arrangement method and system for ETL system
CN111367642A (en) * 2020-03-09 2020-07-03 中国铁塔股份有限公司 Task scheduling execution method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914754A (en) * 2014-03-12 2014-07-09 中国科学院信息工程研究所 Workflow task scheduling method, multi-workflow scheduling method and system thereof
US20170063903A1 (en) * 2015-08-31 2017-03-02 Splunk Inc. Event views in data intake stage of machine data processing platform
US20190121810A1 (en) * 2016-04-25 2019-04-25 GraphSQL, Inc. System and method for querying a graph model
CN108270805A (en) * 2016-12-30 2018-07-10 中国移动通信集团河北有限公司 For the resource allocation methods and device of data processing
US20180322178A1 (en) * 2017-05-08 2018-11-08 Salesforce.Com, Inc. Pseudo-synchronous processing by an analytic query and build cluster
CN108897625A (en) * 2018-07-06 2018-11-27 陈霖 Method of Scheduling Parallel based on DAG model
CN109902117A (en) * 2019-02-19 2019-06-18 新华三大数据技术有限公司 Operation system analysis method and device
CN111082976A (en) * 2019-12-02 2020-04-28 东莞数汇大数据有限公司 Method for supporting ETL task scheduling visualization
CN111367642A (en) * 2020-03-09 2020-07-03 中国铁塔股份有限公司 Task scheduling execution method and device
CN111291106A (en) * 2020-05-13 2020-06-16 成都四方伟业软件股份有限公司 Efficient flow arrangement method and system for ETL system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蔡玉宝,等: "数据处理平台的研究与实现", 《计算机工程与设计》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022143045A1 (en) * 2020-12-30 2022-07-07 中兴通讯股份有限公司 Method and apparatus for determining data blood relationship, and storage medium and electronic apparatus
CN114880385A (en) * 2021-07-27 2022-08-09 云南省地质环境监测院(云南省环境地质研究院) Method and device for accessing geological disaster data through automatic combined flow
CN114880385B (en) * 2021-07-27 2022-11-22 云南省地质环境监测院(云南省环境地质研究院) Method and device for accessing geological disaster data through automatic combination process

Also Published As

Publication number Publication date
CN112115192B (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN112115192B (en) Efficient flow arrangement method and system for ETL system
CN108052394B (en) Resource allocation method based on SQL statement running time and computer equipment
CN106951925A (en) Data processing method, device, server and system
JP6694447B2 (en) Big data calculation method and system, program, and recording medium
CN107844414A (en) A kind of spanned item mesh based on defect report analysis, parallelization defect positioning method
CN110297847A (en) A kind of intelligent information retrieval method based on big data principle
CN103645961B (en) The method for detecting abnormality of computation-intensive parallel task and system
CN109523157A (en) A kind of processing method and system of operation flow
CN105654240A (en) Machine tool manufacturing system energy efficiency analysis method
CN109872052A (en) A kind of law court's case intelligence division householder method and system
CN105630797B (en) Data processing method and system
US20240036841A1 (en) Method and Apparatus for Compatibility Detection, Device and Non-transitory computer-readable storage medium
CN110895506A (en) Construction method and construction system of test data
CN102289408A (en) regression test case sequencing method based on error propagation network
CN104484375B (en) Establish the method and system of database automatically in project analysis flow
Zhang et al. Tuning performance of Spark programs
CN108897678A (en) Static code detection method and static code detection system, storage equipment
CN104572029A (en) Combinability and combination rule judgment method and device of finite state machine
CN112016636A (en) Crop spectral clustering analysis processing method based on Hadoop frame
CN114996331B (en) Data mining control method and system
CN107957944B (en) User data coverage rate oriented test case automatic generation method
CN113641654B (en) Marketing treatment rule engine method based on real-time event
CN105654106A (en) Decision tree generation method and system thereof
Bansal et al. An investigation of strategies for finding test order during integration testing of object oriented applications
JP4972997B2 (en) Program analysis method for asset diagnosis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 1201, 12 / F, building 2, No. 10, KEGU 1st Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Patentee after: Beijing dongfangtong Software Co.,Ltd.

Patentee after: BEIJING TONGTECH Co.,Ltd.

Patentee after: Beijing Dongfang tongwangxin Technology Co.,Ltd.

Patentee after: BEIJING TESTOR TECHNOLOGY Co.,Ltd.

Address before: 1201, 12 / F, building 2, No. 10, KEGU 1st Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Patentee before: Beijing dongfangtong Software Co.,Ltd.

Patentee before: BEIJING TONGTECH Co.,Ltd.

Patentee before: BEIJING MICROVISION TECHNOLOGY CO.,LTD.

Patentee before: BEIJING TESTOR TECHNOLOGY Co.,Ltd.