CN110580265B - ETL task processing method, device, equipment and storage medium - Google Patents

ETL task processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN110580265B
CN110580265B CN201910872609.0A CN201910872609A CN110580265B CN 110580265 B CN110580265 B CN 110580265B CN 201910872609 A CN201910872609 A CN 201910872609A CN 110580265 B CN110580265 B CN 110580265B
Authority
CN
China
Prior art keywords
etl task
evaluation
etl
parameter
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910872609.0A
Other languages
Chinese (zh)
Other versions
CN110580265A (en
Inventor
朱林林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201910872609.0A priority Critical patent/CN110580265B/en
Publication of CN110580265A publication Critical patent/CN110580265A/en
Application granted granted Critical
Publication of CN110580265B publication Critical patent/CN110580265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

The application discloses a processing method, a processing device, processing equipment and a storage medium of an ETL task. The method comprises the following steps: acquiring metadata information of an ETL task; determining an evaluation parameter of the ETL task according to the metadata information, wherein the evaluation parameter is used for representing the value of the ETL task; and processing the ETL task according to the evaluation parameters. In the technical scheme provided by the embodiment of the application, the evaluation parameter of the ETL task is determined according to the metadata information by acquiring the metadata information of the ETL task, and then the ETL task is processed based on the evaluation parameter; therefore, quantitative representation of the value of the ETL task is achieved by quantitatively evaluating the ETL task, automatic management of the ETL task is further achieved, and compared with a manual management mode, the method is higher in efficiency and lower in cost.

Description

ETL task processing method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of computers and internet, in particular to a processing method, a processing device, processing equipment and a storage medium of an ETL task.
Background
An ETL (Extract-Transform-Load) task is a task for implementing processing of data, and is commonly used in database systems and data warehouses.
In the related art, for an ETL task that has been released online, it needs to be manually checked to decide whether the ETL task needs to be offline.
Disclosure of Invention
The embodiment of the application provides a processing method, a processing device, processing equipment and a storage medium of an ETL task. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a method for processing an ETL task, where the method includes:
acquiring metadata information of an ETL task;
determining an evaluation parameter of the ETL task according to the metadata information; wherein the evaluation parameters are used to characterize the value of the ETL job;
and processing the ETL task according to the evaluation parameters.
In another aspect, an embodiment of the present application provides an ETL task processing apparatus, where the apparatus includes:
the information acquisition module is used for acquiring metadata information of the ETL task;
the parameter determining module is used for determining the evaluation parameters of the ETL task according to the metadata information; wherein the evaluation parameters are used to characterize the value of the ETL job;
and the task processing module is used for processing the ETL task according to the evaluation parameters.
In another aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the processing method of the ETL task.
In still another aspect, the present application provides a non-transitory computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the processing method of the ETL task.
The technical scheme provided by the embodiment of the application can bring the following beneficial effects:
determining an evaluation parameter of the ETL task according to the metadata information by acquiring the metadata information of the ETL task, and then processing the ETL task based on the evaluation parameter; therefore, quantitative representation of the value of the ETL task is achieved by quantitatively evaluating the ETL task, automatic management of the ETL task is further achieved, and compared with a manual management mode, the method is higher in efficiency and lower in cost.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for processing an ETL task provided by an embodiment of the present application;
FIG. 2 is a flow chart of a method for processing ETL tasks according to another embodiment of the present application;
FIG. 3 is a schematic diagram of evaluation parameters of an ETL job provided by one embodiment of the present application;
FIG. 4 is a block diagram of a processing device for ETL tasks provided by one embodiment of the present application;
FIG. 5 is a block diagram of a processing device for ETL tasks provided by another embodiment of the present application;
fig. 6 is a block diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
In the method provided by the embodiment of the present application, the execution subject of each step may be a Computer device, and the Computer device may be any electronic device with data processing and storage functions, such as a PC (Personal Computer) or a server. For convenience of explanation, in the following method embodiments, only the execution subject of each step is described as a computer device.
Referring to fig. 1, a flowchart of a processing method of an ETL task according to an embodiment of the present application is shown, where the method may include the following steps (101-103):
step 101, obtaining metadata information of an ETL task.
The metadata information of the ETL task refers to information related to the ETL task, such as information capable of reflecting the characteristics or attributes of the ETL task in multiple different dimensions. For example, the metadata information of the ETL task includes information that can reflect the characteristics or attributes of the ETL task in multiple different dimensions, such as freshness, complexity, cost, dependency, and query.
And 102, determining an evaluation parameter of the ETL task according to the metadata information.
The evaluation parameters of the ETL task are used for characterizing the value of the ETL task. Optionally, the evaluation parameter of the ETL task is in a positive correlation with the value of the ETL task, and a higher evaluation parameter of the ETL task indicates a higher value of the ETL task; conversely, a lower evaluation parameter for an ETL job indicates a lower value for the ETL job. Alternatively, the evaluation parameter may be a score.
In a possible implementation manner, information of multiple dimensions can be extracted from metadata information of an ETL task, then sub-evaluation parameters of the ETL task in each dimension are calculated according to the information of each dimension, and the sub-evaluation parameters of the ETL task in each dimension are integrated to obtain the evaluation parameters of the ETL task.
In another possible implementation, the machine learning model may be used to process the metadata information of the ETL task and output the evaluation parameters of the ETL task. The machine learning model may be referred to as an ETL scoring model, which may be obtained by training a neural network using a machine learning algorithm. The input parameters of the machine learning model may include information of multiple dimensions extracted from metadata information of the ETL task, and the machine learning model performs processing such as feature extraction, recombination, abstraction and the like on the information of multiple dimensions, and finally outputs evaluation parameters of the ETL task.
And 103, processing the ETL task according to the evaluation parameters.
Because the evaluation parameters of the ETL task represent the value of the ETL task, after the evaluation parameters of the ETL task are determined, the ETL task can be processed, for example, the ETL task with lower evaluation parameters (i.e., lower value) is offline processed, and the ETL task with higher evaluation parameters (i.e., higher value) can be kept running on line.
To sum up, in the technical solution provided in the embodiment of the present application, by obtaining metadata information of an ETL task, an evaluation parameter of the ETL task is determined according to the metadata information, and then the ETL task is processed based on the evaluation parameter; therefore, quantitative representation of the value of the ETL task is achieved by quantitatively evaluating the ETL task, automatic management of the ETL task is further achieved, and compared with a manual management mode, the method is higher in efficiency and lower in cost.
Referring to fig. 2, a flowchart of a processing method of an ETL task according to another embodiment of the present application is shown, where the method may include the following steps (201-204):
step 201, obtaining metadata information of an ETL task.
The metadata information of the ETL task refers to information related to the ETL task, such as information capable of reflecting the characteristics or attributes of the ETL task in multiple different dimensions. For example, the metadata information of the ETL task includes information that can reflect the characteristics or attributes of the ETL task in multiple different dimensions, such as freshness, complexity, cost, dependency, and query.
Step 202, determining n sub-evaluation parameters of the ETL task according to the metadata information, where the n sub-evaluation parameters include a complexity evaluation parameter, a cost evaluation parameter, and a dependency evaluation parameter, and n is a positive integer.
In the embodiment of the application, value evaluation can be performed on the ETL task from 3 different dimensions of complexity, cost and dependency.
Optionally, the n sub-evaluation parameters further include a freshness evaluation parameter and/or a query degree evaluation parameter. In one possible implementation, the ETL task can be evaluated for value from 5 different dimensions, freshness, complexity, cost, dependency, and query.
The freshness evaluation parameter is used for representing the freshness of the ETL task, if the freshness evaluation parameter can be in positive correlation with the freshness, the higher the freshness evaluation parameter of one ETL task is, the higher the freshness of the ETL task is; conversely, a lower freshness evaluation parameter for an ETL job indicates a lower freshness for the ETL job.
Illustratively, the freshness evaluation parameter may be determined as follows:
1. determining the on-line time of the ETL task according to the metadata information of the ETL task;
2. and determining a freshness evaluation parameter of the ETL task according to the online time of the ETL task.
The on-line time of the ETL task can be obtained by calculating according to the on-line time and the current time of the ETL task. Optionally, the freshness evaluation parameter is inversely related to the elapsed time. That is, the shorter the online time is, the higher the freshness evaluation parameter is, indicating the higher the freshness of the ETL task; conversely, the longer the elapsed time period, the lower the freshness evaluation parameter, indicating the lower freshness of the ETL task.
In one example, a preset constant may be divided by the length of time that the ETL task has been on-line to obtain a freshness evaluation parameter of the ETL task. Of course, the above examples are only exemplary and explanatory, and other calculation formulas may be set, or a mapping relationship table may be queried to determine the freshness evaluation parameter of the ETL task according to the online time of the ETL task, which is not limited in the embodiment of the present application.
The complexity evaluation parameter is used for representing the complexity of the ETL task, if the complexity evaluation parameter can be in positive correlation with the complexity, the higher the complexity evaluation parameter of one ETL task is, the higher the complexity of the ETL task is; conversely, the lower the complexity evaluation parameter of an ETL task, the lower the complexity of the ETL task.
Illustratively, the complexity evaluation parameter may be determined as follows:
1. determining the complexity level of the ETL task according to the metadata information of the ETL task;
2. and determining a complexity evaluation parameter of the ETL task according to the complexity level of the ETL task.
The complexity level of the ETL task may be preset, for example, including 5 levels of very simple, relatively simple, medium, relatively complex, and very complex, each level corresponding to a different complexity evaluation parameter. Optionally, the complexity evaluation parameter is positively correlated with the complexity level. That is, the higher the complexity level, the higher the complexity evaluation parameter, indicating that the complexity of the ETL task is higher; conversely, the lower the complexity level, the lower the complexity evaluation parameter, indicating that the complexity of the ETL task is lower.
In one example, complexity information may be extracted from metadata information of the ETL task, where the complexity information refers to information used for characterizing complexity of the ETL task, such as some information related to complexity, such as logic plan depth, the number of read tables, and the data amount of the read tables, and based on the complexity information, the complexity level of the ETL task is determined. For example, the complexity level of the ETL task may be determined by querying the mapping relationship table, by using a machine learning model, or by using other methods based on the complexity information. For example, the computer device queries the first mapping relation table, and determines a complexity level corresponding to the complexity information extracted from the metadata information of the ETL task as the complexity level of the ETL task; the first mapping relation table comprises at least one set of mapping relation between complexity information and complexity levels.
The cost evaluation parameter is used for representing the cost of the ETL task, if the cost evaluation parameter can be in positive correlation with the cost, the higher the cost evaluation parameter of one ETL task is, the higher the cost of the ETL task is; conversely, the lower the cost evaluation parameter of an ETL task, the lower the cost of the ETL task.
For example, the cost evaluation parameter may be determined as follows:
1. determining cost information of the ETL task according to the metadata information of the ETL task, wherein the cost information can comprise information of at least one dimension of the following: calculating cost, storage cost and time cost;
2. and determining a cost degree evaluation parameter of the ETL task according to the cost information of the ETL task.
The calculation cost refers to an amount of calculation and processing resources required for executing the ETL task, such as CPU occupancy, the storage cost refers to an amount of storage resources required for storing the ETL task, such as an amount of data of the ETL task in the storage device, and the time cost refers to a time required for executing the ETL task.
In addition, the computer device may determine the cost evaluation parameter of the ETL task by querying the mapping relationship table, by using a machine learning model, or by using another method based on the cost information of the ETL task.
In addition, when calculating the cost evaluation parameter of the ETL task, the same or different weights may be assigned to cost information of different dimensions. For example, the computation cost, the storage cost, and the time cost are weighted by 0.25, and 0.5, respectively. The weight can be preset and adjusted in combination with the actual situation, which is not limited in the embodiment of the present application. And the computer equipment calculates the cost evaluation parameters of the ETL task according to the cost information of each dimension of the ETL task and the corresponding weight. Through the method, the influence proportion of the cost information with different dimensions on the cost of the ETL task can be flexibly adjusted, and the flexibility and the accuracy of cost evaluation are improved.
The dependency evaluation parameters are used for representing the dependency of the ETL tasks, if the dependency evaluation parameters can be in positive correlation with the dependency, the higher the dependency evaluation parameter of one ETL task is, the higher the dependency of the ETL task is; conversely, the lower the dependency evaluation parameter of an ETL task, the lower the dependency of the ETL task is.
Illustratively, the dependency evaluation parameter may be determined as follows:
1. determining the number of downstream dependent tasks of the ETL task according to the metadata information of the ETL task;
2. and determining the dependency evaluation parameter of the ETL task according to the number of the downstream dependent tasks of the ETL task.
A dependent task downstream of an ETL task refers to other ETL tasks that need to be dependent on the execution results of the ETL task when executed. Optionally, the dependency evaluation parameter has a positive correlation with the number of downstream dependent tasks. That is, the more the number of the downstream dependent tasks is, the higher the dependency evaluation parameter is, which indicates that the dependency of the ETL task is higher; conversely, the smaller the number of downstream dependent tasks, the lower the dependency evaluation parameter, indicating that the ETL task has a lower dependency.
In addition, the computer device may determine the dependency evaluation parameter of the ETL task by querying the mapping relationship table, by formula calculation, or by other methods based on the number of the downstream dependent tasks of the ETL task. In one example, the computer device queries the second mapping relation table, and determines the dependency evaluation parameter corresponding to the number of the downstream dependent tasks of the ETL task as the dependency evaluation parameter of the ETL task; and the second mapping relation table comprises the mapping relation between the number of at least one group of downstream dependent tasks and the dependency evaluation parameter.
The query degree evaluation parameter is used for representing the query degree of the ETL task, if the query degree evaluation parameter can be in positive correlation with the query degree, the higher the query degree evaluation parameter of one ETL task is, the higher the query degree of the ETL task is; conversely, the lower the query degree evaluation parameter of an ETL task, the lower the query degree of the ETL task is indicated.
Illustratively, the query degree evaluation parameter may be determined as follows:
1. determining the query times of the ETL task in a target time period according to the metadata information of the ETL task;
2. and determining the query degree evaluation parameter of the ETL task according to the query times of the ETL task in the target time period.
The target period may be a predetermined period, for example, the target period may be a historical period from the current time onward. The duration of the target period may be preset in combination with the actual situation, for example, the target period may be 30 days, 60 days, 120 days, or the like. The query times of the ETL task in the target time period refer to the total number of times of the ETL task being executed in the target time period. Optionally, the query degree evaluation parameter has a positive correlation with the query times. That is, the query frequency is more, the query degree evaluation parameter is higher, which indicates that the query degree of the ETL task is higher; conversely, the smaller the number of queries, the lower the query degree evaluation parameter, which indicates that the query degree of the ETL task is lower.
In addition, the computer device may determine the query degree evaluation parameter of the ETL task in a manner of querying the mapping relationship table, in a manner of formula calculation, or in another manner, based on the number of queries of the ETL task in the target time period.
In one example, the average query times of the ETL task in multiple different historical periods may be calculated, and then the query degree evaluation parameter of the ETL task is calculated by a weighted summation manner according to the average query times in the multiple different historical periods. For example, the average query times of the ETL task in the last 30 days, 60 days, 90 days, and 120 days are calculated, wherein the weights corresponding to the average query times in the last 30 days, 60 days, 90 days, and 120 days are 0.4, 0.3, 0.2, and 0.1, respectively, and the query degree evaluation parameter of the ETL task is calculated by a weighted summation method. The closer the history period is to the current time, the higher the corresponding weight may be. By the method, the influence proportion of the average query times in different historical time periods on the query degree of the ETL task can be flexibly adjusted, and the flexibility and the accuracy of the query degree evaluation are improved.
It should be noted that, in the embodiment of the present application, quantitative evaluation of ETL tasks is only performed from 5 different dimensions, i.e., freshness, complexity, cost, dependency, and query degree, for example, and a description is provided. In practical application, the dimension of quantitative evaluation of the ETL task can be added, modified or deleted according to actual needs, which is not limited in the embodiment of the present application. In addition, the above-described sub-evaluation parameter calculation method for each dimension is only exemplary and explanatory, and can be flexibly designed and adjusted in practical application.
And step 203, calculating the evaluation parameters of the ETL task according to the n sub-evaluation parameters.
After determining the sub-evaluation parameters of the ETL task in a plurality of different dimensions, the computer device can further calculate the final evaluation parameters of the ETL task.
In one example, the n sub-evaluation parameters are added to obtain the evaluation parameters of the ETL task. For example, if the freshness degree evaluation parameter, the complexity degree evaluation parameter, the cost degree evaluation parameter, the dependency degree evaluation parameter, and the query degree evaluation parameter of the ETL job are 8, 6, 7, 8, and 10 in this order, the evaluation parameter of the ETL job is 8+6+7+8+10 — 39.
In another example, n sub-evaluation parameters are weighted and summed to obtain the evaluation parameters of the ETL task. For example, if the freshness degree evaluation parameter, the complexity degree evaluation parameter, the cost degree evaluation parameter, the dependency degree evaluation parameter, and the query degree evaluation parameter of the ETL job are 8, 6, 7, 8, and 10 in this order, and the weights of the above 5 dimensions are 0.1, 0.3, and 0.2 in this order, the evaluation parameter of the ETL job is 8 × 0.1+6 × 0.1+7 × 0.3+8 × 0.3+10 × 0.2 — 7.9.
Of course, the evaluation parameter of the ETL task may also be calculated according to the n sub-evaluation parameters in other manners, for example, an average value of the n sub-evaluation parameters is used as the evaluation parameter of the ETL task, and the like, which is not limited in this embodiment of the present application.
It should be noted that, the value ranges of the scores of the above-mentioned freshness evaluation parameter, complexity evaluation parameter, cost evaluation parameter, dependency evaluation parameter, query evaluation parameter, and other sub-evaluation parameters with different dimensions may be the same or different. For example, as shown in fig. 3, the score value range of the sub-evaluation parameters of each dimension is [0,10], and if the evaluation parameters of the ETL task are obtained by adding the sub-evaluation parameters of 5 dimensions, the value range of the evaluation parameters of the ETL task is [0,50 ].
And step 204, processing the ETL task according to the evaluation parameters.
Because the evaluation parameters of the ETL task represent the value of the ETL task, the ETL task can be processed after the evaluation parameters of the ETL task are determined.
In one example, if the evaluation parameter of the ETL task is in the first value interval, performing offline processing on the ETL task; if the evaluation parameter of the ETL task is in the second value interval, optimizing the ETL task; if the evaluation parameter of the ETL task is in the third value interval, keeping the ETL task running on line; wherein the first value interval is smaller than the second value interval, and the second value interval is smaller than the third value interval.
For example, assuming that the evaluation parameter range of the ETL task is [0,50], when the evaluation parameter of the ETL task is [0,10], performing offline processing on the ETL task; when the evaluation parameters of the ETL task are in [11,30], carrying out optimization processing on the ETL task; when the evaluation parameter of the ETL task is [31,50], the ETL task is considered to be normal, the ETL task is kept to run on the line, and optimization processing is not needed.
To sum up, in the technical solution provided in the embodiment of the present application, by obtaining metadata information of an ETL task, an evaluation parameter of the ETL task is determined according to the metadata information, and then the ETL task is processed based on the evaluation parameter; therefore, quantitative representation of the value of the ETL task is achieved by quantitatively evaluating the ETL task, automatic management of the ETL task is further achieved, and compared with a manual management mode, the method is higher in efficiency and lower in cost.
In addition, the ETL task is quantitatively evaluated from 5 different dimensions of freshness, complexity, cost, dependence and query degree, the evaluation dimension considered is more comprehensive, the evaluation accuracy of the ETL task is improved, and the accuracy of optimization or offline processing of the ETL task is improved.
In addition, by adopting the technical scheme provided by the embodiment of the application, when a large batch of ETL tasks are required to be processed, most tasks required to be offline can be screened out at one time, most tasks required to be optimized can be screened out at one time, and the management efficiency of the ETL tasks is fully improved.
In addition, the technical scheme provided by the embodiment of the application can be applied to any scene needing ETL task management, such as the fields of data processing, data warehouse and the like.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 4, a block diagram of a processing device for an ETL task according to an embodiment of the present application is shown. The device has the functions of realizing the method examples, and the functions can be realized by hardware or by hardware executing corresponding software. The apparatus may be the computer device described above, or may be provided in a computer device. As shown in fig. 4, the apparatus 400 may include: an information acquisition module 410, a parameter determination module 420, and a task processing module 430.
An information obtaining module 410, configured to obtain metadata information of the ETL task.
A parameter determining module 420, configured to determine an evaluation parameter of the ETL task according to the metadata information; wherein the evaluation parameters are used to characterize the value of the ETL job.
And a task processing module 430, configured to process the ETL task according to the evaluation parameter.
In an exemplary embodiment, as shown in fig. 5, the parameter determination module 420 includes: a parameter determination submodule 421 and a parameter calculation submodule 422.
The parameter determining sub-module 421 is configured to determine n sub-evaluation parameters of the ETL task according to the metadata information, where the n sub-evaluation parameters include a complexity evaluation parameter, a cost evaluation parameter, and a dependency evaluation parameter, and n is a positive integer.
A parameter calculating sub-module 422, configured to calculate the evaluation parameter of the ETL task according to the n sub-evaluation parameters.
Optionally, the n sub-evaluation parameters further include a freshness evaluation parameter and/or a query degree evaluation parameter.
In an exemplary embodiment, as shown in fig. 5, the parameter determining submodule 421 includes a freshness determining unit 421a for: determining the on-line time of the ETL task according to the metadata information; determining a freshness evaluation parameter of the ETL task according to the on-line time of the ETL task; wherein, the freshness evaluation parameter and the on-line time length are in a negative correlation relationship.
In an exemplary embodiment, as shown in fig. 5, the parameter determining submodule 421 includes a complexity determining unit 421b for: determining the complexity level of the ETL task according to the metadata information; determining a complexity evaluation parameter of the ETL task according to the complexity level of the ETL task; wherein the complexity evaluation parameter is positively correlated with the complexity level.
Optionally, the complexity determining unit 421b is configured to: extracting complexity information from the metadata information, wherein the complexity information is information for representing the complexity of the ETL task; inquiring a first mapping relation table, and determining the complexity level corresponding to the complexity information as the complexity level of the ETL task; wherein the first mapping relation table comprises at least one set of mapping relation between complexity information and complexity levels.
In an exemplary embodiment, as shown in fig. 5, the parameter determining submodule 421 includes a cost determining unit 421c for: determining cost information of the ETL task according to the metadata information, wherein the cost information comprises information of at least one dimension of the following: calculating cost, storage cost and time cost; and determining a cost degree evaluation parameter of the ETL task according to the cost information of the ETL task.
Optionally, the cost determination unit 421c is configured to: and calculating cost evaluation parameters of the ETL task according to the cost information of each dimension of the ETL task and the corresponding weight.
In an exemplary embodiment, as shown in fig. 5, the parameter determining submodule 421 includes a dependency determining unit 421d for: determining the number of downstream dependent tasks of the ETL task according to the metadata information; determining a dependency evaluation parameter of the ETL task according to the number of the downstream dependent tasks of the ETL task; wherein the dependency evaluation parameter has a positive correlation with the number of the downstream dependent tasks.
Optionally, the dependency determination unit 421d is configured to: inquiring a second mapping relation table, and determining a dependency evaluation parameter corresponding to the number of the downstream dependent tasks of the ETL task as the dependency evaluation parameter of the ETL task; the second mapping relation table comprises mapping relations between the number of at least one group of downstream dependent tasks and the dependency evaluation parameters.
In an exemplary embodiment, as shown in fig. 5, the parameter determining sub-module 421 includes a query degree determining unit 421e, configured to: determining the query times of the ETL task in a target time period according to the metadata information; determining a query degree evaluation parameter of the ETL task according to the query times; and the query degree evaluation parameter and the query times are in positive correlation.
In an exemplary embodiment, as shown in fig. 5, the parameter calculation sub-module 422 is configured to: adding the n sub-evaluation parameters to obtain the evaluation parameters of the ETL task; or, performing weighted summation on the n sub-evaluation parameters to obtain the evaluation parameters of the ETL task.
In an exemplary embodiment, the task processing module 430 is configured to:
if the evaluation parameter is in a first value range, performing offline processing on the ETL task;
if the evaluation parameter is in a second value range, optimizing the ETL task;
if the evaluation parameter is in a third value interval, keeping the ETL task;
the first value interval is smaller than the second value interval, and the second value interval is smaller than the third value interval.
To sum up, in the technical solution provided in the embodiment of the present application, by obtaining metadata information of an ETL task, an evaluation parameter of the ETL task is determined according to the metadata information, and then the ETL task is processed based on the evaluation parameter; therefore, quantitative representation of the value of the ETL task is achieved by quantitatively evaluating the ETL task, automatic management of the ETL task is further achieved, and compared with a manual management mode, the method is higher in efficiency and lower in cost.
It should be noted that: in the above embodiment, when the device implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Referring to fig. 6, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device may be used to implement the processing method of the ETL task provided in the above embodiments. Specifically, the method comprises the following steps:
the computer device 600 includes a Central Processing Unit (CPU)601, a system memory 604 including a Random Access Memory (RAM)602 and a Read Only Memory (ROM)603, and a system bus 605 connecting the system memory 604 and the central processing unit 601. The computer device 600 also includes a basic input/output system (I/O system) 606 for facilitating information transfer between various elements within the computer, and a mass storage device 607 for storing an operating system 613, application programs 614, and other program modules 615.
The basic input/output system 606 includes a display 608 for displaying information and an input device 609 such as a mouse, keyboard, etc. for user input of information. Wherein a display 608 and an input device 609 are connected to the central processing unit 601 through an input output controller 610 connected to the system bus 605. The basic input/output system 606 may also include an input/output controller 610 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input/output controller 610 may also provide output to a display screen, a printer, or other type of output device.
The mass storage device 607 is connected to the central processing unit 601 through a mass storage controller (not shown) connected to the system bus 605. The mass storage device 607 and its associated computer-readable media provide non-volatile storage for the computer device 600. That is, mass storage device 607 may include a computer-readable medium (not shown), such as a hard disk or CD-ROM drive.
Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 604 and mass storage device 607 described above may be collectively referred to as memory.
According to various embodiments of the present application, the computer device 600 may also operate as a remote computer connected to a network through a network, such as the Internet. That is, the computer device 600 may be connected to the network 612 through the network interface unit 66 connected to the system bus 605, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 611.
The memory also includes a computer program stored in the memory and configured to be executed by the one or more processors to implement the processing method of the ETL task described above.
In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the processing method of the ETL task described above.
In an exemplary embodiment, a computer program product is also provided, which, when being executed by a processor, is adapted to carry out the processing method of the ETL task described above.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.
The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (13)

1. A method for processing an ETL task, the method comprising:
acquiring metadata information of an ETL task;
determining n sub-evaluation parameters of the ETL task according to the metadata information, wherein the n sub-evaluation parameters comprise a complexity evaluation parameter, a cost evaluation parameter and a dependency evaluation parameter, and n is a positive integer; the complexity evaluation parameter and the complexity level of the ETL task are in a positive correlation relationship, the cost evaluation parameter and the cost information of the ETL task are in a positive correlation relationship, and the dependency evaluation parameter and the number of the downstream dependent tasks of the ETL task are in a positive correlation relationship;
calculating the evaluation parameters of the ETL task according to the n sub-evaluation parameters; wherein the evaluation parameters are used to characterize the value of the ETL job;
and processing the ETL task according to the evaluation parameters.
2. The method according to claim 1, wherein the n sub-rating parameters further comprise a freshness rating parameter and/or a query degree rating parameter.
3. The method of claim 1, wherein said determining n sub-evaluation parameters of said ETL task according to said metadata information comprises:
determining the complexity level of the ETL task according to the metadata information;
and determining a complexity evaluation parameter of the ETL task according to the complexity level of the ETL task.
4. The method of claim 3, wherein determining the complexity level of the ETL task according to the metadata information comprises:
extracting complexity information from the metadata information, wherein the complexity information is information for representing the complexity of the ETL task;
inquiring a first mapping relation table, and determining the complexity level corresponding to the complexity information as the complexity level of the ETL task;
wherein the first mapping relation table comprises at least one set of mapping relation between complexity information and complexity levels.
5. The method of claim 1, wherein said determining n sub-evaluation parameters of said ETL task according to said metadata information comprises:
determining cost information of the ETL task according to the metadata information, wherein the cost information comprises information of at least one dimension of the following: calculating cost, storage cost and time cost;
and determining a cost degree evaluation parameter of the ETL task according to the cost information of the ETL task.
6. The method of claim 5, wherein determining a cost evaluation parameter of the ETL job according to the cost information of the ETL job comprises:
and calculating cost evaluation parameters of the ETL task according to the cost information of each dimension of the ETL task and the corresponding weight.
7. The method of claim 1, wherein said determining n sub-evaluation parameters of said ETL task according to said metadata information comprises:
determining the number of downstream dependent tasks of the ETL task according to the metadata information;
and determining a dependency evaluation parameter of the ETL task according to the number of the downstream dependent tasks of the ETL task.
8. The method according to claim 7, wherein determining the dependency evaluation parameter of the ETL task according to the number of the dependent tasks downstream of the ETL task comprises:
inquiring a second mapping relation table, and determining a dependency evaluation parameter corresponding to the number of the downstream dependent tasks of the ETL task as the dependency evaluation parameter of the ETL task;
the second mapping relation table comprises mapping relations between the number of at least one group of downstream dependent tasks and the dependency evaluation parameters.
9. The method according to claim 1, wherein said calculating said evaluation parameter of said ETL task from said n sub-evaluation parameters comprises:
adding the n sub-evaluation parameters to obtain the evaluation parameters of the ETL task;
alternatively, the first and second electrodes may be,
and carrying out weighted summation on the n sub-evaluation parameters to obtain the evaluation parameters of the ETL task.
10. The method according to any of the claims 1 to 9, wherein said processing said ETL task according to said evaluation parameters comprises:
if the evaluation parameter is in a first value range, performing offline processing on the ETL task;
if the evaluation parameter is in a second value range, optimizing the ETL task;
if the evaluation parameter is in a third value interval, keeping the ETL task;
the first value interval is smaller than the second value interval, and the second value interval is smaller than the third value interval.
11. An apparatus for processing an ETL task, the apparatus comprising:
the information acquisition module is used for acquiring metadata information of the ETL task;
the parameter determining module is used for determining the evaluation parameters of the ETL task according to the metadata information; wherein the evaluation parameters are used to characterize the value of the ETL job;
the task processing module is used for processing the ETL task according to the evaluation parameters;
wherein the parameter determination module comprises: a parameter determining submodule and a parameter calculating submodule;
the parameter determining submodule is used for determining n sub-evaluation parameters of the ETL task according to the metadata information, wherein the n sub-evaluation parameters comprise a complexity evaluation parameter, a cost evaluation parameter and a dependency evaluation parameter, and n is a positive integer; the complexity evaluation parameter and the complexity level of the ETL task are in a positive correlation relationship, the cost evaluation parameter and the cost information of the ETL task are in a positive correlation relationship, and the dependency evaluation parameter and the number of the downstream dependent tasks of the ETL task are in a positive correlation relationship;
and the parameter calculation submodule is used for calculating the evaluation parameters of the ETL task according to the n sub-evaluation parameters.
12. A computer device, characterized in that the computer device comprises a processor and a memory, in which a computer program is stored, which computer program is loaded and executed by the processor to implement the method according to any of claims 1 to 10.
13. A non-transitory computer-readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1 to 10.
CN201910872609.0A 2019-09-16 2019-09-16 ETL task processing method, device, equipment and storage medium Active CN110580265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910872609.0A CN110580265B (en) 2019-09-16 2019-09-16 ETL task processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910872609.0A CN110580265B (en) 2019-09-16 2019-09-16 ETL task processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110580265A CN110580265A (en) 2019-12-17
CN110580265B true CN110580265B (en) 2020-11-20

Family

ID=68812096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910872609.0A Active CN110580265B (en) 2019-09-16 2019-09-16 ETL task processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110580265B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680085A (en) * 2020-05-07 2020-09-18 北京三快在线科技有限公司 Data processing task analysis method and device, electronic equipment and readable storage medium
CN112650661A (en) * 2020-12-29 2021-04-13 北京嘀嘀无限科技发展有限公司 Data processing quality control method, data processing quality control device, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8200614B2 (en) * 2008-04-30 2012-06-12 SAP France S.A. Apparatus and method to transform an extract transform and load (ETL) task into a delta load task
US8719769B2 (en) * 2009-08-18 2014-05-06 Hewlett-Packard Development Company, L.P. Quality-driven ETL design optimization
CN102117306B (en) * 2010-01-04 2013-05-22 阿里巴巴集团控股有限公司 Method and system for monitoring ETL (extract-transform-load) data processing process
CN109947746B (en) * 2017-10-26 2023-12-26 亿阳信通股份有限公司 Data quality control method and system based on ETL flow
CN109902117B (en) * 2019-02-19 2021-07-06 新华三大数据技术有限公司 Business system analysis method and device

Also Published As

Publication number Publication date
CN110580265A (en) 2019-12-17

Similar Documents

Publication Publication Date Title
CN108833458B (en) Application recommendation method, device, medium and equipment
CN110866181A (en) Resource recommendation method, device and storage medium
CN108345601B (en) Search result ordering method and device
CN113254472B (en) Parameter configuration method, device, equipment and readable storage medium
CN110580265B (en) ETL task processing method, device, equipment and storage medium
CN110764898A (en) Task allocation method and device, readable storage medium and terminal equipment
CN112733034A (en) Content recommendation method, device, equipment and storage medium
CN111611228A (en) Load balance adjustment method and device based on distributed database
CN116610821A (en) Knowledge graph-based enterprise risk analysis method, system and storage medium
CN116737373A (en) Load balancing method, device, computer equipment and storage medium
CN113779116B (en) Object ordering method, related equipment and medium
CN115168509A (en) Processing method and device of wind control data, storage medium and computer equipment
CN113256422B (en) Method and device for identifying bin account, computer equipment and storage medium
CN115203556A (en) Score prediction model training method and device, electronic equipment and storage medium
CN110134575B (en) Method and device for calculating service capacity of server cluster
CN108471362B (en) Resource allocation prediction technique and device
CN111783883A (en) Abnormal data detection method and device
CN113495831A (en) Method, system, device and medium for generating test case based on keyword
CN117240773B (en) Method, device, equipment and medium for arranging nodes of power communication network
CN115879826B (en) Fine chemical process quality inspection method, system and medium based on big data
CN114219315A (en) Marketing effect evaluation method and device, computer equipment and storage medium
CN114185548A (en) Code review method and device based on artificial intelligence, storage medium and server
CN116108272A (en) Intelligent recommendation method, device, equipment and medium for topics
CN116610581A (en) Test result generation method and device
CN111460823A (en) Target object pairing method and device based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant