CN110580265A - ETL task processing method, device, equipment and storage medium - Google Patents
ETL task processing method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110580265A CN110580265A CN201910872609.0A CN201910872609A CN110580265A CN 110580265 A CN110580265 A CN 110580265A CN 201910872609 A CN201910872609 A CN 201910872609A CN 110580265 A CN110580265 A CN 110580265A
- Authority
- CN
- China
- Prior art keywords
- etl task
- etl
- evaluation
- determining
- evaluation parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title abstract description 13
- 238000011156 evaluation Methods 0.000 claims abstract description 220
- 238000012545 processing Methods 0.000 claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 44
- 238000013507 mapping Methods 0.000 claims description 23
- 230000001419 dependent effect Effects 0.000 claims description 21
- 230000000875 corresponding effect Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 11
- 230000002596 correlated effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 8
- 238000010801 machine learning Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000005457 optimization Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000011158 quantitative evaluation Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a processing method, a processing device, processing equipment and a storage medium of an ETL task. The method comprises the following steps: acquiring metadata information of an ETL task; determining an evaluation parameter of the ETL task according to the metadata information, wherein the evaluation parameter is used for representing the value of the ETL task; and processing the ETL task according to the evaluation parameters. In the technical scheme provided by the embodiment of the application, the evaluation parameter of the ETL task is determined according to the metadata information by acquiring the metadata information of the ETL task, and then the ETL task is processed based on the evaluation parameter; therefore, quantitative representation of the value of the ETL task is achieved by quantitatively evaluating the ETL task, automatic management of the ETL task is further achieved, and compared with a manual management mode, the method is higher in efficiency and lower in cost.
Description
Technical Field
The embodiment of the application relates to the technical field of computers and internet, in particular to a processing method, a processing device, processing equipment and a storage medium of an ETL task.
background
an ETL (Extract-Transform-Load) task is a task for implementing processing of data, and is commonly used in database systems and data warehouses.
In the related art, for an ETL task that has been released online, it needs to be manually checked to decide whether the ETL task needs to be offline.
disclosure of Invention
the embodiment of the application provides a processing method, a processing device, processing equipment and a storage medium of an ETL task.
the technical scheme is as follows:
In one aspect, an embodiment of the present application provides a method for processing an ETL task, where the method includes:
Acquiring metadata information of an ETL task;
determining an evaluation parameter of the ETL task according to the metadata information; wherein the evaluation parameters are used to characterize the value of the ETL job;
And processing the ETL task according to the evaluation parameters.
In another aspect, an embodiment of the present application provides an ETL task processing apparatus, where the apparatus includes:
the information acquisition module is used for acquiring metadata information of the ETL task;
The parameter determining module is used for determining the evaluation parameters of the ETL task according to the metadata information; wherein the evaluation parameters are used to characterize the value of the ETL job;
And the task processing module is used for processing the ETL task according to the evaluation parameters.
In another aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the processing method of the ETL task.
In still another aspect, the present application provides a non-transitory computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the processing method of the ETL task.
the technical scheme provided by the embodiment of the application can bring the following beneficial effects:
Determining an evaluation parameter of the ETL task according to the metadata information by acquiring the metadata information of the ETL task, and then processing the ETL task based on the evaluation parameter; therefore, quantitative representation of the value of the ETL task is achieved by quantitatively evaluating the ETL task, automatic management of the ETL task is further achieved, and compared with a manual management mode, the method is higher in efficiency and lower in cost.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for processing an ETL task provided by an embodiment of the present application;
FIG. 2 is a flow chart of a method for processing ETL tasks according to another embodiment of the present application;
FIG. 3 is a schematic diagram of evaluation parameters of an ETL job provided by one embodiment of the present application;
FIG. 4 is a block diagram of a processing device for ETL tasks provided by one embodiment of the present application;
FIG. 5 is a block diagram of a processing device for ETL tasks provided by another embodiment of the present application;
Fig. 6 is a block diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
In the method provided by the embodiment of the present application, the execution subject of each step may be a Computer device, and the Computer device may be any electronic device with data processing and storage functions, such as a PC (Personal Computer) or a server. For convenience of explanation, in the following method embodiments, only the execution subject of each step is described as a computer device.
referring to fig. 1, a flowchart of a processing method of an ETL task according to an embodiment of the present application is shown, where the method may include the following steps (101-103):
Step 101, obtaining metadata information of an ETL task.
The metadata information of the ETL task refers to information related to the ETL task, such as information capable of reflecting the characteristics or attributes of the ETL task in multiple different dimensions. For example, the metadata information of the ETL task includes information that can reflect the characteristics or attributes of the ETL task in multiple different dimensions, such as freshness, complexity, cost, dependency, and query.
And 102, determining an evaluation parameter of the ETL task according to the metadata information.
The evaluation parameters of the ETL task are used for characterizing the value of the ETL task. Optionally, the evaluation parameter of the ETL task is in a positive correlation with the value of the ETL task, and a higher evaluation parameter of the ETL task indicates a higher value of the ETL task; conversely, a lower evaluation parameter for an ETL job indicates a lower value for the ETL job. Alternatively, the evaluation parameter may be a score.
in a possible implementation manner, information of multiple dimensions can be extracted from metadata information of an ETL task, then sub-evaluation parameters of the ETL task in each dimension are calculated according to the information of each dimension, and the sub-evaluation parameters of the ETL task in each dimension are integrated to obtain the evaluation parameters of the ETL task.
in another possible implementation, the machine learning model may be used to process the metadata information of the ETL task and output the evaluation parameters of the ETL task. The machine learning model may be referred to as an ETL scoring model, which may be obtained by training a neural network using a machine learning algorithm. The input parameters of the machine learning model may include information of multiple dimensions extracted from metadata information of the ETL task, and the machine learning model performs processing such as feature extraction, recombination, abstraction and the like on the information of multiple dimensions, and finally outputs evaluation parameters of the ETL task.
And 103, processing the ETL task according to the evaluation parameters.
Because the evaluation parameters of the ETL task represent the value of the ETL task, after the evaluation parameters of the ETL task are determined, the ETL task can be processed, for example, the ETL task with lower evaluation parameters (i.e., lower value) is offline processed, and the ETL task with higher evaluation parameters (i.e., higher value) can be kept running on line.
to sum up, in the technical solution provided in the embodiment of the present application, by obtaining metadata information of an ETL task, an evaluation parameter of the ETL task is determined according to the metadata information, and then the ETL task is processed based on the evaluation parameter; therefore, quantitative representation of the value of the ETL task is achieved by quantitatively evaluating the ETL task, automatic management of the ETL task is further achieved, and compared with a manual management mode, the method is higher in efficiency and lower in cost.
referring to fig. 2, a flowchart of a processing method of an ETL task according to another embodiment of the present application is shown, where the method may include the following steps (201-204):
step 201, obtaining metadata information of an ETL task.
the metadata information of the ETL task refers to information related to the ETL task, such as information capable of reflecting the characteristics or attributes of the ETL task in multiple different dimensions. For example, the metadata information of the ETL task includes information that can reflect the characteristics or attributes of the ETL task in multiple different dimensions, such as freshness, complexity, cost, dependency, and query.
step 202, determining n sub-evaluation parameters of the ETL task according to the metadata information, where the n sub-evaluation parameters include a complexity evaluation parameter, a cost evaluation parameter, and a dependency evaluation parameter, and n is a positive integer.
In the embodiment of the application, value evaluation can be performed on the ETL task from 3 different dimensions of complexity, cost and dependency.
optionally, the n sub-evaluation parameters further include a freshness evaluation parameter and/or a query degree evaluation parameter. In one possible implementation, the ETL task can be evaluated for value from 5 different dimensions, freshness, complexity, cost, dependency, and query.
The freshness evaluation parameter is used for representing the freshness of the ETL task, if the freshness evaluation parameter can be in positive correlation with the freshness, the higher the freshness evaluation parameter of one ETL task is, the higher the freshness of the ETL task is; conversely, a lower freshness evaluation parameter for an ETL job indicates a lower freshness for the ETL job.
Illustratively, the freshness evaluation parameter may be determined as follows:
1. determining the on-line time of the ETL task according to the metadata information of the ETL task;
2. and determining a freshness evaluation parameter of the ETL task according to the online time of the ETL task.
the on-line time of the ETL task can be obtained by calculating according to the on-line time and the current time of the ETL task. Optionally, the freshness evaluation parameter is inversely related to the elapsed time. That is, the shorter the online time is, the higher the freshness evaluation parameter is, indicating the higher the freshness of the ETL task; conversely, the longer the elapsed time period, the lower the freshness evaluation parameter, indicating the lower freshness of the ETL task.
In one example, a preset constant may be divided by the length of time that the ETL task has been on-line to obtain a freshness evaluation parameter of the ETL task. Of course, the above examples are only exemplary and explanatory, and other calculation formulas may be set, or a mapping relationship table may be queried to determine the freshness evaluation parameter of the ETL task according to the online time of the ETL task, which is not limited in the embodiment of the present application.
the complexity evaluation parameter is used for representing the complexity of the ETL task, if the complexity evaluation parameter can be in positive correlation with the complexity, the higher the complexity evaluation parameter of one ETL task is, the higher the complexity of the ETL task is; conversely, the lower the complexity evaluation parameter of an ETL task, the lower the complexity of the ETL task.
Illustratively, the complexity evaluation parameter may be determined as follows:
1. Determining the complexity level of the ETL task according to the metadata information of the ETL task;
2. And determining a complexity evaluation parameter of the ETL task according to the complexity level of the ETL task.
The complexity level of the ETL task may be preset, for example, including 5 levels of very simple, relatively simple, medium, relatively complex, and very complex, each level corresponding to a different complexity evaluation parameter. Optionally, the complexity evaluation parameter is positively correlated with the complexity level. That is, the higher the complexity level, the higher the complexity evaluation parameter, indicating that the complexity of the ETL task is higher; conversely, the lower the complexity level, the lower the complexity evaluation parameter, indicating that the complexity of the ETL task is lower.
in one example, complexity information may be extracted from metadata information of the ETL task, where the complexity information refers to information used for characterizing complexity of the ETL task, such as some information related to complexity, such as logic plan depth, the number of read tables, and the data amount of the read tables, and based on the complexity information, the complexity level of the ETL task is determined. For example, the complexity level of the ETL task may be determined by querying the mapping relationship table, by using a machine learning model, or by using other methods based on the complexity information. For example, the computer device queries the first mapping relation table, and determines a complexity level corresponding to the complexity information extracted from the metadata information of the ETL task as the complexity level of the ETL task; the first mapping relation table comprises at least one set of mapping relation between complexity information and complexity levels.
The cost evaluation parameter is used for representing the cost of the ETL task, if the cost evaluation parameter can be in positive correlation with the cost, the higher the cost evaluation parameter of one ETL task is, the higher the cost of the ETL task is; conversely, the lower the cost evaluation parameter of an ETL task, the lower the cost of the ETL task.
For example, the cost evaluation parameter may be determined as follows:
1. determining cost information of the ETL task according to the metadata information of the ETL task, wherein the cost information can comprise information of at least one dimension of the following: calculating cost, storage cost and time cost;
2. And determining a cost degree evaluation parameter of the ETL task according to the cost information of the ETL task.
the calculation cost refers to an amount of calculation and processing resources required for executing the ETL task, such as CPU occupancy, the storage cost refers to an amount of storage resources required for storing the ETL task, such as an amount of data of the ETL task in the storage device, and the time cost refers to a time required for executing the ETL task.
In addition, the computer device may determine the cost evaluation parameter of the ETL task by querying the mapping relationship table, by using a machine learning model, or by using another method based on the cost information of the ETL task.
In addition, when calculating the cost evaluation parameter of the ETL task, the same or different weights may be assigned to cost information of different dimensions. For example, the computation cost, the storage cost, and the time cost are weighted by 0.25, and 0.5, respectively. The weight can be preset and adjusted in combination with the actual situation, which is not limited in the embodiment of the present application. And the computer equipment calculates the cost evaluation parameters of the ETL task according to the cost information of each dimension of the ETL task and the corresponding weight. Through the method, the influence proportion of the cost information with different dimensions on the cost of the ETL task can be flexibly adjusted, and the flexibility and the accuracy of cost evaluation are improved.
The dependency evaluation parameters are used for representing the dependency of the ETL tasks, if the dependency evaluation parameters can be in positive correlation with the dependency, the higher the dependency evaluation parameter of one ETL task is, the higher the dependency of the ETL task is; conversely, the lower the dependency evaluation parameter of an ETL task, the lower the dependency of the ETL task is.
illustratively, the dependency evaluation parameter may be determined as follows:
1. determining the number of downstream dependent tasks of the ETL task according to the metadata information of the ETL task;
2. And determining the dependency evaluation parameter of the ETL task according to the number of the downstream dependent tasks of the ETL task.
A dependent task downstream of an ETL task refers to other ETL tasks that need to be dependent on the execution results of the ETL task when executed. Optionally, the dependency evaluation parameter has a positive correlation with the number of downstream dependent tasks. That is, the more the number of the downstream dependent tasks is, the higher the dependency evaluation parameter is, which indicates that the dependency of the ETL task is higher; conversely, the smaller the number of downstream dependent tasks, the lower the dependency evaluation parameter, indicating that the ETL task has a lower dependency.
In addition, the computer device may determine the dependency evaluation parameter of the ETL task by querying the mapping relationship table, by formula calculation, or by other methods based on the number of the downstream dependent tasks of the ETL task. In one example, the computer device queries the second mapping relation table, and determines the dependency evaluation parameter corresponding to the number of the downstream dependent tasks of the ETL task as the dependency evaluation parameter of the ETL task; and the second mapping relation table comprises the mapping relation between the number of at least one group of downstream dependent tasks and the dependency evaluation parameter.
the query degree evaluation parameter is used for representing the query degree of the ETL task, if the query degree evaluation parameter can be in positive correlation with the query degree, the higher the query degree evaluation parameter of one ETL task is, the higher the query degree of the ETL task is; conversely, the lower the query degree evaluation parameter of an ETL task, the lower the query degree of the ETL task is indicated.
illustratively, the query degree evaluation parameter may be determined as follows:
1. Determining the query times of the ETL task in a target time period according to the metadata information of the ETL task;
2. And determining the query degree evaluation parameter of the ETL task according to the query times of the ETL task in the target time period.
the target period may be a predetermined period, for example, the target period may be a historical period from the current time onward. The duration of the target period may be preset in combination with the actual situation, for example, the target period may be 30 days, 60 days, 120 days, or the like. The query times of the ETL task in the target time period refer to the total number of times of the ETL task being executed in the target time period. Optionally, the query degree evaluation parameter has a positive correlation with the query times. That is, the query frequency is more, the query degree evaluation parameter is higher, which indicates that the query degree of the ETL task is higher; conversely, the smaller the number of queries, the lower the query degree evaluation parameter, which indicates that the query degree of the ETL task is lower.
In addition, the computer device may determine the query degree evaluation parameter of the ETL task in a manner of querying the mapping relationship table, in a manner of formula calculation, or in another manner, based on the number of queries of the ETL task in the target time period.
In one example, the average query times of the ETL task in multiple different historical periods may be calculated, and then the query degree evaluation parameter of the ETL task is calculated by a weighted summation manner according to the average query times in the multiple different historical periods. For example, the average query times of the ETL task in the last 30 days, 60 days, 90 days, and 120 days are calculated, wherein the weights corresponding to the average query times in the last 30 days, 60 days, 90 days, and 120 days are 0.4, 0.3, 0.2, and 0.1, respectively, and the query degree evaluation parameter of the ETL task is calculated by a weighted summation method. The closer the history period is to the current time, the higher the corresponding weight may be. By the method, the influence proportion of the average query times in different historical time periods on the query degree of the ETL task can be flexibly adjusted, and the flexibility and the accuracy of the query degree evaluation are improved.
It should be noted that, in the embodiment of the present application, quantitative evaluation of ETL tasks is only performed from 5 different dimensions, i.e., freshness, complexity, cost, dependency, and query degree, for example, and a description is provided. In practical application, the dimension of quantitative evaluation of the ETL task can be added, modified or deleted according to actual needs, which is not limited in the embodiment of the present application. In addition, the above-described sub-evaluation parameter calculation method for each dimension is only exemplary and explanatory, and can be flexibly designed and adjusted in practical application.
And step 203, calculating the evaluation parameters of the ETL task according to the n sub-evaluation parameters.
After determining the sub-evaluation parameters of the ETL task in a plurality of different dimensions, the computer device can further calculate the final evaluation parameters of the ETL task.
In one example, the n sub-evaluation parameters are added to obtain the evaluation parameters of the ETL task. For example, if the freshness degree evaluation parameter, the complexity degree evaluation parameter, the cost degree evaluation parameter, the dependency degree evaluation parameter, and the query degree evaluation parameter of the ETL job are 8, 6, 7, 8, and 10 in this order, the evaluation parameter of the ETL job is 8+6+7+8+10 — 39.
In another example, n sub-evaluation parameters are weighted and summed to obtain the evaluation parameters of the ETL task. For example, if the freshness degree evaluation parameter, the complexity degree evaluation parameter, the cost degree evaluation parameter, the dependency degree evaluation parameter, and the query degree evaluation parameter of the ETL job are 8, 6, 7, 8, and 10 in this order, and the weights of the above 5 dimensions are 0.1, 0.3, and 0.2 in this order, the evaluation parameter of the ETL job is 8 × 0.1+6 × 0.1+7 × 0.3+8 × 0.3+10 × 0.2 — 7.9.
of course, the evaluation parameter of the ETL task may also be calculated according to the n sub-evaluation parameters in other manners, for example, an average value of the n sub-evaluation parameters is used as the evaluation parameter of the ETL task, and the like, which is not limited in this embodiment of the present application.
It should be noted that, the value ranges of the scores of the above-mentioned freshness evaluation parameter, complexity evaluation parameter, cost evaluation parameter, dependency evaluation parameter, query evaluation parameter, and other sub-evaluation parameters with different dimensions may be the same or different. For example, as shown in fig. 3, the score value range of the sub-evaluation parameters of each dimension is [0,10], and if the evaluation parameters of the ETL task are obtained by adding the sub-evaluation parameters of 5 dimensions, the value range of the evaluation parameters of the ETL task is [0,50 ].
and step 204, processing the ETL task according to the evaluation parameters.
because the evaluation parameters of the ETL task represent the value of the ETL task, the ETL task can be processed after the evaluation parameters of the ETL task are determined.
In one example, if the evaluation parameter of the ETL task is in the first value interval, performing offline processing on the ETL task; if the evaluation parameter of the ETL task is in the second value interval, optimizing the ETL task; if the evaluation parameter of the ETL task is in the third value interval, keeping the ETL task running on line; wherein the first value interval is smaller than the second value interval, and the second value interval is smaller than the third value interval.
For example, assuming that the evaluation parameter range of the ETL task is [0,50], when the evaluation parameter of the ETL task is [0,10], performing offline processing on the ETL task; when the evaluation parameters of the ETL task are in [11,30], carrying out optimization processing on the ETL task; when the evaluation parameter of the ETL task is [31,50], the ETL task is considered to be normal, the ETL task is kept to run on the line, and optimization processing is not needed.
To sum up, in the technical solution provided in the embodiment of the present application, by obtaining metadata information of an ETL task, an evaluation parameter of the ETL task is determined according to the metadata information, and then the ETL task is processed based on the evaluation parameter; therefore, quantitative representation of the value of the ETL task is achieved by quantitatively evaluating the ETL task, automatic management of the ETL task is further achieved, and compared with a manual management mode, the method is higher in efficiency and lower in cost.
in addition, the ETL task is quantitatively evaluated from 5 different dimensions of freshness, complexity, cost, dependence and query degree, the evaluation dimension considered is more comprehensive, the evaluation accuracy of the ETL task is improved, and the accuracy of optimization or offline processing of the ETL task is improved.
in addition, by adopting the technical scheme provided by the embodiment of the application, when a large batch of ETL tasks are required to be processed, most tasks required to be offline can be screened out at one time, most tasks required to be optimized can be screened out at one time, and the management efficiency of the ETL tasks is fully improved.
in addition, the technical scheme provided by the embodiment of the application can be applied to any scene needing ETL task management, such as the fields of data processing, data warehouse and the like.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 4, a block diagram of a processing device for an ETL task according to an embodiment of the present application is shown. The device has the functions of realizing the method examples, and the functions can be realized by hardware or by hardware executing corresponding software. The apparatus may be the computer device described above, or may be provided in a computer device. As shown in fig. 4, the apparatus 400 may include: an information acquisition module 410, a parameter determination module 420, and a task processing module 430.
An information obtaining module 410, configured to obtain metadata information of the ETL task.
A parameter determining module 420, configured to determine an evaluation parameter of the ETL task according to the metadata information; wherein the evaluation parameters are used to characterize the value of the ETL job.
And a task processing module 430, configured to process the ETL task according to the evaluation parameter.
In an exemplary embodiment, as shown in fig. 5, the parameter determination module 420 includes: a parameter determination submodule 421 and a parameter calculation submodule 422.
The parameter determining sub-module 421 is configured to determine n sub-evaluation parameters of the ETL task according to the metadata information, where the n sub-evaluation parameters include a complexity evaluation parameter, a cost evaluation parameter, and a dependency evaluation parameter, and n is a positive integer.
A parameter calculating sub-module 422, configured to calculate the evaluation parameter of the ETL task according to the n sub-evaluation parameters.
optionally, the n sub-evaluation parameters further include a freshness evaluation parameter and/or a query degree evaluation parameter.
In an exemplary embodiment, as shown in fig. 5, the parameter determining submodule 421 includes a freshness determining unit 421a for: determining the on-line time of the ETL task according to the metadata information; determining a freshness evaluation parameter of the ETL task according to the on-line time of the ETL task; wherein, the freshness evaluation parameter and the on-line time length are in a negative correlation relationship.
In an exemplary embodiment, as shown in fig. 5, the parameter determining submodule 421 includes a complexity determining unit 421b for: determining the complexity level of the ETL task according to the metadata information; determining a complexity evaluation parameter of the ETL task according to the complexity level of the ETL task; wherein the complexity evaluation parameter is positively correlated with the complexity level.
Optionally, the complexity determining unit 421b is configured to: extracting complexity information from the metadata information, wherein the complexity information is information for representing the complexity of the ETL task; inquiring a first mapping relation table, and determining the complexity level corresponding to the complexity information as the complexity level of the ETL task; wherein the first mapping relation table comprises at least one set of mapping relation between complexity information and complexity levels.
In an exemplary embodiment, as shown in fig. 5, the parameter determining submodule 421 includes a cost determining unit 421c for: determining cost information of the ETL task according to the metadata information, wherein the cost information comprises information of at least one dimension of the following: calculating cost, storage cost and time cost; and determining a cost degree evaluation parameter of the ETL task according to the cost information of the ETL task.
optionally, the cost determination unit 421c is configured to: and calculating cost evaluation parameters of the ETL task according to the cost information of each dimension of the ETL task and the corresponding weight.
in an exemplary embodiment, as shown in fig. 5, the parameter determining submodule 421 includes a dependency determining unit 421d for: determining the number of downstream dependent tasks of the ETL task according to the metadata information; determining a dependency evaluation parameter of the ETL task according to the number of the downstream dependent tasks of the ETL task; wherein the dependency evaluation parameter has a positive correlation with the number of the downstream dependent tasks.
Optionally, the dependency determination unit 421d is configured to: inquiring a second mapping relation table, and determining a dependency evaluation parameter corresponding to the number of the downstream dependent tasks of the ETL task as the dependency evaluation parameter of the ETL task; the second mapping relation table comprises mapping relations between the number of at least one group of downstream dependent tasks and the dependency evaluation parameters.
In an exemplary embodiment, as shown in fig. 5, the parameter determining sub-module 421 includes a query degree determining unit 421e, configured to: determining the query times of the ETL task in a target time period according to the metadata information; determining a query degree evaluation parameter of the ETL task according to the query times; and the query degree evaluation parameter and the query times are in positive correlation.
In an exemplary embodiment, as shown in fig. 5, the parameter calculation sub-module 422 is configured to: adding the n sub-evaluation parameters to obtain the evaluation parameters of the ETL task; or, performing weighted summation on the n sub-evaluation parameters to obtain the evaluation parameters of the ETL task.
In an exemplary embodiment, the task processing module 430 is configured to:
If the evaluation parameter is in a first value range, performing offline processing on the ETL task;
If the evaluation parameter is in a second value range, optimizing the ETL task;
If the evaluation parameter is in a third value interval, keeping the ETL task;
The first value interval is smaller than the second value interval, and the second value interval is smaller than the third value interval.
To sum up, in the technical solution provided in the embodiment of the present application, by obtaining metadata information of an ETL task, an evaluation parameter of the ETL task is determined according to the metadata information, and then the ETL task is processed based on the evaluation parameter; therefore, quantitative representation of the value of the ETL task is achieved by quantitatively evaluating the ETL task, automatic management of the ETL task is further achieved, and compared with a manual management mode, the method is higher in efficiency and lower in cost.
it should be noted that: in the above embodiment, when the device implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
referring to fig. 6, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device may be used to implement the processing method of the ETL task provided in the above embodiments. Specifically, the method comprises the following steps:
the computer device 600 includes a Central Processing Unit (CPU)601, a system memory 604 including a Random Access Memory (RAM)602 and a Read Only Memory (ROM)603, and a system bus 605 connecting the system memory 604 and the central processing unit 601. The computer device 600 also includes a basic input/output system (I/O system) 606 for facilitating information transfer between various elements within the computer, and a mass storage device 607 for storing an operating system 613, application programs 614, and other program modules 612.
the basic input/output system 606 includes a display 608 for displaying information and an input device 609 such as a mouse, keyboard, etc. for user input of information. Wherein a display 608 and an input device 609 are connected to the central processing unit 601 through an input output controller 610 connected to the system bus 605. The basic input/output system 606 may also include an input/output controller 610 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input/output controller 610 may also provide output to a display screen, a printer, or other type of output device.
The mass storage device 607 is connected to the central processing unit 601 through a mass storage controller (not shown) connected to the system bus 605. The mass storage device 607 and its associated computer-readable media provide non-volatile storage for the computer device 600. That is, mass storage device 607 may include a computer-readable medium (not shown), such as a hard disk or CD-ROM drive.
Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 604 and mass storage device 607 described above may be collectively referred to as memory.
according to various embodiments of the present application, the computer device 600 may also operate as a remote computer connected to a network through a network, such as the Internet. That is, the computer device 600 may be connected to the network 612 through the network interface unit 66 connected to the system bus 605, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 611.
The memory also includes a computer program stored in the memory and configured to be executed by the one or more processors to implement the processing method of the ETL task described above.
in an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the processing method of the ETL task described above.
in an exemplary embodiment, a computer program product is also provided, which, when being executed by a processor, is adapted to carry out the processing method of the ETL task described above.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.
the above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (14)
1. A method for processing an ETL task, the method comprising:
Acquiring metadata information of an ETL task;
Determining an evaluation parameter of the ETL task according to the metadata information; wherein the evaluation parameters are used to characterize the value of the ETL job;
and processing the ETL task according to the evaluation parameters.
2. The method of claim 1, wherein determining the evaluation parameters of the ETL task according to the metadata information comprises:
Determining n sub-evaluation parameters of the ETL task according to the metadata information, wherein the n sub-evaluation parameters comprise a complexity evaluation parameter, a cost evaluation parameter and a dependency evaluation parameter, and n is a positive integer;
And calculating the evaluation parameters of the ETL task according to the n sub-evaluation parameters.
3. The method according to claim 2, wherein the n sub-rating parameters further comprise a freshness rating parameter and/or a query degree rating parameter.
4. The method of claim 2, wherein said determining n sub-evaluation parameters of said ETL task according to said metadata information comprises:
Determining the complexity level of the ETL task according to the metadata information;
Determining a complexity evaluation parameter of the ETL task according to the complexity level of the ETL task;
Wherein the complexity evaluation parameter is positively correlated with the complexity level.
5. The method of claim 4, wherein determining the complexity level of the ETL task according to the metadata information comprises:
Extracting complexity information from the metadata information, wherein the complexity information is information for representing the complexity of the ETL task;
Inquiring a first mapping relation table, and determining the complexity level corresponding to the complexity information as the complexity level of the ETL task;
Wherein the first mapping relation table comprises at least one set of mapping relation between complexity information and complexity levels.
6. The method of claim 2, wherein said determining n sub-evaluation parameters of said ETL task according to said metadata information comprises:
Determining cost information of the ETL task according to the metadata information, wherein the cost information comprises information of at least one dimension of the following: calculating cost, storage cost and time cost;
And determining a cost degree evaluation parameter of the ETL task according to the cost information of the ETL task.
7. The method of claim 6, wherein determining a cost evaluation parameter of the ETL job according to the cost information of the ETL job comprises:
and calculating cost evaluation parameters of the ETL task according to the cost information of each dimension of the ETL task and the corresponding weight.
8. the method of claim 2, wherein said determining n sub-evaluation parameters of said ETL task according to said metadata information comprises:
determining the number of downstream dependent tasks of the ETL task according to the metadata information;
determining a dependency evaluation parameter of the ETL task according to the number of the downstream dependent tasks of the ETL task;
Wherein the dependency evaluation parameter has a positive correlation with the number of the downstream dependent tasks.
9. The method according to claim 8, wherein determining the dependency evaluation parameter of the ETL task according to the number of the dependent tasks downstream of the ETL task comprises:
Inquiring a second mapping relation table, and determining a dependency evaluation parameter corresponding to the number of the downstream dependent tasks of the ETL task as the dependency evaluation parameter of the ETL task;
the second mapping relation table comprises mapping relations between the number of at least one group of downstream dependent tasks and the dependency evaluation parameters.
10. The method according to claim 2, wherein said calculating said evaluation parameters of said ETL task from said n sub-evaluation parameters comprises:
Adding the n sub-evaluation parameters to obtain the evaluation parameters of the ETL task;
Or,
and carrying out weighted summation on the n sub-evaluation parameters to obtain the evaluation parameters of the ETL task.
11. the method according to any of the claims 1 to 10, wherein said processing said ETL task according to said evaluation parameters comprises:
If the evaluation parameter is in a first value range, performing offline processing on the ETL task;
if the evaluation parameter is in a second value range, optimizing the ETL task;
If the evaluation parameter is in a third value interval, keeping the ETL task;
the first value interval is smaller than the second value interval, and the second value interval is smaller than the third value interval.
12. An apparatus for processing an ETL task, the apparatus comprising:
The information acquisition module is used for acquiring metadata information of the ETL task;
The parameter determining module is used for determining the evaluation parameters of the ETL task according to the metadata information; wherein the evaluation parameters are used to characterize the value of the ETL job;
And the task processing module is used for processing the ETL task according to the evaluation parameters.
13. A computer device, characterized in that the computer device comprises a processor and a memory, in which a computer program is stored, which computer program is loaded and executed by the processor to implement the method according to any of claims 1 to 11.
14. A non-transitory computer-readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1 to 11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910872609.0A CN110580265B (en) | 2019-09-16 | 2019-09-16 | ETL task processing method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910872609.0A CN110580265B (en) | 2019-09-16 | 2019-09-16 | ETL task processing method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110580265A true CN110580265A (en) | 2019-12-17 |
CN110580265B CN110580265B (en) | 2020-11-20 |
Family
ID=68812096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910872609.0A Active CN110580265B (en) | 2019-09-16 | 2019-09-16 | ETL task processing method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110580265B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111680085A (en) * | 2020-05-07 | 2020-09-18 | 北京三快在线科技有限公司 | Data processing task analysis method and device, electronic equipment and readable storage medium |
CN112650661A (en) * | 2020-12-29 | 2021-04-13 | 北京嘀嘀无限科技发展有限公司 | Data processing quality control method, data processing quality control device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102117306A (en) * | 2010-01-04 | 2011-07-06 | 阿里巴巴集团控股有限公司 | Method and system for monitoring ETL (extract-transform-load) data processing process |
US8200614B2 (en) * | 2008-04-30 | 2012-06-12 | SAP France S.A. | Apparatus and method to transform an extract transform and load (ETL) task into a delta load task |
US8719769B2 (en) * | 2009-08-18 | 2014-05-06 | Hewlett-Packard Development Company, L.P. | Quality-driven ETL design optimization |
CN109902117A (en) * | 2019-02-19 | 2019-06-18 | 新华三大数据技术有限公司 | Operation system analysis method and device |
CN109947746A (en) * | 2017-10-26 | 2019-06-28 | 亿阳信通股份有限公司 | A kind of quality of data management-control method and system based on ETL process |
-
2019
- 2019-09-16 CN CN201910872609.0A patent/CN110580265B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8200614B2 (en) * | 2008-04-30 | 2012-06-12 | SAP France S.A. | Apparatus and method to transform an extract transform and load (ETL) task into a delta load task |
US8719769B2 (en) * | 2009-08-18 | 2014-05-06 | Hewlett-Packard Development Company, L.P. | Quality-driven ETL design optimization |
CN102117306A (en) * | 2010-01-04 | 2011-07-06 | 阿里巴巴集团控股有限公司 | Method and system for monitoring ETL (extract-transform-load) data processing process |
CN109947746A (en) * | 2017-10-26 | 2019-06-28 | 亿阳信通股份有限公司 | A kind of quality of data management-control method and system based on ETL process |
CN109902117A (en) * | 2019-02-19 | 2019-06-18 | 新华三大数据技术有限公司 | Operation system analysis method and device |
Non-Patent Citations (1)
Title |
---|
ALKIS SIMITSIS.ETL: "QoX-Driven ETL Design: Reducing the Cost of ETL Consulting Engagements", 《PROCEEDINGS OF THE 2009 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111680085A (en) * | 2020-05-07 | 2020-09-18 | 北京三快在线科技有限公司 | Data processing task analysis method and device, electronic equipment and readable storage medium |
CN112650661A (en) * | 2020-12-29 | 2021-04-13 | 北京嘀嘀无限科技发展有限公司 | Data processing quality control method, data processing quality control device, computer equipment and storage medium |
CN112650661B (en) * | 2020-12-29 | 2024-07-09 | 北京嘀嘀无限科技发展有限公司 | Data processing quality control method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110580265B (en) | 2020-11-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109902708B (en) | Recommendation model training method and related device | |
CN108833458B (en) | Application recommendation method, device, medium and equipment | |
CN110764898B (en) | Task allocation method and device, readable storage medium and terminal equipment | |
CN113254472B (en) | Parameter configuration method, device, equipment and readable storage medium | |
CN111177568B (en) | Object pushing method based on multi-source data, electronic device and storage medium | |
CN110580265B (en) | ETL task processing method, device, equipment and storage medium | |
CN105989066A (en) | Information processing method and device | |
CN116610821B (en) | Knowledge graph-based enterprise risk analysis method, system and storage medium | |
CN107688595B (en) | Information retrieval Accuracy Evaluation, device and computer readable storage medium | |
CN111611228A (en) | Load balance adjustment method and device based on distributed database | |
CN111783883A (en) | Abnormal data detection method and device | |
CN104798035A (en) | Regulating application task development | |
CN115168509A (en) | Processing method and device of wind control data, storage medium and computer equipment | |
CN113256422B (en) | Method and device for identifying bin account, computer equipment and storage medium | |
CN110232590B (en) | Scheme generation method and equipment | |
CN115393100A (en) | Resource recommendation method and device | |
CN113553477B (en) | Graph splitting method and device | |
CN113779116A (en) | Object sorting method, related equipment and medium | |
CN110134575B (en) | Method and device for calculating service capacity of server cluster | |
CN112463378A (en) | Server asset scanning method, system, electronic equipment and storage medium | |
CN108471362B (en) | Resource allocation prediction technique and device | |
CN113495831A (en) | Method, system, device and medium for generating test case based on keyword | |
CN110737679B (en) | Data resource query method, device, equipment and storage medium | |
CN111581485B (en) | Information distribution method and device | |
CN108182201B (en) | Application expansion method and device based on key keywords |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |