CN111881143A - Method for managing ETL task - Google Patents
Method for managing ETL task Download PDFInfo
- Publication number
- CN111881143A CN111881143A CN202010750318.7A CN202010750318A CN111881143A CN 111881143 A CN111881143 A CN 111881143A CN 202010750318 A CN202010750318 A CN 202010750318A CN 111881143 A CN111881143 A CN 111881143A
- Authority
- CN
- China
- Prior art keywords
- execution
- task
- etl
- current
- etl task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of data warehouses and discloses a method for managing ETL tasks, which records and monitors the running state of the tasks, manages the ETL tasks, reduces the labor input and reduces the operation and maintenance cost. The invention establishes a basic information table and a record table of each execution condition of the ETL task, judges the execution state of the current ETL task and the latest execution result of the last time parameter of the current ETL task by using an SQL query instruction, and records the running state of the ETL task and the execution result of the ETL task by using an SQL write-in command record; the operation and maintenance personnel only need to inquire the basic information table of the ETL task and the record table of each execution condition of the ETL task, can visually and comprehensively know the execution condition of the ETL task, and do not need to check the log of each task one by one, so that the aims of reducing the labor input and reducing the operation and maintenance cost are fulfilled. The invention is suitable for ETL task management.
Description
Technical Field
The invention relates to the technical field of data warehouses, in particular to a method for managing ETL tasks.
Background
An ETL (Extract-Transform-Load) is a data warehouse technology, and a user can Extract required data from a data source through an ETL task, and Load the data to a destination (such as a predefined data warehouse) through data cleaning, inter-conversion, and other processes.
Currently in large data platforms, there are a large number of ETL timing tasks executing each day. The ETL timing task is managed, and whether the ETL task succeeds or not is judged mainly by checking a task execution log of each task, and then the failed task is rerun. In such a mode, operation and maintenance personnel are required to check the logs of each task, failed tasks need to be rerun in time, and effective monitoring is not carried out on some tasks which do not rerun after failure, so that data confusion and data loss can be caused in the past. The traditional method for managing the ETL task not only needs a great deal of manpower, but also is very easy to cause data loss. A method for managing an ETL task aims at the problems that the traditional ETL task lacks effective monitoring and consumes operation and maintenance manpower, and the operation execution state is monitored by automatically recording the execution result of each link of the ETL task, so that the management of the ETL task is realized, the manpower input is reduced, and the operation and maintenance cost is reduced.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for managing the ETL tasks is provided, the running states of the tasks are recorded and monitored, the ETL tasks are managed, the labor input is reduced, and the operation and maintenance cost is reduced.
In order to solve the problems, the invention adopts the technical scheme that: a method of managing ETL tasks, comprising the steps of:
establishing a basic information table and a recording table of each execution condition of the ETL task, and writing ETL task information to be executed into the basic information table; wherein, the basic information table comprises the following fields: the ETL task name, the current running state and the operation and maintenance responsible person, wherein the record table of the execution condition each time comprises the following fields: the ETL task name, the execution time of each execution of the ETL task, the time parameter of the execution of the ETL task, the execution result of the ETL task and remarks;
before starting the ETL task, inquiring field information corresponding to the current task in a basic information table, and judging whether the current ETL task is executed or not according to the returned field content of the current running state;
if the current ETL task is being executed, inquiring the information of the corresponding task in the basic information table again after waiting for a period of time; if the current ETL task is not in execution, inquiring an execution record corresponding to a last time parameter of the current ETL task in a record table of each execution condition, and judging whether the execution is successful or not when the last time parameter of the task is executed according to an inquiry result;
if the execution result of the current ETL task at the last time parameter is failure, adding a failure record in the execution condition record table of each time, wherein the failure record comprises the execution time of the time, the execution result of the time and first remark information, the first remark information is the execution failure of the task at the last time parameter, and then quitting the whole task flow; if the execution result of the current ETL task at the last time parameter is successful, modifying the current running state of the current ETL task in the basic information table to be executing, and then starting to execute the ETL task;
if the execution result is successful, adding a success record in the record table of each execution condition, wherein the achievement record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the current time and remark information II, and the remark information II is the success of the execution of the ETL task;
if the execution result is failure, adding a failure record in the record table of the execution condition of each time, wherein the failure record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the task of the current time and third remark information, and the third remark information is ETL task execution failure;
and after the current ETL task is executed, modifying the current running state field of the corresponding current ETL task into unexecuted field in the basic information table of the ETL task.
Furthermore, the basic information table and the record table of each execution condition can be inquired through an SQL inquiry command, and the basic information table and the record table of each execution condition can be modified through an SQL write-in command.
The invention has the beneficial effects that: the invention only needs to establish one basic information table and one record table of each execution condition, and records the operation condition of each ETL task by using the record table of each execution condition, and the operation and maintenance personnel only need to inquire the basic information table and the record table of each execution condition of the ETL task, so that the operation and maintenance personnel can intuitively and comprehensively know the execution condition of the ETL task without checking the logs of each task one by one, thereby achieving the aims of reducing the labor input and the operation and maintenance cost.
Drawings
FIG. 1 is an ETL task execution flow diagram of an embodiment.
Detailed Description
In order to record and monitor the task running state, realize the management of ETL task, reduce the human input, reduce the operation and maintenance cost, the invention provides a method for managing ETL task, comprising the following steps:
establishing a basic information table and a recording table of each execution condition of the ETL task, and writing ETL task information to be executed into the basic information table; wherein, the basic information table comprises the following fields: the ETL task name, the current running state and the operation and maintenance responsible person, wherein the record table of the execution condition each time comprises the following fields: the ETL task name, the execution time of each execution of the ETL task, the time parameter of the execution of the ETL task, the execution result of the ETL task and remarks;
before starting the ETL task, inquiring field information corresponding to the current task in a basic information table, and judging whether the current ETL task is executed or not according to the returned field content of the current running state;
if the current ETL task is being executed, inquiring the information of the corresponding task in the basic information table again after waiting for a period of time; if the current ETL task is not in execution, inquiring an execution record corresponding to a last time parameter of the current ETL task in a record table of each execution condition, and judging whether the execution is successful or not when the last time parameter of the task is executed according to an inquiry result;
if the execution result of the current ETL task at the last time parameter is failure, a failure record is newly added in a record table of each execution condition of the ETL task, wherein the failure record comprises the execution time of the current time, the execution result of the current time and remark information I, the remark information I is the execution failure of the task at the last time parameter, then the whole task flow is exited, and the task failure is returned;
if the execution result of the current ETL task at the last time parameter is successful, modifying the current running state of the current ETL task in the basic information table to be executing, and then starting to execute the ETL task;
if the execution result is successful, adding a success record in the record table of each execution condition of the ETL task, wherein the success record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the current time and remark information II, and the remark information II is the success of the execution of the ETL task;
if the execution result is failure, adding a failure record in a record table of each execution condition of the ETL task, wherein the failure record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the task of the current time and third remark information, and the third remark information is the execution failure of the ETL task;
and after the current ETL task is executed, modifying the current running state field of the corresponding current ETL task into unexecuted field in the basic information table of the ETL task.
The invention is further illustrated by the following examples.
An embodiment provides a method for managing ETL tasks, as shown in fig. 1, the scheme is as follows:
1. establishing a basic information table ETL _ jobof the ETL task, wherein the basic information table ETL _ jobcomprises the following fields: an ETL task name field ETL _ joba, a current operation state field status (where 1 represents operating and 0 represents not operating), and an operation and maintenance person in _ charge. Writing an ETL task record in the ETL _ jobtable: etl _ joba: job1, status: 0; aaa is the in _ charge.
2. Establishing an ETL task execution condition record table ETL _ joblog each time, wherein the ETL task execution condition record table ETL _ joblog each time comprises the following fields: an ETL task name ETL _ joba, an execution time modify _ time of each execution of the ETL task, a time parameter (fixed time intervals such as a month, a week, a day, an hour, a minute and the like are used as time units) tx _ date of the execution of the ETL task, an ETL task execution result log _ status (wherein 0 represents failure and 1 represents success) and remark remap;
3. when the ETL task is started, the time parameter of the current ETL task job1 is determined to be date1, the last time parameter of the joba 1 is determined to be date2, information of the joba 1 in the ETL _ jobtable is inquired by using an SQL inquiry command, and whether the current ETL task jobs 1 is executed or not is judged according to the content of the returned current running state field status.
4. If status is 1, it indicates that the current ETL task jobe 1 is executing, after waiting for 5 minutes, query information of jobe 1 in the ETL _ jobtable by using SQL again, and determine whether the current ETL task jobe 1 is executing according to the content of the returned current running status field status;
5. if the status is 0, indicating that the current ETL task job1 is not being executed, using SQL query command to query the last time parameter date2 of the jobb 1 in the ETL task execution case record table ETL _ jobstatus each time, and determining whether the last time parameter of the task is successfully executed according to the ETL task execution result log _ status;
6. if log _ status of ETL _ joblog is 0, which indicates that the execution result of a time parameter date2 on the current ETL task job1 is failure, a failure record is newly added in the record table ETL _ joblog of each execution case of ETL task by using an SQL write command, where the failure record includes: etl _ joba: job1, modify _ time: current time n1, tx _ date: date1, log _ status:0, remark: last job execution failure. And then exiting the whole task flow.
7. If the log _ status of the ETL _ joblog is 1, which indicates that the execution result of the previous time parameter date2 of the joba 1 is successful, the SQL write command is used to modify the current running state status field corresponding to the joba 1 in the basic information table ETL _ jobto 1 for execution, and then execute the ETL task corresponding to the joba 1.
8. If the execution result of the ETL task corresponding to the jobb 1 is successful, adding a successful record in an ETL task execution condition record table ETL _ joblog by using an SQL write command, wherein the successful record comprises ETL _ jobJob: job1, modification _ time: current time n2, tx _ date: date1, log _ status:1, and mark: ETL task execution success.
9. If the execution result of the ETL task corresponding to the jobb 1 is failure, adding a failure record in an ETL task execution case record table ETL _ joblog by using an SQL write command, wherein the failure record comprises ETL _ jobJob: job1, modification _ time: current time n3, tx _ date: date1, log _ status:0, and remark: ETL task execution failure.
10. And finally, modifying the current running state field corresponding to the joba 1 in the basic information table ETL _ jobof the ETL task to be 0 to be unexecuted by using an SQL write command.
Claims (2)
1. A method of managing ETL tasks, comprising the steps of:
establishing a basic information table and a recording table of each execution condition of the ETL task, and writing ETL task information to be executed into the basic information table; wherein, the basic information table comprises the following fields: the ETL task name, the current running state and the operation and maintenance responsible person, wherein the record table of the execution condition each time comprises the following fields: the ETL task name, the execution time of each execution of the ETL task, the time parameter of the execution of the ETL task, the execution result of the ETL task and remarks;
before starting the ETL task, inquiring field information corresponding to the current task in a basic information table, and judging whether the current ETL task is executed or not according to the returned field content of the current running state;
if the current ETL task is being executed, inquiring the information of the corresponding task in the basic information table again after waiting for a period of time; if the current ETL task is not in execution, inquiring an execution record corresponding to a last time parameter of the current ETL task in a record table of each execution condition, and judging whether the execution is successful or not when the last time parameter of the task is executed according to an inquiry result;
if the execution result of the current ETL task at the last time parameter is failure, adding a failure record in the execution condition record table of each time, wherein the failure record comprises the execution time of the time, the execution result of the time and first remark information, the first remark information is the execution failure of the task at the last time parameter, and then quitting the whole task flow; if the execution result of the current ETL task at the last time parameter is successful, modifying the current running state of the current ETL task in the basic information table to be executing, and then starting to execute the ETL task;
if the execution result is successful, adding a success record in the record table of each execution condition, wherein the achievement record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the current time and remark information II, and the remark information II is the success of the execution of the ETL task;
if the execution result is failure, adding a failure record in the record table of the execution condition of each time, wherein the failure record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the task of the current time and third remark information, and the third remark information is ETL task execution failure;
and after the current ETL task is executed, modifying the current running state field of the corresponding current ETL task into unexecuted field in the basic information table of the ETL task.
2. The method of claim 1, wherein the basic information table and the per-execution-case record table are queried by a SQL query command, and are modified by a SQL write command.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010750318.7A CN111881143A (en) | 2020-07-30 | 2020-07-30 | Method for managing ETL task |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010750318.7A CN111881143A (en) | 2020-07-30 | 2020-07-30 | Method for managing ETL task |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111881143A true CN111881143A (en) | 2020-11-03 |
Family
ID=73205569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010750318.7A Pending CN111881143A (en) | 2020-07-30 | 2020-07-30 | Method for managing ETL task |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111881143A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411599A (en) * | 2011-08-01 | 2012-04-11 | 中国民生银行股份有限公司 | Method for processing abnormal behaviors in data base and monitoring server |
CN102981904A (en) * | 2011-09-02 | 2013-03-20 | 阿里巴巴集团控股有限公司 | Task scheduling method and system |
US20170177309A1 (en) * | 2015-12-22 | 2017-06-22 | Opera Solutions U.S.A., Llc | System and Method for Rapid Development and Deployment of Reusable Analytic Code for Use in Computerized Data Modeling and Analysis |
CN110262945A (en) * | 2019-06-25 | 2019-09-20 | 苏宁消费金融有限公司 | A kind of method of intelligent monitoring data warehouse scheduling system |
-
2020
- 2020-07-30 CN CN202010750318.7A patent/CN111881143A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411599A (en) * | 2011-08-01 | 2012-04-11 | 中国民生银行股份有限公司 | Method for processing abnormal behaviors in data base and monitoring server |
CN102981904A (en) * | 2011-09-02 | 2013-03-20 | 阿里巴巴集团控股有限公司 | Task scheduling method and system |
US20170177309A1 (en) * | 2015-12-22 | 2017-06-22 | Opera Solutions U.S.A., Llc | System and Method for Rapid Development and Deployment of Reusable Analytic Code for Use in Computerized Data Modeling and Analysis |
CN110262945A (en) * | 2019-06-25 | 2019-09-20 | 苏宁消费金融有限公司 | A kind of method of intelligent monitoring data warehouse scheduling system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7898679B2 (en) | Method and system for scheduling jobs in a computer system | |
CN103034554B (en) | The ETL dispatching system that a kind of error correction is restarted and automatic decision starts and method | |
US7689989B2 (en) | Thread monitoring using shared memory | |
CN109271435B (en) | Data extraction method and system supporting breakpoint continuous transmission | |
US20120066684A1 (en) | Control server, virtual server distribution method | |
US20060143290A1 (en) | Session monitoring using shared memory | |
CN103677973A (en) | Distributed multi-task scheduling management system | |
US20100043004A1 (en) | Method and system for computer system diagnostic scheduling using service level objectives | |
US20120072582A1 (en) | Method, apparatus and program storage device for scheduling the performance of maintenance tasks to maintain a system environment | |
US20100023805A1 (en) | Method and system for disaster recovery based on journaling events pruning in a computing environment | |
WO2002029515A3 (en) | Workflow management software overview | |
JP4338126B2 (en) | Network system, server, device management method and program | |
CN102663543A (en) | Scheduling system used for enterprise data unification platform | |
US20040078628A1 (en) | Method, program and system for managing operation | |
KR100671146B1 (en) | Interface apparatus of Manufacturing Execution System | |
CN111881143A (en) | Method for managing ETL task | |
CN104778545A (en) | Work task management system based on mobile equipment | |
CN111124370A (en) | Data processing method and related equipment | |
US20060271929A1 (en) | Method and system for providing job forecasting in a computer system | |
CN111429114A (en) | Staff loss early warning pre-control mechanism system | |
JP2000155708A (en) | Automatic monitoring method for use state of journal file | |
KR101692964B1 (en) | Provisioning apparatus and method | |
CN113946543A (en) | Data archiving method, device, equipment and storage medium based on artificial intelligence | |
CN113868226A (en) | Database management method, database platform and readable storage medium | |
CN110888928B (en) | Visual control method based on ETL tool service assembly |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201103 |