CN111881143A - Method for managing ETL task - Google Patents

Method for managing ETL task Download PDF

Info

Publication number
CN111881143A
CN111881143A CN202010750318.7A CN202010750318A CN111881143A CN 111881143 A CN111881143 A CN 111881143A CN 202010750318 A CN202010750318 A CN 202010750318A CN 111881143 A CN111881143 A CN 111881143A
Authority
CN
China
Prior art keywords
execution
task
etl
current
etl task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010750318.7A
Other languages
Chinese (zh)
Inventor
龚世宇
梁敏
杜科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN202010750318.7A priority Critical patent/CN111881143A/en
Publication of CN111881143A publication Critical patent/CN111881143A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data warehouses and discloses a method for managing ETL tasks, which records and monitors the running state of the tasks, manages the ETL tasks, reduces the labor input and reduces the operation and maintenance cost. The invention establishes a basic information table and a record table of each execution condition of the ETL task, judges the execution state of the current ETL task and the latest execution result of the last time parameter of the current ETL task by using an SQL query instruction, and records the running state of the ETL task and the execution result of the ETL task by using an SQL write-in command record; the operation and maintenance personnel only need to inquire the basic information table of the ETL task and the record table of each execution condition of the ETL task, can visually and comprehensively know the execution condition of the ETL task, and do not need to check the log of each task one by one, so that the aims of reducing the labor input and reducing the operation and maintenance cost are fulfilled. The invention is suitable for ETL task management.

Description

Method for managing ETL task
Technical Field
The invention relates to the technical field of data warehouses, in particular to a method for managing ETL tasks.
Background
An ETL (Extract-Transform-Load) is a data warehouse technology, and a user can Extract required data from a data source through an ETL task, and Load the data to a destination (such as a predefined data warehouse) through data cleaning, inter-conversion, and other processes.
Currently in large data platforms, there are a large number of ETL timing tasks executing each day. The ETL timing task is managed, and whether the ETL task succeeds or not is judged mainly by checking a task execution log of each task, and then the failed task is rerun. In such a mode, operation and maintenance personnel are required to check the logs of each task, failed tasks need to be rerun in time, and effective monitoring is not carried out on some tasks which do not rerun after failure, so that data confusion and data loss can be caused in the past. The traditional method for managing the ETL task not only needs a great deal of manpower, but also is very easy to cause data loss. A method for managing an ETL task aims at the problems that the traditional ETL task lacks effective monitoring and consumes operation and maintenance manpower, and the operation execution state is monitored by automatically recording the execution result of each link of the ETL task, so that the management of the ETL task is realized, the manpower input is reduced, and the operation and maintenance cost is reduced.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for managing the ETL tasks is provided, the running states of the tasks are recorded and monitored, the ETL tasks are managed, the labor input is reduced, and the operation and maintenance cost is reduced.
In order to solve the problems, the invention adopts the technical scheme that: a method of managing ETL tasks, comprising the steps of:
establishing a basic information table and a recording table of each execution condition of the ETL task, and writing ETL task information to be executed into the basic information table; wherein, the basic information table comprises the following fields: the ETL task name, the current running state and the operation and maintenance responsible person, wherein the record table of the execution condition each time comprises the following fields: the ETL task name, the execution time of each execution of the ETL task, the time parameter of the execution of the ETL task, the execution result of the ETL task and remarks;
before starting the ETL task, inquiring field information corresponding to the current task in a basic information table, and judging whether the current ETL task is executed or not according to the returned field content of the current running state;
if the current ETL task is being executed, inquiring the information of the corresponding task in the basic information table again after waiting for a period of time; if the current ETL task is not in execution, inquiring an execution record corresponding to a last time parameter of the current ETL task in a record table of each execution condition, and judging whether the execution is successful or not when the last time parameter of the task is executed according to an inquiry result;
if the execution result of the current ETL task at the last time parameter is failure, adding a failure record in the execution condition record table of each time, wherein the failure record comprises the execution time of the time, the execution result of the time and first remark information, the first remark information is the execution failure of the task at the last time parameter, and then quitting the whole task flow; if the execution result of the current ETL task at the last time parameter is successful, modifying the current running state of the current ETL task in the basic information table to be executing, and then starting to execute the ETL task;
if the execution result is successful, adding a success record in the record table of each execution condition, wherein the achievement record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the current time and remark information II, and the remark information II is the success of the execution of the ETL task;
if the execution result is failure, adding a failure record in the record table of the execution condition of each time, wherein the failure record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the task of the current time and third remark information, and the third remark information is ETL task execution failure;
and after the current ETL task is executed, modifying the current running state field of the corresponding current ETL task into unexecuted field in the basic information table of the ETL task.
Furthermore, the basic information table and the record table of each execution condition can be inquired through an SQL inquiry command, and the basic information table and the record table of each execution condition can be modified through an SQL write-in command.
The invention has the beneficial effects that: the invention only needs to establish one basic information table and one record table of each execution condition, and records the operation condition of each ETL task by using the record table of each execution condition, and the operation and maintenance personnel only need to inquire the basic information table and the record table of each execution condition of the ETL task, so that the operation and maintenance personnel can intuitively and comprehensively know the execution condition of the ETL task without checking the logs of each task one by one, thereby achieving the aims of reducing the labor input and the operation and maintenance cost.
Drawings
FIG. 1 is an ETL task execution flow diagram of an embodiment.
Detailed Description
In order to record and monitor the task running state, realize the management of ETL task, reduce the human input, reduce the operation and maintenance cost, the invention provides a method for managing ETL task, comprising the following steps:
establishing a basic information table and a recording table of each execution condition of the ETL task, and writing ETL task information to be executed into the basic information table; wherein, the basic information table comprises the following fields: the ETL task name, the current running state and the operation and maintenance responsible person, wherein the record table of the execution condition each time comprises the following fields: the ETL task name, the execution time of each execution of the ETL task, the time parameter of the execution of the ETL task, the execution result of the ETL task and remarks;
before starting the ETL task, inquiring field information corresponding to the current task in a basic information table, and judging whether the current ETL task is executed or not according to the returned field content of the current running state;
if the current ETL task is being executed, inquiring the information of the corresponding task in the basic information table again after waiting for a period of time; if the current ETL task is not in execution, inquiring an execution record corresponding to a last time parameter of the current ETL task in a record table of each execution condition, and judging whether the execution is successful or not when the last time parameter of the task is executed according to an inquiry result;
if the execution result of the current ETL task at the last time parameter is failure, a failure record is newly added in a record table of each execution condition of the ETL task, wherein the failure record comprises the execution time of the current time, the execution result of the current time and remark information I, the remark information I is the execution failure of the task at the last time parameter, then the whole task flow is exited, and the task failure is returned;
if the execution result of the current ETL task at the last time parameter is successful, modifying the current running state of the current ETL task in the basic information table to be executing, and then starting to execute the ETL task;
if the execution result is successful, adding a success record in the record table of each execution condition of the ETL task, wherein the success record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the current time and remark information II, and the remark information II is the success of the execution of the ETL task;
if the execution result is failure, adding a failure record in a record table of each execution condition of the ETL task, wherein the failure record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the task of the current time and third remark information, and the third remark information is the execution failure of the ETL task;
and after the current ETL task is executed, modifying the current running state field of the corresponding current ETL task into unexecuted field in the basic information table of the ETL task.
The invention is further illustrated by the following examples.
An embodiment provides a method for managing ETL tasks, as shown in fig. 1, the scheme is as follows:
1. establishing a basic information table ETL _ jobof the ETL task, wherein the basic information table ETL _ jobcomprises the following fields: an ETL task name field ETL _ joba, a current operation state field status (where 1 represents operating and 0 represents not operating), and an operation and maintenance person in _ charge. Writing an ETL task record in the ETL _ jobtable: etl _ joba: job1, status: 0; aaa is the in _ charge.
2. Establishing an ETL task execution condition record table ETL _ joblog each time, wherein the ETL task execution condition record table ETL _ joblog each time comprises the following fields: an ETL task name ETL _ joba, an execution time modify _ time of each execution of the ETL task, a time parameter (fixed time intervals such as a month, a week, a day, an hour, a minute and the like are used as time units) tx _ date of the execution of the ETL task, an ETL task execution result log _ status (wherein 0 represents failure and 1 represents success) and remark remap;
3. when the ETL task is started, the time parameter of the current ETL task job1 is determined to be date1, the last time parameter of the joba 1 is determined to be date2, information of the joba 1 in the ETL _ jobtable is inquired by using an SQL inquiry command, and whether the current ETL task jobs 1 is executed or not is judged according to the content of the returned current running state field status.
4. If status is 1, it indicates that the current ETL task jobe 1 is executing, after waiting for 5 minutes, query information of jobe 1 in the ETL _ jobtable by using SQL again, and determine whether the current ETL task jobe 1 is executing according to the content of the returned current running status field status;
5. if the status is 0, indicating that the current ETL task job1 is not being executed, using SQL query command to query the last time parameter date2 of the jobb 1 in the ETL task execution case record table ETL _ jobstatus each time, and determining whether the last time parameter of the task is successfully executed according to the ETL task execution result log _ status;
6. if log _ status of ETL _ joblog is 0, which indicates that the execution result of a time parameter date2 on the current ETL task job1 is failure, a failure record is newly added in the record table ETL _ joblog of each execution case of ETL task by using an SQL write command, where the failure record includes: etl _ joba: job1, modify _ time: current time n1, tx _ date: date1, log _ status:0, remark: last job execution failure. And then exiting the whole task flow.
7. If the log _ status of the ETL _ joblog is 1, which indicates that the execution result of the previous time parameter date2 of the joba 1 is successful, the SQL write command is used to modify the current running state status field corresponding to the joba 1 in the basic information table ETL _ jobto 1 for execution, and then execute the ETL task corresponding to the joba 1.
8. If the execution result of the ETL task corresponding to the jobb 1 is successful, adding a successful record in an ETL task execution condition record table ETL _ joblog by using an SQL write command, wherein the successful record comprises ETL _ jobJob: job1, modification _ time: current time n2, tx _ date: date1, log _ status:1, and mark: ETL task execution success.
9. If the execution result of the ETL task corresponding to the jobb 1 is failure, adding a failure record in an ETL task execution case record table ETL _ joblog by using an SQL write command, wherein the failure record comprises ETL _ jobJob: job1, modification _ time: current time n3, tx _ date: date1, log _ status:0, and remark: ETL task execution failure.
10. And finally, modifying the current running state field corresponding to the joba 1 in the basic information table ETL _ jobof the ETL task to be 0 to be unexecuted by using an SQL write command.

Claims (2)

1. A method of managing ETL tasks, comprising the steps of:
establishing a basic information table and a recording table of each execution condition of the ETL task, and writing ETL task information to be executed into the basic information table; wherein, the basic information table comprises the following fields: the ETL task name, the current running state and the operation and maintenance responsible person, wherein the record table of the execution condition each time comprises the following fields: the ETL task name, the execution time of each execution of the ETL task, the time parameter of the execution of the ETL task, the execution result of the ETL task and remarks;
before starting the ETL task, inquiring field information corresponding to the current task in a basic information table, and judging whether the current ETL task is executed or not according to the returned field content of the current running state;
if the current ETL task is being executed, inquiring the information of the corresponding task in the basic information table again after waiting for a period of time; if the current ETL task is not in execution, inquiring an execution record corresponding to a last time parameter of the current ETL task in a record table of each execution condition, and judging whether the execution is successful or not when the last time parameter of the task is executed according to an inquiry result;
if the execution result of the current ETL task at the last time parameter is failure, adding a failure record in the execution condition record table of each time, wherein the failure record comprises the execution time of the time, the execution result of the time and first remark information, the first remark information is the execution failure of the task at the last time parameter, and then quitting the whole task flow; if the execution result of the current ETL task at the last time parameter is successful, modifying the current running state of the current ETL task in the basic information table to be executing, and then starting to execute the ETL task;
if the execution result is successful, adding a success record in the record table of each execution condition, wherein the achievement record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the current time and remark information II, and the remark information II is the success of the execution of the ETL task;
if the execution result is failure, adding a failure record in the record table of the execution condition of each time, wherein the failure record comprises the execution time of the current time, the time parameter of the execution of the task of the current time, the execution result of the task of the current time and third remark information, and the third remark information is ETL task execution failure;
and after the current ETL task is executed, modifying the current running state field of the corresponding current ETL task into unexecuted field in the basic information table of the ETL task.
2. The method of claim 1, wherein the basic information table and the per-execution-case record table are queried by a SQL query command, and are modified by a SQL write command.
CN202010750318.7A 2020-07-30 2020-07-30 Method for managing ETL task Pending CN111881143A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010750318.7A CN111881143A (en) 2020-07-30 2020-07-30 Method for managing ETL task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010750318.7A CN111881143A (en) 2020-07-30 2020-07-30 Method for managing ETL task

Publications (1)

Publication Number Publication Date
CN111881143A true CN111881143A (en) 2020-11-03

Family

ID=73205569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010750318.7A Pending CN111881143A (en) 2020-07-30 2020-07-30 Method for managing ETL task

Country Status (1)

Country Link
CN (1) CN111881143A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411599A (en) * 2011-08-01 2012-04-11 中国民生银行股份有限公司 Method for processing abnormal behaviors in data base and monitoring server
CN102981904A (en) * 2011-09-02 2013-03-20 阿里巴巴集团控股有限公司 Task scheduling method and system
US20170177309A1 (en) * 2015-12-22 2017-06-22 Opera Solutions U.S.A., Llc System and Method for Rapid Development and Deployment of Reusable Analytic Code for Use in Computerized Data Modeling and Analysis
CN110262945A (en) * 2019-06-25 2019-09-20 苏宁消费金融有限公司 A kind of method of intelligent monitoring data warehouse scheduling system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411599A (en) * 2011-08-01 2012-04-11 中国民生银行股份有限公司 Method for processing abnormal behaviors in data base and monitoring server
CN102981904A (en) * 2011-09-02 2013-03-20 阿里巴巴集团控股有限公司 Task scheduling method and system
US20170177309A1 (en) * 2015-12-22 2017-06-22 Opera Solutions U.S.A., Llc System and Method for Rapid Development and Deployment of Reusable Analytic Code for Use in Computerized Data Modeling and Analysis
CN110262945A (en) * 2019-06-25 2019-09-20 苏宁消费金融有限公司 A kind of method of intelligent monitoring data warehouse scheduling system

Similar Documents

Publication Publication Date Title
US7898679B2 (en) Method and system for scheduling jobs in a computer system
CN103034554B (en) The ETL dispatching system that a kind of error correction is restarted and automatic decision starts and method
US7689989B2 (en) Thread monitoring using shared memory
CN109271435B (en) Data extraction method and system supporting breakpoint continuous transmission
US20120066684A1 (en) Control server, virtual server distribution method
US20060143290A1 (en) Session monitoring using shared memory
CN103677973A (en) Distributed multi-task scheduling management system
US20100043004A1 (en) Method and system for computer system diagnostic scheduling using service level objectives
US20120072582A1 (en) Method, apparatus and program storage device for scheduling the performance of maintenance tasks to maintain a system environment
US20100023805A1 (en) Method and system for disaster recovery based on journaling events pruning in a computing environment
WO2002029515A3 (en) Workflow management software overview
JP4338126B2 (en) Network system, server, device management method and program
CN102663543A (en) Scheduling system used for enterprise data unification platform
US20040078628A1 (en) Method, program and system for managing operation
KR100671146B1 (en) Interface apparatus of Manufacturing Execution System
CN111881143A (en) Method for managing ETL task
CN104778545A (en) Work task management system based on mobile equipment
CN111124370A (en) Data processing method and related equipment
US20060271929A1 (en) Method and system for providing job forecasting in a computer system
CN111429114A (en) Staff loss early warning pre-control mechanism system
JP2000155708A (en) Automatic monitoring method for use state of journal file
KR101692964B1 (en) Provisioning apparatus and method
CN113946543A (en) Data archiving method, device, equipment and storage medium based on artificial intelligence
CN113868226A (en) Database management method, database platform and readable storage medium
CN110888928B (en) Visual control method based on ETL tool service assembly

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201103