WO2019223174A1 - 任务自动重跑方法、系统、计算机设备和存储介质 - Google Patents

任务自动重跑方法、系统、计算机设备和存储介质 Download PDF

Info

Publication number
WO2019223174A1
WO2019223174A1 PCT/CN2018/104367 CN2018104367W WO2019223174A1 WO 2019223174 A1 WO2019223174 A1 WO 2019223174A1 CN 2018104367 W CN2018104367 W CN 2018104367W WO 2019223174 A1 WO2019223174 A1 WO 2019223174A1
Authority
WO
WIPO (PCT)
Prior art keywords
error
polling
keywords
task
rerun
Prior art date
Application number
PCT/CN2018/104367
Other languages
English (en)
French (fr)
Inventor
刘斌
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019223174A1 publication Critical patent/WO2019223174A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of data processing technology, and in particular, to a method, a system, a computer device, and a storage medium for automatic task rerun.
  • BI Business Intelligence
  • ETL data warehouse technology
  • ETL tasks are scheduled using dedicated scheduling tools.
  • scheduling tools on the market schedule ELT tasks manual intervention is required to complete the re-run of error reporting tasks. In fact, for certain specific errors, it can be run through automatic rerun without the intervention of operation and maintenance or development engineers.
  • a method for automatic task re-run includes:
  • S1 Create an error keyword configuration table in the scheduling platform database, receive manually collected error keywords and the rerun range, number of reruns, and retry intervals corresponding to the error keywords, and configure them in the error keyword configuration table;
  • S2 Create a timing polling script, which periodically polls the error keywords recorded in the error log from the task running log table of the scheduling platform, and extracts the error keywords recorded in the error log from the log table;
  • An automatic task rerun system includes:
  • the configuration unit is set to select a scheduling platform, create an error keyword configuration table in the scheduling platform database, and receive manually collected error keywords and the rerun range, number of reruns, and retry intervals corresponding to the error keywords, and configure them to Error report keyword configuration table;
  • the timed polling script periodically polls the error keywords recorded in the error log from the task running log table of the scheduling platform, and removes the error keywords recorded in the error log from the log table. extract from;
  • the judging unit is set to modify the error status of the current task to ready after the extracted error keywords exist in the error keyword configuration table and the current task meets the rerun range, number of reruns, and retry interval corresponding to the error keywords. status.
  • a computer device includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to perform the following steps:
  • S1 Create an error report keyword configuration table in the scheduling platform database, receive manually collected error report keywords and the rerun range, number of reruns, and retry intervals corresponding to the error report keywords, and configure them in the error report keyword configuration table;
  • S2 Create a timing polling script, which periodically polls the error keywords recorded in the error log from the task running log table of the scheduling platform, and extracts the error keywords recorded in the error log from the log table;
  • a storage medium storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • S1 Create an error report keyword configuration table in the scheduling platform database, receive manually collected error report keywords and the rerun range, number of reruns, and retry intervals corresponding to the error report keywords, and configure them in the error report keyword configuration table;
  • S2 Create a timing polling script, which periodically polls the error keywords recorded in the error log from the task running log table of the scheduling platform, and extracts the error keywords recorded in the error log from the log table;
  • the method, device, computer equipment, and storage medium for the automatic rerun of the above task By creating an error report keyword configuration table in the scheduling platform database, manually configuring the error report keywords and the rerun range, number of reruns, and retry interval corresponding to the error report keywords.
  • the timing keywords are used to periodically poll the error key of the error log record from the task running log table of the scheduling platform, and the error key of the error log record is extracted from the log table, and then exists in the error key configuration table
  • the error-reporting keywords are extracted, after the current task meets the re-running range, number of re-runs, and re-running interval corresponding to the error-reporting keywords, the error-reporting status of the current task is changed to the ready state.
  • This technical solution automatically re-runs through the error keyword configuration table, which improves the operation efficiency of ETL tasks, saves operation and maintenance time and the time of manual intervention by development colleagues, and can also perform re-run ranges, re-runs, and re-run intervals.
  • the flexible configuration improves the competitiveness of the scheduling platform products.
  • FIG. 1 is a flowchart of a method for automatically re-running a task in an embodiment of the present application
  • FIG. 2 is a flowchart of step S1 in FIG. 1;
  • FIG. 3 is a flowchart of steps S2 and S3 in FIG. 1;
  • FIG. 4 is a structural diagram of a task automatic rerun system in an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a configuration unit in FIG. 4;
  • FIG. 6 is a schematic diagram of a module for extracting an error reporting keyword unit in FIG. 4;
  • FIG. 7 is a schematic block diagram of a determination unit in FIG. 4.
  • FIG. 1 is a flowchart of an automatic task rerun method in an embodiment of the present application. As shown in FIG. 1, an automatic task rerun method includes the following steps:
  • step S1 an error report keyword configuration table is created in the scheduling platform database, and the error report keywords and the rerun range, number of reruns, and retry intervals corresponding to the error report keywords manually collected are received and configured into the error report keyword configuration table.
  • the ETL task process of this embodiment uses the Datastage data scheduling platform as a mobilization tool.
  • the Datastage data scheduling platform is a data integration software platform that can help developers obtain more value from the complex heterogeneous information scattered in various systems. Because the Datastage data scheduling platform provides a graphical framework, you can use this framework to design and run jobs for transforming and cleaning data, enabling developers to understand, clean, transform, and deliver trusted, context-rich information.
  • step S2 a timing polling script is created.
  • the timing polling script periodically polls the error keywords recorded in the error log from the task running log table of the scheduling platform, and extracts the error keywords recorded in the error log from the log table.
  • the Datastage data scheduling platform generates task running log tables in real time during the execution of scheduling tasks.
  • the log records in the task running log table are error log records, and the error log records include error keywords.
  • the timing poll script of this embodiment uses oracle's job to execute tasks regularly. It is a timed task based on stored procedures. Using oracle's stored procedures can greatly reduce the workload of writing Java program code, and the stored procedures are executed in the database. In this way, you can take advantage of the good performance support of oracle to greatly improve the efficiency and stability of program execution. Scheduled execution of stored procedures, it is necessary to use Oracle's job to schedule tasks. After the polling parameters are set, the job can execute the task at a specified time point or at a certain time every day.
  • Tasks are periodically polled for error keywords from the task running log table of the Datastage data scheduling platform, and the error keywords are extracted from the log table, and the next analysis and judgment are performed separately.
  • step S3 after the extracted error reporting keywords exist in the error reporting keyword configuration table, and the previous task meets the rerun range, number of reruns, and retry interval corresponding to the error reporting keywords, the current task's error reporting status is changed to a ready state.
  • step S2 When judging the error report keywords extracted in step S2, there are four judgment processes: whether they exist in the error report keyword configuration table, whether they meet the rerun range, whether they meet the number of reruns, and whether they meet the rerun interval. After all the judgment processes are passed, the task will be reset, that is, the error log records in the current task running log table will be changed from the error status to the preparation status, and the preparation status log records will be automatically captured and executed again by the Datastage data scheduling platform.
  • the job scheduled to execute the task regularly runs the log table by polling the task periodically. Once the error report keywords recorded in the error report log and the error report keywords in the error report keyword configuration table match and meet the requirements, the The task will rerun automatically without manual intervention, which greatly improves the operating efficiency of the Datastage data scheduling platform.
  • step S1 may include the following specific steps:
  • a configuration input interface is set on the scheduling platform, and the configuration input interface is provided with field information including an error report key field, a rerun range field, a rerun number field, and a rerun interval field.
  • Error reporting keywords are generally collected from operation and maintenance or development engineers.
  • the error reporting keywords that can be re-run by the collection and scheduling platform are: ORA-00020, ORA-01555, ORA-03113, ORA-12170, Connect failed, etc.
  • the error reporting key corresponding to these error keywords is shown in Table 1 below:
  • ORA-00020 maximum number of processes (% s) exceed ORA-01555 Snapshot Too Old: rollback segment number number string
  • step S102 an error report keyword configuration table is created in the database of the scheduling platform, field information input in the configuration input interface is received, and the field information is stored in the error report keyword configuration table.
  • the error key configuration table is shown in Table 2 below:
  • the error key configuration table includes the error key, the rerun range, the number of reruns, and the rerun interval.
  • the rerun range there are many tasks running on the scheduling platform. Different types of tasks may have the same error keywords. For these error keywords, some task types require automatic reruns, and some types of task reruns are invalid. of.
  • the preset of the rerun range is to control which tasks need to be rerun and which tasks do not need to be rerun.
  • Number of re-runs The automatic re-run caused by the current task running error may continue to report errors, but in order to prevent infinite re-runs, a number of re-runs is set.
  • the number of re-runs represents the maximum number of automatic re-runs after the error keyword appears in the error.
  • the task will also stop retrying.
  • Re-run interval In some cases, a task error is not required to re-run immediately. Re-running after a period of time can improve the success rate of re-runs. For example, when a task is connected to a database, because each database can receive a limited number of connections, when the number of database connections is saturated, the task that connects to the database will not be able to connect due to insufficient access permissions, and an error will occur again. By setting a delay, that is, retrying the task after a certain period of time, you can effectively avoid connection conflicts with other tasks. At this time, the connection of some tasks has been released, thereby improving the success rate of this task. .
  • the re-run interval is preferably set to a time between one minute and twenty minutes.
  • the field information is entered through the configuration input interface, and the field information is configured in the error keyword configuration table.
  • the entry method is simple and convenient, and can achieve the purpose of automatic configuration. If you need to modify the configuration, you can directly modify the error keyword configuration table, which is flexible.
  • step S2 may include the following specific steps:
  • step S201 a scheduled polling script is created in the scheduling platform, and polling parameters including the polling time, the polling task name, and the number of error reporting runs are set.
  • dbms_job.submit (: job1, 'MYPROC;', trunc (sysdate + 1), 'sysdate + 1');-run from 12 tonight, run once a day thereafter
  • a polling parameter input interface may be set on the scheduling platform.
  • the polling parameter input interface is provided with field information including a polling time field and a polling task name field.
  • the scheduling platform receives the polling parameter input interface. Field information, which is stored in the polling parameters corresponding to the timing polling script.
  • field information is entered by setting a polling parameter input interface, and the field information is stored in the polling parameters.
  • the method of modifying the polling parameters is simple and convenient, and can achieve better human-machine interaction purposes.
  • Step S202 The timing polling script periodically polls the task running log table according to the polling parameters.
  • the scheduled polling script polls a task running log table corresponding to a preset polling task name according to a preset polling time. It can also be set according to other polling parameters, such as running once a day or once a day on a weekly basis.
  • the scheduled polling script will run on a preset polling time on a daily or weekly basis in the future. To perform the polling task to run the log table work.
  • step S203 the error reporting keywords recorded in the error log of the task running log table are extracted from the task running log table one by one.
  • the timing polling script will poll the task running log table from the beginning.
  • the task running log table there are several log records.
  • the error log records are recorded in a specific format.
  • the timing poll script polls the error log one by one.
  • the error keywords recorded in the error log are extracted and given to the next process.
  • step S3 may include the following specific steps:
  • Step S301 Compare the error reporting keywords extracted in step S2 with the error reporting keywords in the error reporting keyword configuration table in step S1. If the error reporting keywords extracted in step S2 exist in the error reporting keyword configuration table, proceed to the next step Otherwise, return to S203 to continue polling.
  • the first step in determining whether to rerun a task after reporting an error is to determine whether the error keywords for this task exist in the error keyword configuration table. If it does not exist, the task is not required to be rerun by default even if an error is reported Process, continue to poll the next error log record, and judge the error keywords of the next error log record again.
  • step S2 When the error reporting keywords extracted in step S2 are in the error reporting keyword configuration table, the next judgment is continued. For example, if the error reporting keyword extracted in step S2 is "ORA-12170", and the error reporting keyword configuration table is also configured with this error reporting keyword, it is considered that this task needs to be re-run and proceed to the next judgment.
  • Step S302 extracting the rerun range corresponding to the error keywords from the error report keyword configuration table, and determining whether the polling task name of the current task is within the rerun range. If it is within the rerun range, go to the next step, otherwise return to S203 Continue polling.
  • a rerun range is set.
  • an error report keyword configuration table is configured, a corresponding rerun range is set for any error report keyword, that is, a list of tasks to be rerun is listed.
  • the task name corresponding to the extracted error key is not in the rerun range, even if the task reports an error, it does not need to perform the rerun process. It continues to poll the next error log record and reruns for the next error key. The range is judged again.
  • Step S303 extracting the number of re-runs corresponding to the error-reporting keywords in the error-reporting keyword configuration table, and determining whether the number of re-running times of the current task is less than or equal to the number of re-runs. If the number of re-running times is not equal to the number of re-runs, proceed to the next step; Return to S203 to continue polling.
  • this embodiment sets the number of reruns.
  • a corresponding keyword is set for each error keyword.
  • Number of re-runs When the number of re-runs corresponding to the extracted error-reporting keywords is the same as the preset number of re-runs, by default, this task does not need to re-run even if an error is reported. Continue to poll the next error log record. The number of re-runs corresponding to an error key is judged again.
  • Step S304 extracting the retry interval corresponding to the error reporting keywords in the error report keyword configuration table, and determining whether the polling time of the current task plus the preset time of the rerun interval is the current time of the scheduling platform, and if it is the current time, enter the next One step, otherwise the delay goes to the next step, the delay time is the polling time plus the preset time of the retry interval minus the current time.
  • a rerun failure may occur.
  • the task corresponding to the error keyword "ORA-12170" immediately reruns, then the corresponding task is connected to the database.
  • the number of database connections is saturated Status, the task of connecting to the database cannot be connected due to insufficient access rights, and an error occurs again, which is recorded as an error log again. Therefore, in this embodiment, the number of retries is set.
  • a corresponding retry interval is set for any error keyword, and a delayed retry process is performed for tasks requiring reruns.
  • step S305 the error log record in the current task running log table is changed from the error status to the preparation status, and the number of error running times of the current task is increased by one.
  • Existing scheduling platforms generally automatically capture tasks that need to be executed according to the status of the log records. For example, when the status in the log records is in the ready state, the scheduler of the scheduling platform automatically grabs tasks for execution again. Therefore, in this embodiment, only the error log record in the task running log table needs to be changed from the error status to the ready status to implement the automatic rerun function without manual intervention.
  • the number of re-runs is set in this embodiment, after the current task is re-run, the number of error-reporting operations corresponding to the error-reporting keyword of the current task is added with a mark, which means that the next time the task fails and the judgment is repeated when the error is reported again. Run as a basis.
  • This embodiment determines the tasks that can be re-run by sequentially judging the error keywords, the re-run range, the number of re-runs, and the re-run interval. By modifying the task status, the automatic re-run function is implemented without manual intervention, which improves the scheduling platform. Operating efficiency.
  • an automatic task re-run system includes a configuration unit configured to select a scheduling platform, create an error key configuration table in the scheduling platform database, and receive manual collection.
  • the error report keywords and the rerun range, number of reruns, and retry interval corresponding to the error report keywords are configured in the error report keyword configuration table.
  • the timed polling script periodically polls the error keywords recorded in the error log from the task running log table of the scheduling platform, and removes the error keywords recorded in the error log from the log table. extract from.
  • the judging unit is set to modify the error status of the current task to ready after the extracted error keywords exist in the error keyword configuration table and the current task meets the rerun range, number of reruns, and retry interval corresponding to the error keywords. status.
  • the configuration unit includes: an input module configured to set a configuration input interface on the scheduling platform.
  • the configuration input interface includes an error report key field, a rerun range field, and a rerun. Field information for the number of times field and the retry interval field.
  • the receiving storage module is configured to create an error reporting keyword configuration table in a database of the scheduling platform, receive field information entered in the configuration input interface, and store the field information in the error reporting keyword configuration table.
  • the unit for extracting the error report keywords includes: a creation module configured to create a regular polling script in the scheduling platform, and setting including a polling time, a polling task name, and the number of times of error reporting.
  • the polling parameters included; the polling module is set to timed polling scripts to periodically poll the task run log table based on the polling parameters;
  • the extraction module is set to set the error reporting keywords of the error log records in the task running log table one by one from the task Extract from the run log table.
  • the error report keyword extraction unit further includes: setting a polling parameter input interface module, configured to set a polling parameter input interface on the scheduling platform, and the polling parameter input interface includes a polling time field, The field information of the polling task name field; the storage polling parameter module is set to schedule the platform to receive the field information input in the polling parameter input interface, and stores the field information in the polling parameters corresponding to the timing polling script.
  • the judging unit includes an error reporting keyword comparison module configured to compare the error reporting keywords extracted by the error reporting keyword unit with the error reporting keywords in the error reporting keyword configuration table. If there are error reporting keywords extracted by the error reporting keyword unit in the error reporting keyword configuration table, go to the next step, otherwise return to the extraction module to continue polling.
  • the rerun range judgment module is set to extract the rerun range corresponding to the error keywords in the error report keyword configuration table, and determine whether the polling task name of the current task is within the rerun range. If it is within the rerun range, enter the next Step, otherwise return to the extraction module to continue polling.
  • the number of re-runs judgment module is set to extract the number of re-runs corresponding to the error-report keywords in the error-key configuration table to determine whether the number of re-runs of the current task is less than or equal to the number of re-runs. Go to the next step, otherwise return to the extraction module to continue polling.
  • the rerun interval judgment module is configured to extract the rerun interval corresponding to the error keyword in the error report keyword configuration table, and determine whether the polling time of the current task plus the preset time of the rerun interval is the current time of the scheduling platform, if it is the current time Time, go to the next step; otherwise, go to the next step.
  • the delay time is the polling time plus the preset time of the retry interval minus the current time.
  • the state modification module is configured to modify the error log records in the current task running log table from the error state to the preparation state, and increase the number of error running times of the current task by one.
  • the re-run interval is a time between one minute and twenty minutes.
  • a computer device which includes a memory and a processor.
  • the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor is caused to implement the foregoing when the computer-readable instructions are executed.
  • a storage medium storing computer-readable instructions.
  • the one or more processors are caused to automatically perform tasks in the foregoing embodiments. Run the steps in the method.
  • the storage medium may be a non-volatile storage medium.
  • the program may be stored in a computer-readable storage medium.
  • the storage medium may include: Read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc.

Abstract

一种任务自动重跑方法、系统、计算机设备和存储介质,涉及数据处理技术领域。任务重跑方法包括:在调度平台数据库中创建报错关键字配置表,接收报错关键字及对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中;创建定时轮询脚本,定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错关键字从日志表中提取;在报错关键字配置表中存在提取的报错关键字,且当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。该方法通过报错关键字配置表进行自动重跑,提高了ETL任务的运行效率,节省了运维时间和开发同事人工介入处理的时间。

Description

任务自动重跑方法、系统、计算机设备和存储介质
本申请要求于2018年05月21日提交中国专利局、申请号为201810486865.1、发明名称为“任务自动重跑方法、系统、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种任务自动重跑方法、系统、计算机设备和存储介质。
背景技术
BI(商务智能)是从许多来自不同的企业运作系统的数据中提取出有用的数据并进行清理,以保证数据的正确性,然后经过抽取(Extraction)、转换(Transformation)和装载(Load),即ETL(数据仓库技术)过程,合并到一个企业级的数据仓库里,从而得到企业数据的一个全局视图,在此基础上利用合适的查询和分析工具、数据挖掘工具、OLAP工具等对其进行分析和处理,最后将知识呈现给管理者,为管理者的决策过程提供数据支持。
在一些较大规模的BI项目中,ETL任务的调度均采取专用调度工具进行调度,目前市场上调度工具在调度ELT任务时,对于报错的任务,需要人工干预才能完成报错任务的重跑。而事实上对于某些特定的报错,通过自动重跑即可运行通过,无需运维或开发工程师的介入。
现有技术中调度平台在进行任务重跑时人工介入的缺陷主要有:
1.增加了运维或开发工程师的处理时间成本;
2.影响了ETL任务的运行效率。
发明内容
有鉴于此,有必要针对现有技术中调度平台在进行任务重跑时人工介入的缺陷,提供一种任务自动重跑方法、系统、计算机设备和存储介质。
一种任务自动重跑方法,包括:
S1:在调度平台数据库中创建报错关键字配置表,接收人工收集的报错关 键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中;
S2:创建定时轮询脚本,定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取;
S3:在报错关键字配置表中存在提取的报错关键字,且当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。
一种任务自动重跑系统,包括:
配置单元,设置为选择一调度平台,在调度平台数据库中创建报错关键字配置表,接收人工收集的报错关键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中;
提取报错关键字单元,设置为创建定时轮询脚本,定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取;
判断单元,设置为在报错关键字配置表中存在提取的报错关键字,且当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
S1:在调度平台数据库中创建报错关键字配置表,接收人工收集的报错关键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中;
S2:创建定时轮询脚本,定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取;
S3:在报错关键字配置表中存在提取的报错关键字,且当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。
一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
S1:在调度平台数据库中创建报错关键字配置表,接收人工收集的报错关键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中;
S2:创建定时轮询脚本,定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取;
S3:在报错关键字配置表中存在提取的报错关键字,且当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。
上述任务自动重跑方法、装置、计算机设备和存储介质,通过在调度平台数据库中创建报错关键字配置表,人工配置报错关键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,通过定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取后,在报错关键字配置表中存在提取的报错关键字时,当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。本技术方案通过报错关键字配置表进行自动重跑,提高了ETL任务的运行效率,节省了运维时间和开发同事人工介入处理的时间,还可进行重跑范围、重跑次数、重跑间隔的灵活配置,提升了调度平台产品的竞争力。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。
图1为本申请一个实施例中的任务自动重跑方法的流程图;
图2为图1中步骤S1的流程图;
图3为图1中步骤S2和步骤S3的流程图;
图4为本申请一个实施例中的任务自动重跑系统的结构图;
图5为图4中的配置单元的模块示意图;
图6为图4中的提取报错关键字单元的模块示意图;
图7为图4中的判断单元的模块示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。
图1为本申请一个实施例中的任务自动重跑方法的流程图,如图1所示,一种任务自动重跑方法,包括以下步骤:
步骤S1,在调度平台数据库中创建报错关键字配置表,接收人工收集的报错关键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中。
本实施例的ETL任务过程采用Datastage数据调度平台作为调动工具,Datastage数据调度平台是一种数据集成软件平台,能够帮助开发者从散布在各个系统中的复杂异构信息获得更多价值。由于Datastage数据调度平台提供了图形框架,可使用该框架来设计和运行用于变换和清理数据的作业,使开发者能够了解、清理、变换和交付值得信赖且上下文丰富的信息。
步骤S2,创建定时轮询脚本,定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取。
Datastage数据调度平台在进行调度任务运行过程中,会实时产生任务运行日志表,当运行出现错误时,任务运行日志表中的日志记录为报错日志记录,报错日志记录中包括报错关键字。
本实施例的定时轮询脚本采用oracle的job定时执行任务,它是一种基于 存储过程的定时任务,使用oracle的存储过程,可以大大减少java程序代码的编写工作量,而且存储过程执行在数据库上,这样可以利用oracle的良好性能支持,极大地提高程序执行效率和稳定性。定时执行存储过程,就要用到oracle的job定时执行任务。在设定好轮询参数后,job定时执行任务可以在指定的时间点或每天的某个时间点自行执行任务。
job定时执行任务定时从Datastage数据调度平台的任务运行日志表中轮询报错关键字,并将报错关键字从日志表中提取出来,单独进行下一步分析判断。
步骤S3,在报错关键字配置表中存在提取的报错关键字,且前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。
在对步骤S2提取的报错关键字进行判断时,分别为四个判断过程:是否存在于报错关键字配置表中,是否符合重跑范围,是否符合重跑次数,是否符合重跑间隔,当四个判断过程均通过后,将重置任务,即将当前任务运行日志表中的报错日志记录,从报错状态修改为准备状态,准备状态的日志记录会被Datastage数据调度平台再次自动抓取执行。
本实施例根据报错关键字配置表,job定时执行任务通过定时轮询任务运行日志表,一旦发现报错日志记录的报错关键字与报错关键字配置表中的报错关键字匹配且符合要求后,该任务就会自动重跑,无需人工干预,大大提高了Datastage数据调度平台的运行效率。
在一个实施例中,如图2所示,步骤S1可包括如下具体步骤:
步骤S101,在调度平台上设置一配置输入界面,配置输入界面中设有包括报错关键字字段、重跑范围字段、重跑次数字段和重跑间隔字段的字段信息。
报错关键字一般是从运维或开发工程师处进行收集,收集调度平台可以重跑通过的报错关键字,如Datastage数据调度平台的需要重跑的报错关键字有:ORA-00020、ORA-01555、ORA-03113、ORA-12170和Connect failed等,这些报错关键字对应的报错关键在代码如下表1所示:
报错关键字 报错关键字代码
ORA-00020 ORA-00020:maximum number of processes(%s)exceeded
ORA-01555 Snapshot too old:rollback segment number string
  with name
ORA-03113 End-of-file on communication channel
ORA-12170 TNS:connect timeout occurred
Connect failed Connect failed
表1
通过配置输入界面将这些收集的可以重跑通过的报错关键字一一进行录入,并根据开发经验设置报错关键字对应的重跑范围、重跑次数和重跑间隔,得到字段信息。
步骤S102,在调度平台的数据库中创建报错关键字配置表,接收配置输入界面中输入的字段信息,将字段信息存储在报错关键字配置表中。
报错关键字配置表如下表2所示:
Figure PCTCN2018104367-appb-000001
表2
报错关键字配置表包括报错关键字、重跑范围、重跑次数和重跑间隔。其中,重跑范围:在调度平台上有众多任务在运行,不同类型的任务可能具有相同的报错关键字,对于这些报错关键字,有些任务类型需要自动重跑,有些类型的任务重跑是无效的。重跑范围的预设就是来控制哪些任务需要重跑,哪些任务不需要重跑。
重跑次数:当前任务运行报错引起的自动重跑也可能继续报错,但为了防止无限重跑,所以设置了重跑次数。重跑次数代表报错中出现这个报错关键字后,自动重跑的最大次数。当然,在没有到达最大重跑次数之前如果重跑成功了,该任务也会停止重跑。
重跑间隔:在某些情况下,任务报错了并非需要马上重跑,隔一段时间再 重跑能提高重跑的成功率。例如:在一个任务在连接数据库时,由于每一个数据库能接收的连接数有限,当数据库连接数饱和之后,再连接数据库的任务就会由于访问权限不足而无法连接,从而又出现报错,此时,通过设置延时,即隔一段期间再去重跑任务,则可有效的避免与其他任务的连接冲突,此时部分任务的连接已经被被释放,从而提高了此次任务重跑的成功率。
具体的,重跑间隔优选设置为一分钟至二十分钟之间的时间。在设置时间间隔时,可以根据每个任务现状不同,设定不同的重跑间隔,以便于即能提高重跑的成功率,又能提高调度平台的调度效率。
本实施例通过配置输入界面录入字段信息,并将字段信息配置于报错关键字配置表中,录入方式简单方便,能实现自动配置目的。如果需要修改配置时,直接修改报错关键字配置表即可,修改灵活。
在一个实施例中,如图3所示,步骤S2可包括如下具体步骤:
步骤S201,在调度平台中创建定时轮询脚本,并设定包括轮询时间、轮询任务名称及报错运行次数在内的轮询参数。
定时轮询脚本采用oracle的job定时执行任务时,job定时执行任务需要依托存储过程,需要先创建存储过程,然后创建job,如创建MYPROC存储过程可以采用如下:
create or replace procedure MYPROC as
begin
insert into TEST values(sysdate);
end;
variable job1number;
begin
dbms_job.submit(:job1,'MYPROC;',trunc(sysdate+1),'sysdate+1');--从今晚12开始运行,以后每天运行一次
end;
在创建job定时执行任务时,应明确运行的开始时间,即轮询时间,需要执行任务的PL/SQL块,即轮询任务名称,还设报错运行次数,对每次重跑的任务进行计数,每个任务报错运行次数的初始值均为0。
具体的,可以在调度平台上设置一轮询参数输入界面,轮询参数输入界面 中设有包括轮询时间字段、轮询任务名称字段的字段信息,调度平台接收轮询参数输入界面中输入的字段信息,将字段信息存储在定时轮询脚本对应的轮询参数中。
本步骤通过设置轮询参数输入界面录入字段信息,并将字段信息存储于轮询参数中,修改轮询参数的方式简单方便,能实现较好的人机互动目的。
步骤S202,定时轮询脚本根据轮询参数定时轮询任务运行日志表。
定时轮询脚本根据预设的轮询时间,对预设的轮询任务名称对应的任务运行日志表进行轮询。还可以根据其他轮询参数,如每天运行一次或每周的某一天运行一次等参数设置,则定时轮询脚本在以后的每天或每周的某一个天,到预设的轮询时间开始运行,进行轮询任务运行日志表工作。
步骤S203,逐个将任务运行日志表中报错日志记录的报错关键字从任务运行日志表中进行提取。
定时轮询脚本会从头开始轮询任务运行日志表,在任务运行日志表中,存在有若干条日志记录,通常报错日志记录采用特定格式记录,定时轮询脚本逐个轮询到报错日志,并对报错日志记录的报错关键字进行提取,给下一步工序。
本实施例通过创建定时轮询脚本,设定轮询参数,定时且有针对性的对特定任务运行日志表进行轮询,能实现自动轮询目的,无需运维或开发工程师经常人工介入,能大大减少处理时间。
在一个实施例中,如图3所示,步骤S3可包括如下具体步骤:
步骤S301,将步骤S2提取的报错关键字与步骤S1中的报错关键字配置表中的报错关键字进行比对,若报错关键字配置表中存在步骤S2提取的报错关键字,则进入下一步,否则返回S203继续轮询。
在判断某一任务报错后是否进行重跑的第一步是,判断此任务的报错关键字是否在报错关键字配置表中存在,如果不存在,则默认此任务即使报错,也无需进行重跑工序,继续轮询下一个报错日志记录,对下一个报错日志记录的报错关键字再次判断。
当步骤S2提取的报错关键字在报错关键字配置表中,则会继续进行下一步判断。比如,步骤S2提取的报错关键字为“ORA-12170”,报错关键字配置表中也配置了此报错关键字,则认为此任务需要重跑,继续进入下一步判断。
步骤S302,提取报错关键字配置表中与报错关键字对应的重跑范围,判断 当前任务的轮询任务名称是否在重跑范围内,若是在重跑范围内,则进入下一步,否则返回S203继续轮询。
不同的任务在报错时,可能会产生相同的报错关键字,比如不同任务都可能存在连接数据库失败的报错,就会产生相同的报错关键字“ORA-12170”,有些任务不需要重新连接数据库,而有些任务需再次连接。因此本实施例设置重跑范围,在配置报错关键字配置表时,针对任一一个报错关键字都设置对应的重跑范围,即罗列出需要重跑的任务名称。当提取的报错关键字对应的任务名称不在重跑范围内时,则默认此任务即使报错,也无需进行重跑工序,继续轮询下一个报错日志记录,对下一个报错关键字对应的重跑范围再次判断。
当提取的报错关键字对应的任务名称在重跑范围中时,则认为此任务需要重跑,继续进入下一步判断。
步骤S303,提取报错关键字配置表中与报错关键字对应的重跑次数,判断当前任务的报错运行次数是否小于等于重跑次数,若报错运行次数小于等于重跑次数,则进入下一步,否则返回S203继续轮询。
当某一任务进行重跑后,可能还会报错,为了防止无限重跑,因此本实施例设置重跑次数,在配置报错关键字配置表时,针对任一一个报错关键字都设置对应的重跑次数,当提取的报错关键字对应的报错运行次数与预设的重跑次数相同时,则默认此任务即使报错,也无需进行重跑工序,继续轮询下一个报错日志记录,对下一个报错关键字对应的重跑次数再次判断。
当提取的报错关键字对应的报错运行次数没有达到预设的重跑次数时,则认为此任务需要重跑,继续进入下一步判断。
步骤S304,提取报错关键字配置表中与报错关键字对应的重跑间隔,判断当前任务的轮询时间加上重跑间隔预设的时间是否是调度平台当前时间,若是当前时间,则进入下一步,否则延时进入下一步,延时时间为轮询时间加上重跑间隔预设的时间减去当前时间。
当某一任务马上进行重跑工序,可能会出现重跑失败,比如报错关键字“ORA-12170”对应的任务马上进行重跑,则对应任务去连接数据库,此时,如果数据库连接数是饱和状态,则连接数据库的任务由于访问权限不足无法连接,从而又出现报错,再次记录为错误日志。因此本实施例设置重跑次数,在配置报错关键字配置表时,针对任一一个报错关键字都设置对应的重跑间隔,对需 要重跑的任务进行延时重跑工序。
步骤S305,将当前任务运行日志表中的报错日志记录从报错状态修改为准备状态,并将当前任务的报错运行次数加一。
现有的调度平台一般都会根据日志记录的状态,自动抓取需要执行的任务,比如日志记录中的状态为准备状态时,调度平台的调度程序自动再次抓取任务进行执行。因此本实施例只需将任务运行日志表中的报错日志记录从报错状态修改为准备状态即可实现自动重跑功能,无需人工介入。
另外,由于本实施例设置重跑次数,因此在当前任务进行重跑后,将当前任务的报错关键字对应的报错运行次数加一标记,为下一次此任务运行失败,再次报错时的判断重跑次数做依据。
本实施例通过报错关键字、重跑范围、重跑次数、重跑间隔的依次判断,确定可以重跑的任务,通过修改任务状态,实现自动重跑的功能,无需人工干预,提高了调度平台的运行效率。
在一个实施例中,提出了一种任务自动重跑系统,如图4所示,其包括:配置单元,设置为选择一调度平台,在调度平台数据库中创建报错关键字配置表,接收人工收集的报错关键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中。
提取报错关键字单元,设置为创建定时轮询脚本,定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取。
判断单元,设置为在报错关键字配置表中存在提取的报错关键字,且当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。
在一个实施例中,如图5所示,配置单元包括:输入模块,设置为在调度平台上设置一配置输入界面,配置输入界面中设有包括报错关键字字段、重跑范围字段、重跑次数字段和重跑间隔字段的字段信息。接收存储模块,设置为在调度平台的数据库中创建报错关键字配置表,接收配置输入界面中输入的字段信息,将字段信息存储在报错关键字配置表中。
在一个实施例中,如图6所示,提取报错关键字单元包括:创建模块,设置为在调度平台中创建定时轮询脚本,并设定包括轮询时间、轮询任务名称及 报错运行次数在内的轮询参数;轮询模块,设置为定时轮询脚本根据轮询参数定时轮询任务运行日志表;提取模块,设置为逐个将任务运行日志表中报错日志记录的报错关键字从任务运行日志表中进行提取。
在一个实施例中,提取报错关键字单元还包括:设置轮询参数输入界面模块,设置为在调度平台上设置一轮询参数输入界面,轮询参数输入界面中设有包括轮询时间字段、轮询任务名称字段的字段信息;存储轮询参数模块,设置为调度平台接收轮询参数输入界面中输入的字段信息,将字段信息存储在定时轮询脚本对应的轮询参数中。
在一个实施例中,如图7所示,判断单元包括:报错关键字比对模块,设置为将提取报错关键字单元提取的报错关键字与报错关键字配置表中的报错关键字进行比对,若报错关键字配置表中存在提取报错关键字单元提取的报错关键字,则进入下一步,否则返回提取模块继续轮询。重跑范围判断模块,设置为提取报错关键字配置表中与报错关键字对应的重跑范围,判断当前任务的轮询任务名称是否在重跑范围内,若是在重跑范围内,则进入下一步,否则返回提取模块继续轮询。重跑次数判断模块,设置为提取报错关键字配置表中与报错关键字对应的重跑次数,判断当前任务的报错运行次数是否小于等于重跑次数,若报错运行次数小于等于重跑次数,则进入下一步,否则返回提取模块继续轮询。重跑间隔判断模块,设置为提取报错关键字配置表中与报错关键字对应的重跑间隔,判断当前任务的轮询时间加上重跑间隔预设的时间是否是调度平台当前时间,若是当前时间,则进入下一步,否则延时进入下一步,延时时间为轮询时间加上重跑间隔预设的时间减去当前时间。状态修改模块,设置为将当前任务运行日志表中的报错日志记录从报错状态修改为准备状态,并将当前任务的报错运行次数加一。
在一个实施例中,重跑间隔为一分钟至二十分钟之间的时间。
在一个实施例中,提出了一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行计算机可读指令时实现上述各实施例里任务自动重跑方法中的步骤。
在一个实施例中,提出了一种存储有计算机可读指令的存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各实施例里任务自动重跑方法中的步骤。其中,存储介质可以为非易失性存储介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请一些示例性实施例,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种任务自动重跑方法,包括:
    S1:在调度平台数据库中创建报错关键字配置表,接收人工收集的报错关键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中;
    S2:创建定时轮询脚本,定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取;
    S3:在报错关键字配置表中存在提取的报错关键字,且当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。
  2. 根据权利要求1所述的任务自动重跑方法,其中,所述S1包括:
    S101:在调度平台上设置一配置输入界面,配置输入界面中设有包括报错关键字字段、重跑范围字段、重跑次数字段和重跑间隔字段的字段信息;
    S102:在调度平台的数据库中创建报错关键字配置表,接收配置输入界面中输入的字段信息,将字段信息存储在报错关键字配置表中。
  3. 根据权利要求1所述的任务自动重跑方法,其中,所述S2包括:
    S201:在调度平台中创建定时轮询脚本,并设定包括轮询时间、轮询任务名称及报错运行次数在内的轮询参数;
    S202:定时轮询脚本根据轮询参数定时轮询任务运行日志表;
    S203:逐个将任务运行日志表中报错日志记录的报错关键字从任务运行日志表中进行提取。
  4. 根据权利要求3所述的任务自动重跑方法,其中,所述S3包括:
    S301:将步骤S2提取的报错关键字与步骤S1中的报错关键字配置表中的报错关键字进行比对,若报错关键字配置表中存在步骤S2提取的报错关键字,则进入下一步,否则返回S203继续轮询;
    S302:提取报错关键字配置表中与报错关键字对应的重跑范围,判断当前任务的轮询任务名称是否在重跑范围内,若是在重跑范围内,则进入下一步,否则返回S203继续轮询;
    S303:提取报错关键字配置表中与报错关键字对应的重跑次数,判断当前任务的报错运行次数是否小于等于重跑次数,若报错运行次数小于等于重跑次 数,则进入下一步,否则返回S203继续轮询;
    S304:提取报错关键字配置表中与报错关键字对应的重跑间隔,判断当前任务的轮询时间加上重跑间隔预设的时间是否是调度平台当前时间,若是当前时间,则进入下一步,否则延时进入下一步,延时时间为轮询时间加上重跑间隔预设的时间减去当前时间;
    S305:将当前任务运行日志表中的报错日志记录从报错状态修改为准备状态,并将当前任务的报错运行次数加一。
  5. 根据权利要求3所述的任务自动重跑方法,其中,在调度平台上设置一轮询参数输入界面,轮询参数输入界面中设有包括轮询时间字段、轮询任务名称字段的字段信息;
    调度平台接收轮询参数输入界面中输入的字段信息,将字段信息存储在定时轮询脚本对应的轮询参数中。
  6. 根据权利要求1所述的任务自动重跑方法,其中,所述重跑间隔为一分钟至二十分钟之间的时间。
  7. 一种任务自动重跑系统,包括:
    配置单元,设置为选择一调度平台,在调度平台数据库中创建报错关键字配置表,接收人工收集的报错关键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中;
    提取报错关键字单元,设置为创建定时轮询脚本,定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取;
    判断单元,设置为在报错关键字配置表中存在提取的报错关键字,且当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。
  8. 根据权利要求7所述的任务自动重跑系统,其中,所述配置单元包括:
    设置配置输入界面模块,设置为在调度平台上设置一配置输入界面,配置输入界面中设有包括报错关键字字段、重跑范围字段、重跑次数字段和重跑间隔字段的字段信息;
    接收并存储字段信息模块,设置为在调度平台的数据库中创建报错关键字配置表,接收配置输入界面中输入的字段信息,将字段信息存储在报错关键字 配置表中。
  9. 根据权利要求7所述的任务自动重跑系统,其中,所述提取报错关键字单元包括:
    创建模块,设置为在调度平台中创建定时轮询脚本,并设定包括轮询时间、轮询任务名称及报错运行次数在内的轮询参数;
    轮询模块,设置为定时轮询脚本根据轮询参数定时轮询任务运行日志表;
    提取模块,设置为逐个将任务运行日志表中报错日志记录的报错关键字从任务运行日志表中进行提取。
  10. 根据权利要求7所述的任务自动重跑系统,其中,所述判断单元包括:
    报错关键字比对模块,设置为将所述提取报错关键字单元提取的报错关键字与报错关键字配置表中的报错关键字进行比对,若报错关键字配置表中存在所述提取报错关键字单元提取的报错关键字,则进入下一步,否则返回所述提取模块继续轮询;
    重跑范围判断模块,设置为提取报错关键字配置表中与报错关键字对应的重跑范围,判断当前任务的轮询任务名称是否在重跑范围内,若是在重跑范围内,则进入下一步,否则返回所述提取模块继续轮询;
    重跑次数判断模块,设置为提取报错关键字配置表中与报错关键字对应的重跑次数,判断当前任务的报错运行次数是否小于等于重跑次数,若报错运行次数小于等于重跑次数,则进入下一步,否则返回所述提取模块继续轮询;
    重跑间隔判断模块,设置为提取报错关键字配置表中与报错关键字对应的重跑间隔,判断当前任务的轮询时间加上重跑间隔预设的时间是否是调度平台当前时间,若是当前时间,则进入下一步,否则延时进入下一步,延时时间为轮询时间加上重跑间隔预设的时间减去当前时间;
    状态修改模块,设置为将当前任务运行日志表中的报错日志记录从报错状态修改为准备状态,并将当前任务的报错运行次数加一。
  11. 根据权利要求9所述的任务自动重跑系统,其中,所述提取报错关键字单元还包括:
    设置轮询参数输入界面模块,设置为在调度平台上设置一轮询参数输入界面,轮询参数输入界面中设有包括轮询时间字段、轮询任务名称字段的字段信息;
    存储轮询参数模块,设置为调度平台接收轮询参数输入界面中输入的字段信息,将字段信息存储在定时轮询脚本对应的轮询参数中。
  12. 根据权利要求7所述的任务自动重跑系统,其中,所述重跑间隔为一分钟至二十分钟之间的时间。
  13. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    S1:在调度平台数据库中创建报错关键字配置表,接收人工收集的报错关键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中;
    S2:创建定时轮询脚本,定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取;
    S3:在报错关键字配置表中存在提取的报错关键字,且当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。
  14. 根据权利要求13所述的计算机设备,其中,所述S1,使得所述处理器执行以下步骤:
    S101:在调度平台上设置一配置输入界面,配置输入界面中设有包括报错关键字字段、重跑范围字段、重跑次数字段和重跑间隔字段的字段信息;
    S102:在调度平台的数据库中创建报错关键字配置表,接收配置输入界面中输入的字段信息,将字段信息存储在报错关键字配置表中。
  15. 根据权利要求13所述的计算机设备,其中,所述S2,使得所述处理器执行以下步骤:
    S201:在调度平台中创建定时轮询脚本,并设定包括轮询时间、轮询任务名称及报错运行次数在内的轮询参数;
    S202:定时轮询脚本根据轮询参数定时轮询任务运行日志表;
    S203:逐个将任务运行日志表中报错日志记录的报错关键字从任务运行日志表中进行提取。
  16. 根据权利要求13所述的计算机设备,其中,所述S3,使得所述处理器 执行以下步骤:
    S301:将步骤S2提取的报错关键字与步骤S1中的报错关键字配置表中的报错关键字进行比对,若报错关键字配置表中存在步骤S2提取的报错关键字,则进入下一步,否则返回S203继续轮询;
    S302:提取报错关键字配置表中与报错关键字对应的重跑范围,判断当前任务的轮询任务名称是否在重跑范围内,若是在重跑范围内,则进入下一步,否则返回S203继续轮询;
    S303:提取报错关键字配置表中与报错关键字对应的重跑次数,判断当前任务的报错运行次数是否小于等于重跑次数,若报错运行次数小于等于重跑次数,则进入下一步,否则返回S203继续轮询;
    S304:提取报错关键字配置表中与报错关键字对应的重跑间隔,判断当前任务的轮询时间加上重跑间隔预设的时间是否是调度平台当前时间,若是当前时间,则进入下一步,否则延时进入下一步,延时时间为轮询时间加上重跑间隔预设的时间减去当前时间;
    S305:将当前任务运行日志表中的报错日志记录从报错状态修改为准备状态,并将当前任务的报错运行次数加一。
  17. 一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
    S1:在调度平台数据库中创建报错关键字配置表,接收人工收集的报错关键字及报错关键字对应的重跑范围、重跑次数和重跑间隔,并配置到报错关键字配置表中;
    S2:创建定时轮询脚本,定时轮询脚本定时从调度平台的任务运行日志表中轮询报错日志记录的报错关键字,并将报错日志记录的报错关键字从日志表中提取;
    S3:在报错关键字配置表中存在提取的报错关键字,且当前任务符合报错关键字对应的重跑范围、重跑次数和重跑间隔后,将当前任务的报错状态修改为准备状态。
  18. 根据权利要求17所述的存储介质,其中,所述S1,使得一个或多个处理器执行以下步骤:
    S101:在调度平台上设置一配置输入界面,配置输入界面中设有包括报错 关键字字段、重跑范围字段、重跑次数字段和重跑间隔字段的字段信息;
    S102:在调度平台的数据库中创建报错关键字配置表,接收配置输入界面中输入的字段信息,将字段信息存储在报错关键字配置表中。
  19. 根据权利要求17所述的存储介质,其中,所述S2,使得一个或多个处理器执行以下步骤:
    S201:在调度平台中创建定时轮询脚本,并设定包括轮询时间、轮询任务名称及报错运行次数在内的轮询参数;
    S202:定时轮询脚本根据轮询参数定时轮询任务运行日志表;
    S203:逐个将任务运行日志表中报错日志记录的报错关键字从任务运行日志表中进行提取。
  20. 根据权利要求17所述的存储介质,其中,所述S3,使得一个或多个处理器执行以下步骤:
    S301:将步骤S2提取的报错关键字与步骤S1中的报错关键字配置表中的报错关键字进行比对,若报错关键字配置表中存在步骤S2提取的报错关键字,则进入下一步,否则返回S203继续轮询;
    S302:提取报错关键字配置表中与报错关键字对应的重跑范围,判断当前任务的轮询任务名称是否在重跑范围内,若是在重跑范围内,则进入下一步,否则返回S203继续轮询;
    S303:提取报错关键字配置表中与报错关键字对应的重跑次数,判断当前任务的报错运行次数是否小于等于重跑次数,若报错运行次数小于等于重跑次数,则进入下一步,否则返回S203继续轮询;
    S304:提取报错关键字配置表中与报错关键字对应的重跑间隔,判断当前任务的轮询时间加上重跑间隔预设的时间是否是调度平台当前时间,若是当前时间,则进入下一步,否则延时进入下一步,延时时间为轮询时间加上重跑间隔预设的时间减去当前时间;
    S305:将当前任务运行日志表中的报错日志记录从报错状态修改为准备状态,并将当前任务的报错运行次数加一。
PCT/CN2018/104367 2018-05-21 2018-09-06 任务自动重跑方法、系统、计算机设备和存储介质 WO2019223174A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810486865.1A CN108681598B (zh) 2018-05-21 2018-05-21 任务自动重跑方法、系统、计算机设备和存储介质
CN201810486865.1 2018-05-21

Publications (1)

Publication Number Publication Date
WO2019223174A1 true WO2019223174A1 (zh) 2019-11-28

Family

ID=63806868

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/104367 WO2019223174A1 (zh) 2018-05-21 2018-09-06 任务自动重跑方法、系统、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN108681598B (zh)
WO (1) WO2019223174A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177232B (zh) * 2018-11-09 2022-12-13 中移(苏州)软件技术有限公司 一种数据处理方法、装置、系统和存储介质
CN110096416B (zh) * 2019-03-13 2022-07-08 中国平安人寿保险股份有限公司 异常报警方法、装置、计算机装置及可读存储介质
CN110380817A (zh) * 2019-06-28 2019-10-25 苏州浪潮智能科技有限公司 持续抓取crc报错数量的方法、装置、终端及存储介质
CN111414203A (zh) * 2020-03-14 2020-07-14 北京数巫大数据研究院有限公司 一种智能数据etl任务运行系统
CN111611127B (zh) * 2020-04-26 2023-10-31 第四范式(北京)技术有限公司 任务运行日志的处理方法、装置、设备及存储介质
CN111626770A (zh) * 2020-04-30 2020-09-04 上海携程商务有限公司 人群计算的控制方法、系统、设备和介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479113A (zh) * 2010-11-30 2012-05-30 中国移动通信集团黑龙江有限公司 异常自适应处理方法及系统
CN105468500A (zh) * 2015-11-16 2016-04-06 中国建设银行股份有限公司 定时任务监控方法和装置
CN106201754A (zh) * 2016-07-06 2016-12-07 乐视控股(北京)有限公司 任务信息分析方法及装置
CN107025224A (zh) * 2016-01-29 2017-08-08 阿里巴巴集团控股有限公司 一种监控任务运行的方法和设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105897487B (zh) * 2016-06-13 2019-12-10 北京百度网讯科技有限公司 用于运维系统的设备管理方法和装置
CN107870948A (zh) * 2016-09-28 2018-04-03 平安科技(深圳)有限公司 任务调度方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479113A (zh) * 2010-11-30 2012-05-30 中国移动通信集团黑龙江有限公司 异常自适应处理方法及系统
CN105468500A (zh) * 2015-11-16 2016-04-06 中国建设银行股份有限公司 定时任务监控方法和装置
CN107025224A (zh) * 2016-01-29 2017-08-08 阿里巴巴集团控股有限公司 一种监控任务运行的方法和设备
CN106201754A (zh) * 2016-07-06 2016-12-07 乐视控股(北京)有限公司 任务信息分析方法及装置

Also Published As

Publication number Publication date
CN108681598B (zh) 2023-06-02
CN108681598A (zh) 2018-10-19

Similar Documents

Publication Publication Date Title
WO2019223174A1 (zh) 任务自动重跑方法、系统、计算机设备和存储介质
EP2038760B1 (en) Standard operating procedure automation in database administration
US11474874B2 (en) Systems and methods for auto-scaling a big data system
WO2019068002A1 (en) INFRASTRUCTURE OF INDEPENDENT AUTONOMOUS DATABASE BASED CLOUD SERVICES
CN111125444A (zh) 大数据任务调度管理方法、装置、设备及存储介质
WO2020248708A1 (zh) 一种Spark作业的提交方法及装置
CN107783829B (zh) 任务处理方法、装置、存储介质和计算机设备
CN110895488B (zh) 任务调度方法及装置
CN108960641B (zh) 电商平台作业调度方法及系统
WO2020211253A1 (zh) 分布式系统中主机数量弹性伸缩方法、装置和计算机设备
CN113590386A (zh) 数据的容灾恢复方法、系统、终端设备及计算机存储介质
CN112579267A (zh) 一种去中心化大数据作业流调度方法及装置
WO2022247219A1 (zh) 一种信息备份方法、设备及平台
CN110895485A (zh) 任务调度系统
CN110895486A (zh) 分布式任务调度系统
CN102221995A (zh) 地震数据处理作业的断点恢复方法
CN111767125B (zh) 任务执行方法、装置、电子设备、存储介质
US11443191B2 (en) Computing device and parameter synchronization method implemented by computing device
CN113010295A (zh) 流式计算方法、装置、设备以及存储介质
EP3748506B1 (en) Information processing program, information processing device, and debugging system
US20230418242A1 (en) Intelligent resource evaluator system for robotic process automations
CN113296840B (zh) 一种集群运维方法及装置
CN111177116B (zh) 一种数据库智能管理平台及其管理方法
CN110888928B (zh) 基于etl工具服务组件的可视化控制方法
CN113656468B (zh) 基于nifi的任务流程触发方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18919692

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18919692

Country of ref document: EP

Kind code of ref document: A1