WO2017045565A1 - 一种作业的操作重试方法和装置 - Google Patents

一种作业的操作重试方法和装置 Download PDF

Info

Publication number
WO2017045565A1
WO2017045565A1 PCT/CN2016/098508 CN2016098508W WO2017045565A1 WO 2017045565 A1 WO2017045565 A1 WO 2017045565A1 CN 2016098508 W CN2016098508 W CN 2016098508W WO 2017045565 A1 WO2017045565 A1 WO 2017045565A1
Authority
WO
WIPO (PCT)
Prior art keywords
job
retry
module
progress
time
Prior art date
Application number
PCT/CN2016/098508
Other languages
English (en)
French (fr)
Inventor
李强
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to JP2018513324A priority Critical patent/JP6818014B2/ja
Priority to EP16845679.6A priority patent/EP3352078B1/en
Publication of WO2017045565A1 publication Critical patent/WO2017045565A1/zh
Priority to US15/924,118 priority patent/US10866862B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/805Real-time

Definitions

  • the present application relates to the technical field of computer processing, and in particular to an operation retry method of a job and an operation retry device of a job.
  • a job interacts with other systems (services) while it is running.
  • other systems (services) are short-lived (service interruption)
  • the job will fail.
  • the time interval for retrying is generally greater than 10 minutes to ensure that the job can continue to run when such short periods of time are unavailable.
  • the time at which the retry is generally selected will not select the maximum retry time, but rather the average or a certain percentage of time.
  • embodiments of the present application have been made in order to provide an operation retry method of a job and a corresponding operation retry device of a job that overcomes the above problems or at least partially solves the above problems.
  • an operation retry method for a job including:
  • the job is a data synchronization job
  • the step of detecting whether the operation in the job fails is:
  • the step of calculating the retry time according to the progress of the job includes:
  • the increment factor is configured according to the progress of the job
  • the retry time is calculated based on the increment factor and the retry time base.
  • the step of calculating the retry time base according to the preset interval time includes:
  • the current number of retries is used as an index, and the preset interval is increased as a time base.
  • the method before the step of counting the progress of the job, the method further includes:
  • the step of determining whether the operation needs to be performed again includes:
  • the number of times is that the current number of retries exceeds a preset threshold number of retries
  • the status condition is that the job is stopped.
  • an operation retry device for a job including:
  • a job detection module configured to detect whether the operation in the job fails; if yes, call a progress statistics module;
  • a progress statistics module for counting the progress of the job
  • a retry time calculation module configured to calculate a retry time according to the progress of the job
  • the job retry module is used to re-execute the operation after waiting for the retry time.
  • the job detection module includes:
  • a first determining sub-module configured to fail to read data from the source device or timeout, and determine that the operation in the data synchronization job fails
  • a second determining sub-module configured to determine that the operation in the data synchronization job fails when the interaction with the data synchronization service fails or times out;
  • the third determining submodule is configured to determine that the operation in the data synchronization job fails when the data is written to the destination device fails or times out.
  • the retry time calculation module includes:
  • An incremental factor calculation sub-module configured to configure an increment factor according to the progress of the job
  • Retry time base calculation sub-module configured to calculate a retry time base according to a preset interval time
  • An incremental adjustment submodule configured to calculate a retry time according to the increment factor and the retry time base.
  • the retry time base calculation submodule includes:
  • the retries acquisition unit is configured to obtain the current number of retries
  • a fixed calculation unit configured to calculate a product of a preset interval time and a current number of retries as a time base
  • the index calculation unit is configured to increase the preset interval time as the time base by using the current number of retries as an index.
  • the operation retry device of the job further includes:
  • Retrying the determination module configured to determine whether the operation needs to be re-executed; if yes, calling the progress statistics module; if not, calling the retry exit module;
  • the retry determination module includes:
  • condition determining sub-module configured to determine whether the number of times condition and/or the state condition are met; if yes, the fourth determining sub-module is invoked, and if not, the fifth determining sub-module is invoked;
  • a fourth determining sub-module configured to determine that the operation does not need to be performed again
  • a fifth determining submodule configured to determine that the operation needs to be performed again
  • the number of times is that the current number of retries exceeds a preset threshold number of retries
  • the status condition is that the job is stopped.
  • the retry time is adaptively calculated according to the progress of the operation, especially for the long operation, the length of the retry time is greatly increased, and the retry of the dynamic operation is realized, and the operation can be handled for a longer time.
  • the service interruption situation avoids the waste of equipment resources caused by the failure of the operation and the re-execution of the operation, and greatly reduces the retrying cost while ensuring the success rate of the operation.
  • the status detection of the job is increased.
  • the retry is terminated, and the quick end and exit of the retry are implemented, thereby further reducing the waste of equipment resources and reducing the retrying cost.
  • Embodiment 1 is a flow chart showing the steps of Embodiment 1 of an operation retry method of a job of the present application;
  • FIG. 2 is an architectural diagram of an offline synchronization tool according to an embodiment of the present application
  • Embodiment 3 is a flow chart showing the steps of Embodiment 2 of an operation retry method of a job of the present application;
  • Embodiment 4 is a structural block diagram of Embodiment 1 of an operation retry device of an operation of the present application;
  • FIG. 5 is a structural block diagram of Embodiment 2 of an operation retry device of an operation of the present application.
  • Embodiment 1 of the operation retry method of a job of the present application is shown, which may specifically include the following steps:
  • Step 101 detecting whether the operation in the job fails; if yes, executing step 102;
  • a job is a computer operator (or a program called a job scheduler) that is handed over to the operating system's execution unit.
  • an assignment can be the running of an application, such as a weekly employee payroll program.
  • Jobs are usually run in batch mode.
  • the operator or job scheduler hands the operating system a batch of jobs to be performed (employee payroll, cost analysis, employee file upgrades, etc.) that will be executed when the operating system is not performing time-sensitive interactive operations. carried out.
  • the data synchronization job can be executed by the synchronization tool.
  • the synchronization tool is a common tool for synchronizing between multiple databases.
  • the synchronization tool includes a series of workers (work devices) with datax Service.
  • the datax Service accepts the job command (such as starting the job, stopping the job, etc.), selects a worker to execute the job, and returns the status to the datax Service.
  • the job command such as starting the job, stopping the job, etc.
  • the working device reads data from the source device and writes the data to the destination device.
  • the source and destination devices are arbitrary relational databases (such as MySQL, PostgreSQL, HBase) or non-relational databases.
  • a working device can read data from MySQL and write to HBase.
  • the job interacts with the source device, the destination device, and the datax Service during execution.
  • the API Application Programming Interface
  • the API Application Programming Interface
  • the interactive API provided by the datax service When interacting with the datax Service (operation), the interactive API provided by the datax service usually returns three kinds of results: success, failure, and timeout.
  • the API that writes data provided by the destination device returns three results: success, failure, and timeout.
  • Step 102 Statistics the progress of the job
  • the work device can count the progress of the job while the job is being executed.
  • a complete job can be divided into many fragments, and the completion degree of the statistical fragments can count the progress of the operation.
  • Step 103 Calculate a retry time according to the progress of the job
  • the progress of the job is used as a factor, and the retry time is calculated according to the factor, and the retry time is proportional to the progress of the job, so that the service interruption condition can be coped with for a longer time.
  • the retry time when the progress is 80% will be significantly greater than the retry time when the progress is 30%, so that the job has enough time to wait for the service to recover, avoiding the waste of re-executing the job.
  • step 103 may include the following sub-steps:
  • Sub-step S11 configuring an increment factor according to the progress of the job
  • the incremental factor is mainly determined by the operating state of the job, which can characterize the impact of the progress of the job on the retry time.
  • the progress of the job is proportional to the increment factor, that is, the progress of the job is the largest, and the increment factor is larger. Otherwise, the progress of the job is the smallest and the increment factor is smaller.
  • a job with a 15% progress has an increment factor of 1.15
  • a job with an 80% progress has an increment factor of 1.80.
  • Sub-step S12 calculating a retry time base according to a preset interval time
  • the retrying strategy may be preset, such as retrying according to fixed time or exponential growth, interval time, maximum number of retries allowed, and the like.
  • the retry time base can be calculated according to the retry strategy.
  • each retry interval is 30 seconds.
  • the product of the preset interval time and the current number of retries can be calculated as the time base.
  • the current number of retries can be used as an index to increase the preset interval as a time base.
  • the time base interval time * 2n-1, where n is the current number of retries.
  • the interval is 10s, that is, the time for the first retry is 10s, the time for the second retry is 10s*2, and the time for the third retry is 10s*4, fourth.
  • the time for the second retry is 10s*8.
  • Sub-step S13 calculating a retry time according to the increment factor and the retry time base.
  • the product of the increment factor and the retry time base can be directly calculated as the retry time.
  • Step 104 after waiting for the retry time, re-execute the operation.
  • the job after waiting for the retry time, the job can be retried.
  • the data can be read again from the source device, or re-interacted with the datax service, or the data can be rewritten to the destination device, and so on.
  • the amount of synchronization in a day on an e-commerce platform is about 300T, and the number of synchronized jobs per day is 60,000 jobs.
  • the completion time of these jobs is mostly inconsistent. There are jobs completed within 30 minutes, jobs completed within 2 hours, jobs completed in 10 hours, and jobs completed even longer.
  • the retry cost is the same for all jobs.
  • the adjustment retry is 10 minutes, and the retry time of all short jobs is even longer than the actual working time, which is unreasonable and costly.
  • the current retry strategy can only solve the problem of low success rate of single operation. By adjusting the number of retry attempts or time intervals, whether it is a fixed time or an exponentially retrying strategy, the problem of retrying costs cannot be solved. Come for more extra retry costs.
  • the retry time is adaptively calculated according to the progress of the operation, especially for the long operation, the length of the retry time is greatly increased, and the retry of the dynamic operation is realized, and the operation can be handled for a longer time.
  • the service interruption situation avoids the waste of equipment resources caused by the failure of the operation and the re-execution of the operation, and greatly reduces the retrying cost while ensuring the success rate of the operation.
  • Embodiment 2 of the operation retry method of a job of the present application is shown, which may specifically include the following steps:
  • Step 301 detecting whether the operation in the job fails; if yes, executing step 302;
  • Step 302 it is determined whether the operation needs to be performed again; if yes, step 304 is performed; if not, step 303 is performed;
  • Step 303 exiting and performing the operation again;
  • the retrying policy may be preset, and if the retrying policy is met, the job is retried, otherwise, the retry is exited.
  • step 302 may include the following sub-steps:
  • Sub-step S21 it is determined whether the number of times condition and / or status conditions are met; if so, sub-step S22 is performed, and if not, sub-step S23 is performed;
  • Sub-step S22 determining that the operation does not need to be performed again
  • Sub-step S23 determining that the operation needs to be performed again
  • the number of times is that the current number of retries exceeds a preset threshold number of retries
  • the status condition is that the job is stopped.
  • a retry strategy such as retrying at a fixed time or exponential increase, and the maximum number of retries allowed (ie, the number of retries thresholds) can be specified at the time of job configuration, that is, when the user configures the job, the job is specified. If you encounter an error, how to retry.
  • the working device retry according to the retrying strategy. For each retry, the current number of retries can be recorded to match the maximum number of retries allowed (ie, the number of retries threshold). For comparison, when the maximum number of retries allowed (ie, the number of retries threshold) is exceeded, the retry is stopped, otherwise, the retry is continued.
  • the maximum number of retries allowed ie, the number of retries threshold
  • the work device can know the execution status of the job, such as normal execution, stop execution, execution time, execution progress, and the like, while executing the job.
  • the traditional retry strategy requires the entire retry to end before it can stop.
  • a user configures a job to retry 10 times, and then the user finds a configuration error in one place and needs to stop the job, but the job just enters the retry, the user needs to wait 10 times for the retry to complete, and the job can be stopped correctly.
  • the status detection of the job is increased.
  • the retry is terminated, and the quick end and exit of the retry are implemented, thereby further reducing the waste of equipment resources and reducing the retrying cost.
  • Step 304 Statistics the progress of the job
  • Step 305 Calculate a retry time according to the progress of the job
  • Step 306 after waiting for the retry time, re-execute the operation.
  • Embodiment 1 of an operation retry device of a job of the present application is shown, which may specifically include the following modules:
  • the job detection module 401 is configured to detect whether the operation in the job fails; if yes, call the progress statistics module 402;
  • a retry time calculation module 403, configured to calculate a retry time according to the progress of the job
  • the job retry module 404 is configured to re-execute the operation after waiting for the retry time.
  • the job detection module 401 may include the following sub-modules:
  • a first determining sub-module configured to fail to read data from the source device or timeout, and determine that the operation in the data synchronization job fails
  • a second determining sub-module configured to determine that the operation in the data synchronization job fails when the interaction with the data synchronization service fails or times out;
  • the third determining submodule is configured to determine that the operation in the data synchronization job fails when the data is written to the destination device fails or times out.
  • the retry time calculation module 403 may include the following submodules:
  • An incremental factor calculation sub-module configured to configure an increment factor according to the progress of the job
  • Retry time base calculation sub-module configured to calculate a retry time base according to a preset interval time
  • An incremental adjustment submodule configured to calculate a retry time according to the increment factor and the retry time base.
  • the retry time base calculation submodule may include the following units:
  • the retries acquisition unit is configured to obtain the current number of retries
  • a fixed calculation unit configured to calculate a product of a preset interval time and a current number of retries as a time base
  • the index calculation unit is configured to increase the preset interval time as the time base by using the current number of retries as an index.
  • Embodiment 2 of an operation retry device of a job of the present application is shown, which may specifically include the following modules:
  • the job detection module 501 is configured to detect whether the operation in the job fails; if yes, call the retry determination module 502;
  • the retry decision module 502 is configured to determine whether the operation needs to be re-executed; if yes, the progress statistics module 504 is invoked, and if not, the retry exit module 503 is invoked;
  • a progress statistics module 504, configured to count the progress of the job
  • a retry time calculation module 505 configured to calculate a retry time according to the progress of the job
  • the job retry module 506 is configured to re-execute the operation after waiting for the retry time.
  • the retry decision module 502 may include the following submodules:
  • condition determining sub-module configured to determine whether the number of times condition and/or the state condition are met; if yes, the fourth determining sub-module is invoked, and if not, the fifth determining sub-module is invoked;
  • a fourth determining sub-module configured to determine that the operation does not need to be performed again
  • a fifth determining submodule configured to determine that the operation needs to be performed again
  • the number of times is that the current number of retries exceeds a preset threshold number of retries
  • the status condition is that the job is stopped.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Retry When Errors Occur (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种作业的操作重试方法和装置,所述方法包括:检测作业中的操作是否失败(101);若是,则统计所述作业的进度(102);根据所述作业的进度计算重试时间(103);等待重试时间后,重新执行所述操作(104)。该方法使得在作业的操作失败时,根据作业的进度自适应计算重试时间,尤其是对于长作业,大大增加了重试时间的长度,实现动态进行作业的重试,能应付更长时间的服务中断情况,避免了作业失败时、重新执行作业带来的设备资源浪费,在保证作业的成功率的同时,大大降低了重试成本。

Description

一种作业的操作重试方法和装置
本申请要求2015年09月18日递交的申请号为201510601116.5、发明名称为“一种作业的操作重试方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机处理的技术领域,特别是涉及一种作业的操作重试方法和一种作业的操作重试装置。
背景技术
在计算机系统中,作业在运行中会和其他系统(服务)进行交互,在其他系统(服务)出现短时间不可服务(服务中断)时,作业将运行失败。
在大数据的场景下,现在的产品在运行出错时,一般对作业进行重试,即等待其他系统(服务)恢复,来使作业继续运行,尽量保证作业的成功率。
否则,如果重试全部失败,整个作业将失败。
例如,其他系统(服务)不可服务的时间最长到了10分钟,则要求重试的时间间隔总体要大于10分钟,才能保证作业在出现这种短时间不可服务时能继续运行。
如果重试都调整为10分钟,无疑增大了重试成本,表现在2方面:
1、短作业;例如,预计的运行时间只有30分钟的作业,因重试带来的实际的运行时间将可能长于正常运行的时间;
2、管理员需要手动停止作业时,特别是因为外部系统(服务)不可用时,需要人工进行运维停止作业,并进行一些调整和部署,如果刚好碰到作业重试,则需要等作业重试完10分钟,该作业才能完全停止。
因此,现在一般选择重试的时间不会选择最大的重试时间,而是选择平均值或者满足一定百分比的时间。
例如,若外部系统(服务)的不可用时间均为1分钟,最大不可用时间为10分钟时,一般选择重试为2分钟,确保大部分情况都能重试成功。
但是,这种解决方案对于长作业,可能导致重试成本增加。
例如,对于一个运行10个小时的作业,如果10个小时的作业运行到了80%,即已经运行了8个小时的时候,出现了一次10分钟的服务中断,超过了最大的重试时间,导 致该作业失败,重新执行该作业意味着在先8个小时的运行都浪费了,代价是巨大的。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种作业的操作重试方法和相应的一种作业的操作重试装置。
为了解决上述问题,本申请实施例公开了一种作业的操作重试方法,包括:
检测作业中的操作是否失败;若是,则统计所述作业的进度;
根据所述作业的进度计算重试时间;
等待重试时间后,重新执行所述操作。
可选的,所述作业为数据同步作业,所述检测作业中的操作是否失败的步骤包括:
当从源设备读取数据失败或超时,判定所述数据同步作业中的操作失败;
和/或,
当与数据同步服务交互失败或超时,判定所述数据同步作业中的操作失败;
和/或,
当将数据写入目的设备失败或超时,判定所述数据同步作业中的操作失败。
可选的,所述根据所述作业的进度计算重试时间的步骤包括:
按照所述作业的进度配置增量因子;
依据预设的间隔时间计算重试时间基数;
根据所述增量因子与所述重试时间基数计算重试时间。
可选的,所述依据预设的间隔时间计算重试时间基数的步骤包括:
获取当前的重试次数;
计算预设的间隔时间与当前的重试次数的乘积,作为时间基数;
或者,
将当前的重试次数作为指数,对预设的间隔时间进行增加,作为时间基数。
可选的,在所述统计所述作业的进度的步骤之前,所述方法还包括:
判断是否需要重新执行所述操作;
若是,则执行所述统计所述作业的进度的步骤;
若否,则退出重新执行所述操作。
可选的,所述判断是否需要重新执行所述操作的步骤包括:
判断是否符合次数条件和/或状态条件;
若是,则判定不需要重新执行所述操作;
若否,则判定需要重新执行所述操作;
其中,所述次数条件为当前的重试次数超过预设的重试次数阈值;
所述状态条件为作业停止。
为了解决上述问题,本申请实施例还公开了一种作业的操作重试装置,包括:
作业检测模块,用于检测作业中的操作是否失败;若是,则调用进度统计模块;
进度统计模块,用于统计所述作业的进度;
重试时间计算模块,用于根据所述作业的进度计算重试时间;
作业重试模块,用于等待重试时间后,重新执行所述操作。
可选的,所述作业检测模块包括:
第一判定子模块,用于在从源设备读取数据失败或超时,判定所述数据同步作业中的操作失败;
和/或,
第二判定子模块,用于在与数据同步服务交互失败或超时,判定所述数据同步作业中的操作失败;
和/或,
第三判定子模块,用于在将数据写入目的设备失败或超时,判定所述数据同步作业中的操作失败。
可选的,所述重试时间计算模块包括:
增量因子计算子模块,用于按照所述作业的进度配置增量因子;
重试时间基数计算子模块,用于依据预设的间隔时间计算重试时间基数;
增量调整子模块,用于根据所述增量因子与所述重试时间基数计算重试时间。
可选的,所述重试时间基数计算子模块包括:
重试次数获取单元,用于获取当前的重试次数;
固定计算单元,用于计算预设的间隔时间与当前的重试次数的乘积,作为时间基数;
或者,
指数计算单元,用于将当前的重试次数作为指数,对预设的间隔时间进行增加,作为时间基数。
可选的,所述作业的操作重试装置还包括:
重试判定模块,用于判断是否需要重新执行所述操作;若是,则调用进度统计模块,若否,则调用重试退出模块;
重试退出模块,用于退出重新执行所述操作。
可选的,所述重试判定模块包括:
条件判断子模块,用于判断是否符合次数条件和/或状态条件;若是,则调用第四判定子模块,若否,则调用第五判定子模块;
第四判定子模块,用于判定不需要重新执行所述操作;
第五判定子模块,用于判定需要重新执行所述操作;
其中,所述次数条件为当前的重试次数超过预设的重试次数阈值;
所述状态条件为作业停止。
本申请实施例包括以下优点:
本申请实施例在作业的操作失败时,根据作业的进度自适应计算重试时间,尤其是对于长作业,大大增加了重试时间的长度,实现动态进行作业的重试,能应付更长时间的服务中断情况,避免了作业失败时、重新执行作业带来的设备资源浪费,在保证作业的成功率的同时,大大降低了重试成本。
本申请实施例的重试策略中增加作业的状态检测,当作业被停止时,将终止重试,实现重试的快速结束和退出,进一步减少了设备资源的浪费,减少了重试成本。
附图说明
图1是本申请的一种作业的操作重试方法实施例1的步骤流程图;
图2是本申请实施例的一种离线同步工具的架构图;
图3是本申请的一种作业的操作重试方法实施例2的步骤流程图;
图4是本申请的一种作业的操作重试装置实施例1的结构框图;
图5是本申请的一种作业的操作重试装置实施例2的结构框图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
参照图1,示出了本申请的一种作业的操作重试方法实施例1的步骤流程图,具体可以包括如下步骤:
步骤101,检测作业中的操作是否失败;若是,则执行步骤102;
在某些操作系统中,作业(job)是计算机操作者(或是一个叫做作业调度器的程序)交给操作系统的执行单位。
例如,作业可以是一个应用程序的运行,如一个每周都执行的职工工资册的程序。
作业通常是以批的模式运行的。
操作者或作业调度器交给操作系统一批要执行的作业(职工工资册、花销分析、雇员文件升级等等),这些作业将在操作系统没有执行具有时间敏感性的交互式操作时被执行。
在本申请实施例中,以数据同步作业为作业的其中一个示例,数据同步作业可以通过同步工具执行。
如图2所示,同步工具是一个通用的多种数据库之间进行同步的工具。
同步工具包括具有datax Service(数据同步服务)的一系列的worker(工作设备)。
其中,datax Service接受作业的命令(如启动作业、停止作业等),选择一台worker执行该作业,并向datax Service回报状态。
在同步时,工作设备从源设备读取数据,将数据写入目的设备。
源设备和目的设备是任意的关系数据库(如MySQL、PostgreSQL、HBase)或者是非关系数据库。
例如,工作设备可以从MySQL读取数据,写入HBase。
作业在执行过程中,与源设备,目的设备,以及datax Service进行交互。
从源设备读取数据(操作)时,源设备提供的读数据的API(Application Programming Interface,应用程序编程接口)返回的结果通常有3种:成功,失败,超时。
当从源设备读取数据失败或超时,可以判定数据同步作业中的操作失败。
与datax Service的交互(操作)时,datax Service提供的交互的API返回的结果通常有3种:成功,失败,超时。
当与数据同步服务交互失败或超时,可以判定数据同步作业中的操作失败。
从将数据写入目的设备(操作)时,目的设备提供的写数据的API返回的结果通常有3种:成功,失败,超时。
当将数据写入目的设备失败或超时,可以判定数据同步作业中的操作失败。
对于失败和超时这2种情况,确认作业失败,可以进行重试。
当然,上述作业及其失败的判断方式只是作为示例,在实施本申请实施例时,可以根据实际情况设置其他作业及其失败的判断方式,本申请实施例对此不加以限制。另外,除了上述作业及其失败的判断方式外,本领域技术人员还可以根据实际需要采用其它作业及其失败的判断方式,本申请实施例对此也不加以限制。
步骤102,统计所述作业的进度;
在实际应用中,工作设备可以在执行作业的时候,统计作业的进度。
以分批统计为例,可以将一个完整的作业划分为很多分片,统计分片的完成度既可以统计作业的进度。
例如,一个完整的作业划分为10000个分片,则每完成100个分片,则作业的进度前进1%。
步骤103,根据所述作业的进度计算重试时间;
在本申请实施例中,作业的进度作为一个因子,按照该因子计算重试时间,重试时间与作业的进度成正比,使得可以应付更长时间的服务中断情况。
例如,进度为80%时的重试时间将明显大于进度为30%时的重试时间,使得作业有足够的时间等待服务恢复,避免重新执行作业造成浪费。
在本申请的一个实施例中,步骤103可以包括如下子步骤:
子步骤S11,按照所述作业的进度配置增量因子;
在实际应用中,增量因子主要由作业的运行状态决定,可以表征作业的进度对重试时间的影响。
一般而言,作业的进度与增量因子成正比,即作业的进度最大,增量因子越大,反之,作业的进度最小,增量因子越小。
例如,进度为15%的作业的增量因子为1.15,进度为80%的作业的增量因子为1.80。
子步骤S12,依据预设的间隔时间计算重试时间基数;
应用本申请实施例,可以预先设置重试策略,如按照固定时间或指数增长进行重试、间隔时间、允许的最大重试次数等。
若获取了当前的重试次数,则可以按照重试策略计算重试时间基数。
若重试策略为固定时间的重试策略,即每经过间隔时间进行重试,如指定3次重试、每次重试的时间间隔为30秒。
则可以计算预设的间隔时间与当前的重试次数的乘积,作为时间基数。
若重试策略为指数增长的重试策略,则可以将当前的重试次数作为指数,对预设的间隔时间进行增加,作为时间基数。
在一个示例中,时间基数=间隔时间*2n-1,其中,n为当前的重试次数。
例如,指定4次重试,间隔时间为10s,即第一次重试的时间为10s,第二次重试的时间为10s*2,第三次重试的时间为10s*4,第四次重试的时间为10s*8。
子步骤S13,根据所述增量因子与所述重试时间基数计算重试时间。
通常,可以直接计算增量因子与重试时间基数的乘积,作为重试时间。
步骤104,等待重试时间后,重新执行所述操作。
在本申请实施例中,在等待重试时间后,可以重试作业。
以数据同步作业为作业的其中一个示例,在作业出现失败时,可以重新从源设备读取数据,或者,重新与datax Service的交互,或者,重新将数据写入目的设备等等。
在大数据的场景下,现在的产品在运行出错时,一般对作业进行重试,即等待其他系统(服务)恢复,来使作业继续运行,尽量保证作业的成功率。
例如,通过datax Service进行离线同步,在某电子商务平台一天的同步量大概在300T左右,每天同步的作业数在6万作业。
这些作业的完成时间大多不一致,有30分钟内完成的作业,有2小时内完成的作业,也有10小时完成的作业,甚至更长时间完成的作业。
对于完成时间较少(如30分钟)的作业,即短作业,如果出现偶然的错误比如断网或者服务重启等,重试结束都没有恢复的话,重新执行整个作业,代价一般是可以接受的。
但是,完成时间较长(如10小时)的作业,即长作业,如果在运行一半时间以上时重试都不能成功的话,重新执行整个作业,代价会比较大。
如果简单调整重试的重试次数,或者重试间隔,则导致所有的作业的重试代价一样。
例如,调整重试为10分钟,所有的短作业的重试时间甚至会长于实际的工作时间,这是不合理,也是代价很大的。
特别是在用户想停止作业,刚好碰到作业在重试时,将会面对不得不等待重试结束后,作业才能结束。
现在的重试策略只能单一缓解作业成功率低的问题,通过调整重试次数或者时间间隔,不管是应用固定时间的还是指数增长的重试策略,也无法解决重试成本的问题,会带来更多的额外重试成本。
本申请实施例在作业的操作失败时,根据作业的进度自适应计算重试时间,尤其是对于长作业,大大增加了重试时间的长度,实现动态进行作业的重试,能应付更长时间的服务中断情况,避免了作业失败时、重新执行作业带来的设备资源浪费,在保证作业的成功率的同时,大大降低了重试成本。
参照图3,示出了本申请的一种作业的操作重试方法实施例2的步骤流程图,具体可以包括如下步骤:
步骤301,检测作业中的操作是否失败;若是,则执行步骤302;
步骤302,判断是否需要重新执行所述操作;若是,则执行步骤304;若否,则执行步骤303;
步骤303,退出重新执行所述操作;
应用本申请实施例,可以预先设置重试策略,在符合重试策略的情况下,对作业进行重试,否则,退出重试。
在本申请的一个实施例中,步骤302可以包括如下子步骤:
子步骤S21,判断是否符合次数条件和/或状态条件;若是,则执行子步骤S22,若否,则执行子步骤S23;
子步骤S22,判定不需要重新执行所述操作;
子步骤S23,判定需要重新执行所述操作;
其中,次数条件为当前的重试次数超过预设的重试次数阈值;
状态条件为作业停止。
在具体实现中,诸如按照固定时间或指数增长进行重试、允许的最大重试次数(即重试次数阈值)等重试策略均可以在作业配置时指定,即用户配置作业时,指定该作业如果遇到出错的时候,如何进行重试。
因此,作业下发到工作设备时,工作设备按照这个重试策略进行重试,每重试一次,则可以记录当前的重试次数,以便与允许的最大重试次数(即重试次数阈值)进行比较,当超过允许的最大重试次数(即重试次数阈值)时,停止重试,否则,继续重试。
由于作业是在工作机器上执行,因此工作设备在执行作业时,可以获知作业的执行状态,如正常执行、停止执行、执行的时间,执行的进度等等。
如果用户想手动停止作业,尤其是在外部系统(服务)不可用时,需要人工进行运维停止作业,并进行一些调整和部署等情况,工作设备接收到作业停止指令后,应该尽 快停止作业。
如果作业刚好在重试中,传统的重试策略需要整个重试全部结束才能停止成功。
例如,某个用户配置作业重试10次,然后用户发现某个地方配置错误,需要停止作业,但作业刚好进入了重试,则用户需要等待10次重试全部结束,作业才能正确停止。
本申请实施例的重试策略中增加作业的状态检测,当作业被停止时,将终止重试,实现重试的快速结束和退出,进一步减少了设备资源的浪费,减少了重试成本。
步骤304,统计所述作业的进度;
步骤305,根据所述作业的进度计算重试时间;
步骤306,等待重试时间后,重新执行所述操作。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
参照图4,示出了本申请的一种作业的操作重试装置实施例1的结构框图,具体可以包括如下模块:
作业检测模块401,用于检测作业中的操作是否失败;若是,则调用进度统计模块402;
进度统计模块402,用于统计所述作业的进度;
重试时间计算模块403,用于根据所述作业的进度计算重试时间;
作业重试模块404,用于等待重试时间后,重新执行所述操作。
在本申请的一个实施例中,所述作业检测模块401可以包括如下子模块:
第一判定子模块,用于在从源设备读取数据失败或超时,判定所述数据同步作业中的操作失败;
和/或,
第二判定子模块,用于在与数据同步服务交互失败或超时,判定所述数据同步作业中的操作失败;
和/或,
第三判定子模块,用于在将数据写入目的设备失败或超时,判定所述数据同步作业中的操作失败。
在本申请的一个实施例中,所述重试时间计算模块403可以包括如下子模块:
增量因子计算子模块,用于按照所述作业的进度配置增量因子;
重试时间基数计算子模块,用于依据预设的间隔时间计算重试时间基数;
增量调整子模块,用于根据所述增量因子与所述重试时间基数计算重试时间。
在本申请的一个实施例中,所述重试时间基数计算子模块可以包括如下单元:
重试次数获取单元,用于获取当前的重试次数;
固定计算单元,用于计算预设的间隔时间与当前的重试次数的乘积,作为时间基数;
或者,
指数计算单元,用于将当前的重试次数作为指数,对预设的间隔时间进行增加,作为时间基数。
参照图5,示出了本申请的一种作业的操作重试装置实施例2的结构框图,具体可以包括如下模块:
作业检测模块501,用于检测作业中的操作是否失败;若是,则调用重试判定模块502;
重试判定模块502,用于判断是否需要重新执行所述操作;若是,则调用进度统计模块504,若否,则调用重试退出模块503;
重试退出模块503,用于退出重新执行所述操作;
进度统计模块504,用于统计所述作业的进度;
重试时间计算模块505,用于根据所述作业的进度计算重试时间;
作业重试模块506,用于等待重试时间后,重新执行所述操作。
在本申请的一个实施例中,所述重试判定模块502可以包括如下子模块:
条件判断子模块,用于判断是否符合次数条件和/或状态条件;若是,则调用第四判定子模块,若否,则调用第五判定子模块;
第四判定子模块,用于判定不需要重新执行所述操作;
第五判定子模块,用于判定需要重新执行所述操作;
其中,所述次数条件为当前的重试次数超过预设的重试次数阈值;
所述状态条件为作业停止。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种作业的操作重试方法和一种作业的操作重试装置,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (12)

  1. 一种作业的操作重试方法,其特征在于,包括:
    检测作业中的操作是否失败;若是,则统计所述作业的进度;
    根据所述作业的进度计算重试时间;
    等待重试时间后,重新执行所述操作。
  2. 根据权利要求1所述的方法,其特征在于,所述作业为数据同步作业,所述检测作业中的操作是否失败的步骤包括:
    当从源设备读取数据失败或超时,判定所述数据同步作业中的操作失败;
    和/或,
    当与数据同步服务交互失败或超时,判定所述数据同步作业中的操作失败;
    和/或,
    当将数据写入目的设备失败或超时,判定所述数据同步作业中的操作失败。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述作业的进度计算重试时间的步骤包括:
    按照所述作业的进度配置增量因子;
    依据预设的间隔时间计算重试时间基数;
    根据所述增量因子与所述重试时间基数计算重试时间。
  4. 根据权利要求3所述的方法,其特征在于,所述依据预设的间隔时间计算重试时间基数的步骤包括:
    获取当前的重试次数;
    计算预设的间隔时间与当前的重试次数的乘积,作为时间基数;
    或者,
    将当前的重试次数作为指数,对预设的间隔时间进行增加,作为时间基数。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,在所述统计所述作业的进度的步骤之前,所述方法还包括:
    判断是否需要重新执行所述操作;
    若是,则执行所述统计所述作业的进度的步骤;
    若否,则退出重新执行所述操作。
  6. 根据权利要求5所述的方法,其特征在于,所述判断是否需要重新执行所述操作的步骤包括:
    判断是否符合次数条件和/或状态条件;
    若是,则判定不需要重新执行所述操作;
    若否,则判定需要重新执行所述操作;
    其中,所述次数条件为当前的重试次数超过预设的重试次数阈值;
    所述状态条件为作业停止。
  7. 一种作业的操作重试装置,其特征在于,包括:
    作业检测模块,用于检测作业中的操作是否失败;若是,则调用进度统计模块;
    进度统计模块,用于统计所述作业的进度;
    重试时间计算模块,用于根据所述作业的进度计算重试时间;
    作业重试模块,用于等待重试时间后,重新执行所述操作。
  8. 根据权利要求7所述的装置,其特征在于,所述作业检测模块包括:
    第一判定子模块,用于在从源设备读取数据失败或超时,判定所述数据同步作业中的操作失败;
    和/或,
    第二判定子模块,用于在与数据同步服务交互失败或超时,判定所述数据同步作业中的操作失败;
    和/或,
    第三判定子模块,用于在将数据写入目的设备失败或超时,判定所述数据同步作业中的操作失败。
  9. 根据权利要求7所述的装置,其特征在于,所述重试时间计算模块包括:
    增量因子计算子模块,用于按照所述作业的进度配置增量因子;
    重试时间基数计算子模块,用于依据预设的间隔时间计算重试时间基数;
    增量调整子模块,用于根据所述增量因子与所述重试时间基数计算重试时间。
  10. 根据权利要求9所述的装置,其特征在于,所述重试时间基数计算子模块包括:
    重试次数获取单元,用于获取当前的重试次数;
    固定计算单元,用于计算预设的间隔时间与当前的重试次数的乘积,作为时间基数;
    或者,
    指数计算单元,用于将当前的重试次数作为指数,对预设的间隔时间进行增加,作为时间基数。
  11. 根据权利要求7至10中任一项所述的装置,其特征在于,还包括:
    重试判定模块,用于判断是否需要重新执行所述操作;若是,则调用进度统计模块,若否,则调用重试退出模块;
    重试退出模块,用于退出重新执行所述操作。
  12. 根据权利要求11所述的装置,其特征在于,所述重试判定模块包括:
    条件判断子模块,用于判断是否符合次数条件和/或状态条件;若是,则调用第四判定子模块,若否,则调用第五判定子模块;
    第四判定子模块,用于判定不需要重新执行所述操作;
    第五判定子模块,用于判定需要重新执行所述操作;
    其中,所述次数条件为当前的重试次数超过预设的重试次数阈值;
    所述状态条件为作业停止。
PCT/CN2016/098508 2015-09-18 2016-09-09 一种作业的操作重试方法和装置 WO2017045565A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2018513324A JP6818014B2 (ja) 2015-09-18 2016-09-09 ジョブ用の動作リトライ方法及び機器
EP16845679.6A EP3352078B1 (en) 2015-09-18 2016-09-09 Task operation retry method and device
US15/924,118 US10866862B2 (en) 2015-09-18 2018-03-16 Method and apparatus for job operation retry

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510601116.5 2015-09-18
CN201510601116.5A CN106547635B (zh) 2015-09-18 2015-09-18 一种作业的操作重试方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/924,118 Continuation US10866862B2 (en) 2015-09-18 2018-03-16 Method and apparatus for job operation retry

Publications (1)

Publication Number Publication Date
WO2017045565A1 true WO2017045565A1 (zh) 2017-03-23

Family

ID=58288136

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/098508 WO2017045565A1 (zh) 2015-09-18 2016-09-09 一种作业的操作重试方法和装置

Country Status (5)

Country Link
US (1) US10866862B2 (zh)
EP (1) EP3352078B1 (zh)
JP (1) JP6818014B2 (zh)
CN (1) CN106547635B (zh)
WO (1) WO2017045565A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10866862B2 (en) 2015-09-18 2020-12-15 Alibaba Group Holding Limited Method and apparatus for job operation retry

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273195B (zh) * 2017-05-24 2020-06-02 上海艾融软件股份有限公司 一种大数据的批处理方法、装置及计算机系统
CN107368359A (zh) * 2017-05-31 2017-11-21 杭州大搜车汽车服务有限公司 一种异步任务执行方法及其存储介质、装置
CN107797800B (zh) * 2017-10-10 2020-12-15 平安科技(深圳)有限公司 对多个关联业务系统的数据处理方法、电子装置及介质
CN108519920B (zh) * 2018-03-14 2020-12-01 口碑(上海)信息技术有限公司 一种调度重试方法及装置
CN109118344A (zh) * 2018-07-06 2019-01-01 阿里巴巴集团控股有限公司 一种业务重试方法和装置
CN110928650A (zh) * 2018-09-20 2020-03-27 北京国双科技有限公司 一种任务处理方法及装置
US10673708B2 (en) * 2018-10-12 2020-06-02 International Business Machines Corporation Auto tuner for cloud micro services embeddings
CN112350807A (zh) * 2019-08-07 2021-02-09 德科仕通信(上海)有限公司 终端大数据采集系统避免网络风暴的容错方法
US10915418B1 (en) * 2019-08-29 2021-02-09 Snowflake Inc. Automated query retry in a database environment
CN111259032A (zh) * 2020-01-17 2020-06-09 中国建设银行股份有限公司 一种业务处理方法和装置
CN112087510B (zh) * 2020-09-08 2022-10-28 中国工商银行股份有限公司 请求处理方法、装置、电子设备和介质
CN112099998A (zh) * 2020-09-24 2020-12-18 百度在线网络技术(北京)有限公司 小程序加载失败的处理方法、装置、电子设备和存储介质
US11403184B1 (en) * 2021-03-01 2022-08-02 EMC IP Holding Company LLC Mitigating and automating backup failure recoveries in data protection policies

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1365114A (zh) * 2000-11-01 2002-08-21 松下电器产业株式会社 重现记录在光盘上的数字信息的重现设备及方法
US20140025763A1 (en) * 2011-04-04 2014-01-23 Message Systems, Inc. Method and system for adaptive delivery of digital messages
CN103988547A (zh) * 2011-10-20 2014-08-13 高通股份有限公司 用于处理在eHRPD预注册期间的失败和重试机制的方法和装置
CN103995691A (zh) * 2014-05-21 2014-08-20 中国人民解放军国防科学技术大学 基于事务的服务状态一致性维护方法
CN104346215A (zh) * 2013-08-07 2015-02-11 中兴通讯股份有限公司 一种任务调度服务系统及方法

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4900784B2 (ja) * 2006-04-13 2012-03-21 株式会社日立製作所 ストレージシステム及びストレージシステムのデータ移行方法
JP2010044814A (ja) * 2008-08-11 2010-02-25 Toshiba Storage Device Corp 記憶装置の制御方法及び記憶装置
JP5532987B2 (ja) 2010-02-05 2014-06-25 富士通株式会社 スイッチ装置、スイッチ制御方法、及びストレージシステム
JP5639441B2 (ja) * 2010-10-29 2014-12-10 キヤノン株式会社 情報処理装置、印刷装置、印刷データ処理方法およびプログラム
CN104080476A (zh) 2011-09-30 2014-10-01 诺瓦瓦克斯股份有限公司 用于呼吸道合胞病毒的重组纳米颗粒rsv f疫苗
JP5377684B2 (ja) * 2012-01-31 2013-12-25 京セラドキュメントソリューションズ株式会社 画像形成装置
CN104067219B (zh) * 2012-03-15 2019-08-02 慧与发展有限责任合伙企业 确定用于作业复制在存储装置上存储的对象的时间表
CN103902399B (zh) * 2012-12-28 2017-05-10 华为技术有限公司 软件系统修复处理方法及装置
US20150026525A1 (en) * 2013-07-18 2015-01-22 Synchronoss Technologies, Inc. Server controlled adaptive back off for overload protection using internal error counts
CN104182283B (zh) * 2014-08-22 2018-07-10 北京京东尚科信息技术有限公司 一种任务同步方法
US9843418B2 (en) * 2015-02-03 2017-12-12 Change Healthcare Llc Fault tolerant retry subsystem and method
CN106547635B (zh) 2015-09-18 2020-10-09 阿里巴巴集团控股有限公司 一种作业的操作重试方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1365114A (zh) * 2000-11-01 2002-08-21 松下电器产业株式会社 重现记录在光盘上的数字信息的重现设备及方法
US20140025763A1 (en) * 2011-04-04 2014-01-23 Message Systems, Inc. Method and system for adaptive delivery of digital messages
CN103988547A (zh) * 2011-10-20 2014-08-13 高通股份有限公司 用于处理在eHRPD预注册期间的失败和重试机制的方法和装置
CN104346215A (zh) * 2013-08-07 2015-02-11 中兴通讯股份有限公司 一种任务调度服务系统及方法
CN103995691A (zh) * 2014-05-21 2014-08-20 中国人民解放军国防科学技术大学 基于事务的服务状态一致性维护方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10866862B2 (en) 2015-09-18 2020-12-15 Alibaba Group Holding Limited Method and apparatus for job operation retry

Also Published As

Publication number Publication date
CN106547635A (zh) 2017-03-29
EP3352078A4 (en) 2018-10-31
EP3352078A1 (en) 2018-07-25
US20180203767A1 (en) 2018-07-19
US10866862B2 (en) 2020-12-15
CN106547635B (zh) 2020-10-09
JP2018529164A (ja) 2018-10-04
JP6818014B2 (ja) 2021-01-20
EP3352078B1 (en) 2021-03-10

Similar Documents

Publication Publication Date Title
WO2017045565A1 (zh) 一种作业的操作重试方法和装置
CN103294533B (zh) 任务流控制方法及系统
US20180218058A1 (en) Data synchronization method and system
US7904894B2 (en) Automatically optimize performance of package execution
US8813082B2 (en) Thread priority based on object creation rates
US11216345B2 (en) Technologies for limiting performance variation in a storage device
US10367719B2 (en) Optimized consumption of third-party web services in a composite service
US8661067B2 (en) Predictive migrate and recall
WO2020232871A1 (zh) 一种微服务依赖分析方法及装置
US8453013B1 (en) System-hang recovery mechanisms for distributed systems
WO2017005115A1 (zh) 分布式dag系统的自适应优化方法和装置
US20140195861A1 (en) Implementing rate controls to limit timeout-based faults
WO2019223174A1 (zh) 任务自动重跑方法、系统、计算机设备和存储介质
WO2018024076A1 (zh) 一种流速控制方法及装置
WO2021104383A1 (zh) 数据备份的方法和装置、设备和存储介质
US11934665B2 (en) Systems and methods for ephemeral storage snapshotting
WO2022247219A1 (zh) 一种信息备份方法、设备及平台
WO2012130043A1 (zh) 一种手机数据处理方法和装置
CN111813518A (zh) 机器人预警方法、装置、计算机设备及存储介质
EP3200083B1 (en) Resource scheduling method and related apparatus
US20230385159A1 (en) Systems and methods for preventing data loss
TWI556612B (zh) 適用於遠端程序呼叫的逾時控制單元與遠端程序呼叫方法
US11645164B2 (en) Adjusting data backups based on system details
JP5300938B2 (ja) 輻輳検出方法及び輻輳制御方法
CN110365775B (zh) 业务数据上传方法、装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16845679

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018513324

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2016845679

Country of ref document: EP