WO2021032056A1 - Method and apparatus for processing batch tasks, computing device and storage medium - Google Patents

Method and apparatus for processing batch tasks, computing device and storage medium Download PDF

Info

Publication number
WO2021032056A1
WO2021032056A1 PCT/CN2020/109572 CN2020109572W WO2021032056A1 WO 2021032056 A1 WO2021032056 A1 WO 2021032056A1 CN 2020109572 W CN2020109572 W CN 2020109572W WO 2021032056 A1 WO2021032056 A1 WO 2021032056A1
Authority
WO
WIPO (PCT)
Prior art keywords
batch
data
processing
task
data volume
Prior art date
Application number
PCT/CN2020/109572
Other languages
French (fr)
Chinese (zh)
Inventor
王磊
江旻
李斌
黄俏龙
席俊杰
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021032056A1 publication Critical patent/WO2021032056A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0715Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a system implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Definitions

  • This application relates to the field of financial (Fintech) computer technology, and in particular to a method, device, computing device and storage medium for processing batch tasks.
  • a method for processing batch tasks includes:
  • the feature object whose correlation degree meets the preset screening condition is determined as the reference feature object.
  • determining a feature object whose correlation degree meets a preset screening condition as the reference feature object includes:
  • an apparatus for processing batch tasks includes:
  • the second determining module is configured to determine the actual data volume of the batch task when the batch processing trigger condition is satisfied;
  • the batch blocking module is configured to block this batch processing of the batch task if the actual data amount is not within the reference range of the data amount;
  • the embodiment of the present application provides the following two implementation manners.
  • the type of data to be processed in the batch task can be parsed first.
  • the type of data to be processed can be understood as self-deducted data, and the type of data to be processed is batch processed.
  • the corresponding data volume can be understood as the total amount of automatic batch deductions, and then all the characteristic objects that have an association relationship (such as positive or negative influence) with the type of data to be processed are determined.
  • the characteristic object is the aforementioned example Independent variable characteristics such as the number of accounts and the number of loans.
  • the initial batch task prediction model (ie, the aforementioned multiple linear regression model) can be trained according to the data value corresponding to each reference feature object in each historical time period to obtain the trained batch task prediction model.
  • the above 6 sets of data can be substituted into the above multiple linear regression equation, that is, the independent variables x1, x2, and x3 in the above multiple linear regression equation can be substituted with the above 6 sets of data, so that b1, b2, b3, that is, the corresponding independent variable weights of these independent variables x1, x2, and x3 can be obtained.
  • model training process may generally include multiple rounds of iterative training.
  • b1, b2, b3 can be calculated by the least square method. I will not elaborate again.
  • Step 302 Determine the reference feature object corresponding to the type of data to be processed in the historical batch data.
  • the target data value corresponding to the reference feature object is used to characterize the data value corresponding to the reference feature object within a preset period of time. For example, taking historical batch processing data within 1 month, the target data value corresponding to each reference feature object is The data value corresponding to each reference feature object within the one month.
  • the amount refers to the threshold.
  • b0 is the average growth rate of 6 months, which is a known constant, x1, x2, and x3 respectively represent the above
  • the number of new accounts per month, the number of new borrowing notes in the previous month, and the new loan balances last month, and the account weight, IOU weight, and loan balance weight correspond to 0.4, 0.6, and 0.5.
  • the average value of the deduction data amount can be added together to obtain the reference threshold value of the data amount of this batch task, so as to realize the accurate prediction of the total deduction amount of the batch automatic deduction business.
  • the data volume reference range can be dynamically and real-time predicted temporarily through the algorithm.
  • the corresponding predetermined time can be flexibly set, for example, according to The batch deduction time is reversed, and the historical business processing data of the most recent 500 batch deductions is used as the calculation basis for this forecast, so that each historical business processing data is the latest historical data, so that the The processing of batch tasks near time is fully taken into consideration, and according to the principle that the closer the closer, the greater the correlation, so this method can ensure the accuracy of the prediction to a certain extent.
  • historical batch processing can be fully taken into consideration, that is, based on historical data, data features are mined from historical massive data, and batch decisions are output by analyzing these data features.
  • the accuracy of batch prediction can be improved, the batch operation can be monitored in real time, and the batch can be blocked in time if an abnormality is found to ensure the correctness of the data batch processing, thereby avoiding the loss caused by the batch exception.
  • the second determining module 402 is configured to determine the actual data volume of the batch task when the batch processing trigger condition is met;
  • the third determining module 403 is configured to determine the reference range of the data volume for batch processing of the batch task according to the historical batch processing data of the batch task;
  • the batch blocking module 404 is configured to block the current batch processing of the batch task if the actual data amount is not within the data amount reference range;
  • the device for processing batch tasks in the embodiment of the present application further includes a model training module 406, which is used to:
  • each feature object has an association relationship with the data volume corresponding to the batch processing of the data type to be processed;
  • the initial batch task prediction model is trained to obtain the trained batch task prediction model.
  • model training module 406 is used to:
  • the feature object whose correlation degree meets the preset filtering condition is determined as the reference feature object.
  • model training module 406 is used to:
  • All feature objects whose correlation degree is greater than a predetermined correlation degree threshold are determined as reference feature objects; or,
  • model training module 406 is used to:
  • a predetermined feature object is selected as the reference feature object.
  • model training module 406 is used to:
  • model training module 406 is used to:
  • the data volume reference threshold of the batch task is determined.
  • the actual data volume and the data volume reference range both include the batch number of batch tasks and the total amount corresponding to all the number of tasks.
  • the division of modules in the embodiments of the present application is illustrative, and is only a logical function division. In actual implementation, there may be other division methods.
  • the functional modules in the various embodiments of the present application may be integrated into one process. In the device, it can also exist alone, or two or more modules can be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • the processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, Implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the basic input/output system 601 includes a display 606 for displaying information and an input device 607 such as a mouse and a keyboard for the user to input information. Both the display 606 and the input device 607 are connected to the processor 501 through a basic input/output system 601 connected to the system bus 500.
  • the basic input/output system 601 may also include an input and output controller for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus. Similarly, the I/O controller also provides output to a display screen, printer or other type of output device.

Abstract

The present application relates to the technical field of financial technology, and disclosed therein are a method and apparatus for processing batch tasks, a computing device and a storage medium, which are used to effectively monitor abnormalities in batch tasks. The method comprises: determining the actual data volume of batch tasks when the batch tasks satisfy a batch processing trigger condition; according to historical batch processing data corresponding to the batch tasks, determining a reference data volume range for the batch processing of the batch tasks; if the actual data volume does not fall within the reference data volume range, then blocking the current batch processing of the batch tasks; and if the actual data volume falls within the reference data volume range, then performing batch processing on a target batch task according to the actual data volume. Historical batch processing data serves as the basis of the described solution, and the accuracy of batch prediction may be improved, thereby making an accurate batch processing decision.

Description

一种处理批量任务的方法、装置、计算设备及存储介质Method, device, computing equipment and storage medium for processing batch tasks
相关申请的交叉引用Cross references to related applications
本申请要求在2019年08月21日提交中国专利局、申请号为201910775666.7、申请名称为“一种处理批量任务的方法、装置、计算设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910775666.7, and the application name is "a method, device, computing device and storage medium for processing batch tasks" on August 21, 2019, all of which The content is incorporated in this application by reference.
技术领域Technical field
本申请涉及金融(Fintech)科技的计算机技术领域,尤其涉及一种处理批量任务的方法、装置、计算设备及存储介质。This application relates to the field of financial (Fintech) computer technology, and in particular to a method, device, computing device and storage medium for processing batch tasks.
背景技术Background technique
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Fintech)转变,批量处理技术也不例外,但由于金融行业的安全性、实时性要求,也对批量处理技术提出的更高的要求。随着金融行业批量任务的数量越来越多,批量系统处理的数据量越来越大,影响面也越来越广。如果批量系统出现异常,将出现大范围的数据错误,并且这些错误数据大多都是不可逆转的。With the development of computer technology, more and more technologies are applied in the financial field. The traditional financial industry is gradually changing to Fintech. Batch processing technology is no exception. However, due to the security and real-time requirements of the financial industry, Also put forward higher requirements for batch processing technology. With the increasing number of batch tasks in the financial industry, the amount of data processed by the batch system is getting larger and larger, and the influence is getting wider. If the batch system is abnormal, a wide range of data errors will occur, and most of these error data are irreversible.
但目前还没有针对批量任务的异常进行有效监控的方法,这是目前需要解决的问题。However, there is no effective monitoring method for the abnormality of batch tasks, which is a problem that needs to be solved at present.
发明内容Summary of the invention
本申请实施例提供一种处理批量任务的方法、装置、计算设备及存储介质,用于对批量任务的异常进行有效监控。The embodiments of the present application provide a method, device, computing device, and storage medium for processing batch tasks, which are used to effectively monitor abnormalities of batch tasks.
第一方面,提供一种处理批量任务的方法,所述方法包括:In a first aspect, a method for processing batch tasks is provided, and the method includes:
确定批量任务是否满足预设的批量处理触发条件;Determine whether the batch task meets the preset batch processing trigger conditions;
在满足所述批量处理触发条件时,确定所述批量任务的实际数据量;When the batch processing trigger condition is met, determine the actual data volume of the batch task;
根据所述批量任务对应的历史批量处理数据,确定对所述批量任务进行批量处理的数据量参考范围;Determine a reference range of data volume for batch processing of the batch task according to the historical batch processing data corresponding to the batch task;
若所述实际数据量未在所述数据量参考范围之内,则阻断对于所述批量任务的本次批量处理;If the actual data amount is not within the data amount reference range, block this batch processing of the batch task;
若所述实际数据量在所述数据量参考范围之内,则根据所述实际数据量对所述目标批量任务进行批量处理。If the actual data volume is within the data volume reference range, batch processing is performed on the target batch task according to the actual data volume.
在一种可能的设计中,根据所述批量任务对应的历史批量数据,确定对所述批量任务进行批量处理的数据量参考范围,包括:In a possible design, according to the historical batch data corresponding to the batch task, determining the data volume reference range for batch processing of the batch task includes:
解析出所述批量任务中的待处理数据类型;Parse out the type of data to be processed in the batch task;
在所述历史批量数据中,确定所述待处理数据类型对应的参考特征对象,并调取各个所述参考特征对象对应的目标数据值,所述目标数据值用于表征所述参考特征对象在预设时长内对应的数据值;In the historical batch data, determine the reference feature object corresponding to the data type to be processed, and retrieve the target data value corresponding to each reference feature object, and the target data value is used to represent the reference feature object in Corresponding data value within the preset duration;
根据所述批量任务对应的预先训练的批量任务预测模型和各个所述参考特征对象对应的目标数据值,确定各个所述参考特征对象对应的参考数据量,并根据各个所述参考特征对象对应的参考数据量,确定所述批量任务的数据量参考阈值;其中,所述批量任务预 测模型是根据所述历史批量数据中的所述参考特征对象对应的数据值训练得到的;Determine the amount of reference data corresponding to each reference feature object according to the pre-trained batch task prediction model corresponding to the batch task and the target data value corresponding to each reference feature object, and determine the amount of reference data corresponding to each reference feature object Reference data volume, determining the data volume reference threshold of the batch task; wherein the batch task prediction model is obtained by training according to the data value corresponding to the reference feature object in the historical batch data;
根据所述批量任务的数据量参考阈值,确定所述数据量参考范围。Determine the data amount reference range according to the data amount reference threshold of the batch task.
在一种可能的设计中,所述批量任务预测模型按照以下方式训练得到:In a possible design, the batch task prediction model is trained in the following manner:
从所述历史批量数据包括的所有特征对象中,按照预设选择策略确定所述参考特征对象;其中,每个特征对象与所述待处理数据类型进行批量处理对应的数据量具有关联关系;From all the characteristic objects included in the historical batch data, determine the reference characteristic object according to a preset selection strategy; wherein each characteristic object has an association relationship with the data volume corresponding to the batch processing of the data type to be processed;
从所述历史批量数据中确定多个历史时间段,并分别提取每个历史时间段内的各个所述参考特征对象对应的数据值;Determine a plurality of historical time periods from the historical batch data, and respectively extract data values corresponding to each of the reference feature objects in each historical time period;
根据每个历史时间段内的各个所述参考特征对象对应的数据值,对初始的批量任务预测模型进行训练,以得到训练后的批量任务预测模型。According to the data value corresponding to each of the reference feature objects in each historical time period, the initial batch task prediction model is trained to obtain the trained batch task prediction model.
在一种可能的设计中,从所述历史批量数据包括的所有特征对象中,按照预设选择策略确定所述参考特征对象,包括:In a possible design, from all the characteristic objects included in the historical batch data, determining the reference characteristic object according to a preset selection strategy includes:
确定各个特征对象与所述待处理数据类型进行批量处理对应的数据量之间的相关度;Determine the correlation between each characteristic object and the data volume corresponding to the batch processing of the data type to be processed;
将相关度满足预设筛选条件的特征对象确定为所述参考特征对象。The feature object whose correlation degree meets the preset screening condition is determined as the reference feature object.
在一种可能的设计中,将相关度满足预设筛选条件的特征对象确定为所述参考特征对象,包括:In a possible design, determining a feature object whose correlation degree meets a preset screening condition as the reference feature object includes:
将相关度大于预定相关度阈值的特征对象均确定为所述参考特征对象;或者,All feature objects whose correlation degree is greater than a predetermined correlation degree threshold are determined as the reference feature objects; or,
按照相关度由大到小的顺序,将位于前面的预定数量的特征对象确定为所述参考特征对象。According to the descending order of the correlation degree, a predetermined number of feature objects located in front are determined as the reference feature objects.
在一种可能的设计中,从所述历史批量数据包括的所有特征对象中,按照预设选择策略确定所述参考特征对象,包括:In a possible design, from all the characteristic objects included in the historical batch data, determining the reference characteristic object according to a preset selection strategy includes:
从所述所有特征对象中,选择预定特征对象作为所述参考特征对象。From all the feature objects, a predetermined feature object is selected as the reference feature object.
在一种可能的设计中,根据各个所述参考特征对象对应的参考数据量,包括:In a possible design, the amount of reference data corresponding to each of the reference feature objects includes:
确定各个所述参考特征对象在第一预定时长内的对象增量;Determining the object increment of each of the reference feature objects within the first predetermined time period;
根据所述批量任务预测模型和各个所述参考特征对象在所述第一预定时长内的对象增量,确定各个所述参考特征对象对应的数据增量;Determine the data increment corresponding to each reference feature object according to the batch task prediction model and the object increment of each reference feature object within the first predetermined time period;
根据各个所述参考特征对象在第二预定时长内的数据量和在所述第一预定时长内的数据增量,确定各个所述参考特征对象对应的参考数据量。Determine the reference data amount corresponding to each reference feature object according to the data amount of each reference feature object in the second predetermined time period and the data increment within the first predetermined time period.
在一种可能的设计中,根据各个所述参考特征对象对应的参考数据量,确定所述批量任务的数据量参考阈值,包括:In a possible design, determining the data amount reference threshold of the batch task according to the reference data amount corresponding to each of the reference feature objects includes:
确定所述批量任务的数据量在第三预定时长内的平均增长幅度;Determine the average increase range of the data volume of the batch task within the third predetermined time period;
根据所述平均增长幅度和各个所述参考特征对象对应的参考数据量,确定所述批量任务的数据量参考阈值。Determine the data amount reference threshold of the batch task according to the average growth rate and the reference data amount corresponding to each of the reference feature objects.
在一种可能的设计中,所述实际数据量和所述数据量参考范围均包括所述批量任务的批处理的数量以及所有数量的任务对应的总金额。In a possible design, the actual data amount and the data amount reference range both include the batch number of the batch tasks and the total amount corresponding to all the tasks.
第二方面,提供一种处理批量任务的装置,所述装置包括:In a second aspect, an apparatus for processing batch tasks is provided, and the apparatus includes:
第一确定模块,用于确定批量任务是否满足预设的批量处理触发条件;The first determining module is used to determine whether the batch task meets the preset batch processing trigger condition;
第二确定模块,用于在满足所述批量处理触发条件时,确定所述批量任务的实际数据量;The second determining module is configured to determine the actual data volume of the batch task when the batch processing trigger condition is satisfied;
第三确定模块,用于根据所述批量任务的历史批量处理数据,确定所述批量任务进行批量处理的数据量参考范围;The third determining module is configured to determine the reference range of the data volume for batch processing of the batch task according to the historical batch processing data of the batch task;
批量阻断模块,用于若所述实际数据量未在所述数据量参考范围之内,则阻断对于所述批量任务的本次批量处理;The batch blocking module is configured to block this batch processing of the batch task if the actual data amount is not within the reference range of the data amount;
批量执行模块,用于若所述实际数据量在所述数据量参考范围之内,则根据所述实际数据量对所述目标批量任务进行批量处理。The batch execution module is configured to perform batch processing on the target batch task according to the actual data volume if the actual data volume is within the data volume reference range.
在一种可能的设计中,所述第三确定模块用于:In a possible design, the third determining module is used to:
解析出所述批量任务中的待处理数据类型;Parse out the type of data to be processed in the batch task;
在所述历史批量数据中,确定所述待处理数据类型对应的参考特征对象,并调取各个所述参考特征对象对应的目标数据值,所述目标数据值用于表征所述参考特征对象在预设时长内对应的数据值;In the historical batch data, determine the reference feature object corresponding to the data type to be processed, and retrieve the target data value corresponding to each reference feature object, and the target data value is used to represent the reference feature object in Corresponding data value within the preset duration;
根据所述批量任务对应的预先训练的批量任务预测模型和各个所述参考特征对象对应的目标数据值,确定各个所述参考特征对象对应的参考数据量,并根据各个所述参考特征对象对应的参考数据量,确定所述批量任务的数据量参考阈值;其中,所述批量任务预测模型是根据所述历史批量数据中的所述参考特征对象对应的数据值训练得到的;Determine the amount of reference data corresponding to each reference feature object according to the pre-trained batch task prediction model corresponding to the batch task and the target data value corresponding to each reference feature object, and determine the amount of reference data corresponding to each reference feature object Reference data volume, determining the data volume reference threshold of the batch task; wherein the batch task prediction model is obtained by training according to the data value corresponding to the reference feature object in the historical batch data;
根据所述批量任务的数据量参考阈值,确定所述数据量参考范围。Determine the data amount reference range according to the data amount reference threshold of the batch task.
在一种可能的设计中,所述装置还包括模型训练模块,用于:In a possible design, the device further includes a model training module for:
从所述历史批量数据包括的所有特征对象中,按照预设选择策略确定所述参考特征对象;其中,每个特征对象与所述待处理数据类型进行批量处理对应的数据量具有关联关系;From all the characteristic objects included in the historical batch data, determine the reference characteristic object according to a preset selection strategy; wherein each characteristic object has an association relationship with the data volume corresponding to the batch processing of the data type to be processed;
从所述历史批量数据中确定多个历史时间段,并分别提取每个历史时间段内的各个所述参考特征对象对应的数据值;Determine a plurality of historical time periods from the historical batch data, and respectively extract data values corresponding to each of the reference feature objects in each historical time period;
根据每个历史时间段内的各个所述参考特征对象对应的数据值,对初始的批量任务预测模型进行训练,以得到训练后的批量任务预测模型。According to the data value corresponding to each of the reference feature objects in each historical time period, the initial batch task prediction model is trained to obtain the trained batch task prediction model.
在一种可能的设计中,所述模型训练模块用于:In a possible design, the model training module is used to:
确定各个特征对象与所述待处理数据类型进行批量处理对应的数据量之间的相关度;Determine the correlation between each characteristic object and the data volume corresponding to the batch processing of the data type to be processed;
将相关度满足预设筛选条件的特征对象确定为所述参考特征对象。The feature object whose correlation degree meets the preset screening condition is determined as the reference feature object.
在一种可能的设计中,所述模型训练模块用于:In a possible design, the model training module is used to:
将相关度大于预定相关度阈值的特征对象均确定为所述参考特征对象;或者,All feature objects whose correlation degree is greater than a predetermined correlation degree threshold are determined as the reference feature objects; or,
按照相关度由大到小的顺序,将位于前面的预定数量的特征对象确定为所述参考特征对象。According to the descending order of the correlation degree, a predetermined number of feature objects located in front are determined as the reference feature objects.
在一种可能的设计中,所述模型训练模块用于:In a possible design, the model training module is used to:
从所述所有特征对象中,选择预定特征对象作为所述参考特征对象。From all the feature objects, a predetermined feature object is selected as the reference feature object.
在一种可能的设计中,所述模型训练模块用于:In a possible design, the model training module is used to:
确定各个所述参考特征对象在第一预定时长内的对象增量;Determining the object increment of each of the reference feature objects within the first predetermined time period;
根据所述批量任务预测模型和各个所述参考特征对象在所述第一预定时长内的对象增量,确定各个所述参考特征对象对应的数据增量;Determine the data increment corresponding to each reference feature object according to the batch task prediction model and the object increment of each reference feature object within the first predetermined time period;
根据各个所述参考特征对象在第二预定时长内的数据量和在所述第一预定时长内的数据增量,确定各个所述参考特征对象对应的参考数据量。Determine the reference data amount corresponding to each reference feature object according to the data amount of each reference feature object in the second predetermined time period and the data increment within the first predetermined time period.
在一种可能的设计中,所述模型训练模块用于:In a possible design, the model training module is used to:
确定所述批量任务的数据量在第三预定时长内的平均增长幅度;Determine the average increase range of the data volume of the batch task within the third predetermined time period;
根据所述平均增长服务幅度和各个所述参考特征对象对应的参考数据量,确定所述批量任务的数据量参考阈值。Determine the data volume reference threshold value of the batch task according to the average increase service range and the reference data volume corresponding to each of the reference feature objects.
在一种可能的设计中,所述实际数据量和所述数据量参考范围均包括所述批量任务的批处理的数量以及所有数量的任务对应的总金额。In a possible design, the actual data amount and the data amount reference range both include the batch number of the batch tasks and the total amount corresponding to all the tasks.
第三方面,提供一种处理批量任务的装置,包括至少一个处理器、以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行上述第一方面中任一所述的处理批量任务的方法的步骤。In a third aspect, there is provided an apparatus for processing batch tasks, including at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processing The device executes the steps of the method for processing batch tasks described in any one of the above-mentioned first aspects.
第四方面,提供一种存储介质,所述存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行上述第一方面中任一所述的处理批量任务的方法的步骤。In a fourth aspect, a storage medium is provided, the storage medium stores computer instructions, and when the computer instructions are executed on a computer, the computer executes the steps of the method for processing batch tasks in any one of the first aspects above .
本申请实施例中,在满足需要对批量任务进行批量处理的批量处理触发条件时,可以确定该批量任务的实际数据量,以及可以根据该批量任务的历史批量处理数据,确定本次批量处理的数据量参考范围,进而通过实际数据量与数据量参考范围的比较来进行批量决策,具体来说,在实际数据量在数据量参考范围之内时,则认为本次批量任务与往常的批量任务处理相近似,可以认为并未发生异常,所以此时可以直接根据实际数据量进行批量处理,以确保批量任务处理的及时性,而在实际数据量未在数据量参考范围之内时,则可以认为本次批量任务与历史处理情况不相符,此时则可以认为批量处理过程可能出现异常,那么此时则可以阻断批量任务的处理。这样,结合批量任务的历史批量处理数据,可以提升批量决策的准确性,实时监控批量运行,如果发现异常及时阻断批量,保证数据的正确性,从而避免了批量异常带来的损失,增强用户的使用体验。In the embodiment of this application, when the batch processing trigger condition that requires batch processing of the batch task is met, the actual data volume of the batch task can be determined, and the batch processing data of this batch can be determined according to the historical batch processing data of the batch task. The data volume reference range, and then the batch decision is made by comparing the actual data volume with the data volume reference range. Specifically, when the actual data volume is within the data volume reference range, this batch task is considered to be the same as the usual batch task The processing is similar, it can be considered that no abnormality has occurred, so at this time, you can directly perform batch processing based on the actual data volume to ensure the timeliness of batch task processing, and when the actual data volume is not within the reference range of the data volume, you can It is considered that this batch task does not match the historical processing situation. At this time, it can be considered that the batch processing process may be abnormal, and then the processing of the batch task can be blocked at this time. In this way, combining the historical batch processing data of batch tasks can improve the accuracy of batch decision-making and monitor batch operation in real time. If an abnormality is found, the batch will be blocked in time to ensure the correctness of the data, thereby avoiding the loss caused by the abnormal batch and enhancing users Experience.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本发明。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the present invention.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are the embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without creative work.
图1为本申请实施例中的处理批量任务的方法的过程示意图;FIG. 1 is a schematic diagram of a process of a method for processing batch tasks in an embodiment of the application;
图2为本申请实施例中的处理批量任务的方法的流程图;2 is a flowchart of a method for processing batch tasks in an embodiment of the application;
图3为本申请实施例中的利用批量任务预测模型确定数据量参考范围的流程图;FIG. 3 is a flowchart of using a batch task prediction model to determine a reference range of data amount in an embodiment of the application;
图4本申请实施例中的处理批量任务的装置的结构框图;FIG. 4 is a structural block diagram of an apparatus for processing batch tasks in an embodiment of the present application;
图5为本申请实施例中的计算设备的结构示意图;FIG. 5 is a schematic structural diagram of a computing device in an embodiment of the application;
图6为本申请实施例中的计算设备的另一结构示意图。FIG. 6 is a schematic diagram of another structure of a computing device in an embodiment of the application.
具体实施方式detailed description
为使本发明的目的、技术方案和优点更加清楚明白,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互任意组合。并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描 述的步骤。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention. In the case of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other arbitrarily. Also, although a logical sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than here.
本发明的说明书和权利要求书及上述附图中的术语“第一”和“第二”是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的保护。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first" and "second" in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the term "including" and any variations of them are intended to cover non-exclusive protection. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.
本发明实施例中,“多个”可以表示至少两个,例如可以是两个、三个或者更多个,本发明实施例不做限制。In the embodiment of the present invention, “a plurality of” may mean at least two, for example, it may be two, three or more, which is not limited in the embodiment of the present invention.
另外,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,在不做特别说明的情况下,一般表示前后关联对象是一种“或”的关系。In addition, the term "and/or" in this article is only an association relationship describing associated objects, which means that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, and both A and B exist. There are three cases of B alone. In addition, the character "/" in this text, unless otherwise specified, generally indicates that the associated objects before and after are in an "or" relationship.
以下介绍本申请的设计思想。The following describes the design ideas of this application.
如前所述的,针对金融行业的各种批量任务的异常进行有效监控是目前亟待解决的技术问题。为了能够捕获批量处理中出现的异常,批量监控系统是必不可少的。以银行机构为例,传统的银行跑批系统异常监控大多都是滞后的,即一般是在出现批量异常导致的不良结果后才能发现批量异常的问题,为了解决监控滞后的问题,很多银行批量系统也采取了简单的批量阻断机制。例如采用设置阈值的方式,数据处理前通过程序得到当天预处理的数据量,如果发现预处理数据量超过提前设置的阈值则阻断批量。这种设置阈值的方式虽然也能够通过提前预测来保证批量正确运行,但是如何准确地设置阈值是个难点,而目前一般是工作人员按照经验粗略地设置一个阈值,由于不同的工作人员的经验有限,并且不同的工作人员还可能存在一些个人主观上的认知偏差,如果阈值设置太高,可能兜不住异常而导致数据错误,如果设置太低,又会导致异常误报,批量阻断下来后才发现并非异常,影响批量运行效率。As mentioned above, effective monitoring of abnormalities in various batch tasks in the financial industry is a technical problem that needs to be solved urgently. In order to be able to catch anomalies in batch processing, a batch monitoring system is essential. Taking banking institutions as an example, most of the abnormal monitoring of the traditional bank batch system is lagging, that is, the batch abnormality problem is generally found after the bad results caused by the batch abnormality. In order to solve the problem of monitoring lag, many bank batch systems A simple bulk blocking mechanism has also been adopted. For example, the method of setting a threshold is adopted. Before the data is processed, the amount of pre-processed data on the day is obtained through the program. If the amount of pre-processed data is found to exceed the threshold set in advance, the batch will be blocked. Although this method of setting thresholds can also ensure the correct operation of batches by predicting in advance, it is difficult to set the threshold accurately. At present, the staff generally set a threshold roughly based on experience. Due to the limited experience of different staff, And different staff may have some personal subjective cognitive biases. If the threshold is set too high, it may not be able to handle the abnormality and cause data errors. If the setting is too low, it will cause abnormal abnormal alarms. After the batch is blocked It was discovered that it was not abnormal, affecting the efficiency of batch operation.
鉴于此,本申请实施提供一种处理批量任务的方法,通过该方法可以利用批量系统对批量任务的异常处理进行有效监测。如图1所示的,可以先基于BDP(Beagledata Platform,一款基于Hadoop生态体系的企业级大数据中间件平台)对批量任务的历史数据进行加工处理,可以作为当日批量处理的预测数据。进一步地,业务系统生成批扣交易结果,即进行本次批量处理的实际数据量,并将得到的实际数据量与预测数据进行比对判断,以得出最终的批量处理决策。也就是说,在进行本次批量处理时,本申请实施例可以将历史批量处理的情况充分考虑在内,即以历史数据作为基础,从历史的海量数据中挖掘出数据特征,通过对这些数据特征进行分析从而输出批量决策。通过结合历史批量处理数据进行本次批量预测的方式,尽量地将历史的批量处理情况作为参照依据,这样能够尽量地提升批量预测的准确性,实时监控批量运行,如果发现异常及时阻断批量,保证数据批量处理的正确性和及时性,便能避免批量异常带来的损失。In view of this, the implementation of this application provides a method for processing batch tasks, by which the batch system can be used to effectively monitor the abnormal processing of batch tasks. As shown in Figure 1, the historical data of batch tasks can be processed first based on BDP (Beagledata Platform, an enterprise-level big data middleware platform based on the Hadoop ecosystem), which can be used as forecast data for batch processing of the day. Further, the business system generates a batch deduction transaction result, that is, the actual data volume for this batch processing, and compares the obtained actual data volume with the predicted data to make a final batch processing decision. That is to say, when performing this batch processing, the embodiment of the application can fully take into account the historical batch processing, that is, use historical data as the basis to dig out data features from the historical massive data. Features are analyzed to output batch decisions. By combining historical batch processing data for this batch prediction, try to use historical batch processing as a reference basis, so as to improve the accuracy of batch prediction as much as possible, monitor batch operation in real time, and block the batch in time if abnormalities are found. To ensure the correctness and timeliness of data batch processing can avoid losses caused by batch exceptions.
为进一步说明本申请实施例提供的技术方案,下面结合附图以及具体实施方式对此进行详细的说明。虽然本申请实施例提供了如下述实施例或附图所示的方法操作步骤,但基于常规或者无需创造性的劳动在所述方法中可以包括更多或者更少的操作步骤。在逻辑上不存在必要因果关系的步骤中,这些步骤的执行顺序不限于本申请实施例提供的执行顺序。所述方法在实际的处理过程中或者装置执行时,可按照实施例或者附图所示的方法顺序执 行或者并行执行。In order to further illustrate the technical solutions provided by the embodiments of the present application, this will be described in detail below with reference to the drawings and specific implementations. Although the embodiments of the present application provide method operation steps as shown in the following embodiments or drawings, more or fewer operation steps may be included in the method based on conventional or no creative labor. In steps where there is no necessary causal relationship logically, the execution order of these steps is not limited to the execution order provided in the embodiments of the present application. The method can be executed sequentially or in parallel according to the methods shown in the embodiments or the drawings during actual processing or when executed by the device.
基于上述内容,本申请实施例提供一种处理批量任务的方法,该方法可以部署在例如银行、信贷平台等需要进行批量任务处理的系统中。请参见图2所示,本申请实施例中的处理批量任务的方法的流程描述如下。Based on the foregoing content, embodiments of the present application provide a method for processing batch tasks, which can be deployed in systems that require batch task processing, such as banks and credit platforms. Referring to FIG. 2, the flow of the method for processing batch tasks in the embodiment of the present application is described as follows.
步骤201:判断是否满足批量任务的批量处理触发条件。Step 201: Determine whether the batch processing trigger condition of the batch task is satisfied.
如前所述的,在金融行业中包括多种类型的批量任务,例如工资批量转账、贷款批量扣除等业务,本申请实施例中的批量任务可以是任一种可能类型的批量任务。针对不同类型的批量任务,银行可以在不同时间节点进行批量处理,例如联机交易业务处理通常要求银行业务处理系统能够快速响应并实时返回,所以为了避免对联机业务造成影响,批量任务可以在联机业务发生概率较低的时间段进行,例如在夜间或者凌晨批量执行,所以在一种可能的实施方式中,批量处理触发条件可以是到达预先设定的批量任务的处理时间节点,在其它的实施方式中,例如还可以由银行工作人员手动触发进行批量任务的批量处理,所以批量处理触发条件还可以是批量处理系统接收到针对批量任务的处理请求,等等。As mentioned above, the financial industry includes multiple types of batch tasks, such as batch salary transfer, loan batch deduction, and other services. The batch task in the embodiment of the present application may be any possible type of batch task. For different types of batch tasks, banks can perform batch processing at different time nodes. For example, online transaction business processing usually requires the banking business processing system to respond quickly and return in real time. Therefore, in order to avoid affecting online business, batch tasks can be processed in online business. The occurrence probability is low, such as batch execution at night or early morning. Therefore, in a possible implementation manner, the batch processing trigger condition may be to reach the preset processing time node of the batch task. In other implementation manners For example, bank staff can also manually trigger batch processing of batch tasks, so the batch processing trigger condition can also be that the batch processing system receives a processing request for batch tasks, and so on.
在确定批量任务满足批量处理触发条件时,即表明需要对该批量任务进行批量处理,此时进一步地可以执行步骤202,若确定不满足批量处理触发条件,则表明暂时还不需要对该批量任务进行批量处理,进一步地可以继续进行是否满足批量处理触发条件的判断。When it is determined that the batch task meets the batch processing trigger condition, it means that the batch task needs to be batch processed. At this time, step 202 can be further performed. If it is determined that the batch processing trigger condition is not met, it indicates that the batch task is not needed yet. Perform batch processing, and further can continue to determine whether the batch processing trigger condition is satisfied.
步骤202:确定批量任务的实际数据量。Step 202: Determine the actual data volume of the batch task.
以信贷业务中的批量扣款业务为例,在需要进行批量扣款时,金融机构的批量处理系统可以根据各个用户的借款金额、还款期数、计息规则等要素计算出每笔还款业务的本次还款金额,进而根据本次需要批量扣款的所有用户,计算出本次批量扣款的总扣款笔数和扣款总金额,而计算出的本次批量扣款的总扣款笔数和扣款总金额例如可以理解为是本申请实施例中针对批量扣款业务这种批量任务的实际数据量,也就是说,在确定需要对批量任务进行批量处理时,可以先计算出本地对该批量任务进行批量处理实际需要处理的数据量,本申请实施例中的实际数据量为批量处理系统按照现有系统规则计算出的实际批量处理依据。Taking the batch deduction business in the credit business as an example, when batch deductions are needed, the batch processing system of financial institutions can calculate each repayment based on the borrowing amount of each user, the number of repayment periods, and interest calculation rules. The current repayment amount of the business, and then based on all users who need to batch deduct this time, calculate the total number of deductions and the total deduction amount of this batch deduction, and the calculated total of this batch deduction The number of deductions and the total amount of deductions can be understood as, for example, the actual amount of data for batch tasks such as the batch deduction business in this embodiment of the application, that is, when it is determined that the batch tasks need to be processed in batches, you can first The actual amount of data that needs to be processed locally for batch processing of the batch task is calculated. The actual data amount in the embodiment of the present application is the actual batch processing basis calculated by the batch processing system according to the existing system rules.
步骤203:根据批量任务的历史批量处理数据,确定对批量任务进行批量处理的数据量参考范围。Step 203: According to the historical batch processing data of the batch task, determine the data volume reference range for batch processing of the batch task.
批量处理系统在进行批量任务处理的过程中,涉及到的计算量较大且耗时一般也较长,若在期间出现异常的话则可能导致计算出的批量扣款的相关数据出现错误,例如计算出的扣款金额多于用户本次应还金额,这样的话将可能导致用户投诉,或者计算出的扣款金额少于用户本次应还金额,这样的话则可能导致银行出现损失等。并且,由于是批量处理,若针对一个用户的计算出现错误的话,那么针对本次批量处理中的其他每个用户一般也可能出现相同的问题,当批量处理的扣款笔数较多时,那么出现的误差也就越大。In the process of batch processing, the batch processing system involves a large amount of calculation and generally takes a long time. If an abnormality occurs during the period, it may cause errors in the calculated data related to the batch deduction, such as calculation The deducted amount is more than the amount that the user should repay this time. This may cause the user to complain, or the calculated deduction amount is less than the amount that the user should repay this time, which may cause losses to the bank. Moreover, because it is batch processing, if there is an error in the calculation for one user, the same problem may generally occur for every other user in this batch processing. When the number of deductions for batch processing is large, then the same problem may occur. The greater the error.
鉴于此,为了对批量处理系统可能出现的异常进行监测,也尽量确保批量任务处理的正确性和有效性,在本申请实施例中,采用了将历史批量处理数据作为参考依据来对本次批量任务处理进行对照处理的方式,这样,通过历史的大数据作为参考,可以在一定程度上表明近期一段时间的批量任务的整体处理情况和变化趋势,从而可以较为准确地对本次批量任务的处理进行预测,以便于批量处理系统最终做出准确的批量处理决策,即是阻断批量处理还是执行批量处理,以提高批量处理的有效性。In view of this, in order to monitor possible abnormalities in the batch processing system, and try to ensure the correctness and effectiveness of batch task processing, in the embodiment of this application, historical batch processing data is used as a reference basis to compare the batch The way the task is processed by comparison, in this way, through the historical big data as a reference, it can indicate to a certain extent the overall processing situation and change trend of the batch task in the recent period of time, so that the batch task can be processed more accurately Make predictions so that the batch processing system can finally make accurate batch processing decisions, that is, whether to block batch processing or perform batch processing to improve the effectiveness of batch processing.
为此,本申请实施例中可以针对同一类型的批量任务,以该批量任务的历史批量处理 数据,确定对于该批量任务进行批量处理的数据量参考范围,进而以该数据量参考范围作为比对依据,以判断本次批量处理是否异常。例如,以批量扣款业务为例,可以获得最近一个月的所有贷款用户的历史批量处理数据,或者可以获得最近100次扣款时间点包括的所有贷款用户的历史批量处理数据,进而根据这些数据来预测本次进行批量处理的数据量参考范围,因为之前已经成功正确执行的批量任务一般来说就是在批量处理系统正常的情况下执行的,所以通过大量的已经成功正确执行的批量任务的处理数据来对下一次的批量处理进行预测是具有一定指导意义的,换言之,本申请实施例中的数据量参考范围可以看作是批量处理系统正常进行批处理时的大致数据量范围。For this reason, in the embodiments of the present application, for batch tasks of the same type, the historical batch processing data of the batch task can be used to determine the data volume reference range for batch processing of the batch task, and then the data volume reference range is used as a comparison The basis to determine whether this batch processing is abnormal. For example, taking the batch deduction business as an example, the historical batch processing data of all loan users in the last month can be obtained, or the historical batch processing data of all loan users included in the last 100 deductions time point can be obtained, and then based on these data To predict the reference range of the data volume for batch processing this time, because the batch tasks that have been successfully and correctly executed before are generally executed under the normal conditions of the batch processing system, so through the processing of a large number of batch tasks that have been successfully and correctly executed The use of data to predict the next batch processing is of certain guiding significance. In other words, the data volume reference range in the embodiments of the present application can be regarded as the approximate data volume range when the batch processing system normally performs batch processing.
在具体实施过程中,步骤202和步骤203的执行顺序可以是任意的,例如可以先执行步骤202再执行步骤203,或者可以先执行步骤203再执行步骤202,或者两个步骤可以同时执行,本申请实施例不做限制。In the specific implementation process, the order of execution of step 202 and step 203 can be arbitrary. For example, step 202 can be executed first and then step 203 can be executed, or step 203 can be executed first and then step 202 can be executed, or both steps can be executed simultaneously. The application examples are not limited.
步骤204:将实际数据量与数据量参考范围进行比较,以判断实际数据量是否在数据量参考范围之内。Step 204: The actual data volume is compared with the data volume reference range to determine whether the actual data volume is within the data volume reference range.
在获得了数据量参考范围之后,则可以将该数据量参考范围作为比对依据,来判断实际数据量是否在正常范围之内,从而通过批量处理历史数据对后期的批量处理进行预测和指导。After the data volume reference range is obtained, the data volume reference range can be used as a comparison basis to determine whether the actual data volume is within the normal range, so as to predict and guide the later batch processing through batch processing of historical data.
在本申请实施例中,实际数据量和数据量参考范围均可以包括批量任务的业务批处理的数量以及所有数量的业务对应的总金额,继续以批量扣款业务为例,那么实际数据量和数据量参考范围均可以包括扣款业务需要执行的扣款笔数,以及所有笔数的扣款业务对应的总扣款金额,这样,可以通过总量的方式总体上反映每笔扣款业务是否出现差错。因为一般来说,批量处理系统对于每笔扣款业务的计算处理方式是一样的,若其中一笔扣款业务计算错误的话,那么本次批量扣款的其它笔的扣款业务也就会出现类似的误差,而一个用户的扣款出错可能不太明显,例如将某个用户的利息多计算了5元,而如果本次批量处理的扣款笔数有1000笔,那么这1000笔扣款业务的总的误差则可能是几千甚至上万,所以通过总金额的方式能够更加明显有效的监测出计算误差,进而确定系统异常,以提升异常监测的有效性。In the embodiment of the present application, the actual data volume and the reference range of the data volume can both include the number of business batches of batch tasks and the total amount corresponding to all the numbers of businesses. Continue to take the batch deduction business as an example, then the actual data volume and The reference range of data volume can include the number of deductions that need to be performed for the deduction business, and the total deduction amount corresponding to all deductions. In this way, the total amount can be used to reflect whether each deduction business is Something went wrong. Because in general, the batch processing system has the same calculation and processing methods for each deduction business. If one of the deductions is calculated incorrectly, then the other deductions for this batch deduction will also appear Similar errors, and a user’s deduction error may not be obvious. For example, the interest of a certain user is calculated by 5 yuan, and if the number of deductions in this batch processing is 1,000, then these 1,000 deductions The total error of the business may be thousands or even tens of thousands, so the calculation error can be more obviously and effectively monitored by the total amount, and then the system abnormality can be determined to improve the effectiveness of abnormal monitoring.
步骤205:在实际数据量在数据量参考范围之内时,则根据实际数据量,对批量任务进行批量处理。Step 205: When the actual data amount is within the reference range of the data amount, batch processing is performed on the batch task according to the actual data amount.
若批量任务的实际数据量在数据量参考范围之内,则说明按照现有的批量处理系统计算出的批量处理数据是在正常范围之内,以此可以表明批量处理系统是正常的,即未发生异常,所以此时可以以确定出的实际数据量对批量任务进行批量处理,例如同时扣掉300个贷款用户的本期还款金额的总和。If the actual data volume of the batch task is within the data volume reference range, it means that the batch processing data calculated according to the existing batch processing system is within the normal range, which can indicate that the batch processing system is normal, that is, the batch processing system is normal. An exception occurs, so at this time, the actual amount of data can be determined to perform batch processing on batch tasks, such as deducting the sum of the current repayment amount of 300 loan users at the same time.
步骤206:在实际数据量未在数据量参考范围之内时,则阻断对于批量任务的本次批量处理。Step 206: When the actual data amount is not within the reference range of the data amount, block this batch processing of the batch task.
若实际数据量未在数据量参考范围之内,那么则表明本次批量处理与以往的历史处理过程相比存在较大差异,这有可能是扣款业务自身发生了大量的较大业务的突变,这一般与正常的平滑的业务特性可能不太相符,说明批量处理系统在较大程度上可能已经出现了异常,那么此时为了确保批量任务执行的准确性,则可以及时地阻断本次批量任务的批量执行。在阻断批量任务之后,可以进一步地输出阻断告警信息,以向工作人员进行业务阻断的有效告警,便于工作人员可以及时确认批量系统是否确实出现异常,以及在出现异常 时可以及时地进行系统维护和修复,以便尽快消除异常,以恢复批量系统的正常业务能力,同时,也可以再次进行计算,即进行二次批量处理计算,以对批量任务进行及时处理。If the actual data volume is not within the reference range of the data volume, it indicates that there is a big difference between this batch processing and the previous historical processing process. This may be due to a large number of major business mutations in the deduction business itself. This is generally not consistent with the normal smooth business characteristics, indicating that the batch processing system may have been abnormal to a large extent. At this time, in order to ensure the accuracy of the batch task execution, you can block this time in time Batch execution of batch tasks. After the batch task is blocked, the blocking alarm information can be further output to provide effective warning to the staff of the business interruption, so that the staff can confirm in time whether the batch system is indeed abnormal, and can proceed in time when an abnormality occurs System maintenance and repair, in order to eliminate the abnormality as soon as possible to restore the normal business capacity of the batch system. At the same time, it can also be calculated again, that is, the second batch processing calculation is performed to process the batch tasks in time.
本申请实施例中,以历史批量处理数据作为参考,可以在一定程度上反映本次批量任务出现异常的可能,进而可以对批量处理系统发生的异常进行有效预测,以确保批量任务能够准确、有效执行。In the embodiments of this application, historical batch processing data is used as a reference, which can reflect the possibility of abnormalities in this batch task to a certain extent, and then can effectively predict the abnormalities that occur in the batch processing system to ensure that the batch tasks can be accurate and effective carried out.
在具体实施过程中,针对上述步骤203中的,根据批量任务的历史批量处理数据确定数据量参考范围的方式,本申请实施例提供以下两种实施方式。In the specific implementation process, for the manner in which the data volume reference range is determined according to the historical batch processing data of the batch task in the above step 203, the embodiment of the present application provides the following two implementation manners.
第一种确定方式The first way to determine
在第一种确定方式中,是通过机器学习技术对批量任务对应的历史批量处理数据进行分析,通过海量数据的量化分析来训练批量任务预测模型,进而通过训练好的批量任务预测模型来对批量任务的每次批量处理过程的数据量参考范围进行预测。以下对基于批量任务预测模型进行预测的过程进行说明。In the first determination method, the historical batch processing data corresponding to the batch task is analyzed through machine learning technology, the batch task prediction model is trained through the quantitative analysis of the massive data, and then the batch task prediction model is trained through the trained batch task prediction model. The data volume of each batch processing process of the task is predicted with reference to the range. The following describes the prediction process based on the batch task prediction model.
以下先介绍基于历史批量处理数据来训练批量任务预测模型的训练过程。The following first introduces the training process of training a batch task prediction model based on historical batch processing data.
在模型训练过程中,可以包括特征选取、模型选取、以选取的特征训练模型几部分,以下具体说明。The model training process may include feature selection, model selection, and model training based on the selected features, which are described in detail below.
1)特征选取。特征选择对于机器学习模型的搭建是至关重要的。好的特征能够提升模型的性能,更能帮助理解数据的特点和底层结构,这对进一步改善模型、算法都有着重要作用。然而使用太多的变量作为模型训练特征可能会导致模型变得不精确,尤其是存在对输出结果没有影响或者对其它变量有较大影响的模型训练特征时。以批量扣款业务为例,对自扣数据有关联影响的自变量特征例如包括账户数(所有贷款账户的数量)、借据数(所有贷款的笔数)、逾期借据数(存在逾期还款的贷款笔数)、贷款类型、分期还款方式、用户信用等级、分期还款期数、贷款余额,等等,在如此多的自变量特征中,如何选取一些特定的特征作为模型训练特征用便是需要考虑的问题,为了尽量避免变量数量太多对模型训练的精确性的影响,本申请实施例中以相关度作为筛选依据来选择用于模型训练的特征,例如可以按照预设选择策略来选择参考对象特征,而该预设选择策略是以相关度为依据的。1) Feature selection. Feature selection is crucial to the construction of machine learning models. Good features can improve the performance of the model and help understand the characteristics and underlying structure of the data. This plays an important role in further improving the model and algorithm. However, using too many variables as model training features may cause the model to become inaccurate, especially when there are model training features that have no effect on the output result or have a greater impact on other variables. Take the bulk deduction business as an example. The independent variable characteristics that have an associated influence on the self-deduction data include the number of accounts (the number of all loan accounts), the number of loan data (the number of all loans), and the number of overdue loan data (there are overdue repayments). Loan number), loan type, installment method, user credit rating, installment period, loan balance, etc. Among so many independent variable features, how to select some specific features as model training features for convenience It is a problem that needs to be considered. In order to avoid the influence of too many variables on the accuracy of model training, in the embodiments of this application, the correlation degree is used as a screening basis to select features for model training. For example, a preset selection strategy can be used to The feature of the reference object is selected, and the preset selection strategy is based on the degree of relevance.
在一种可能的实施方式中,可以先解析出批量任务中的待处理数据类型,以批量扣款业务为例,待处理数据类型例如可以理解为自扣数据,而对待处理数据类型进行批量处理对应的数据量即可以理解为是自动批量扣款的总金额,再确定与待处理数据类型具有关联关系(例如正面影响或者负面影响)的所有特征对象,该特征对象即为上述提到的例如账户数、借据数等自变量特征。进而再针对每个特征对象,计算每个特征对象(例如可以看作是自变量)与待处理数据类型进行批量处理的数据量(例如可以看作是因变量)之间的相关度,例如可以采用皮尔森相关系数来计算各个自变量与因变量之间的相关度,或者也可以采用其它计算相关度的方法来确定各个自变量与因变量之间的相关度。在得到各个特征对象对应的相关度之后,可以选取与相关度较大的特征对象作为最终用于模型训练的输入特征,例如可以将最终用作模型训练的特征对象称作参考特征对象。在得到各个特征对象(即自变量特征)与因变量之间的相关度之后,可以从所有特征对象中选择相关度满足预设筛选条件的特征对象作为用于进行模型训练的参考特征对象。例如,可以将相关度大于或等于预定相关度阈值的特征对象作为参考特征对象,为确保选择出的参考特征对象与因变量之间具有强相关性,该预定相关度阈值可以设置的稍大一些,例如设置为80%;又例如,可以按照相关度由大到小的顺序,将排列在前面的预定数量的特征对象确定为参考 特征对象,即可以先设定参考特征对象的数量,例如4个,进而可以选择相关度最大的4个特征对象作为最终的参考特征对象。In a possible implementation, the type of data to be processed in the batch task can be parsed first. Taking the batch deduction business as an example, the type of data to be processed can be understood as self-deducted data, and the type of data to be processed is batch processed. The corresponding data volume can be understood as the total amount of automatic batch deductions, and then all the characteristic objects that have an association relationship (such as positive or negative influence) with the type of data to be processed are determined. The characteristic object is the aforementioned example Independent variable characteristics such as the number of accounts and the number of loans. Then for each feature object, calculate the correlation between each feature object (for example, it can be regarded as an independent variable) and the amount of data to be processed for batch processing of the data type (for example, it can be regarded as a dependent variable), for example, The Pearson correlation coefficient is used to calculate the correlation between each independent variable and the dependent variable, or other methods of calculating the correlation can also be used to determine the correlation between each independent variable and the dependent variable. After the correlation degree corresponding to each feature object is obtained, the feature object with a greater correlation degree can be selected as the input feature for the final model training. For example, the feature object that is finally used for the model training can be called a reference feature object. After the correlation between each feature object (ie, the feature of the independent variable) and the dependent variable is obtained, a feature object with a correlation that meets a preset screening condition can be selected from all feature objects as a reference feature object for model training. For example, a feature object with a correlation greater than or equal to a predetermined correlation threshold can be used as a reference feature object. To ensure that the selected reference feature object has a strong correlation with the dependent variable, the predetermined correlation threshold can be set slightly larger , For example, set to 80%; for another example, a predetermined number of feature objects arranged in front can be determined as reference feature objects according to the order of relevance, that is, the number of reference feature objects can be set first, such as 4 Then, the 4 feature objects with the highest correlation can be selected as the final reference feature objects.
在另一种可能的实施方式中,用户可以根据经验,预先将预定特征对象设置为参考特征对象,例如,根据经验可知账户数、借据数、逾期借据数这三种预定特征对象对自扣数据的影响是强相关的,所以可以直接从所有特征对象中选择这三种预定特征对象作为最终的参考特征对象。In another possible implementation manner, the user can pre-set the predetermined feature object as the reference feature object based on experience. For example, based on experience, the three types of predetermined feature objects, namely the number of accounts, the number of debit data, and the number of overdue debit data, can be deducted from the data. The influence of is strongly correlated, so these three predetermined feature objects can be directly selected from all feature objects as the final reference feature object.
因为相关度越高,说明对应的自变量特征与因变量之间的正相关性越强,那么则说明该自变量特征对因变量的影响也较大,所以利用相关度高的一些参考特征对象来进行模型训练,在保证模型精确的前提下,还可以尽量减少训练特征的数量,以进一步地提高模型训练的精确性。Because the higher the correlation degree, the stronger the positive correlation between the corresponding independent variable feature and the dependent variable, which means that the independent variable feature has a greater impact on the dependent variable, so some reference feature objects with high correlation are used For model training, under the premise of ensuring the accuracy of the model, the number of training features can also be minimized to further improve the accuracy of model training.
2)模型选取。在模型选择上,本申请实施例采用回归拟合的思想,通过观察因变量和自变量的数据分布来选择恰当的回归函数。通过分析,自扣数据与选择出的具有强相关性的参考特征对象之间呈明显的线性关系,即自扣数据量的增长随参考特征对象的增长而增长,并且是呈现线性增长的,所以,例如批量扣款业务这种类型的业务,其应该是基于业务的稳定而线性增长发展的。也就是说,自扣数据与各个参考特征对象的增长呈明显的显性关系,因此可以采用多元线性回归(Multiple Linear Regression,MLR)模型来进行数据预测,即可以选择多元线性回归模型作为初始模型来进行模型训练。多元线性回归通过已知数据找到一个线性方程来描述两个及以上的特征(自变量)与输出(因变量)之间的关系,并用这个线性方程来预测结果。2) Model selection. In terms of model selection, the embodiment of the application adopts the idea of regression fitting, and selects an appropriate regression function by observing the data distribution of the dependent variable and the independent variable. Through analysis, there is an obvious linear relationship between the self-deduction data and the selected reference feature object with strong correlation, that is, the increase of the self-deduction data volume increases with the growth of the reference feature object, and shows a linear growth, so For example, the bulk deduction business should be based on stable and linear growth. That is to say, the self-deduction data has an obvious explicit relationship with the growth of each reference feature object. Therefore, multiple linear regression (MLR) models can be used for data prediction, that is, multiple linear regression models can be selected as the initial model For model training. Multiple linear regression finds a linear equation through known data to describe the relationship between two or more features (independent variables) and output (dependent variables), and uses this linear equation to predict the results.
多元线性回归的数学形式如下:The mathematical form of multiple linear regression is as follows:
y=b0+b1x1+b2x2+b3x3+……+bnxn。y=b0+b1x1+b2x2+b3x3+...+bnxn.
上述公式中,y表示因变量;x1、x2、x3表示自变量;b1、b2、b3为对应于x1、x2、x3的自变量系数,也可以将b1、b2、b3理解为对应于x1、x2、x3这些自变量的自变量权重,而权重可以反映自变量对因变量的影响大小,例如相关度越高的自变量的权重值越大,表明其对因变量的影响越大;b0可以理解为是一个自定义常数,根据因变量的类型不用,b0可以设置为不同的值,在具体实施过程中,b0也可以设置为0。In the above formula, y represents the dependent variable; x1, x2, x3 represent the independent variables; b1, b2, b3 are the independent variable coefficients corresponding to x1, x2, and x3, and b1, b2, and b3 can also be understood as corresponding to x1, x2 and x3 are the independent variable weights of these independent variables, and the weight can reflect the influence of the independent variable on the dependent variable. For example, the higher the correlation degree, the greater the weight of the independent variable, indicating that the greater the influence on the dependent variable; b0 can It is understood as a self-defined constant. It is not used according to the type of dependent variable. B0 can be set to a different value. In the specific implementation process, b0 can also be set to 0.
3)训练模型。在选取初始的训练模型之后,可以根据前述选择出的参考特征对象对初始的训练模型进行模型训练,以得到训练好的批量任务预测模型。3) Training model. After the initial training model is selected, the initial training model can be model-trained according to the selected reference feature object to obtain a trained batch task prediction model.
首先,可以从历史批量数据中确定多个历史时间段,例如历史批量数据为1个月的数据,那么则可以将1个月按照同一时间间隔(例如5天)划分为6个等长的历史时间段,即第1天-第5天为第一个历史时间段,第6天-第10天为第二个历史时间段,第11天-第15天为第三个历史时间段,第6天-第10天为第二个历史时间段,第11天-第15天为第三个历史时间段,第16天-第20天为第四个历史时间段,第21天-第25天为第五个历史时间段,第26天-第30天为第六个历史时间段。First, multiple historical time periods can be determined from the historical batch data. For example, if the historical batch data is one month of data, then one month can be divided into six equal-length histories at the same time interval (for example, 5 days) Time period, that is, Day 1-Day 5 is the first historical time period, Day 6-Day 10 is the second historical time period, Day 11-Day 15 is the third historical time period, 6 days-10th day is the second historical time period, 11th-15th day is the third historical time period, 16th-20th day is the fourth historical time period, 21st-25th day Day is the fifth historical time period, and the 26th-30th day is the sixth historical time period.
然后,再分别提取每个历史时间段内的各个参考对象对应的数据值,例如可以提取每个历史时间段内的账户数、借据数、逾期借据数,则可以得到6组账户数、借据数、逾期借据数。Then, extract the data values corresponding to each reference object in each historical time period. For example, you can extract the number of accounts, the number of debit data, and the number of overdue debit data in each historical time period, and you can get 6 sets of account numbers and debit data. , The number of overdue loans.
进一步地,可以根据每个历史时间段内的各个参考特征对象对应的数据值,对初始的批量任务预测模型(即前述的多元线性回归模型)进行训练,以得到训练后的批量任务预测模型。具体来说,可以将这上述6组数据分别代入上述多元线性回归方程中,即分别将 上述多元线性回归方程中的自变量x1、x2、x3以上述6组数据代入,从而可以计算出b1、b2、b3,即可以得到这些自变量x1、x2、x3相应的自变量权重,例如计算出的b1、b2、b3分别是0.4、0.6、0.5,那么训练得到的批量任务预测模型为:y=b0+0.4x1+0.6x2+0.5x3。Further, the initial batch task prediction model (ie, the aforementioned multiple linear regression model) can be trained according to the data value corresponding to each reference feature object in each historical time period to obtain the trained batch task prediction model. Specifically, the above 6 sets of data can be substituted into the above multiple linear regression equation, that is, the independent variables x1, x2, and x3 in the above multiple linear regression equation can be substituted with the above 6 sets of data, so that b1, b2, b3, that is, the corresponding independent variable weights of these independent variables x1, x2, and x3 can be obtained. For example, the calculated b1, b2, and b3 are 0.4, 0.6, and 0.5 respectively, then the batch task prediction model obtained by training is: y= b0+0.4x1+0.6x2+0.5x3.
需要说明的是,上述只是以较为简单的理解方式对模型训练过程进行说明,在具体的模型训练过程中,一般可能包括多轮迭代训练,例如可以通过最小二乘法来计算b1、b2、b3,再次不再详细说明。It should be noted that the above is only a simpler way of understanding the model training process. In a specific model training process, it may generally include multiple rounds of iterative training. For example, b1, b2, b3 can be calculated by the least square method. I will not elaborate again.
另外,对于训练得到的批量任务预测模型(即y=b0+0.4x1+0.6x2+0.5x3),其中的x1、x2、x3可以表示各个参考特征对象的实际数量,也可以表示各个参考对象的对象增量,在具体实施过程中,可以根据实际业务需求对x1、x2、x3进行自定义理解。In addition, for the batch task prediction model obtained by training (ie, y=b0+0.4x1+0.6x2+0.5x3), x1, x2, and x3 can represent the actual number of reference feature objects, and can also represent the number of reference objects. Object increment. In the specific implementation process, you can customize x1, x2, and x3 according to actual business needs.
通过上述说明得到针对批量任务的批量任务预测模型之后,则可以基于该批量任务预测模型对批量任务的本次批量处理进行数据量参考范围的预测,以下结合图3所示的流程对该预测过程进行说明。After the batch task prediction model for batch tasks is obtained through the above description, the batch task prediction model can be used to predict the data volume reference range for this batch processing of the batch task. The following describes the prediction process in conjunction with the flow shown in Figure 3 Be explained.
步骤301:解析出批量任务中的待处理数据类型。Step 301: parse out the data type to be processed in the batch task.
如前所述的,根据批量任务的类型的不同,与其对应的待处理数据类型也可能不同,以自动批量扣款业务为例,待处理数据类型例如为自扣数据。As mentioned above, depending on the type of batch task, the corresponding data type to be processed may also be different. Taking the automatic batch deduction service as an example, the type of data to be processed is, for example, self-deducted data.
步骤302:在历史批量数据中,确定待处理数据类型对应的参考特征对象。Step 302: Determine the reference feature object corresponding to the type of data to be processed in the historical batch data.
继续上述批量扣款业务为例,与自扣数据对应的参考特征对象例如为前述实施例中提到的账户数、借据数、逾期借据数、贷款余额,等等。Continuing the above batch deduction business as an example, the reference feature objects corresponding to the self-deduction data are, for example, the number of accounts, the number of debit data, the number of overdue debit data, and the loan balance mentioned in the foregoing embodiment.
步骤303:调取各个参考特征对象对应的目标数据值。Step 303: Retrieve the target data value corresponding to each reference feature object.
其中,参考特征对象对应的目标数据值用于表征该参考特征对象在预设时长内对应的数据值,例如取1个月内的历史批量处理数据,各个参考特征对象对应的目标数据值即为该1个月内的各个参考特征对象对应的数据值。Among them, the target data value corresponding to the reference feature object is used to characterize the data value corresponding to the reference feature object within a preset period of time. For example, taking historical batch processing data within 1 month, the target data value corresponding to each reference feature object is The data value corresponding to each reference feature object within the one month.
步骤304:根据预先训练好的批量任务预测模型和各个参考特征对象对应的目标数据值,确定各个参考特征对象对应的参考数据量。Step 304: Determine the amount of reference data corresponding to each reference feature object according to the pre-trained batch task prediction model and the target data value corresponding to each reference feature object.
基于上述训练得到的批量任务预测模型,即y=b0+0.4x1+0.6x2+0.5x3,可以将各个参考特征对象对应的参考数据量分别替代该公式中的x1、x2、x3,即可以得到0.4x1、0.6x2、0.5x3对应的值,0.4x1、0.6x2、0.5x3对应的值即为各个参考特征对象对应的参考数据量。Based on the batch task prediction model obtained by the above training, that is, y=b0+0.4x1+0.6x2+0.5x3, the reference data amount corresponding to each reference feature object can be substituted for x1, x2, x3 in the formula, and then we can get The values corresponding to 0.4x1, 0.6x2, and 0.5x3, and the values corresponding to 0.4x1, 0.6x2, and 0.5x3 are the amount of reference data corresponding to each reference feature object.
步骤305:根据各个参考特征对象对应的参考数据量,确定批量任务的数据量参考阈值,以得到进行本次批量任务处理的数据量参考范围。Step 305: Determine the data volume reference threshold value of the batch task according to the reference data volume corresponding to each reference feature object, so as to obtain the data volume reference range for processing the batch task.
又由于b0为自定义的常数,根据上述计算得到的各个参考特征对象对应的参考数据量,所以可以相应的计算出y的值,即得到批量任务的数据量参考阈值,进而根据一些阈值范围设置条件,可以相应地确定出批量任务对应的数据量参考范围。And because b0 is a self-defined constant, according to the reference data volume corresponding to each reference feature object obtained by the above calculation, the value of y can be calculated accordingly, that is, the data volume reference threshold of the batch task is obtained, and then set according to some threshold ranges Conditions, the reference range of the data volume corresponding to the batch task can be determined accordingly.
如前所述的,训练好的批量任务预测模型中的x1、x2、x3等自变量可以表示相应的参考特征对象的对象增量,即可以确定各个参考特征对象在第一预定时长内的对象增量,例如在1个月之内的对象增量,再根据批量任务预测模型和各个参考特征对象在第一预定时长内的对象增量,确定各个参考特征对象对应的数据增量,最后根据各个参考特征对象在第二预定时长内的数据量(例如在本次批量任务处理前的上一次的数据量或者上几次的平均数据量)和在第一预定时长内的数据增量,确定各个参考特征对象对应的参考数据量。进一步地,再确定批量任务的数据量在第三预定时长(例如6月)内的平均增长幅度,并根据该平均增长幅度和各个参考特征对象对应的参考数据量,确定批量任务最终对应的数 据量参考阈值。As mentioned above, independent variables such as x1, x2, x3 in the trained batch task prediction model can represent the object increment of the corresponding reference feature object, which can determine the object of each reference feature object within the first predetermined period of time. Increment, for example, the object increment within 1 month, and then according to the batch task prediction model and the object increment of each reference feature object within the first predetermined time period, the data increment corresponding to each reference feature object is determined, and finally according to The data amount of each reference feature object in the second predetermined time period (for example, the last data amount before this batch task processing or the average data amount of the last several times) and the data increment in the first predetermined time period are determined The amount of reference data corresponding to each reference feature object. Further, determine the average growth rate of the data volume of the batch task within the third predetermined period of time (for example, June), and determine the final data corresponding to the batch task according to the average growth rate and the reference data volume corresponding to each reference feature object The amount refers to the threshold.
根据上述增量计算的思想,例如得到的预测公式为:当日批扣数据增量=6个月的平均增长增幅+上个月新增账户数*账户权重+上个月新增借据数*借据权重+上个月新增贷款余额*贷款余额权重。对应到上述的批量任务预测模型,即对应的y=b0+0.4x1+0.6x2+0.5x3,b0即为6个月的平均增长增幅,是一个已知常数,x1、x2、x3分别表示上个月新增账户数、上个月新增借据数、上个月新增贷款余额,而账户权重、借据权重、贷款余额权重即对应为0.4、0.6、0.5。通过该公式,即可以预测出当日批扣数据增量(即本次批量任务的数据增量),进而再将其与上一次批量任务的实际扣款数据量(或上几次批量任务的实际扣款数据量的平均值)相加,则可以得到本次批量任务的数据量参考阈值,从而实现对批量自动扣款业务的自扣总金额的准确预测。According to the idea of the above incremental calculation, for example, the obtained prediction formula is: the incremental deduction data of the day = the average growth rate of 6 months + the number of new accounts in the last month * account weight + the number of new debit data in the last month * debit data Weight + new loan balance last month * loan balance weight. Corresponding to the above-mentioned batch task prediction model, that is, the corresponding y=b0+0.4x1+0.6x2+0.5x3, b0 is the average growth rate of 6 months, which is a known constant, x1, x2, and x3 respectively represent the above The number of new accounts per month, the number of new borrowing notes in the previous month, and the new loan balances last month, and the account weight, IOU weight, and loan balance weight correspond to 0.4, 0.6, and 0.5. Through this formula, it is possible to predict the increase in batch deduction data on the day (that is, the data increase in this batch task), and then compare it with the actual deduction data amount of the last batch task (or the actual amount of the last batch tasks). The average value of the deduction data amount) can be added together to obtain the reference threshold value of the data amount of this batch task, so as to realize the accurate prediction of the total deduction amount of the batch automatic deduction business.
在第一种确定方式中,利用机器学习的方式来实现对历史数据的分析、归纳和应用,进而可以使用机器学习训练好的机器模型进行有效预测,提高了整个批量处理系统的智能性,通过机器模型进行预测的效率也较高,从而可以提高预测的效率,进而提高批量任务的处理效率和及时性。In the first determination method, machine learning is used to realize the analysis, induction and application of historical data, and then the machine model trained by machine learning can be used to make effective predictions, which improves the intelligence of the entire batch processing system. The prediction efficiency of the machine model is also high, which can improve the efficiency of the prediction, thereby improving the processing efficiency and timeliness of batch tasks.
第二种确定方式The second way to determine
可以获得在预定时长内(例如1个月或10天或者15天)进行了多次批量任务处理的历史批量处理统计数据,进而再利用预定处理方式对批量任务统计数据进行处理,例如采用前述机器学习的计算思想动态地计算出针对本次批量任务的数据量参考范围。It is possible to obtain historical batch processing statistics data that have been processed for multiple batch tasks within a predetermined period of time (for example, 1 month or 10 days or 15 days), and then use the predetermined processing method to process the batch task statistical data, such as using the aforementioned machine The calculation idea of learning dynamically calculates the reference range of data volume for this batch task.
也就是说,在第二种确定方式中,可以通过算法临时地对数据量参考范围进行动态地实时预测,这样,在进行本次预测的时候,可以灵活地设置相应的预定时长,例如可以按照批量扣款时间倒序的方式,将最近的500次批量扣款的历史业务处理数据作为本次进行预测的计算依据,这样可以使得每次的历史业务处理数据都是最新的历史数据,从而可以将时间临近的批量任务处理完全充分的考虑在内,根据越临近的相关性越大的原则,所以通过该方式可以在一定程度上确保预测的准确性。That is to say, in the second determination method, the data volume reference range can be dynamically and real-time predicted temporarily through the algorithm. In this way, when the prediction is made, the corresponding predetermined time can be flexibly set, for example, according to The batch deduction time is reversed, and the historical business processing data of the most recent 500 batch deductions is used as the calculation basis for this forecast, so that each historical business processing data is the latest historical data, so that the The processing of batch tasks near time is fully taken into consideration, and according to the principle that the closer the closer, the greater the correlation, so this method can ensure the accuracy of the prediction to a certain extent.
本申请实施例中,可以将历史批量处理的情况充分考虑在内,即以历史数据作为基础,从历史的海量数据中挖掘出数据特征,通过对这些数据特征进行分析从而输出批量决策。通过该方式能够提升批量预测的准确性,实时监控批量运行,如果发现异常及时阻断批量,保证数据批量处理的正确性,从而避免了批量异常带来的损失。In the embodiments of the present application, historical batch processing can be fully taken into consideration, that is, based on historical data, data features are mined from historical massive data, and batch decisions are output by analyzing these data features. In this way, the accuracy of batch prediction can be improved, the batch operation can be monitored in real time, and the batch can be blocked in time if an abnormality is found to ensure the correctness of the data batch processing, thereby avoiding the loss caused by the batch exception.
基于同一发明构思,本申请实施例提供一种处理批量任务的装置。该处理批量任务的装置可以实现前述实施例中的处理批量任务的方法。请参见图4所示,本申请实施例中的处理批量任务的装置包括第一确定模块401、第二确定模块402、第三确定模块403、批量阻断模块404和批量执行模块405,其中:Based on the same inventive concept, an embodiment of the present application provides an apparatus for processing batch tasks. The apparatus for processing batch tasks can implement the method for processing batch tasks in the foregoing embodiments. As shown in FIG. 4, the apparatus for processing batch tasks in the embodiment of the present application includes a first determination module 401, a second determination module 402, a third determination module 403, a batch blocking module 404, and a batch execution module 405, where:
第一确定模块401,用于确定批量任务是否满足预设的批量处理触发条件;The first determining module 401 is configured to determine whether the batch task meets preset batch processing trigger conditions;
第二确定模块402,用于在满足批量处理触发条件时,确定批量任务的实际数据量;The second determining module 402 is configured to determine the actual data volume of the batch task when the batch processing trigger condition is met;
第三确定模块403,用于根据批量任务的历史批量处理数据,确定批量任务进行批量处理的数据量参考范围;The third determining module 403 is configured to determine the reference range of the data volume for batch processing of the batch task according to the historical batch processing data of the batch task;
批量阻断模块404,用于若实际数据量未在数据量参考范围之内,则阻断对于批量任务的本次批量处理;The batch blocking module 404 is configured to block the current batch processing of the batch task if the actual data amount is not within the data amount reference range;
批量执行模块405,用于若实际数据量在数据量参考范围之内,则根据实际数据量对批量任务进行批量处理。The batch execution module 405 is configured to perform batch processing on the batch tasks according to the actual data volume if the actual data volume is within the data volume reference range.
在一种可能的实施方式中,第三确定模块403用于:In a possible implementation manner, the third determining module 403 is configured to:
解析出批量任务中的待处理数据类型;Analyze the type of data to be processed in the batch task;
在历史批量数据中,确定待处理数据类型对应的参考特征对象,并调取各个参考特征对象对应的目标数据值,目标数据值用于表征参考特征对象在预设时长内对应的数据值;In the historical batch data, determine the reference feature object corresponding to the data type to be processed, and retrieve the target data value corresponding to each reference feature object. The target data value is used to represent the data value corresponding to the reference feature object within a preset time period;
根据批量任务对应的预先训练的批量任务预测模型核各个参考特征对象对应的目标数据值,确定各个参考特征对象对应的参考数据量,并根据各个参考特征对象对应的参考数据量,确定批量任务的数据量参考阈值;其中,批量任务预测模型是根据历史批量数据中的参考特征对象对应的数据值训练得到的;According to the pre-trained batch task prediction model corresponding to the batch task, the target data value corresponding to each reference feature object is determined, and the reference data amount corresponding to each reference feature object is determined, and the reference data amount corresponding to each reference feature object is determined. Data volume reference threshold; among them, the batch task prediction model is trained based on the data value corresponding to the reference feature object in the historical batch data;
根据批量任务的数据量参考阈值,确定数据量参考范围。Determine the data volume reference range according to the data volume reference threshold of the batch task.
在一种可能的设计中,本申请实施例中的处理批量任务的装置还包括模型训练模块406,用于:In a possible design, the device for processing batch tasks in the embodiment of the present application further includes a model training module 406, which is used to:
从历史批量数据包括的所有特征对象中,按照预设选择策略确定参考特征对象;其中,每个特征对象与待处理数据类型进行批量处理对应的数据量具有关联关系;From all the feature objects included in the historical batch data, determine the reference feature object according to a preset selection strategy; wherein, each feature object has an association relationship with the data volume corresponding to the batch processing of the data type to be processed;
从历史批量数据中确定多个历史时间段,并分别提取每个历史时间段内的各个参考特征对象对应的数据值;Determine multiple historical time periods from historical batch data, and extract the data values corresponding to each reference feature object in each historical time period;
根据每个历史时间段内的各个参考特征对象对应的数据值,对初始的批量任务预测模型进行训练,以得到训练后的批量任务预测模型。According to the data value corresponding to each reference feature object in each historical time period, the initial batch task prediction model is trained to obtain the trained batch task prediction model.
在一种可能的设计中,模型训练模块406用于:In one possible design, the model training module 406 is used to:
确定各个特征对象与待处理数据类型进行批量处理对应的数据量之间的相关度;Determine the correlation between each feature object and the data volume corresponding to the batch processing of the data type to be processed;
将相关度满足预设筛选条件的特征对象确定为参考特征对象。The feature object whose correlation degree meets the preset filtering condition is determined as the reference feature object.
在一种可能的设计中,模型训练模块406用于:In one possible design, the model training module 406 is used to:
将相关度大于预定相关度阈值的特征对象均确定为参考特征对象;或者,All feature objects whose correlation degree is greater than a predetermined correlation degree threshold are determined as reference feature objects; or,
按照相关度由大到小的顺序,将位于前面的预定数量的特征对象确定为参考特征对象。According to the descending order of the correlation degree, a predetermined number of feature objects located in the front are determined as reference feature objects.
在一种可能的设计中,模型训练模块406用于:In one possible design, the model training module 406 is used to:
从所有特征对象中,选择预定特征对象作为参考特征对象。From all the feature objects, a predetermined feature object is selected as the reference feature object.
在一种可能的设计中,模型训练模块406用于:In one possible design, the model training module 406 is used to:
确定各个参考特征对象在第一预定时长内的对象增量;Determine the object increment of each reference feature object within the first predetermined time period;
根据批量任务预测模型和各个参考特征对象在第一预定时长内的对象增量,确定各个参考特征对象对应的数据增量;Determine the data increment corresponding to each reference feature object according to the batch task prediction model and the object increment of each reference feature object within the first predetermined time period;
根据各个参考特征对象在第二预定时长内的数据量和在第一预定时长内的数据增量,确定各个参考特征对象对应的参考数据量。According to the data amount of each reference feature object within the second predetermined time period and the data increment within the first predetermined time period, the reference data amount corresponding to each reference feature object is determined.
在一种可能的设计中,模型训练模块406用于:In one possible design, the model training module 406 is used to:
确定批量任务的数据量在第三预定时长内的平均增长幅度;Determine the average growth rate of the data volume of the batch task within the third predetermined time period;
根据平均增长服务幅度和各个参考特征对象对应的参考数据量,确定批量任务的数据量参考阈值。According to the average growth service range and the reference data volume corresponding to each reference feature object, the data volume reference threshold of the batch task is determined.
在一种可能的设计中,实际数据量和数据量参考范围均包括批量任务的批处理的数量以及所有数量的任务对应的总金额。In a possible design, the actual data volume and the data volume reference range both include the batch number of batch tasks and the total amount corresponding to all the number of tasks.
前述的处理批量任务的方法的实施例涉及的各步骤的所有相关内容均可援引到本申请施例中的处理批量任务的装置所对应的功能模块的功能描述,在此不再赘述。All relevant content of the steps involved in the foregoing embodiment of the method for processing batch tasks can be cited in the functional description of the functional module corresponding to the device for processing batch tasks in the embodiment of the present application, and will not be repeated here.
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可 以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。The division of modules in the embodiments of the present application is illustrative, and is only a logical function division. In actual implementation, there may be other division methods. In addition, the functional modules in the various embodiments of the present application may be integrated into one process. In the device, it can also exist alone, or two or more modules can be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
基于同一发明构思,本申请实施例还提供一种计算设备,如图5所示,本申请实施例中的计算设备包括至少一个处理器501,以及与至少一个处理器501连接的存储器502和通信接口503,本申请实施例中不限定处理器501与存储器502之间的具体连接介质,图5中是以处理器501和存储器502之间通过总线500连接为例,总线500在图5中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。总线500可以分为地址总线、数据总线、控制总线等,为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Based on the same inventive concept, an embodiment of the present application also provides a computing device. As shown in FIG. 5, the computing device in the embodiment of the present application includes at least one processor 501, and a memory 502 connected to the at least one processor 501 and communication Interface 503, the specific connection medium between the processor 501 and the memory 502 is not limited in the embodiment of the present application. In FIG. 5, the connection between the processor 501 and the memory 502 is taken as an example. The bus 500 is shown in FIG. The thick line indicates that the connection mode between other components is only for schematic illustration and is not intended to be limiting. The bus 500 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only a thick line is used in FIG. 5 to represent it, but it does not mean that there is only one bus or one type of bus.
在本申请实施例中,存储器502存储有可被至少一个处理器501执行的指令,至少一个处理器501通过执行存储器502存储的指令,可以执行前述的全链路性能测试方法中所包括的步骤。In the embodiment of the present application, the memory 502 stores instructions that can be executed by at least one processor 501, and the at least one processor 501 can execute the steps included in the aforementioned full-link performance test method by executing the instructions stored in the memory 502 .
其中,处理器501是计算设备的控制中心,可以利用各种接口和线路连接整个计算设备的各个部分,通过运行或执行存储在存储器502内的指令以及调用存储在存储器502内的数据,计算设备的各种功能和处理数据,从而对计算设备进行整体监控。可选的,处理器501可包括一个或多个处理模块,处理器501可集成应用处理器和调制解调处理器,其中,处理器501主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器501中。在一些实施例中,处理器501和存储器502可以在同一芯片上实现,在一些实施例中,它们也可以在独立的芯片上分别实现。Among them, the processor 501 is the control center of the computing device. It can use various interfaces and lines to connect to various parts of the entire computing device. By running or executing instructions stored in the memory 502 and calling data stored in the memory 502, the computing device Various functions and processing data of the computer, so as to monitor the computing equipment as a whole. Optionally, the processor 501 may include one or more processing modules, and the processor 501 may integrate an application processor and a modem processor, where the processor 501 mainly processes an operating system, a user interface, and application programs, etc. The adjustment processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 501. In some embodiments, the processor 501 and the memory 502 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.
处理器501可以是通用处理器,例如中央处理器(CPU)、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。The processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, Implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present application. The general-purpose processor may be a microprocessor or any conventional processor. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
存储器502作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器502可以包括至少一种类型的存储介质,例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory,RAM)、静态随机访问存储器(Static Random Access Memory,SRAM)、可编程只读存储器(Programmable Read Only Memory,PROM)、只读存储器(Read Only Memory,ROM)、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性存储器、磁盘、光盘等等。存储器502是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器502还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。As a non-volatile computer-readable storage medium, the memory 502 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The memory 502 may include at least one type of storage medium, for example, it may include flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic memory, disk , CD, etc. The memory 502 is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 502 in the embodiment of the present application may also be a circuit or any other device capable of implementing a storage function for storing program instructions and/or data.
通信接口503是能够用于进行通信的传输接口,可以通过通信接口503接收数据或者发送数据,进而于其它设备进行通信。The communication interface 503 is a transmission interface that can be used for communication, and can receive data or send data through the communication interface 503, and then communicate with other devices.
参见图6所示的计算设备的进一步地的结构示意图,该计算设备还包括帮助计算设备内的各个器件之间传输信息的基本输入/输出系统(I/O系统)601、用于存储操作系统602、 应用程序603和其他程序模块604的大容量存储设备605。Referring to the further structural diagram of the computing device shown in FIG. 6, the computing device also includes a basic input/output system (I/O system) 601 that helps to transfer information between various devices in the computing device, and is used to store an operating system 602. A mass storage device 605 for application programs 603 and other program modules 604.
基本输入/输出系统601包括有用于显示信息的显示器606和用于用户输入信息的诸如鼠标、键盘之类的输入设备607。其中显示器606和输入设备607都通过连接到系统总线500的基本输入/输出系统601连接到处理器501。所述基本输入/输出系统601还可以包括输入输出控制器以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器还提供输出到显示屏、打印机或其他类型的输出设备。The basic input/output system 601 includes a display 606 for displaying information and an input device 607 such as a mouse and a keyboard for the user to input information. Both the display 606 and the input device 607 are connected to the processor 501 through a basic input/output system 601 connected to the system bus 500. The basic input/output system 601 may also include an input and output controller for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus. Similarly, the I/O controller also provides output to a display screen, printer or other type of output device.
所述大容量存储设备605通过连接到系统总线500的大容量存储控制器(未示出)连接到处理器501。所述大容量存储设备605及其相关联的计算机可读介质为该服务器包提供非易失性存储。也就是说,大容量存储设备605可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。The mass storage device 605 is connected to the processor 501 through a mass storage controller (not shown) connected to the system bus 500. The mass storage device 605 and its associated computer readable medium provide non-volatile storage for the server package. That is, the mass storage device 605 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM drive.
根据本发明的各种实施例,该计算设备包还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即该计算设备可以通过连接在所述系统总线500上的通信接口503连接到网络606,或者说,也可以使用通信接口503来连接到其他类型的网络或远程计算机系统(未示出)。According to various embodiments of the present invention, the computing device package can also be run by a remote computer connected to the network through a network such as the Internet. That is, the computing device can be connected to the network 606 through the communication interface 503 connected to the system bus 500, or in other words, the communication interface 503 can also be used to connect to other types of networks or remote computer systems (not shown).
基于同一发明构思,本申请实施例还提供一种存储介质,该存储介质例如是计算机可读存储介质,该计算机可读存储介质存储有计算机指令,当该计算机指令在计算机上运行时,使得计算机执行如前述的处理批量任务的方法的步骤。Based on the same inventive concept, the embodiments of the present application also provide a storage medium. The storage medium is, for example, a computer-readable storage medium. The computer-readable storage medium stores computer instructions. When the computer instructions run on the computer, the computer Perform the steps of the method for processing batch tasks as described above.
在一些可能的实施方式中,本申请实施例提供的处理批量任务的方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在计算机上运行时,所述程序代码用于使所述计算机执行前文述描述的根据本发明各种示例性实施方式的处理批量任务的方法中的步骤。In some possible implementation manners, the various aspects of the method for processing batch tasks provided in the embodiments of the present application can also be implemented in the form of a program product, which includes program code. When the program product runs on a computer, The program code is used to make the computer execute the steps in the method for processing batch tasks according to various exemplary embodiments of the present invention described above.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may be in the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.
本发明是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowcharts and/or block diagrams of methods, equipment (systems), and computer program products according to embodiments of the present application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processors of general-purpose computers, special-purpose computers, embedded processors, or other programmable data processing equipment to generate a machine, so that instructions executed by the processor of the computer or other programmable data processing equipment are used It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个 方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims (10)

  1. 一种处理批量任务的方法,其特征在于,所述方法包括:A method for processing batch tasks, characterized in that the method includes:
    确定批量任务是否满足预设的批量处理触发条件;Determine whether the batch task meets the preset batch processing trigger conditions;
    在满足所述批量处理触发条件时,确定所述批量任务的实际数据量;When the batch processing trigger condition is met, determine the actual data volume of the batch task;
    根据所述批量任务对应的历史批量处理数据,确定对所述批量任务进行批量处理的数据量参考范围;Determine a reference range of data volume for batch processing of the batch task according to the historical batch processing data corresponding to the batch task;
    若所述实际数据量未在所述数据量参考范围之内,则阻断对于所述批量任务的本次批量处理;If the actual data amount is not within the data amount reference range, block this batch processing of the batch task;
    若所述实际数据量在所述数据量参考范围之内,则根据所述实际数据量对所述目标批量任务进行批量处理。If the actual data volume is within the data volume reference range, batch processing is performed on the target batch task according to the actual data volume.
  2. 如权利要求1所述的方法,其特征在于,根据所述批量任务对应的历史批量数据,确定对所述批量任务进行批量处理的数据量参考范围,包括:The method according to claim 1, wherein determining a reference range of a data amount for batch processing of the batch task according to the historical batch data corresponding to the batch task comprises:
    解析出所述批量任务中的待处理数据类型;Parse out the type of data to be processed in the batch task;
    在所述历史批量数据中,确定所述待处理数据类型对应的参考特征对象,并调取各个所述参考特征对象对应的目标数据值,所述目标数据值用于表征所述参考特征对象在预设时长内对应的数据值;In the historical batch data, determine the reference feature object corresponding to the data type to be processed, and retrieve the target data value corresponding to each reference feature object, and the target data value is used to represent the reference feature object in Corresponding data value within the preset duration;
    根据所述批量任务对应的预先训练的批量任务预测模型和各个所述参考特征对象对应的目标数据值,确定各个所述参考特征对象对应的参考数据量,并根据各个所述参考特征对象对应的参考数据量,确定所述批量任务的数据量参考阈值;其中,所述批量任务预测模型是根据所述历史批量数据中的所述参考特征对象对应的数据值训练得到的;Determine the amount of reference data corresponding to each reference feature object according to the pre-trained batch task prediction model corresponding to the batch task and the target data value corresponding to each reference feature object, and determine the amount of reference data corresponding to each reference feature object Reference data volume, determining the data volume reference threshold of the batch task; wherein the batch task prediction model is obtained by training according to the data value corresponding to the reference feature object in the historical batch data;
    根据所述批量任务的数据量参考阈值,确定所述数据量参考范围。Determine the data amount reference range according to the data amount reference threshold of the batch task.
  3. 如权利要求2所述的方法,其特征在于,所述批量任务预测模型按照以下方式训练得到:The method of claim 2, wherein the batch task prediction model is trained in the following manner:
    从所述历史批量数据包括的所有特征对象中,按照预设选择策略确定所述参考特征对象;其中,每个特征对象与所述待处理数据类型进行批量处理对应的数据量具有关联关系;From all the characteristic objects included in the historical batch data, determine the reference characteristic object according to a preset selection strategy; wherein each characteristic object has an association relationship with the data volume corresponding to the batch processing of the data type to be processed;
    从所述历史批量数据中确定多个历史时间段,并分别提取每个历史时间段内的各个所述参考特征对象对应的数据值;Determine a plurality of historical time periods from the historical batch data, and respectively extract data values corresponding to each of the reference feature objects in each historical time period;
    根据每个历史时间段内的各个所述参考特征对象对应的数据值,对初始的批量任务预测模型进行训练,以得到训练后的批量任务预测模型。According to the data value corresponding to each of the reference feature objects in each historical time period, the initial batch task prediction model is trained to obtain the trained batch task prediction model.
  4. 如权利要求3所述的方法,其特征在于,从所述历史批量数据包括的所有特征对象中,按照预设选择策略确定所述参考特征对象,包括:The method of claim 3, wherein, from all the feature objects included in the historical batch data, determining the reference feature object according to a preset selection strategy comprises:
    确定各个特征对象与所述待处理数据类型进行批量处理对应的数据量之间的相关度;Determine the correlation between each characteristic object and the data volume corresponding to the batch processing of the data type to be processed;
    将相关度满足预设筛选条件的特征对象确定为所述参考特征对象。The feature object whose correlation degree meets the preset screening condition is determined as the reference feature object.
  5. 如权利要求3所述的方法,其特征在于,从所述历史批量数据包括的所有特征对象中,按照预设选择策略确定所述参考特征对象,包括:The method of claim 3, wherein, from all the feature objects included in the historical batch data, determining the reference feature object according to a preset selection strategy comprises:
    从所述所有特征对象中,选择预定特征对象作为所述参考特征对象。From all the feature objects, a predetermined feature object is selected as the reference feature object.
  6. 如权利要求2所述的方法,其特征在于,确定各个所述参考特征对象对应的参考数据量,包括:3. The method according to claim 2, wherein determining the amount of reference data corresponding to each of the reference feature objects comprises:
    确定各个所述参考特征对象在第一预定时长内的对象增量;Determining the object increment of each of the reference feature objects within the first predetermined time period;
    根据所述批量任务预测模型和各个所述参考特征对象在所述第一预定时长内的对象增量,确定各个所述参考特征对象对应的数据增量;Determine the data increment corresponding to each reference feature object according to the batch task prediction model and the object increment of each reference feature object within the first predetermined time period;
    根据各个所述参考特征对象在第二预定时长内的数据量和在所述第一预定时长内的数据增量,确定各个所述参考特征对象对应的参考数据量。Determine the reference data amount corresponding to each reference feature object according to the data amount of each reference feature object in the second predetermined time period and the data increment within the first predetermined time period.
  7. 如权利要求6所述的方法,其特征在于,根据各个所述参考特征对象对应的参考数据量,确定所述批量任务的数据量参考阈值,包括:7. The method according to claim 6, wherein determining the data amount reference threshold of the batch task according to the reference data amount corresponding to each of the reference feature objects comprises:
    确定所述批量任务的数据量在第三预定时长内的平均增长幅度;Determine the average increase range of the data volume of the batch task within the third predetermined time period;
    根据所述平均增长幅度和各个所述参考特征对象对应的参考数据量,确定所述批量任务的数据量参考阈值。Determine the data amount reference threshold of the batch task according to the average growth rate and the reference data amount corresponding to each of the reference feature objects.
  8. 一种处理批量任务的装置,其特征在于,所述装置包括:A device for processing batch tasks, characterized in that the device comprises:
    第一确定模块,用于确定批量任务是否满足预设的批量处理触发条件;The first determining module is used to determine whether the batch task meets the preset batch processing trigger condition;
    第二确定模块,用于在满足所述批量处理触发条件时,确定所述批量任务的实际数据量;The second determining module is configured to determine the actual data volume of the batch task when the batch processing trigger condition is satisfied;
    第三确定模块,用于根据所述批量任务的历史批量处理数据,确定对所述批量任务进行批量处理的数据量参考范围;The third determining module is configured to determine the reference range of the data volume for batch processing of the batch task according to the historical batch processing data of the batch task;
    批量阻断模块,用于若所述实际数据量未在所述数据量参考范围之内,则阻断对于所述批量任务的本次批量处理;The batch blocking module is configured to block this batch processing of the batch task if the actual data amount is not within the reference range of the data amount;
    批量执行模块,用于若所述实际数据量在所述数据量参考范围之内,则根据所述实际数据量对所述目标批量任务进行批量处理。The batch execution module is configured to perform batch processing on the target batch task according to the actual data volume if the actual data volume is within the data volume reference range.
  9. 一种计算设备,其特征在于,包括至少一个处理器、以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行权利要求1-7任一项所述方法的步骤。A computing device, characterized by comprising at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processor executes the claims Steps of the method described in any one of 1-7.
  10. 一种存储介质,其特征在于,所述存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如权利要求1-7任一项所述方法的步骤。A storage medium, characterized in that the storage medium stores computer instructions, and when the computer instructions run on a computer, the computer executes the steps of the method according to any one of claims 1-7.
PCT/CN2020/109572 2019-08-21 2020-08-17 Method and apparatus for processing batch tasks, computing device and storage medium WO2021032056A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910775666.7 2019-08-21
CN201910775666.7A CN110457159A (en) 2019-08-21 2019-08-21 A kind of method, apparatus, calculating equipment and the storage medium of processing batch tasks

Publications (1)

Publication Number Publication Date
WO2021032056A1 true WO2021032056A1 (en) 2021-02-25

Family

ID=68488350

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/109572 WO2021032056A1 (en) 2019-08-21 2020-08-17 Method and apparatus for processing batch tasks, computing device and storage medium

Country Status (2)

Country Link
CN (1) CN110457159A (en)
WO (1) WO2021032056A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448808A (en) * 2021-08-30 2021-09-28 北京必示科技有限公司 Method, system and storage medium for predicting single task time in batch processing task

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457159A (en) * 2019-08-21 2019-11-15 深圳前海微众银行股份有限公司 A kind of method, apparatus, calculating equipment and the storage medium of processing batch tasks
CN112288446B (en) * 2020-10-28 2023-06-06 中国联合网络通信集团有限公司 Calculation method and device for complaint and claim payment
CN113360265B (en) * 2021-06-18 2021-12-28 特斯联科技集团有限公司 Big data operation task scheduling and monitoring system and method
CN113807942B (en) * 2021-08-05 2024-03-01 福建省农村信用社联合社 Method and system for recovering bad loans of banks in real time
CN113673857A (en) * 2021-08-13 2021-11-19 南京理工大学 Service sensing and resource scheduling system and method for data center station

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556678A (en) * 2009-05-21 2009-10-14 中国建设银行股份有限公司 Processing method of batch processing services, system and service processing control equipment
CN104778622A (en) * 2015-04-29 2015-07-15 清华大学 Method and system for predicting TPS transaction event threshold value
CN104811344A (en) * 2014-01-23 2015-07-29 阿里巴巴集团控股有限公司 Network dynamic service monitoring method and apparatus
CN107871190A (en) * 2016-09-23 2018-04-03 阿里巴巴集团控股有限公司 A kind of operational indicator monitoring method and device
US20180307740A1 (en) * 2017-04-20 2018-10-25 Microsoft Technology Licesning, LLC Clustering and labeling streamed data
CN110135856A (en) * 2019-05-16 2019-08-16 中国银联股份有限公司 A kind of repeat business risk monitoring method, device and computer readable storage medium
CN110457159A (en) * 2019-08-21 2019-11-15 深圳前海微众银行股份有限公司 A kind of method, apparatus, calculating equipment and the storage medium of processing batch tasks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556678A (en) * 2009-05-21 2009-10-14 中国建设银行股份有限公司 Processing method of batch processing services, system and service processing control equipment
CN104811344A (en) * 2014-01-23 2015-07-29 阿里巴巴集团控股有限公司 Network dynamic service monitoring method and apparatus
CN104778622A (en) * 2015-04-29 2015-07-15 清华大学 Method and system for predicting TPS transaction event threshold value
CN107871190A (en) * 2016-09-23 2018-04-03 阿里巴巴集团控股有限公司 A kind of operational indicator monitoring method and device
US20180307740A1 (en) * 2017-04-20 2018-10-25 Microsoft Technology Licesning, LLC Clustering and labeling streamed data
CN110135856A (en) * 2019-05-16 2019-08-16 中国银联股份有限公司 A kind of repeat business risk monitoring method, device and computer readable storage medium
CN110457159A (en) * 2019-08-21 2019-11-15 深圳前海微众银行股份有限公司 A kind of method, apparatus, calculating equipment and the storage medium of processing batch tasks

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448808A (en) * 2021-08-30 2021-09-28 北京必示科技有限公司 Method, system and storage medium for predicting single task time in batch processing task

Also Published As

Publication number Publication date
CN110457159A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
WO2021032056A1 (en) Method and apparatus for processing batch tasks, computing device and storage medium
US10579396B2 (en) System and automated method for configuring a predictive model and deploying it on a target platform
US8296205B2 (en) Connecting decisions through customer transaction profiles
US20200134716A1 (en) Systems and methods for determining credit worthiness of a borrower
CN110246029A (en) Risk management method, terminal, device and readable storage medium storing program for executing after loan
US20150221027A1 (en) Systems and methods for optimizations involving insufficient funds (nsf) conditions
US20230111785A1 (en) Machine-learning techniques to generate recommendations for risk mitigation
Cheng et al. A reinforced urn process modeling of recovery rates and recovery times
US11934971B2 (en) Systems and methods for automatically building a machine learning model
JP2016099915A (en) Server for credit examination, system for credit examination, and program for credit examination
WO2019194696A1 (en) Automated system for creating and managing scoring models
US20210201400A1 (en) Intelligent servicing
US20180330268A1 (en) Method for adaptive tuning via automated simulation and optimization
CN112766814A (en) Training method, device and equipment for credit risk pressure test model
CN113537631B (en) Medicine demand prediction method, device, electronic equipment and storage medium
US20150294328A1 (en) Customer Relationship Prediction and Valuation
CN111899093B (en) Method and device for predicting default loss rate
US20210049687A1 (en) Systems and methods of generating resource allocation insights based on datasets
US20220067460A1 (en) Variance Characterization Based on Feature Contribution
CN113421014A (en) Target enterprise determination method, device, equipment and storage medium
Xu et al. Profit-and risk-driven credit scoring under parameter uncertainty: A multiobjective approach
Yang et al. Assessing Markov property in multistate transition models with applications to credit risk modeling
US20230394069A1 (en) Method and apparatus for measuring material risk in a data set
Kuznietsova et al. Adaptive Approach to Building Risk Models of Financial Systems.
US20240095605A1 (en) Systems and methods for automated risk analysis of machine learning models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20853813

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20853813

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.08.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20853813

Country of ref document: EP

Kind code of ref document: A1