WO2021032056A1

WO2021032056A1 - Method and apparatus for processing batch tasks, computing device and storage medium

Info

Publication number: WO2021032056A1
Application number: PCT/CN2020/109572
Authority: WO
Inventors: 王磊; 江旻; 李斌; 黄俏龙; 席俊杰
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2019-08-21
Filing date: 2020-08-17
Publication date: 2021-02-25
Also published as: CN110457159A

Abstract

The present application relates to the technical field of financial technology, and disclosed therein are a method and apparatus for processing batch tasks, a computing device and a storage medium, which are used to effectively monitor abnormalities in batch tasks. The method comprises: determining the actual data volume of batch tasks when the batch tasks satisfy a batch processing trigger condition; according to historical batch processing data corresponding to the batch tasks, determining a reference data volume range for the batch processing of the batch tasks; if the actual data volume does not fall within the reference data volume range, then blocking the current batch processing of the batch tasks; and if the actual data volume falls within the reference data volume range, then performing batch processing on a target batch task according to the actual data volume. Historical batch processing data serves as the basis of the described solution, and the accuracy of batch prediction may be improved, thereby making an accurate batch processing decision.

Description

Method, device, computing equipment and storage medium for processing batch tasks

Cross references to related applications

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910775666.7, and the application name is "a method, device, computing device and storage medium for processing batch tasks" on August 21, 2019, all of which The content is incorporated in this application by reference.

Technical field

This application relates to the field of financial (Fintech) computer technology, and in particular to a method, device, computing device and storage medium for processing batch tasks.

Background technique

With the development of computer technology, more and more technologies are applied in the financial field. The traditional financial industry is gradually changing to Fintech. Batch processing technology is no exception. However, due to the security and real-time requirements of the financial industry, Also put forward higher requirements for batch processing technology. With the increasing number of batch tasks in the financial industry, the amount of data processed by the batch system is getting larger and larger, and the influence is getting wider. If the batch system is abnormal, a wide range of data errors will occur, and most of these error data are irreversible.

However, there is no effective monitoring method for the abnormality of batch tasks, which is a problem that needs to be solved at present.

Summary of the invention

The embodiments of the present application provide a method, device, computing device, and storage medium for processing batch tasks, which are used to effectively monitor abnormalities of batch tasks.

In a first aspect, a method for processing batch tasks is provided, and the method includes:

Determine whether the batch task meets the preset batch processing trigger conditions;

When the batch processing trigger condition is met, determine the actual data volume of the batch task;

Determine a reference range of data volume for batch processing of the batch task according to the historical batch processing data corresponding to the batch task;

If the actual data amount is not within the data amount reference range, block this batch processing of the batch task;

If the actual data volume is within the data volume reference range, batch processing is performed on the target batch task according to the actual data volume.

In a possible design, according to the historical batch data corresponding to the batch task, determining the data volume reference range for batch processing of the batch task includes:

Parse out the type of data to be processed in the batch task;

In the historical batch data, determine the reference feature object corresponding to the data type to be processed, and retrieve the target data value corresponding to each reference feature object, and the target data value is used to represent the reference feature object in Corresponding data value within the preset duration;

Determine the amount of reference data corresponding to each reference feature object according to the pre-trained batch task prediction model corresponding to the batch task and the target data value corresponding to each reference feature object, and determine the amount of reference data corresponding to each reference feature object Reference data volume, determining the data volume reference threshold of the batch task; wherein the batch task prediction model is obtained by training according to the data value corresponding to the reference feature object in the historical batch data;

Determine the data amount reference range according to the data amount reference threshold of the batch task.

In a possible design, the batch task prediction model is trained in the following manner:

From all the characteristic objects included in the historical batch data, determine the reference characteristic object according to a preset selection strategy; wherein each characteristic object has an association relationship with the data volume corresponding to the batch processing of the data type to be processed;

Determine a plurality of historical time periods from the historical batch data, and respectively extract data values corresponding to each of the reference feature objects in each historical time period;

According to the data value corresponding to each of the reference feature objects in each historical time period, the initial batch task prediction model is trained to obtain the trained batch task prediction model.

In a possible design, from all the characteristic objects included in the historical batch data, determining the reference characteristic object according to a preset selection strategy includes:

Determine the correlation between each characteristic object and the data volume corresponding to the batch processing of the data type to be processed;

The feature object whose correlation degree meets the preset screening condition is determined as the reference feature object.

In a possible design, determining a feature object whose correlation degree meets a preset screening condition as the reference feature object includes:

All feature objects whose correlation degree is greater than a predetermined correlation degree threshold are determined as the reference feature objects; or,

According to the descending order of the correlation degree, a predetermined number of feature objects located in front are determined as the reference feature objects.

From all the feature objects, a predetermined feature object is selected as the reference feature object.

In a possible design, the amount of reference data corresponding to each of the reference feature objects includes:

Determining the object increment of each of the reference feature objects within the first predetermined time period;

Determine the data increment corresponding to each reference feature object according to the batch task prediction model and the object increment of each reference feature object within the first predetermined time period;

Determine the reference data amount corresponding to each reference feature object according to the data amount of each reference feature object in the second predetermined time period and the data increment within the first predetermined time period.

In a possible design, determining the data amount reference threshold of the batch task according to the reference data amount corresponding to each of the reference feature objects includes:

Determine the average increase range of the data volume of the batch task within the third predetermined time period;

Determine the data amount reference threshold of the batch task according to the average growth rate and the reference data amount corresponding to each of the reference feature objects.

In a possible design, the actual data amount and the data amount reference range both include the batch number of the batch tasks and the total amount corresponding to all the tasks.

In a second aspect, an apparatus for processing batch tasks is provided, and the apparatus includes:

The first determining module is used to determine whether the batch task meets the preset batch processing trigger condition;

The second determining module is configured to determine the actual data volume of the batch task when the batch processing trigger condition is satisfied;

The third determining module is configured to determine the reference range of the data volume for batch processing of the batch task according to the historical batch processing data of the batch task;

The batch blocking module is configured to block this batch processing of the batch task if the actual data amount is not within the reference range of the data amount;

The batch execution module is configured to perform batch processing on the target batch task according to the actual data volume if the actual data volume is within the data volume reference range.

In a possible design, the third determining module is used to:

Parse out the type of data to be processed in the batch task;

In a possible design, the device further includes a model training module for:

In a possible design, the model training module is used to:

Determine the data volume reference threshold value of the batch task according to the average increase service range and the reference data volume corresponding to each of the reference feature objects.

In a third aspect, there is provided an apparatus for processing batch tasks, including at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processing The device executes the steps of the method for processing batch tasks described in any one of the above-mentioned first aspects.

In a fourth aspect, a storage medium is provided, the storage medium stores computer instructions, and when the computer instructions are executed on a computer, the computer executes the steps of the method for processing batch tasks in any one of the first aspects above .

In the embodiment of this application, when the batch processing trigger condition that requires batch processing of the batch task is met, the actual data volume of the batch task can be determined, and the batch processing data of this batch can be determined according to the historical batch processing data of the batch task. The data volume reference range, and then the batch decision is made by comparing the actual data volume with the data volume reference range. Specifically, when the actual data volume is within the data volume reference range, this batch task is considered to be the same as the usual batch task The processing is similar, it can be considered that no abnormality has occurred, so at this time, you can directly perform batch processing based on the actual data volume to ensure the timeliness of batch task processing, and when the actual data volume is not within the reference range of the data volume, you can It is considered that this batch task does not match the historical processing situation. At this time, it can be considered that the batch processing process may be abnormal, and then the processing of the batch task can be blocked at this time. In this way, combining the historical batch processing data of batch tasks can improve the accuracy of batch decision-making and monitor batch operation in real time. If an abnormality is found, the batch will be blocked in time to ensure the correctness of the data, thereby avoiding the loss caused by the abnormal batch and enhancing users Experience.

It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the present invention.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are the embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without creative work.

FIG. 1 is a schematic diagram of a process of a method for processing batch tasks in an embodiment of the application;

2 is a flowchart of a method for processing batch tasks in an embodiment of the application;

FIG. 3 is a flowchart of using a batch task prediction model to determine a reference range of data amount in an embodiment of the application;

FIG. 4 is a structural block diagram of an apparatus for processing batch tasks in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a computing device in an embodiment of the application;

FIG. 6 is a schematic diagram of another structure of a computing device in an embodiment of the application.

detailed description

In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention. In the case of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other arbitrarily. Also, although a logical sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than here.

The terms "first" and "second" in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the term "including" and any variations of them are intended to cover non-exclusive protection. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.

In the embodiment of the present invention, “a plurality of” may mean at least two, for example, it may be two, three or more, which is not limited in the embodiment of the present invention.

In addition, the term "and/or" in this article is only an association relationship describing associated objects, which means that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, and both A and B exist. There are three cases of B alone. In addition, the character "/" in this text, unless otherwise specified, generally indicates that the associated objects before and after are in an "or" relationship.

The following describes the design ideas of this application.

As mentioned above, effective monitoring of abnormalities in various batch tasks in the financial industry is a technical problem that needs to be solved urgently. In order to be able to catch anomalies in batch processing, a batch monitoring system is essential. Taking banking institutions as an example, most of the abnormal monitoring of the traditional bank batch system is lagging, that is, the batch abnormality problem is generally found after the bad results caused by the batch abnormality. In order to solve the problem of monitoring lag, many bank batch systems A simple bulk blocking mechanism has also been adopted. For example, the method of setting a threshold is adopted. Before the data is processed, the amount of pre-processed data on the day is obtained through the program. If the amount of pre-processed data is found to exceed the threshold set in advance, the batch will be blocked. Although this method of setting thresholds can also ensure the correct operation of batches by predicting in advance, it is difficult to set the threshold accurately. At present, the staff generally set a threshold roughly based on experience. Due to the limited experience of different staff, And different staff may have some personal subjective cognitive biases. If the threshold is set too high, it may not be able to handle the abnormality and cause data errors. If the setting is too low, it will cause abnormal abnormal alarms. After the batch is blocked It was discovered that it was not abnormal, affecting the efficiency of batch operation.

In view of this, the implementation of this application provides a method for processing batch tasks, by which the batch system can be used to effectively monitor the abnormal processing of batch tasks. As shown in Figure 1, the historical data of batch tasks can be processed first based on BDP (Beagledata Platform, an enterprise-level big data middleware platform based on the Hadoop ecosystem), which can be used as forecast data for batch processing of the day. Further, the business system generates a batch deduction transaction result, that is, the actual data volume for this batch processing, and compares the obtained actual data volume with the predicted data to make a final batch processing decision. That is to say, when performing this batch processing, the embodiment of the application can fully take into account the historical batch processing, that is, use historical data as the basis to dig out data features from the historical massive data. Features are analyzed to output batch decisions. By combining historical batch processing data for this batch prediction, try to use historical batch processing as a reference basis, so as to improve the accuracy of batch prediction as much as possible, monitor batch operation in real time, and block the batch in time if abnormalities are found. To ensure the correctness and timeliness of data batch processing can avoid losses caused by batch exceptions.

In order to further illustrate the technical solutions provided by the embodiments of the present application, this will be described in detail below with reference to the drawings and specific implementations. Although the embodiments of the present application provide method operation steps as shown in the following embodiments or drawings, more or fewer operation steps may be included in the method based on conventional or no creative labor. In steps where there is no necessary causal relationship logically, the execution order of these steps is not limited to the execution order provided in the embodiments of the present application. The method can be executed sequentially or in parallel according to the methods shown in the embodiments or the drawings during actual processing or when executed by the device.

Based on the foregoing content, embodiments of the present application provide a method for processing batch tasks, which can be deployed in systems that require batch task processing, such as banks and credit platforms. Referring to FIG. 2, the flow of the method for processing batch tasks in the embodiment of the present application is described as follows.

Step 201: Determine whether the batch processing trigger condition of the batch task is satisfied.

As mentioned above, the financial industry includes multiple types of batch tasks, such as batch salary transfer, loan batch deduction, and other services. The batch task in the embodiment of the present application may be any possible type of batch task. For different types of batch tasks, banks can perform batch processing at different time nodes. For example, online transaction business processing usually requires the banking business processing system to respond quickly and return in real time. Therefore, in order to avoid affecting online business, batch tasks can be processed in online business. The occurrence probability is low, such as batch execution at night or early morning. Therefore, in a possible implementation manner, the batch processing trigger condition may be to reach the preset processing time node of the batch task. In other implementation manners For example, bank staff can also manually trigger batch processing of batch tasks, so the batch processing trigger condition can also be that the batch processing system receives a processing request for batch tasks, and so on.

When it is determined that the batch task meets the batch processing trigger condition, it means that the batch task needs to be batch processed. At this time, step 202 can be further performed. If it is determined that the batch processing trigger condition is not met, it indicates that the batch task is not needed yet. Perform batch processing, and further can continue to determine whether the batch processing trigger condition is satisfied.

Step 202: Determine the actual data volume of the batch task.

Taking the batch deduction business in the credit business as an example, when batch deductions are needed, the batch processing system of financial institutions can calculate each repayment based on the borrowing amount of each user, the number of repayment periods, and interest calculation rules. The current repayment amount of the business, and then based on all users who need to batch deduct this time, calculate the total number of deductions and the total deduction amount of this batch deduction, and the calculated total of this batch deduction The number of deductions and the total amount of deductions can be understood as, for example, the actual amount of data for batch tasks such as the batch deduction business in this embodiment of the application, that is, when it is determined that the batch tasks need to be processed in batches, you can first The actual amount of data that needs to be processed locally for batch processing of the batch task is calculated. The actual data amount in the embodiment of the present application is the actual batch processing basis calculated by the batch processing system according to the existing system rules.

Step 203: According to the historical batch processing data of the batch task, determine the data volume reference range for batch processing of the batch task.

In the process of batch processing, the batch processing system involves a large amount of calculation and generally takes a long time. If an abnormality occurs during the period, it may cause errors in the calculated data related to the batch deduction, such as calculation The deducted amount is more than the amount that the user should repay this time. This may cause the user to complain, or the calculated deduction amount is less than the amount that the user should repay this time, which may cause losses to the bank. Moreover, because it is batch processing, if there is an error in the calculation for one user, the same problem may generally occur for every other user in this batch processing. When the number of deductions for batch processing is large, then the same problem may occur. The greater the error.

In view of this, in order to monitor possible abnormalities in the batch processing system, and try to ensure the correctness and effectiveness of batch task processing, in the embodiment of this application, historical batch processing data is used as a reference basis to compare the batch The way the task is processed by comparison, in this way, through the historical big data as a reference, it can indicate to a certain extent the overall processing situation and change trend of the batch task in the recent period of time, so that the batch task can be processed more accurately Make predictions so that the batch processing system can finally make accurate batch processing decisions, that is, whether to block batch processing or perform batch processing to improve the effectiveness of batch processing.

For this reason, in the embodiments of the present application, for batch tasks of the same type, the historical batch processing data of the batch task can be used to determine the data volume reference range for batch processing of the batch task, and then the data volume reference range is used as a comparison The basis to determine whether this batch processing is abnormal. For example, taking the batch deduction business as an example, the historical batch processing data of all loan users in the last month can be obtained, or the historical batch processing data of all loan users included in the last 100 deductions time point can be obtained, and then based on these data To predict the reference range of the data volume for batch processing this time, because the batch tasks that have been successfully and correctly executed before are generally executed under the normal conditions of the batch processing system, so through the processing of a large number of batch tasks that have been successfully and correctly executed The use of data to predict the next batch processing is of certain guiding significance. In other words, the data volume reference range in the embodiments of the present application can be regarded as the approximate data volume range when the batch processing system normally performs batch processing.

In the specific implementation process, the order of execution of step 202 and step 203 can be arbitrary. For example, step 202 can be executed first and then step 203 can be executed, or step 203 can be executed first and then step 202 can be executed, or both steps can be executed simultaneously. The application examples are not limited.

Step 204: The actual data volume is compared with the data volume reference range to determine whether the actual data volume is within the data volume reference range.

After the data volume reference range is obtained, the data volume reference range can be used as a comparison basis to determine whether the actual data volume is within the normal range, so as to predict and guide the later batch processing through batch processing of historical data.

In the embodiment of the present application, the actual data volume and the reference range of the data volume can both include the number of business batches of batch tasks and the total amount corresponding to all the numbers of businesses. Continue to take the batch deduction business as an example, then the actual data volume and The reference range of data volume can include the number of deductions that need to be performed for the deduction business, and the total deduction amount corresponding to all deductions. In this way, the total amount can be used to reflect whether each deduction business is Something went wrong. Because in general, the batch processing system has the same calculation and processing methods for each deduction business. If one of the deductions is calculated incorrectly, then the other deductions for this batch deduction will also appear Similar errors, and a user’s deduction error may not be obvious. For example, the interest of a certain user is calculated by 5 yuan, and if the number of deductions in this batch processing is 1,000, then these 1,000 deductions The total error of the business may be thousands or even tens of thousands, so the calculation error can be more obviously and effectively monitored by the total amount, and then the system abnormality can be determined to improve the effectiveness of abnormal monitoring.

Step 205: When the actual data amount is within the reference range of the data amount, batch processing is performed on the batch task according to the actual data amount.

If the actual data volume of the batch task is within the data volume reference range, it means that the batch processing data calculated according to the existing batch processing system is within the normal range, which can indicate that the batch processing system is normal, that is, the batch processing system is normal. An exception occurs, so at this time, the actual amount of data can be determined to perform batch processing on batch tasks, such as deducting the sum of the current repayment amount of 300 loan users at the same time.

Step 206: When the actual data amount is not within the reference range of the data amount, block this batch processing of the batch task.

If the actual data volume is not within the reference range of the data volume, it indicates that there is a big difference between this batch processing and the previous historical processing process. This may be due to a large number of major business mutations in the deduction business itself. This is generally not consistent with the normal smooth business characteristics, indicating that the batch processing system may have been abnormal to a large extent. At this time, in order to ensure the accuracy of the batch task execution, you can block this time in time Batch execution of batch tasks. After the batch task is blocked, the blocking alarm information can be further output to provide effective warning to the staff of the business interruption, so that the staff can confirm in time whether the batch system is indeed abnormal, and can proceed in time when an abnormality occurs System maintenance and repair, in order to eliminate the abnormality as soon as possible to restore the normal business capacity of the batch system. At the same time, it can also be calculated again, that is, the second batch processing calculation is performed to process the batch tasks in time.

In the embodiments of this application, historical batch processing data is used as a reference, which can reflect the possibility of abnormalities in this batch task to a certain extent, and then can effectively predict the abnormalities that occur in the batch processing system to ensure that the batch tasks can be accurate and effective carried out.

In the specific implementation process, for the manner in which the data volume reference range is determined according to the historical batch processing data of the batch task in the above step 203, the embodiment of the present application provides the following two implementation manners.

The first way to determine

In the first determination method, the historical batch processing data corresponding to the batch task is analyzed through machine learning technology, the batch task prediction model is trained through the quantitative analysis of the massive data, and then the batch task prediction model is trained through the trained batch task prediction model. The data volume of each batch processing process of the task is predicted with reference to the range. The following describes the prediction process based on the batch task prediction model.

The following first introduces the training process of training a batch task prediction model based on historical batch processing data.

The model training process may include feature selection, model selection, and model training based on the selected features, which are described in detail below.

1) Feature selection. Feature selection is crucial to the construction of machine learning models. Good features can improve the performance of the model and help understand the characteristics and underlying structure of the data. This plays an important role in further improving the model and algorithm. However, using too many variables as model training features may cause the model to become inaccurate, especially when there are model training features that have no effect on the output result or have a greater impact on other variables. Take the bulk deduction business as an example. The independent variable characteristics that have an associated influence on the self-deduction data include the number of accounts (the number of all loan accounts), the number of loan data (the number of all loans), and the number of overdue loan data (there are overdue repayments). Loan number), loan type, installment method, user credit rating, installment period, loan balance, etc. Among so many independent variable features, how to select some specific features as model training features for convenience It is a problem that needs to be considered. In order to avoid the influence of too many variables on the accuracy of model training, in the embodiments of this application, the correlation degree is used as a screening basis to select features for model training. For example, a preset selection strategy can be used to The feature of the reference object is selected, and the preset selection strategy is based on the degree of relevance.

In a possible implementation, the type of data to be processed in the batch task can be parsed first. Taking the batch deduction business as an example, the type of data to be processed can be understood as self-deducted data, and the type of data to be processed is batch processed. The corresponding data volume can be understood as the total amount of automatic batch deductions, and then all the characteristic objects that have an association relationship (such as positive or negative influence) with the type of data to be processed are determined. The characteristic object is the aforementioned example Independent variable characteristics such as the number of accounts and the number of loans. Then for each feature object, calculate the correlation between each feature object (for example, it can be regarded as an independent variable) and the amount of data to be processed for batch processing of the data type (for example, it can be regarded as a dependent variable), for example, The Pearson correlation coefficient is used to calculate the correlation between each independent variable and the dependent variable, or other methods of calculating the correlation can also be used to determine the correlation between each independent variable and the dependent variable. After the correlation degree corresponding to each feature object is obtained, the feature object with a greater correlation degree can be selected as the input feature for the final model training. For example, the feature object that is finally used for the model training can be called a reference feature object. After the correlation between each feature object (ie, the feature of the independent variable) and the dependent variable is obtained, a feature object with a correlation that meets a preset screening condition can be selected from all feature objects as a reference feature object for model training. For example, a feature object with a correlation greater than or equal to a predetermined correlation threshold can be used as a reference feature object. To ensure that the selected reference feature object has a strong correlation with the dependent variable, the predetermined correlation threshold can be set slightly larger , For example, set to 80%; for another example, a predetermined number of feature objects arranged in front can be determined as reference feature objects according to the order of relevance, that is, the number of reference feature objects can be set first, such as 4 Then, the 4 feature objects with the highest correlation can be selected as the final reference feature objects.

In another possible implementation manner, the user can pre-set the predetermined feature object as the reference feature object based on experience. For example, based on experience, the three types of predetermined feature objects, namely the number of accounts, the number of debit data, and the number of overdue debit data, can be deducted from the data. The influence of is strongly correlated, so these three predetermined feature objects can be directly selected from all feature objects as the final reference feature object.

Because the higher the correlation degree, the stronger the positive correlation between the corresponding independent variable feature and the dependent variable, which means that the independent variable feature has a greater impact on the dependent variable, so some reference feature objects with high correlation are used For model training, under the premise of ensuring the accuracy of the model, the number of training features can also be minimized to further improve the accuracy of model training.

2) Model selection. In terms of model selection, the embodiment of the application adopts the idea of regression fitting, and selects an appropriate regression function by observing the data distribution of the dependent variable and the independent variable. Through analysis, there is an obvious linear relationship between the self-deduction data and the selected reference feature object with strong correlation, that is, the increase of the self-deduction data volume increases with the growth of the reference feature object, and shows a linear growth, so For example, the bulk deduction business should be based on stable and linear growth. That is to say, the self-deduction data has an obvious explicit relationship with the growth of each reference feature object. Therefore, multiple linear regression (MLR) models can be used for data prediction, that is, multiple linear regression models can be selected as the initial model For model training. Multiple linear regression finds a linear equation through known data to describe the relationship between two or more features (independent variables) and output (dependent variables), and uses this linear equation to predict the results.

The mathematical form of multiple linear regression is as follows:

y=b0+b1x1+b2x2+b3x3+...+bnxn.

In the above formula, y represents the dependent variable; x1, x2, x3 represent the independent variables; b1, b2, b3 are the independent variable coefficients corresponding to x1, x2, and x3, and b1, b2, and b3 can also be understood as corresponding to x1, x2 and x3 are the independent variable weights of these independent variables, and the weight can reflect the influence of the independent variable on the dependent variable. For example, the higher the correlation degree, the greater the weight of the independent variable, indicating that the greater the influence on the dependent variable; b0 can It is understood as a self-defined constant. It is not used according to the type of dependent variable. B0 can be set to a different value. In the specific implementation process, b0 can also be set to 0.

3) Training model. After the initial training model is selected, the initial training model can be model-trained according to the selected reference feature object to obtain a trained batch task prediction model.

First, multiple historical time periods can be determined from the historical batch data. For example, if the historical batch data is one month of data, then one month can be divided into six equal-length histories at the same time interval (for example, 5 days) Time period, that is, Day 1-Day 5 is the first historical time period, Day 6-Day 10 is the second historical time period, Day 11-Day 15 is the third historical time period, 6 days-10th day is the second historical time period, 11th-15th day is the third historical time period, 16th-20th day is the fourth historical time period, 21st-25th day Day is the fifth historical time period, and the 26th-30th day is the sixth historical time period.

Then, extract the data values corresponding to each reference object in each historical time period. For example, you can extract the number of accounts, the number of debit data, and the number of overdue debit data in each historical time period, and you can get 6 sets of account numbers and debit data. , The number of overdue loans.

Further, the initial batch task prediction model (ie, the aforementioned multiple linear regression model) can be trained according to the data value corresponding to each reference feature object in each historical time period to obtain the trained batch task prediction model. Specifically, the above 6 sets of data can be substituted into the above multiple linear regression equation, that is, the independent variables x1, x2, and x3 in the above multiple linear regression equation can be substituted with the above 6 sets of data, so that b1, b2, b3, that is, the corresponding independent variable weights of these independent variables x1, x2, and x3 can be obtained. For example, the calculated b1, b2, and b3 are 0.4, 0.6, and 0.5 respectively, then the batch task prediction model obtained by training is: y= b0+0.4x1+0.6x2+0.5x3.

It should be noted that the above is only a simpler way of understanding the model training process. In a specific model training process, it may generally include multiple rounds of iterative training. For example, b1, b2, b3 can be calculated by the least square method. I will not elaborate again.

In addition, for the batch task prediction model obtained by training (ie, y=b0+0.4x1+0.6x2+0.5x3), x1, x2, and x3 can represent the actual number of reference feature objects, and can also represent the number of reference objects. Object increment. In the specific implementation process, you can customize x1, x2, and x3 according to actual business needs.

After the batch task prediction model for batch tasks is obtained through the above description, the batch task prediction model can be used to predict the data volume reference range for this batch processing of the batch task. The following describes the prediction process in conjunction with the flow shown in Figure 3 Be explained.

Step 301: parse out the data type to be processed in the batch task.

As mentioned above, depending on the type of batch task, the corresponding data type to be processed may also be different. Taking the automatic batch deduction service as an example, the type of data to be processed is, for example, self-deducted data.

Step 302: Determine the reference feature object corresponding to the type of data to be processed in the historical batch data.

Continuing the above batch deduction business as an example, the reference feature objects corresponding to the self-deduction data are, for example, the number of accounts, the number of debit data, the number of overdue debit data, and the loan balance mentioned in the foregoing embodiment.

Step 303: Retrieve the target data value corresponding to each reference feature object.

Among them, the target data value corresponding to the reference feature object is used to characterize the data value corresponding to the reference feature object within a preset period of time. For example, taking historical batch processing data within 1 month, the target data value corresponding to each reference feature object is The data value corresponding to each reference feature object within the one month.

Step 304: Determine the amount of reference data corresponding to each reference feature object according to the pre-trained batch task prediction model and the target data value corresponding to each reference feature object.

Based on the batch task prediction model obtained by the above training, that is, y=b0+0.4x1+0.6x2+0.5x3, the reference data amount corresponding to each reference feature object can be substituted for x1, x2, x3 in the formula, and then we can get The values corresponding to 0.4x1, 0.6x2, and 0.5x3, and the values corresponding to 0.4x1, 0.6x2, and 0.5x3 are the amount of reference data corresponding to each reference feature object.

Step 305: Determine the data volume reference threshold value of the batch task according to the reference data volume corresponding to each reference feature object, so as to obtain the data volume reference range for processing the batch task.

And because b0 is a self-defined constant, according to the reference data volume corresponding to each reference feature object obtained by the above calculation, the value of y can be calculated accordingly, that is, the data volume reference threshold of the batch task is obtained, and then set according to some threshold ranges Conditions, the reference range of the data volume corresponding to the batch task can be determined accordingly.

As mentioned above, independent variables such as x1, x2, x3 in the trained batch task prediction model can represent the object increment of the corresponding reference feature object, which can determine the object of each reference feature object within the first predetermined period of time. Increment, for example, the object increment within 1 month, and then according to the batch task prediction model and the object increment of each reference feature object within the first predetermined time period, the data increment corresponding to each reference feature object is determined, and finally according to The data amount of each reference feature object in the second predetermined time period (for example, the last data amount before this batch task processing or the average data amount of the last several times) and the data increment in the first predetermined time period are determined The amount of reference data corresponding to each reference feature object. Further, determine the average growth rate of the data volume of the batch task within the third predetermined period of time (for example, June), and determine the final data corresponding to the batch task according to the average growth rate and the reference data volume corresponding to each reference feature object The amount refers to the threshold.

According to the idea of the above incremental calculation, for example, the obtained prediction formula is: the incremental deduction data of the day = the average growth rate of 6 months + the number of new accounts in the last month * account weight + the number of new debit data in the last month * debit data Weight + new loan balance last month * loan balance weight. Corresponding to the above-mentioned batch task prediction model, that is, the corresponding y=b0+0.4x1+0.6x2+0.5x3, b0 is the average growth rate of 6 months, which is a known constant, x1, x2, and x3 respectively represent the above The number of new accounts per month, the number of new borrowing notes in the previous month, and the new loan balances last month, and the account weight, IOU weight, and loan balance weight correspond to 0.4, 0.6, and 0.5. Through this formula, it is possible to predict the increase in batch deduction data on the day (that is, the data increase in this batch task), and then compare it with the actual deduction data amount of the last batch task (or the actual amount of the last batch tasks). The average value of the deduction data amount) can be added together to obtain the reference threshold value of the data amount of this batch task, so as to realize the accurate prediction of the total deduction amount of the batch automatic deduction business.

In the first determination method, machine learning is used to realize the analysis, induction and application of historical data, and then the machine model trained by machine learning can be used to make effective predictions, which improves the intelligence of the entire batch processing system. The prediction efficiency of the machine model is also high, which can improve the efficiency of the prediction, thereby improving the processing efficiency and timeliness of batch tasks.

The second way to determine

It is possible to obtain historical batch processing statistics data that have been processed for multiple batch tasks within a predetermined period of time (for example, 1 month or 10 days or 15 days), and then use the predetermined processing method to process the batch task statistical data, such as using the aforementioned machine The calculation idea of learning dynamically calculates the reference range of data volume for this batch task.

That is to say, in the second determination method, the data volume reference range can be dynamically and real-time predicted temporarily through the algorithm. In this way, when the prediction is made, the corresponding predetermined time can be flexibly set, for example, according to The batch deduction time is reversed, and the historical business processing data of the most recent 500 batch deductions is used as the calculation basis for this forecast, so that each historical business processing data is the latest historical data, so that the The processing of batch tasks near time is fully taken into consideration, and according to the principle that the closer the closer, the greater the correlation, so this method can ensure the accuracy of the prediction to a certain extent.

In the embodiments of the present application, historical batch processing can be fully taken into consideration, that is, based on historical data, data features are mined from historical massive data, and batch decisions are output by analyzing these data features. In this way, the accuracy of batch prediction can be improved, the batch operation can be monitored in real time, and the batch can be blocked in time if an abnormality is found to ensure the correctness of the data batch processing, thereby avoiding the loss caused by the batch exception.

Based on the same inventive concept, an embodiment of the present application provides an apparatus for processing batch tasks. The apparatus for processing batch tasks can implement the method for processing batch tasks in the foregoing embodiments. As shown in FIG. 4, the apparatus for processing batch tasks in the embodiment of the present application includes a first determination module 401, a second determination module 402, a third determination module 403, a batch blocking module 404, and a batch execution module 405, where:

The first determining module 401 is configured to determine whether the batch task meets preset batch processing trigger conditions;

The second determining module 402 is configured to determine the actual data volume of the batch task when the batch processing trigger condition is met;

The third determining module 403 is configured to determine the reference range of the data volume for batch processing of the batch task according to the historical batch processing data of the batch task;

The batch blocking module 404 is configured to block the current batch processing of the batch task if the actual data amount is not within the data amount reference range;

The batch execution module 405 is configured to perform batch processing on the batch tasks according to the actual data volume if the actual data volume is within the data volume reference range.

In a possible implementation manner, the third determining module 403 is configured to:

Analyze the type of data to be processed in the batch task;

In the historical batch data, determine the reference feature object corresponding to the data type to be processed, and retrieve the target data value corresponding to each reference feature object. The target data value is used to represent the data value corresponding to the reference feature object within a preset time period;

According to the pre-trained batch task prediction model corresponding to the batch task, the target data value corresponding to each reference feature object is determined, and the reference data amount corresponding to each reference feature object is determined, and the reference data amount corresponding to each reference feature object is determined. Data volume reference threshold; among them, the batch task prediction model is trained based on the data value corresponding to the reference feature object in the historical batch data;

Determine the data volume reference range according to the data volume reference threshold of the batch task.

In a possible design, the device for processing batch tasks in the embodiment of the present application further includes a model training module 406, which is used to:

From all the feature objects included in the historical batch data, determine the reference feature object according to a preset selection strategy; wherein, each feature object has an association relationship with the data volume corresponding to the batch processing of the data type to be processed;

Determine multiple historical time periods from historical batch data, and extract the data values corresponding to each reference feature object in each historical time period;

According to the data value corresponding to each reference feature object in each historical time period, the initial batch task prediction model is trained to obtain the trained batch task prediction model.

In one possible design, the model training module 406 is used to:

Determine the correlation between each feature object and the data volume corresponding to the batch processing of the data type to be processed;

The feature object whose correlation degree meets the preset filtering condition is determined as the reference feature object.

In one possible design, the model training module 406 is used to:

All feature objects whose correlation degree is greater than a predetermined correlation degree threshold are determined as reference feature objects; or,

According to the descending order of the correlation degree, a predetermined number of feature objects located in the front are determined as reference feature objects.

In one possible design, the model training module 406 is used to:

Determine the object increment of each reference feature object within the first predetermined time period;

According to the data amount of each reference feature object within the second predetermined time period and the data increment within the first predetermined time period, the reference data amount corresponding to each reference feature object is determined.

In one possible design, the model training module 406 is used to:

Determine the average growth rate of the data volume of the batch task within the third predetermined time period;

According to the average growth service range and the reference data volume corresponding to each reference feature object, the data volume reference threshold of the batch task is determined.

In a possible design, the actual data volume and the data volume reference range both include the batch number of batch tasks and the total amount corresponding to all the number of tasks.

All relevant content of the steps involved in the foregoing embodiment of the method for processing batch tasks can be cited in the functional description of the functional module corresponding to the device for processing batch tasks in the embodiment of the present application, and will not be repeated here.

The division of modules in the embodiments of the present application is illustrative, and is only a logical function division. In actual implementation, there may be other division methods. In addition, the functional modules in the various embodiments of the present application may be integrated into one process. In the device, it can also exist alone, or two or more modules can be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.

Based on the same inventive concept, an embodiment of the present application also provides a computing device. As shown in FIG. 5, the computing device in the embodiment of the present application includes at least one processor 501, and a memory 502 connected to the at least one processor 501 and communication Interface 503, the specific connection medium between the processor 501 and the memory 502 is not limited in the embodiment of the present application. In FIG. 5, the connection between the processor 501 and the memory 502 is taken as an example. The bus 500 is shown in FIG. The thick line indicates that the connection mode between other components is only for schematic illustration and is not intended to be limiting. The bus 500 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only a thick line is used in FIG. 5 to represent it, but it does not mean that there is only one bus or one type of bus.

In the embodiment of the present application, the memory 502 stores instructions that can be executed by at least one processor 501, and the at least one processor 501 can execute the steps included in the aforementioned full-link performance test method by executing the instructions stored in the memory 502 .

Among them, the processor 501 is the control center of the computing device. It can use various interfaces and lines to connect to various parts of the entire computing device. By running or executing instructions stored in the memory 502 and calling data stored in the memory 502, the computing device Various functions and processing data of the computer, so as to monitor the computing equipment as a whole. Optionally, the processor 501 may include one or more processing modules, and the processor 501 may integrate an application processor and a modem processor, where the processor 501 mainly processes an operating system, a user interface, and application programs, etc. The adjustment processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 501. In some embodiments, the processor 501 and the memory 502 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.

The processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, Implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present application. The general-purpose processor may be a microprocessor or any conventional processor. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.

As a non-volatile computer-readable storage medium, the memory 502 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The memory 502 may include at least one type of storage medium, for example, it may include flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic memory, disk , CD, etc. The memory 502 is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 502 in the embodiment of the present application may also be a circuit or any other device capable of implementing a storage function for storing program instructions and/or data.

The communication interface 503 is a transmission interface that can be used for communication, and can receive data or send data through the communication interface 503, and then communicate with other devices.

Referring to the further structural diagram of the computing device shown in FIG. 6, the computing device also includes a basic input/output system (I/O system) 601 that helps to transfer information between various devices in the computing device, and is used to store an operating system 602. A mass storage device 605 for application programs 603 and other program modules 604.

The basic input/output system 601 includes a display 606 for displaying information and an input device 607 such as a mouse and a keyboard for the user to input information. Both the display 606 and the input device 607 are connected to the processor 501 through a basic input/output system 601 connected to the system bus 500. The basic input/output system 601 may also include an input and output controller for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus. Similarly, the I/O controller also provides output to a display screen, printer or other type of output device.

The mass storage device 605 is connected to the processor 501 through a mass storage controller (not shown) connected to the system bus 500. The mass storage device 605 and its associated computer readable medium provide non-volatile storage for the server package. That is, the mass storage device 605 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM drive.

According to various embodiments of the present invention, the computing device package can also be run by a remote computer connected to the network through a network such as the Internet. That is, the computing device can be connected to the network 606 through the communication interface 503 connected to the system bus 500, or in other words, the communication interface 503 can also be used to connect to other types of networks or remote computer systems (not shown).

Based on the same inventive concept, the embodiments of the present application also provide a storage medium. The storage medium is, for example, a computer-readable storage medium. The computer-readable storage medium stores computer instructions. When the computer instructions run on the computer, the computer Perform the steps of the method for processing batch tasks as described above.

In some possible implementation manners, the various aspects of the method for processing batch tasks provided in the embodiments of the present application can also be implemented in the form of a program product, which includes program code. When the program product runs on a computer, The program code is used to make the computer execute the steps in the method for processing batch tasks according to various exemplary embodiments of the present invention described above.

Those skilled in the art should understand that the embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may be in the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.

The present invention is described with reference to flowcharts and/or block diagrams of methods, equipment (systems), and computer program products according to embodiments of the present application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processors of general-purpose computers, special-purpose computers, embedded processors, or other programmable data processing equipment to generate a machine, so that instructions executed by the processor of the computer or other programmable data processing equipment are used It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims

A method for processing batch tasks, characterized in that the method includes:

Determine whether the batch task meets the preset batch processing trigger conditions;

When the batch processing trigger condition is met, determine the actual data volume of the batch task;

Determine a reference range of data volume for batch processing of the batch task according to the historical batch processing data corresponding to the batch task;

If the actual data amount is not within the data amount reference range, block this batch processing of the batch task;

If the actual data volume is within the data volume reference range, batch processing is performed on the target batch task according to the actual data volume.
The method according to claim 1, wherein determining a reference range of a data amount for batch processing of the batch task according to the historical batch data corresponding to the batch task comprises:

Parse out the type of data to be processed in the batch task;

In the historical batch data, determine the reference feature object corresponding to the data type to be processed, and retrieve the target data value corresponding to each reference feature object, and the target data value is used to represent the reference feature object in Corresponding data value within the preset duration;

Determine the amount of reference data corresponding to each reference feature object according to the pre-trained batch task prediction model corresponding to the batch task and the target data value corresponding to each reference feature object, and determine the amount of reference data corresponding to each reference feature object Reference data volume, determining the data volume reference threshold of the batch task; wherein the batch task prediction model is obtained by training according to the data value corresponding to the reference feature object in the historical batch data;

Determine the data amount reference range according to the data amount reference threshold of the batch task.
The method of claim 2, wherein the batch task prediction model is trained in the following manner:

From all the characteristic objects included in the historical batch data, determine the reference characteristic object according to a preset selection strategy; wherein each characteristic object has an association relationship with the data volume corresponding to the batch processing of the data type to be processed;

Determine a plurality of historical time periods from the historical batch data, and respectively extract data values corresponding to each of the reference feature objects in each historical time period;

According to the data value corresponding to each of the reference feature objects in each historical time period, the initial batch task prediction model is trained to obtain the trained batch task prediction model.
The method of claim 3, wherein, from all the feature objects included in the historical batch data, determining the reference feature object according to a preset selection strategy comprises:

Determine the correlation between each characteristic object and the data volume corresponding to the batch processing of the data type to be processed;

The feature object whose correlation degree meets the preset screening condition is determined as the reference feature object.
The method of claim 3, wherein, from all the feature objects included in the historical batch data, determining the reference feature object according to a preset selection strategy comprises:

From all the feature objects, a predetermined feature object is selected as the reference feature object.
3. The method according to claim 2, wherein determining the amount of reference data corresponding to each of the reference feature objects comprises:

Determining the object increment of each of the reference feature objects within the first predetermined time period;

Determine the data increment corresponding to each reference feature object according to the batch task prediction model and the object increment of each reference feature object within the first predetermined time period;

Determine the reference data amount corresponding to each reference feature object according to the data amount of each reference feature object in the second predetermined time period and the data increment within the first predetermined time period.
7. The method according to claim 6, wherein determining the data amount reference threshold of the batch task according to the reference data amount corresponding to each of the reference feature objects comprises:

Determine the average increase range of the data volume of the batch task within the third predetermined time period;

Determine the data amount reference threshold of the batch task according to the average growth rate and the reference data amount corresponding to each of the reference feature objects.
A device for processing batch tasks, characterized in that the device comprises:

The first determining module is used to determine whether the batch task meets the preset batch processing trigger condition;

The second determining module is configured to determine the actual data volume of the batch task when the batch processing trigger condition is satisfied;

The third determining module is configured to determine the reference range of the data volume for batch processing of the batch task according to the historical batch processing data of the batch task;

The batch blocking module is configured to block this batch processing of the batch task if the actual data amount is not within the reference range of the data amount;

The batch execution module is configured to perform batch processing on the target batch task according to the actual data volume if the actual data volume is within the data volume reference range.
A computing device, characterized by comprising at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processor executes the claims Steps of the method described in any one of 1-7.
A storage medium, characterized in that the storage medium stores computer instructions, and when the computer instructions run on a computer, the computer executes the steps of the method according to any one of claims 1-7.