CN113986487A

CN113986487A - Breakpoint redraw method and device of batch processing system

Info

Publication number: CN113986487A
Application number: CN202111217552.4A
Authority: CN
Inventors: 周青
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2021-10-19
Filing date: 2021-10-19
Publication date: 2022-01-28

Abstract

The embodiment of the invention provides a breakpoint redrawing method and device of a batch processing system. The method comprises the following steps: acquiring an abnormal name of an abnormal slave server after detecting that the slave server has abnormal execution in the sub-process of executing the task; for any abnormal type, determining the abnormal type corresponding to the abnormal name by judging whether the similarity between the abnormal name and the abnormal keyword of the abnormal type meets a preset threshold value; and if the exception type corresponding to the exception name is an optimistic lock exception, carrying out breakpoint redraw on the sub-process. Therefore, the condition that the batch processing system is paralyzed due to continuous redraw failure caused by blindly redraw of the breakpoint is avoided. Further, the abnormal type corresponding to the abnormal name is determined by determining whether the similarity between the abnormal name and the abnormal keyword of the abnormal type meets a preset threshold value. Manpower input is reduced, and the speed and accuracy of determining the abnormal type are improved.

Description

Breakpoint redraw method and device of batch processing system

Technical Field

The embodiment of the invention relates to the technical field of financial science and technology, in particular to a breakpoint redrawing method and device of a batch processing system, computing equipment and a computer-readable storage medium.

Background

With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but due to the requirements of the financial industry on safety and real-time performance, higher requirements are also put forward on the technologies.

For the application scenarios of first storage and then calculation, the requirement on real-time performance is not high, meanwhile, the data scale is large, and the calculation model is complex, so that the method is suitable for batch processing. The financial industry, especially banks, generally have batch processing jobs, and due to the nature of the financial industry, it is necessary to ensure that processed data is accurate, safe and not lost, which requires that batch processing systems in the financial industry can be redrawn at break points when abnormal conditions occur, i.e., batch processing is continued at tasks that are not completed.

In the prior art, when batch processing is performed, if an execution exception occurs in a batch processing system, when breakpoint redraw is performed on the batch processing system, a situation of multiple unsuccessful redraws occurs, and the whole batch processing system falls into paralysis.

To sum up, the embodiment of the present invention provides a breakpoint redraw method for a batch processing system, so as to solve the problem of a breakpoint redraw continuation failure, thereby enabling the batch processing system to operate normally.

Disclosure of Invention

The embodiment of the invention provides a breakpoint redrawing method of a batch processing system, which is used for solving the problem of breakpoint redrawing continuous failure, so that the batch processing system can run normally.

In a first aspect, an embodiment of the present invention provides a breakpoint redraw method for a batch processing system, including:

after detecting that the slave server has execution abnormity in the sub-process of executing the task, acquiring an abnormal name of the abnormity; each slave server is used for performing distributed batch processing under the control of the master server; the number of the subtasks processed by each slave server under the sub-process is obtained by dividing the tasks by the master server according to the number of the slave servers; the slave server is any one of the slave servers;

for any abnormal type, determining the abnormal type corresponding to the abnormal name by judging whether the similarity between the abnormal name and the abnormal keyword of the abnormal type meets a preset threshold value; the preset threshold is obtained through the similarity between the suffix of the historical abnormal name of the abnormal type and the abnormal keyword of the abnormal type;

and if the exception type corresponding to the exception name is an optimistic lock exception, carrying out breakpoint redraw on the sub-process.

And after detecting that the execution is abnormal in the sub-process of executing the task from the server, acquiring an abnormal name, determining the abnormal type through judging the abnormal name, determining whether to perform breakpoint redraw according to the abnormal type, and if the abnormal type is that the optimistic lock is abnormal, performing breakpoint redraw. Therefore, the condition that the batch processing system is paralyzed due to continuous redraw failure caused by blindly redraw of the breakpoint is avoided. Further, since the exception names are various, the exception types corresponding to the exception names cannot be listed one by one, and the exception type corresponding to the exception name can be determined by determining whether the similarity between the exception name and the exception keyword of the exception type meets a preset threshold value. In this way, human input is reduced, and the speed and accuracy of determining the type of abnormality are improved.

Optionally, performing breakpoint redraw on the sub-flow, including:

acquiring the execution state of the slave server on the sub-process, and acquiring uncompleted sub-tasks in the slave server if the execution state is a termination state or a failure state; the termination state is that the execution of the sub-process by the slave server is terminated manually; the failure state is failure occurring in the running process of the slave server;

and restarting the sub-process based on the slave server, and continuing to execute the unfinished subtasks.

Optionally, the method further comprises:

if the execution state is a starting state or an unknown state, manually modifying the starting state or the unknown state into a failure state;

acquiring the uncompleted subtasks in the slave server;

Optionally, the execution state of the sub-process is determined by sub-process execution record, and the incomplete sub-task is determined by sub-process execution context record; the sub-process execution context record is used for recording a next executed sub-task of the sub-process.

Optionally, after detecting that an execution exception occurs in the sub-process execution process of the slave server, the method further includes: pulling up tasks corresponding to the sub-processes;

and carrying out breakpoint redraw on the sub-flow, comprising the following steps:

determining that a task name of the pulled task exists in a task execution record;

searching a sub-process execution record which is associated with the task execution record and has the latest starting time;

determining that the execution state of the sub-process recorded by the sub-process execution meets a breakpoint redraw condition;

performing breakpoint redrawing according to the sub-process execution context record associated with the sub-process execution record; the sub-process execution context record is used for recording a next executed sub-task of the sub-process.

Optionally, the slave servers are obtained by:

determining a slave server from the batch processing system; or the like, or, alternatively,

and determining the slave server in the slave online trading system according to various performance parameters of each online server in the online trading process.

Optionally, determining, in the slave online transaction system, a slave server according to performance parameters of each online server in the process of processing the online transaction, including:

aiming at any online server, acquiring various performance parameters of the online server, and determining the probability of the online server participating in batch processing and the probability of the online server not participating in batch processing according to the probability of the online server participating in batch processing and the probability of the online server not participating in batch processing in the section where any performance parameter is located; the probability that the interval where any performance parameter is located participates in batch processing and the probability that the interval does not participate in batch processing are obtained through a statistical model;

and if the probability of participating in batch processing is greater than the probability of not participating in batch processing, determining that the online server is a second slave server.

Optionally, determining the probability that the interval in which the performance parameter is located participates in the batch processing and the probability that the interval does not participate in the batch processing by the following method includes:

dividing any performance parameter of each sample server in sample data into different intervals; the sample data comprises various performance parameters of each sample server and the condition of whether to participate in batch processing;

in the statistical sample data, the proportion of the sample servers occupied by different intervals of each performance parameter in each sample server participating in batch processing and the proportion of the sample servers occupied by different intervals of each performance parameter in each sample server not participating in batch processing are respectively used as the probability of participating in batch processing and the probability of not participating in batch processing of the interval in which the performance parameter is located.

In a second aspect, an embodiment of the present invention further provides a breakpoint redraw device for a batch processing system, including:

the acquiring unit is used for acquiring an abnormal name of an abnormal condition after detecting that the execution of the abnormal condition occurs in the sub-process of executing the task from the server; each slave server is used for performing distributed batch processing under the control of the master server; the number of the subtasks processed by each slave server under the sub-process is obtained by dividing the tasks by the master server according to the number of the slave servers; the slave server is any one of the slave servers;

the processing unit is used for determining an abnormal type corresponding to the abnormal name by judging whether the similarity between the abnormal name and the abnormal keyword of the abnormal type meets a preset threshold value or not aiming at any abnormal type; the preset threshold is obtained through the similarity between the suffix of the historical abnormal name of the abnormal type and the abnormal keyword of the abnormal type; and if the exception type corresponding to the exception name is an optimistic lock exception, carrying out breakpoint redraw on the sub-process.

Optionally, the processing unit is specifically configured to:

Optionally, the processing unit is further configured to:

acquiring the uncompleted subtasks in the slave server;

Optionally, the processing unit is further configured to: pulling up tasks corresponding to the sub-processes;

optionally, the processing unit is specifically configured to: determining that a task name of the pulled task exists in a task execution record;

Optionally, the processing unit is specifically configured to:

In a third aspect, an embodiment of the present invention further provides a computing device, including:

a memory for storing a computer program;

and the processor is used for calling the computer program stored in the memory and executing the breakpoint redraw method of the batch processing system listed in any mode according to the obtained program.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer-executable program is stored in the computer-readable storage medium, and the computer-executable program is configured to enable a computer to execute the breakpoint redraw method of the batch processing system in any one of the above manners.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present invention;

FIG. 2 is a block diagram of a possible distributed batch processing system according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a possible task execution process according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of a database partitioning method according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating a file partitioning method according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a breakpoint redraw method according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a method for performing breakpoint redraw on a sub-flow according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating a possible task execution record according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a possible method for determining a slave server according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of another possible method for determining a slave server according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a breakpoint redraw device of a batch processing system according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily intended to limit the order or sequence of any particular one, Unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

Fig. 1 illustrates an exemplary system architecture, which may be a server 100, including a processor 110, a communication interface 120, and a memory 130, to which embodiments of the present invention are applicable.

The communication interface 120 is used for communicating with a terminal device, and transceiving information transmitted by the terminal device to implement communication.

The processor 110 is a control center of the server 100, connects various parts of the entire server 100 using various interfaces and routes, performs various functions of the server 100 and processes data by operating or executing software programs and/or modules stored in the memory 130 and calling data stored in the memory 130. Alternatively, processor 110 may include one or more processing units.

The memory 130 may be used to store software programs and modules, and the processor 110 executes various functional applications and data processing by operating the software programs and modules stored in the memory 130. The memory 130 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to a business process, and the like. Further, the memory 130 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

It should be noted that the structure shown in fig. 1 is only an example, and the embodiment of the present invention is not limited thereto.

The server 100 in fig. 1 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

Batch processing refers to a processing mode in which a series of tasks are performed by a computer program based on the input of a batch without human intervention. In batch processing, a single execution can process a large amount of data. In general, batch processing may choose to place processing time in a time period where computing resources are less intense, better utilizing system resources. For example, when cash withdrawal is performed from a payment bank to a bank, cash withdrawal is not performed in real time, and in the process, data reconciliation between the payment bank and the bank is performed by batch processing.

First, a possible framework of a distributed batch processing system provided by an embodiment of the present invention is described, as shown in fig. 2.

The distributed batch processing system includes a scheduling system 201, a routing apparatus 202, and a plurality of data centers (such as a first data center 2031 and a second data center 2032 illustrated in fig. 1). The data center comprises a plurality of servers for processing tasks according to instructions. For example, the first data center 2031 includes a first server 20311, a second server 20312, and a third server 20313; the second data center 2032 includes a fourth server 20314, a fifth server 20315, and a sixth server 20316. The above are merely examples, and embodiments of the present invention are not limited thereto.

The scheduling system 201 primarily schedules and manages the execution of tasks. The scheduling system issues instructions of batch operation, defines rules and attributes of tasks, arranges execution sequence and logic of the tasks and ensures efficient execution of the tasks.

The routing apparatus 202 stores a message queue therein for scheduling communications between the system 201 and the data center. Currently, the message queues used more are RabbitMQ, rockmq, ActiveMQ, Kafka, ZeroMQ, MetaMq and the like, and the embodiment of the present invention does not limit the kind of the message queues used.

The data center is used for transmitting, accelerating, displaying, calculating and storing data information, and in the embodiment of the invention, the data center is used for executing tasks according to instructions of a scheduling system and correspondingly processing data.

When a batch processing job is required, the scheduling system 201 may issue an instruction of a batch operation, where the instruction includes a task to be processed in batch, and the instruction may also include a preset data center to process the task, and the instruction is transmitted to the corresponding data center via the routing device 202. Or the instruction does not include a preset data center for processing the task, the data center for acquiring the instruction through the routing device 202 becomes the data center for processing the task, and the batch processing operation is executed according to the instruction.

The transmission of the command to the first data center 2031 via the routing apparatus 202 will be described as an example.

When an instruction is transmitted to the first data center 2031, the server that first acquires the instruction becomes a master server, and the remaining servers become slave servers. And the main server reads the tasks in the database or the file according to the instruction, then divides the tasks in the database or the file, and divides the tasks into a plurality of subtasks according to the number of the slave servers. For example, in fig. 2, when the first server 20311 first acquires the command as a master server, the second server 20312 and the third server 20313 become slave servers. The first server 20311 divides the task into two subtasks, and distributes the subtasks to the two

slave servers

20312 and 20313 for processing. If the data amount of the processed subtasks is increased day by day, the amount of the subtasks of each slave server can be reduced by increasing the number of the slave servers, so that the time consumption of task processing is reduced, and the processing speed is improved.

The execution process of the task is divided into a plurality of sub-processes, for example, the executed task can be: deducting interest on borrowing from 100 accounts, the task can be divided into two sub-processes: subflow 1: calculating the interest of borrowing of each of 100 accounts; and (2) sub-process: and deducting the corresponding interest for borrowing for 100 accounts.

For any sub-process, if the sub-processes are all completed by one server, the workload of the server is undoubtedly increased, the processing time is very long, and the processing accuracy is reduced due to the performance limitation of the server, so that the distributed batch processing system framework shown in fig. 2 can be adopted, and a plurality of servers are adopted to perform tasks.

Fig. 3 illustrates a possible task execution process provided by an embodiment of the present invention.

Aiming at any sub-process, the main server divides the tasks according to the number of the slave servers, obtains a plurality of sub-tasks and then distributes the sub-tasks to the slave servers respectively, wherein the number of the divided sub-tasks is approximately equal to S/N, S is the task quantity, and N is the number of the slave servers. And each slave server acquires data corresponding to the subtasks for processing, and after the processing is finished, information such as processing results, execution states and the like is reported, for example, the information can be reported to a remote aggregator, and the remote aggregator aggregates final processing results to be used as processing results of the sub-processes. And summarizing the processing results of the plurality of sub-processes to obtain the processing result of the task.

Taking the execution of the sub-process 1 as an example, the master server divides the task into 2 sub-tasks according to the number of the slave servers, which are respectively: subtask 1: calculating the interest of borrowing of the accounts 1-50 respectively; subtask 2: calculate the interest of borrowing from each of the 51 st to 100 th accounts. The master server distributes the two subtasks to the two slave servers, respectively, for example, sending subtask 1 to the second server 20312 and subtask 2 to the third server 20313. The two slave servers are notified of the processing of the data. And after the processing of the two slave servers is finished, reporting the respective processing to the remote aggregator, and the remote aggregator aggregating the processing results of the two slave servers as the processing results of the sub-process 1.

When the main server divides tasks according to the number of the slave servers, the dividing modes are various, and when the processed data come from the database, the database can be divided; when the processed data is from a file, the file may be divided. The above are merely examples, and embodiments of the present invention are not limited thereto.

The following describes the database partitioning.

Fig. 4 is a flowchart illustrating a database partitioning method according to an embodiment of the present invention. The method comprises the following steps:

step 401, the main server loads data.

And the main server acquires the task, and loads the data related to any sub-process into the memory from the database when the sub-process is executed.

Step 402, the main server performs the division of the subtasks.

And the main server divides the tasks according to the number of the slave servers to obtain a plurality of subtasks. And then respectively sending the subtasks to the N slave servers.

In step 403, the slave server divides the subtask into a plurality of threads and processes the subtask.

The division of the tasks by the main server is process level division, and the division of the subtasks by the auxiliary server is thread level division.

At step 404, the thread loads data for processing.

And the thread loads corresponding data from the database to the memory according to the divided subtasks, so that subsequent data processing is facilitated.

In the data loading process, if the data volume is large and cannot be loaded to the memory at one time, a page reading mode can be adopted for cyclic loading. For example, 1000 ten thousand data size cannot be loaded directly into the memory, and then the data can be loaded for multiple times, for example, 10 times, and 100 ten thousand data size is loaded each time. Only when loading each time, the number and the position of the loading are required to be saved, so that the data can be ensured not to be repeatedly loaded and processed. Therefore, the data can be efficiently and accurately loaded into the memory by increasing the processing times.

The following describes the way in which files are divided. Fig. 5 is a flowchart illustrating a file division method according to an embodiment of the present invention.

And for any sub-process, the data is stored in a file form according to the task confirmation. And then dividing the file into a plurality of subfiles according to the number of the slave servers, and distributing the plurality of subfiles to the plurality of slave servers for processing. And after the processing of the plurality of slave servers is finished, combining the respective processing results.

In the process of executing the task, the execution conditions of the task and each sub-flow need to be monitored. If execution is abnormal in the execution process of the task or the sub-process, breakpoint redraw needs to be carried out on the person or the sub-process, namely, the person or the sub-process is re-executed at the position of the sub-task which is not completed by execution. In many cases, when the system performs a breakpoint redraw, the system continues to redraw, but always fails, and thus the system falls into a long-time crash state. The data processing is greatly influenced.

An embodiment of the present invention provides a method for redrawing a breakpoint, as shown in fig. 6, including:

601, after detecting that an execution exception occurs in a sub-process of executing a task by a slave server, acquiring an exception name of the exception; each slave server is used for performing distributed batch processing under the control of the master server; the number of the subtasks processed by each slave server under the sub-process is obtained by dividing the tasks by the master server according to the number of the slave servers; the slave server is any one of the slave servers;

step 602, for any abnormal type, determining the abnormal type corresponding to the abnormal name by judging whether the similarity between the abnormal name and the abnormal keyword of the abnormal type meets a preset threshold value; the preset threshold is obtained through the similarity between the suffix of the historical abnormal name of the abnormal type and the abnormal keyword of the abnormal type;

step 603, if the exception type corresponding to the exception name is an optimistic lock exception, performing breakpoint redraw on the sub-process.

In step 602, the exception type corresponding to the exception name is determined. The exception types include optimistic lock exceptions, program exceptions, and a variety of exception types. If the optimistic lock exception is the exception key: optimetristeiclock. The exception names are various, and although the names of the exception names are different, the corresponding exception types may be the same and are all optimistic lock exceptions. How to make one-to-one correspondence between the exception name and the exception type is an urgent problem to be solved.

In the embodiment of the present invention, a preset threshold may be obtained by analyzing the similarity between the suffix of the historical exception name of any exception type and the exception keyword of the exception type, and if the similarity between the suffix of the exception name and the exception keyword of the exception type satisfies the preset threshold, the exception type is determined to be the exception type corresponding to the exception name.

For example, it is known that 3 historical exception names are optimistic lock exceptions, which are javax, persistence, optimalistic lockinstance, org, hibernate, optimalistic lockinstance, and org, spring frame, orm, hibernate, 5, hibernate, optimalistic lockingfailureinstance, respectively, and that the suffixes of the 3 historical exception names are: OptimoniticLockException, HibernatoOptitimisticLockingFailureException, where the first two suffixes are the same. And calculating the similarity between the first suffix and the abnormal keyword Optimisticlock of the optimistic lock abnormality. Obtaining:

Similarity(OptimisticLockException,OptimisticLock)＝0.95；

Similarity(HibernateOptimisticLockingFailureException,OptimisticLock)＝0.88；

the two similarity values are combined to obtain a preset threshold value of which the anomaly type is an optimistically locked anomaly, for example, an average value or a median value is taken, and if the average value is taken, the preset threshold value of which the anomaly type is the optimistically locked anomaly is (0.95+0.88)/2, which is 0.92.

Similarly, the preset threshold corresponding to other exception types may be obtained, for example, the preset threshold corresponding to the program exception is obtained to be 0.9. The above is merely an example and is not intended as a limitation on the aspects of the embodiments of the present invention.

After the exception name of the execution exception is acquired in step 601, the similarity between the exception name and the exception key of the multiple exception types is calculated respectively. For example, calculating the similarity between the exception name and the exception keyword of the optimistic lock exception, calculating the similarity between the exception name and the exception keyword of the program exception, and when only one similarity meets a preset threshold, determining the exception type corresponding to the similarity as the exception type corresponding to the final exception name; and when the two similarities both meet the preset threshold values of the two exception types, taking the exception type corresponding to the value with the larger similarity as the final exception type. The embodiments of the present invention are not limited in this regard.

If the exception type corresponding to the exception name is determined to be optimistic lock exception, directly carrying out breakpoint redraw on the sub-flow; and if the exception type corresponding to the exception name is determined to be program exception, waiting for resending the program package, and then carrying out breakpoint redraw. Therefore, the problems of redrawing failure and paralysis of the whole system caused by continuous meaningless redrawing are avoided.

The method of breakpoint redraw for sub-flows is described next with reference to fig. 7 and 8. As shown in fig. 7. The method comprises the following steps:

step 701, determining that the task name of the pulled task exists in a task execution record;

in step 601, after detecting that an execution exception occurs in the process of executing the sub-process by the slave server, pulling up a task corresponding to the sub-process. Therefore, in step 701, if the task name of the pulled task exists in the task execution record, it indicates that the pulled task is an old task, and the breakpoint redraw can be executed; if the task name of the pulled task does not exist in the task execution record, the pulled task is a new task, and breakpoint redraw is not needed. The task execution record is a record generated in real time in the task execution process and is used for recording information such as the name, the start time, the end time, the execution state and the like of the task.

Step 702, searching a sub-process execution record which is associated with the task execution record and has the latest starting time.

Since one task corresponds to a plurality of sub-processes, a plurality of sub-process execution records are associated with the task execution record. The sub-process execution record is a record generated in real time in the sub-process execution process and is used for recording information such as the name, the starting time, the ending time, the execution state and the like of the sub-process. Since the plurality of sub-flows are executed sequentially, after an execution exception occurs in a sub-flow, it is necessary to determine which sub-flow has an execution exception. And determining the sub-process which is started recently, namely the sub-process with abnormal execution, by searching the sub-process execution record which is associated with the task execution record and has the latest starting time.

Step 703, determining that the execution state of the sub-process recorded by the sub-process execution meets the breakpoint redraw condition.

The sub-process execution record records the execution state of the sub-process, and the execution state is divided into a starting state, an unknown state, a terminating state, a failure state and a completion state.

And if the execution state is a termination state or a failure state, acquiring the uncompleted subtasks in the slave server, restarting the subtasks based on the slave server, and continuously executing the uncompleted subtasks. The termination state is that the execution of the sub-process by the slave server is terminated manually; the failure state is failure occurring in the running process of the slave server. When the execution state is a termination state or a failure state, breakpoint redraw can be directly carried out on the sub-flow. The specific way to obtain the unfinished subtasks from the server is introduced in step 704.

And if the execution state is the starting state or the unknown state, manually modifying the starting state or the unknown state into a failure state, acquiring the uncompleted subtasks in the slave server, and continuously executing the uncompleted subtasks based on the restart of the subprocess by the slave server.

And 704, executing context record according to the sub-process associated with the sub-process execution record, and performing breakpoint redraw.

The execution record of each sub-process is associated with a sub-process execution context record, and a next executed sub-task of the sub-process is recorded in the sub-process execution context record. Therefore, the next executed subtask of the sub-process can be obtained through the sub-process execution context record. Then a breakpoint redraw is performed at the next executed subtask.

For convenience of understanding, fig. 8 shows a schematic diagram of a relationship between a possible task execution record, a sub-flow execution record and a sub-flow execution context record, and optionally, a task execution context record may be further included.

In fig. 8, task instance 801 associates task execution record 802, task execution record 802 associates a plurality of sub-process execution records (e.g.,

sub-process execution records

8031 and 8032 in fig. 8), task execution record 802 further associates task execution context record 804, and sub-process execution record further associates sub-process execution context records (e.g., sub-process execution context records 8051 and 8052 in fig. 8). Task execution record 802 may also be associated with task execution context record 804.

Taking the tasks as follows: deducting interest for borrowing 100 accounts for example, the task can be divided into two sub-processes, which are: subflow 1: calculating the interest of borrowing of each of 100 accounts; and (2) sub-process: and deducting the corresponding interest for borrowing for 100 accounts. The sub-process 1 corresponds to the sub-process execution record 8031, and the sub-process 2 corresponds to the sub-process execution record 8032.

The sub-scheme 1 is described as an example. The sub-process 1 is divided into two sub-tasks, which are respectively: subtask 1: calculating the interest of borrowing of the accounts 1-50 respectively; subtask 2: calculate the interest of borrowing from each of the 51 st to 100 th accounts. In the sub-process execution record 8031, if the execution state of the record is failure, the breakpoint redraw can be directly performed without changing the state, and it can be determined by the sub-process execution context record 8051 that the incomplete sub-task is: interest on borrowing from each of the 23 th to 50 th accounts executed by the second server 20312; the third server 20313 executes interest on borrowing from each of the 58 th to 100 th accounts.

It can be seen that the second server 20312 and the third server 20313 do not complete their respective subtasks, and if the second server 20312 and the third server 20313 both operate normally, the execution continues based on the incomplete subtasks, that is, the second server 20312 continues to execute from the 23 rd account, and the third server 20313 continues to execute from the 58 th account.

If the second server 20312 or the third server 20313 goes down, a new slave server may be replaced for performing the incomplete subtasks.

The determination from the server is various, and may be determined from, for example, a batch processing system, or may be determined from each server in an online transaction system. Online transactions are different from batch processes, and online transactions have real-time performance, and processes such as inquiry requests, transfer requests and the like of users are performed in an online transaction system. Generally, there are more on-line transactions during the day, but less on-line transactions at night, so that by using this business feature, a part of the servers of the on-line transaction system can be used for batch processing at night.

The following describes in detail how to select a server from the servers of the online transaction system as a slave server for batch processing.

In a first mode

Fig. 9 shows one possible method of determining a slave server. The method comprises the following steps:

step 901, obtaining various performance parameters of each online server for online transaction, where the performance parameters include any one or more of the following: CPU utilization, memory utilization, disk utilization, network card utilization, and garbage collection time.

Specifically, each performance parameter of each online server may be reported, for example, to an index collection system.

Step 902, obtain various performance requirements of the slave server executing the incomplete sub-process.

For example, the performance requirements for the slave server executing the incomplete sub-flow are: the CPU utilization rate is less than or equal to 0.6, the memory utilization rate is less than or equal to 0.6, and the disk utilization rate is less than or equal to 0.6.

Step 903, in each online server for online transaction, determining the online server with each performance parameter meeting each performance requirement and using the online server as a slave server.

For example, in each online server, each performance parameter is selected to meet the performance requirement: the utilization rate of the CPU is less than or equal to 0.6, the utilization rate of the memory is less than or equal to 0.6, and the online server with the utilization rate of the disk less than or equal to 0.6 is used as the slave server.

Mode two

In addition to filtering the slave servers by manually setting performance requirements through experience as shown in the first embodiment, a statistical model may be determined through a machine learning algorithm, and whether any of the online servers meets the performance requirements or can participate in batch processing may be determined through the statistical model.

Fig. 10 shows another possible method of determining a slave server. The method comprises the following steps:

1001, aiming at any online server, acquiring various performance parameters of the online server, and determining the probability of the online server participating in batch processing and the probability of the online server not participating in batch processing according to the probability of the online server participating in batch processing and the probability of the online server not participating in batch processing in the section where any performance parameter is located; and the probability that the interval where any one performance parameter is located participates in batch processing and the probability that the interval does not participate in batch processing are obtained through a statistical model.

The above method is described below by taking 3 individual performance parameters as an example. The embodiment of the invention does not limit the number of the selected performance parameters.

For example, the selected 3 performance parameters are respectively the CPU utilization a1, the memory utilization a2, and the disk utilization a3, and the 3 performance parameters are divided into intervals, for example, for the performance parameter of the CPU utilization a1, 3 intervals are divided: a1 is less than 0.1, a1 is more than or equal to 0.1 and less than or equal to 0.7, and a1 is more than 0.7; for the performance parameter of the memory utilization rate a2, 3 intervals are divided: a2 is less than 0.3, a2 is more than or equal to 0.3 and less than or equal to 0.8, and a2 is more than 0.8; for the performance parameter of the disk utilization rate a3, 3 intervals are divided: a3 is less than 0.4, a3 is more than or equal to 0.4 and less than or equal to 0.9, and a3 is more than or equal to 0.9. And collecting a large amount of performance parameters of the sample servers participating in batch processing and information about whether the sample servers participate in batch processing, and performing statistical analysis on the performance parameters. Table 1 schematically shows one of the pieces of information collected, and 10000 pieces of information about the above-mentioned performance parameters of the sample server participating in the batch processing and whether or not to participate in the batch processing may be collected.

TABLE 1

By analyzing the above performance parameters of 10000 sample servers participating in batch processing and the information of whether to participate in batch processing, the probability of the sample server participating in batch processing (C ═ 1) and the probability of the sample server not participating in batch processing (C ═ 0) can be obtained:

P(C＝0)＝2000/10000＝0.2

P(C＝1)＝8000/10000＝0.8

in the 3 performance parameters, the proportion of the sample servers occupied by different intervals of each performance parameter in each sample server participating in batch processing and the proportion of the sample servers occupied by different intervals of each performance parameter in each sample server not participating in batch processing are shown in table 2:

then, after analyzing each performance of the sample server to obtain table 2, for any online server, according to table 2, the probability of participating in batch processing and the probability of not participating in batch processing of the online server can be obtained.

For example, for an online server with a CPU utilization rate a1 of 50%, a memory utilization rate a2 of 50%, and a disk utilization rate a3 of 50%, a1 of the online server is in the interval range of 0.1 ≤ a1 ≤ 0.7; a2 is in the range of 0.3-0.8 of a 2; a3 is in the interval of 0.4 ≦ a3 ≦ 0.9, and the probability P of participating in batch processing (C ≦ 1|0.1 ≦ a1 ≦ 0.7, 0.3 ≦ a2 ≦ 0.8, 0.4 ≦ a3 ≦ 0.9) and not participating in batch processing (C ≦ 0|0.1 ≦ a1 ≦ 0.7, 0.3 ≦ a2 ≦ 0.8, 0.4 ≦ a3 ≦ 0.9) of the online server are calculated.

Probability P not participating in batch processing (C0 |0.1 ≤ a1 ≤ 0.7, 0.3 ≤ a2 ≤ 0.8, 0.4 ≤ a3 ≤ 0.9)

＝P(0.1≤a1≤0.7、0.3≤a2≤0.8、0.4≤a3≤0.9|C＝0)*P(C＝0)/P(0.1≤a1≤0.7、0.3≤a2≤0.8、0.4≤a3≤0.9)

＝P(0.1≤a1≤0.7|C＝0)*P(0.3≤a2≤0.8|C＝0)*P(0.4≤a3≤0.9|C＝0)*P(C＝0)/(P(0.1≤a1≤0.7)*P(0.3≤a2≤0.8)*P(0.4≤a3≤0.9))

Probability P of participating in batch processing (C1 |0.1 ≤ a1 ≤ 0.7, 0.3 ≤ a2 ≤ 0.8, 0.4 ≤ a3 ≤ 0.9)

＝P(0.1≤a1≤0.7、0.3≤a2≤0.8、0.4≤a3≤0.9|C＝1)*P(C＝1)/P(0.1≤a1≤0.7、0.3≤a2≤0.8、0.4≤a3≤0.9)

＝P(0.1≤a1≤0.7|C＝1)*P(0.3≤a2≤0.8|C＝1)*P(0.4≤a3≤0.9|C＝1)*P(C＝1)/(P(0.1≤a1≤0.7)*P(0.3≤a2≤0.8)*P(0.4≤a3≤0.9))

Step 1002, if the probability of participating in batch processing is greater than the probability of not participating in batch processing, determining that the online server is a second slave server.

In the above example, since the denominator is the same, it is possible to determine whether the probability of participating in batch processing is greater than the probability of not participating in batch processing if the sizes of P (0.1. ltoreq. a 1. ltoreq.0.7. ltoreq. C0) > P (0.3. ltoreq. a 2. ltoreq.0.8. ltoreq. C0) > P (0.4. ltoreq. a 3. ltoreq.0.9. ltoreq. C0). ltoreq.0) P (0.1. ltoreq. a 1. ltoreq.0.7. ltoreq. C1) > P (0.3. ltoreq. a 2. ltoreq.0.8. ltoreq. C1). P (0.4. ltoreq. a. 3. ltoreq.0.9. C1). P (C1) are compared.

P(0.1≤a1≤0.7|C＝0)*P(0.3≤a2≤0.8|C＝0)*P(0.4≤a3≤0.9|C＝0)*P(C＝0)＝0.2*0.1*0.15*0.2＝0.0006

P(0.1≤a1≤0.7|C＝1)*P(0.3≤a2≤0.8|C＝1)*P(0.4≤a3≤0.9|C＝1)*P(C＝1)＝0.6*0.5*0.7*0.8＝0.168

Since 0.0006<0.168, it can be determined that the online server is eligible to participate in batch processing.

By performing the above operations on any online server, it can be determined whether any online server can participate in batch processing until the determined number of online servers participating in batch processing meets the number requirement of the slave servers required by the batch processing system.

The slave servers for batch processing are determined from the online servers in a machine learning mode, so that the determination efficiency can be improved, and the accuracy is higher.

Based on the same technical concept, fig. 11 exemplarily shows a structure of a breakpoint redraw device of a batch processing system according to an embodiment of the present invention, where the structure can perform a flow of breakpoint redraw of the batch processing system.

As shown in fig. 11, the apparatus specifically includes:

an obtaining unit 1101, configured to obtain an abnormal name of an abnormality after detecting that the execution abnormality occurs in a sub-process of executing a task by a slave server; each slave server is used for performing distributed batch processing under the control of the master server; the number of the subtasks processed by each slave server under the sub-process is obtained by dividing the tasks by the master server according to the number of the slave servers; the slave server is any one of the slave servers;

the processing unit 1102 is configured to determine, for any exception type, an exception type corresponding to the exception name by determining whether a similarity between the exception name and an exception keyword of the exception type satisfies a preset threshold; the preset threshold is obtained through the similarity between the suffix of the historical abnormal name of the abnormal type and the abnormal keyword of the abnormal type; and if the exception type corresponding to the exception name is an optimistic lock exception, carrying out breakpoint redraw on the sub-process.

Optionally, the processing unit 1102 is specifically configured to:

Optionally, the processing unit 1102 is further configured to:

acquiring the uncompleted subtasks in the slave server;

Optionally, the processing unit 1102 is further configured to: pulling up tasks corresponding to the sub-processes;

optionally, the processing unit 1102 is specifically configured to: determining that a task name of the pulled task exists in a task execution record;

Optionally, the processing unit 1102 is specifically configured to:

Based on the same technical concept, the embodiment of the present application provides a computer device, as shown in fig. 12, including at least one processor 1201 and a memory 1202 connected to the at least one processor, where a specific connection medium between the processor 1201 and the memory 1202 is not limited in the embodiment of the present application, and the processor 1201 and the memory 1202 in fig. 12 are connected through a bus as an example. The bus may be divided into an address bus, a data bus, a control bus, etc.

In the embodiment of the present application, the memory 1202 stores instructions executable by the at least one processor 1201, and the at least one processor 1201 can execute the steps of the breakpoint redraw method of the batch processing system by executing the instructions stored in the memory 1202.

The processor 1201 is a control center of the computer device, and can connect various parts of the computer device by using various interfaces and lines, and perform breakpoint redraw of the batch processing system by executing or executing instructions stored in the memory 1202 and calling data stored in the memory 1202. Optionally, the processor 1201 may include one or more processing units, and the processor 1201 may integrate an application processor and a modem processor, wherein the application processor mainly handles an operating system, a user interface, an application program, and the like, and the modem processor mainly handles wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 1201. In some embodiments, the processor 1201 and the memory 1202 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 1201 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

Memory 1202, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 1202 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 1202 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1202 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

Based on the same technical concept, an embodiment of the present invention further provides a computer-readable storage medium, where a computer-executable program is stored in the computer-readable storage medium, and the computer-executable program is configured to enable a computer to execute the method for re-pulling the breakpoint of the batch processing system in any of the above manners.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A breakpoint redraw method for a batch processing system, comprising:

2. The method of claim 1, wherein breakpoint redrawing the sub-flow comprises:

3. The method of claim 2, further comprising:

acquiring the uncompleted subtasks in the slave server;

4. The method of claim 2, wherein the execution state of the sub-flow is determined by sub-flow execution records, the incomplete sub-task is determined by sub-flow execution context records; the sub-process execution context record is used for recording a next executed sub-task of the sub-process.

5. The method of claim 1, wherein detecting an execution exception from the server during execution of the sub-process further comprises: pulling up tasks corresponding to the sub-processes;

and executing context record according to the sub-process associated with the sub-process execution record, and performing breakpoint redraw.

6. The method of any one of claims 1 to 5,

the slave servers are obtained by the following method, including:

7. The method of claim 6, wherein determining the slave server from the online trading system based on performance parameters of each online server in processing online trades comprises:

8. The method of claim 7, wherein determining the probability of participation and the probability of non-participation in the batch process for the interval in which the performance parameter is located is performed by:

9. A breakpoint redraw device for a batch processing system, comprising:

10. A computing device, comprising:

a memory for storing a computer program;

a processor for calling a computer program stored in said memory and executing the method of any one of claims 1 to 8 in accordance with the obtained program.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer-executable program for causing a computer to execute the method of any one of claims 1 to 8.