CN114168329A

CN114168329A - Distributed batch optimization method, electronic device and computer-readable storage medium

Info

Publication number: CN114168329A
Application number: CN202111479939.7A
Authority: CN
Inventors: 陈杨; 李建峰; 李毅; 万磊
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-03-11

Abstract

The application discloses a distributed batch optimization method, an electronic device and a computer readable storage medium, wherein the distributed batch optimization method comprises the following steps: determining a first-layer batch subtask on a first-layer service logic in a preset batch task, and determining a first pull thread corresponding to each distributed machine; polling and pulling the first subtask message for the corresponding distributed machine through each first pulling thread to consume, so as to obtain an execution result corresponding to the first subtask message; judging whether batch running subtasks on the first-level service logic are all executed; if the execution is finished, determining a second-layer batch subtask of which the second-layer service logic depends on the first-layer batch subtask; and determining second pulling threads corresponding to the distributed machines, and polling and pulling the execution result for the corresponding distributed machines through the second pulling threads to consume so as to perform distributed batching. The application solves the technical problem of low distributed batch running efficiency in the prior art.

Description

Distributed batch optimization method, electronic device and computer-readable storage medium

Technical Field

The present application relates to the field of computer technologies in financial technology (Fintech), and in particular, to a distributed batch optimization method, an electronic device, and a computer-readable storage medium.

Background

With the continuous development of financial technologies, especially internet technology and finance, more and more technologies (such as distributed technologies, block chains and the like) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, for example, higher requirements on the distribution of backlogs in the financial industry are also put forward.

With the continuous development of computer software, the application of computer technology is more and more extensive, in the scene of bank financial loan, in order to perform post-loan wind control, a post-loan risk early warning batching task is generally required to be executed once every period of time to perform wind control, and at present, when the magnitude of user data is large, the user data is generally divided into different distributed machines in a balanced manner to perform batching, so as to realize distributed batching.

Disclosure of Invention

The present application mainly aims to provide a distributed batch optimization method, an electronic device, and a computer-readable storage medium, and aims to solve the technical problem of low efficiency of distributed batch in the prior art.

To achieve the above object, the present application provides a distributed batch optimization method, including:

determining a first-layer batch subtask on a first-layer service logic in a preset batch task, and determining a first pull thread corresponding to each distributed machine;

polling and pulling a first subtask message corresponding to the first-layer batch subtask for the corresponding distributed machine through each first pulling thread to consume, so as to obtain an execution result corresponding to the first subtask message;

judging whether the batch subtasks on the first-level service logic are all executed;

if the execution is finished, determining a second-layer batch subtask of which the second-layer service logic depends on the first-layer batch subtask;

and determining second pulling threads corresponding to the distributed machines, and polling and pulling the execution results corresponding to the second-layer batch subtasks for the corresponding distributed machines through the second pulling threads to consume so as to perform distributed batch running.

The present application further provides a distributed batching optimization device, the distributed batching optimization device includes:

the first determining module is used for determining a first-layer batch subtask on a first-layer service logic in a preset batch task and determining a first pulling thread corresponding to each distributed machine;

the first polling consumption module is used for polling and pulling a first subtask message corresponding to the first-layer batch subtask for consumption through each first pulling thread for the corresponding distributed machine respectively to obtain an execution result corresponding to the first subtask message;

the judging module is used for judging whether batch running subtasks on the first-level service logic are all executed;

the second determining module is used for determining a second-layer batch subtask of which the second-layer service logic depends on the first-layer batch subtask if the execution of the second-layer batch subtask is finished;

and the second polling consumption module is used for determining a second pulling thread corresponding to each distributed machine, and polling and pulling the execution result corresponding to the second-layer batch subtask for the corresponding distributed machine through each second pulling thread to consume so as to perform distributed batch running.

The present application further provides an electronic device, the electronic device including: a memory, a processor, and a program of the distributed batch optimization method stored on the memory and executable on the processor, the program of the distributed batch optimization method when executed by the processor may implement the steps of the distributed batch optimization method as described above.

The present application further provides a computer readable storage medium having stored thereon a program for implementing a distributed batch optimization method, the program when executed by a processor implementing the steps of the distributed batch optimization method as described above.

The present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the distributed batch optimization method as described above.

Compared with the technical means of evenly dividing user data into different distributed machines for batching to realize distributed batching in the prior art, the distributed batching optimization method comprises the steps of firstly determining a first-layer batching subtask on first-layer service logic in a preset batching task, and determining a first pull thread corresponding to each distributed machine; polling and pulling a first subtask message corresponding to the first-layer batch subtask for the corresponding distributed machine through each first pulling thread to consume, so as to obtain an execution result corresponding to the first subtask message; judging whether the batch subtasks on the first-level service logic are all executed; if the execution is finished, determining a second-layer batch subtask of which the second-layer service logic depends on the first-layer batch subtask; and determining second pulling threads corresponding to the distributed machines, and polling and pulling the execution results corresponding to the second-layer batch subtasks for the corresponding distributed machines through the second pulling threads to consume so as to perform distributed batch running. According to the method, the granularity of the batch is subdivided into subtask levels according to the dependency relationship between the batch subtasks on the business logic, and because the data volume of a single subtask is far smaller than the data volume corresponding to a single user, the calculation time required by the subtask is shorter, the consumption time difference of different subtask messages consumed by different distributed equipment is smaller, and then the subtask messages are polled and pulled to each distributed machine for consumption by a pulling thread, wherein the subtask messages can be pulled again after the pulled subtask messages are consumed, so that the purpose of continuously carrying out batch by splitting the batch with larger magnitude into small batches is realized, the influence of the difference between the data volume of the user data on the batch efficiency can be reduced, and the influence of the difference between the machine resources of different distributed machines only can affect the small batches, and the difference of the batch running consumption time between the distributed machines can be basically ignored for the whole batch running task when the batch running is carried out in small batches, so the technical defect that the load of the distributed machines is unbalanced and the efficiency of the distributed batch running is influenced because the machine resources of different distributed machines are different and the data volume between different user data is different in the prior art is overcome, and the efficiency of the distributed batch running is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart diagram of a distributed batch optimization method according to a first embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of a distributed batch optimization method according to a second embodiment of the present application;

FIG. 3 is a schematic flow chart diagram illustrating a distributed batch optimization method according to a third embodiment of the present application;

FIG. 4 is a system architecture diagram of a distributed lot in the distributed lot optimization method of the present application;

FIG. 5 is a diagram of dependency relationships between batch subtasks in the distributed batch optimization method of the present application;

FIG. 6 is a schematic diagram of an apparatus structure of a hardware operating environment related to a distributed batch optimization method in an embodiment of the present application.

The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

In the scenario of bank financial loan, in order to perform post-loan wind control, it is generally necessary to perform a post-loan risk early warning batching task once every certain period of time to perform wind control. When the magnitude of the batch is large, the batch is usually performed in a distributed manner, for example, the user data is divided into different distributed machines for performing batch in a balanced manner, but since the more the characteristics of the original user data are, the more the batch subtasks are, the more the operation time required by the user data with different characteristics in the batch process will be amplified by the number of the batch subtasks, and therefore, due to the difference between the data amount of the user data belonging to different users, even if the user data is divided into different distributed machines for performing batch in a balanced manner, the operation time and the calculation resources required by each distributed machine will have large difference, so that the load of each distributed machine is unbalanced. In addition, machine resources of different distributed machines are different, and in some extreme cases (the distributed machines with fewer machine resources are distributed with the most computation tasks), the problem of unbalanced load of each distributed machine is further aggravated, and the efficiency of distributed batching is further affected.

In a first embodiment of the distributed batch optimization method of the present application, referring to fig. 1, the distributed batch optimization method includes:

step S10, determining a first-layer batch subtask on a first-layer service logic in a preset batch task, and determining a first pull thread corresponding to each distributed machine;

step S20, polling and pulling a first subtask message corresponding to the first-layer batch subtask for the corresponding distributed machine through each first pulling thread to consume, and obtaining an execution result corresponding to the first subtask message;

step S30, judging whether the batch subtasks on the first level business logic are all executed;

step S40, if all the execution is finished, determining a second-layer batch subtask of which the second-layer business logic depends on the first-layer batch subtask;

and step S50, determining second pulling threads corresponding to the distributed machines, and polling and pulling the execution results corresponding to the second-layer batch subtasks for the corresponding distributed machines through the second pulling threads to consume so as to perform distributed batch running.

The embodiment of the application provides a distributed batch optimization method, namely determining a first-layer batch subtask on a first-layer service logic in a preset batch task, and determining a first pull thread corresponding to each distributed machine; polling and pulling a first subtask message corresponding to the first-layer batch subtask for the corresponding distributed machine through each first pulling thread to consume, so as to obtain an execution result corresponding to the first subtask message; judging whether the batch subtasks on the first-level service logic are all executed; if the execution is finished, determining a second-layer batch subtask of which the second-layer service logic depends on the first-layer batch subtask; and determining second pulling threads corresponding to the distributed machines, and polling and pulling the execution results corresponding to the second-layer batch subtasks for the corresponding distributed machines through the second pulling threads to consume so as to perform distributed batch running. The distributed batch running process comprises the steps of dividing batch running subtasks according to business logic levels in the whole distributed batch running process to obtain batch running subtasks of all the business logic levels, and then pulling subtask messages to the distributed machines for consumption in a polling mode according to the sequence of the business logic levels through the pulling threads of all the distributed machines, so that the distributed machines can realize pipeline operation, the influence of difference between machine resources of different distributed machines on the distributed batch running can be avoided, and the efficiency of the distributed batch running is improved. On the basis of pipeline operation of each distributed machine, only the data size of the subtask message pulled in the last batch has influence on the operation time of each distributed machine, but in the embodiment of the application, the granularity of the batch is subdivided into the subtask level, since the data size of a single subtask message is much smaller than the data size of a single user data, therefore, the computation time required by the subtask message is small, the difference between the consumption time of different distributed devices consuming different subtask messages is extremely small, the difference between the elapsed time between runs made between the distributed machines is substantially negligible for the entire run task, therefore, all the distributed machines basically finish the distributed batch running together at the same time, the reasonable load distribution of all the distributed machines is realized, and the efficiency of the distributed batch running is improved.

In this embodiment, it should be noted that the preset batching task may be a wind control task of a bank, where the preset batching task at least includes a batching subtask, and a second-layer batching subtask in the second-layer service logic depends on the first-layer batching subtask, and the specific reason may be that an entry of the second-layer batching subtask is an execution result of the first-layer batching subtask. According to the method and the device, the dependency relationship among the batch subtasks in the preset batch tasks is determined according to the levels of the business logics, and the batch subtasks are divided into the batch subtasks with different levels of business logics. Each distributed machine is a task execution node in the batch running process.

Illustratively, the steps S10 to S50 include:

after a first-layer batch subtask on a first-layer service logic is determined in a preset batch task; determining a first pulling thread corresponding to each distributed machine, and adding a subtask message tag corresponding to the first-layer batch subtask to a preset message queue; polling a preset message queue through each first pulling thread to obtain a subtask message label, respectively pulling a first subtask message corresponding to the subtask message label for a distributed machine corresponding to each first pulling thread to consume, obtaining an execution result corresponding to each first subtask message, and storing the task execution condition of each first subtask message to a preset task execution condition table; judging whether the batch subtasks on the first-level service logic are all executed, if not, continuing to execute the batch subtasks on the first-level service logic until the batch subtasks on the first-level service logic are all executed; if the batch sub-tasks on the first-level service logic are all executed, determining a second-level batch sub-task of which the second-level service logic depends on the first-level batch sub-task in a preset batch task; determining a second pulling thread corresponding to each distributed machine, and adding a serial number corresponding to the second-layer batch subtask to a preset message queue, wherein the serial number identifies a pipeline operation process of user data in the batch running process, and an execution result of the corresponding first subtask message can be inquired in a preset task execution condition table according to the serial number; polling a preset message queue to obtain serial numbers through each second pulling thread, and respectively consuming execution results corresponding to the distributed machine pulling serial numbers corresponding to the second pulling threads to obtain the execution results corresponding to each second subtask message; and judging whether the second-level service logic is the last-level service logic, if so, proving that the preset batch running task is finished, otherwise, determining a next batch running subtask on the next-level service logic depending on the second-level batch running subtask in the preset batch running task, and executing the next batch running subtask until the batch running subtask on the last-level service logic is executed. The specific implementation manner of executing the next batch subtask may refer to the execution process of the second-layer batch subtask, which is not described herein again.

As an example, the step of determining a first-level batch subtask on a first-level business logic in the preset batch task includes:

and when the execution node of the preset batch task is detected to be reached, determining a first-layer batch subtask on the first-layer service logic by inquiring the preset batch task configuration table. Wherein the subtask message tag may be encoded for a user unique identity, such as a CCIF number (for a public customer number), etc.

As an example, the preset batching task configuration table includes fields such as specific subtasks in the preset batching task and dependencies between the specific subtasks. The specific subtask may be a first-layer batch subtask or a second-layer batch subtask, and the like.

As an example, the preset task execution condition table is a table used for recording task execution conditions of the preset batch task, and may specifically include a serial number, entry and execution results of specific subtasks, a CCIF number, and the like. For example, assuming that the specific subtask is model scoring, the preset task execution condition table records entry of the scoring model and an output result of the scoring model, and assuming that the specific subtask is personal blacklist detection, the preset task execution condition table records input and output of the personal blacklist detection.

In addition, it should be noted that, because there is a dependency relationship between the batching subtasks, some batching subtasks generally need to be executed depending on the execution result of the previous batching subtask, and further in this embodiment of the present application, each batching subtask is divided in the service logic level, so that each batching subtask can be sequentially executed according to the service logic level, and a situation that some batching subtasks are interrupted and wait due to lack of the execution result of the previous batching subtask does not occur when the batching subtask is executed, so that each batching subtask can be executed more orderly, and a pipeline operation is formed, thereby improving the efficiency of the distributed batching.

In step S10, the step of determining the first pull thread corresponding to each distributed machine includes:

step S11, acquiring address information corresponding to the distributed machine and batch information corresponding to the first-layer batch subtask;

step S12, according to the address information and the batch information, inquiring a corresponding pull thread in a preset pull thread creation table;

step S13, if the query fails, the first pull thread is created, and the first pull thread is stored in the preset pull thread creation table;

step S14, if the query is successful, the queried pull thread is used as the first pull thread.

In this embodiment, it should be noted that the preset pull thread creation table is a table used for creating or querying a pull thread, and may specifically include a batch name, an execution batch, a subtask identifier, address information of a distributed machine, a creation time field, and the like, where the batch name is a name of a batch to be batched corresponding to the preset batching task, the execution batch is a batch determined according to a service logic hierarchy, for example, the execution batch of the first-level service logic is a first batch, the execution batch of the second-level service logic is a second batch, and the subtask identifier is an identity identifier of the subtask, and the address information of the distributed machine may be an ip address of the distributed machine.

Illustratively, the steps S11 to S14 include:

acquiring an ip address corresponding to the distributed machine and batch information corresponding to the first-layer batch running subtask, wherein the batch information can be an execution batch; using the address information and the batch information as indexes, and inquiring whether a corresponding pull thread exists in a preset pull thread creation table; if a pull thread corresponding to the ip address and the execution batch together exists in the preset pull thread creation table, taking the inquired pull thread as a first pull thread; and if the pull thread corresponding to the ip address and the execution batch does not exist in the preset pull thread creation table, creating a new pull thread as a first pull thread, and storing the first pull thread and the corresponding pull thread information to the preset pull thread creation table. The pull thread information comprises information such as a batch name, an execution batch, a subtask identifier and an ip address of a corresponding distributed machine.

In step S30, the step of determining whether all the batch subtasks in the first-level business logic have been executed includes:

step S31, judging whether the first layer batch subtask has parallel subtasks which are not executed and completed;

step S32, if yes, judging that the batch subtasks on the first-level business logic are not all executed;

and step S33, if not, judging that all the batch subtasks on the first-level business logic are executed.

In this embodiment, it should be noted that the parallel subtask is in the same level of business logic as the first-level batch subtask.

Illustratively, the steps S31 to S33 include:

inquiring whether the first-layer batch subtask has a corresponding parallel subtask or not in a preset task execution condition table; if the first-layer batch subtask does not have a corresponding parallel subtask, judging that the batch subtasks on the first-layer service logic are all executed; if the first-layer batch subtask has a corresponding parallel subtask, acquiring the task execution condition of the parallel subtask recorded in the preset task execution condition table, determining whether the parallel subtask is completely executed according to the task execution condition, if the parallel subtask is completely executed, judging that the batch subtasks in the first-layer service logic are completely executed, and if the parallel subtask is not completely executed, judging that the batch subtasks in the first-layer service logic are not completely executed.

Wherein, in step S40, the step of determining a second tier batch subtask whose second tier business is logically dependent on the first tier batch subtask includes:

step S41, inquiring the dependency relationship field corresponding to the first layer batch subtask in a preset batch task configuration table;

and step S42, inquiring the second-layer batch subtask on the second-layer business logic in the preset batch task configuration table according to the dependency relationship field.

In this embodiment, it should be noted that the dependency relationship field stores dependency relationship information corresponding to the first-layer batch subtask, where the dependency relationship information may be a dependency relationship tag, for example, if the dependency relationship tag is (a, b), where a is a subtask identifier of the first-layer batch subtask, b is a subtask identifier of the second-layer batch subtask, and the dependency relationship tag (a, b) indicates that the second-layer batch subtask depends on the first-layer batch subtask.

Exemplarily, the steps S41 to S42 include;

inquiring a dependency relationship field corresponding to the first-layer batch subtask in a preset batch task configuration table; determining a dependency subtask identifier dependent on the first-layer batch subtask according to the dependency relationship label in the dependency relationship field; and inquiring the batch subtasks corresponding to the dependency subtask identifications in the preset batch task configuration table, and taking the batch subtasks obtained through inquiry as second-layer batch subtasks on second-layer business logic.

Compared with the technical means of carrying out batch running by evenly dividing user data into different distributed machines to realize distributed batch running in the prior art, the distributed batch running optimization method comprises the steps of firstly determining a first-layer batch subtask on first-layer service logic in a preset batch running task, and determining a first pull thread corresponding to each distributed machine; polling and pulling a first subtask message corresponding to the first-layer batch subtask for the corresponding distributed machine through each first pulling thread to consume, so as to obtain an execution result corresponding to the first subtask message; judging whether the batch subtasks on the first-level service logic are all executed; if the execution is finished, determining a second-layer batch subtask of which the second-layer service logic depends on the first-layer batch subtask; and determining second pulling threads corresponding to the distributed machines, and polling and pulling the execution results corresponding to the second-layer batch subtasks for the corresponding distributed machines through the second pulling threads to consume so as to perform distributed batch running. The embodiment of the application subdivides the granularity of the batch into the subtask level according to the dependency relationship between the batch subtasks on the business logic, and because the data volume of a single subtask is far smaller than the data volume corresponding to a single user, the calculation time required by the subtask is shorter, the consumption time difference of different subtask messages consumed by different distributed equipment is smaller, and then the subtask messages are polled and pulled to each distributed machine for consumption by the pulling thread, wherein the subtask messages can be pulled again after the pulled subtask messages are consumed, so that the aim of continuously carrying out batch by splitting the batch with larger magnitude into small batches is fulfilled, the influence of the difference between the data volume of the user data on the batch efficiency can be reduced, and the influence of the difference between the machine resources of different distributed machines only can be exerted on the small batches, and the difference of the batch running consumption time between the distributed machines can be basically ignored for the whole batch running task when the batch running is carried out in small batches, so the technical defect that the load of the distributed machines is unbalanced and the efficiency of the distributed batch running is influenced because the machine resources of different distributed machines are different and the data volume between different user data is different in the prior art is overcome, and the efficiency of the distributed batch running is improved.

Example two

Further, referring to fig. 2, based on the first embodiment of the present application, in another embodiment of the present application, the same or similar contents to the first embodiment described above may be referred to the above description, and are not repeated again in the following. On this basis, in step S20, the step of obtaining the first task execution result by polling and pulling the first subtask message corresponding to the first-layer batch subtask for the corresponding distributed machine by each first pulling thread to consume includes:

step S21, pulling a preset number of first subtask messages for the corresponding distributed machine through the first pulling thread;

step S22, submitting each of the first subtask messages to a task execution thread pool of the distributed machine to consume the first subtask messages, so as to obtain an execution result corresponding to each of the first subtask messages;

step S23, determining whether the first subtask message pulled in the current round is consumed, and if the first subtask message pulled in the current round is consumed, returning to the execution step: and pulling a preset number of first subtask messages for the corresponding distributed machines through the first pulling thread.

Illustratively, the steps S21 to S23 include:

acquiring a preset number of subtask message labels from a preset message queue through the first pull thread, and pulling a first subtask message corresponding to each subtask message label to a corresponding distributed machine; submitting each first subtask message to a task execution thread pool of the distributed machine so that an available thread in the task thread pool can consume each first subtask message, and obtaining an execution result corresponding to each first subtask message; after the polling interval time, judging whether the first subtask message pulled in the current round is consumed completely, if not, waiting for the first subtask message pulled in the current round to be consumed completely, and returning to the execution step: judging whether the first subtask message pulled in the current round is consumed completely; if the consumption is finished, returning to the execution step: and pulling a preset number of first subtask messages for the corresponding distributed machine through the first pulling thread to pull each first subtask message of the next round, so as to realize polling pulling of the subtask messages through the pulling thread. The polling interval time is the interval time of pulling the subtask message twice in the polling process.

In step S22, the step of submitting each of the first subtask messages to a task execution thread pool of the distributed machine to consume the first subtask messages and obtain an execution result corresponding to each of the first subtask messages includes:

step S221, judging whether an available thread exists in the task execution thread pool;

step S222, if the available thread exists, submitting the first subtask message to the available thread for consumption to obtain the execution result;

in step S223, if there is no available thread, the method returns to the step of: and judging whether available threads exist in the task execution thread pool or not.

In this embodiment, it should be noted that, a task execution thread pool is maintained in the distributed machine, the running state of each task execution thread in the task execution thread pool is monitored in real time, once an available thread exists, a pulled subtask message is submitted to the available thread for consumption, and when the available thread does not exist, the pulled thread will pull a next round of subtask message after the round of pulled subtask message is consumed, so that it is ensured that the distributed machine performs saturated pipeline operation, and the load of the distributed machine is not too high, thereby achieving the purpose of reasonably distributing the load for the distributed machine.

With respect to step S21, before the step of pulling a preset number of first subtask messages for a corresponding distributed machine through the first pull thread, the distributed batch optimization method further includes:

adding a subtask message tag to a preset message queue;

in step S21, the step of pulling, by the first pull thread, a preset number of first subtask messages for the corresponding distributed machine includes:

ensuring S211, obtaining a preset number of subtask message tags from the preset message queue through the first pull thread;

and S212, pulling the first subtask message corresponding to each subtask message label to the distributed machine.

Exemplarily, the steps S211 to S212 include: acquiring a preset number of subtask message labels from the preset message queue through the first pull thread; and inquiring a first subtask message corresponding to each subtask message label in a preset task execution condition table, and pulling each first subtask message to the distributed machine.

The embodiment of the application provides a method for polling and pulling subtask messages for consumption in a batch running process, namely pulling a preset number of first subtask messages for a corresponding distributed machine through a first pulling thread, submitting each first subtask message to a task execution thread pool of the distributed machine, so as to consume the first subtask messages and obtain an execution result corresponding to each first subtask message; judging whether the first subtask message pulled in the current round is consumed completely, if the first subtask message pulled in the current round is consumed completely, returning to the execution step: and pulling a preset number of first subtask messages for the corresponding distributed machines through the first pulling thread. Wherein, the pull thread not only pulls the subtask message once at a certain time interval, but also reaches the time point of pulling the subtask message, whether the first subtask message pulled in the round is consumed is judged, if the first subtask message pulled in the round is consumed, the subtask message of the next round is pulled, the situation that the load distribution is not reasonable due to insufficient resources of the distributed machines can be prevented, and the subtask message is pulled to the distributed machines by the pull thread in a polling mode, can ensure that each distributed machine can carry out saturated running-water type operation under the condition of reasonably distributing the machine load, the distributed batch running method and the distributed batch running device can ensure that the time points of the batch running tasks of all the distributed machines are consistent, and can avoid the situation that a part of the distributed machines are in an idle state to wait for other distributed machines to execute the tasks, so that the efficiency of the distributed batch running is improved.

EXAMPLE III

Further, referring to fig. 3, based on the first embodiment of the present application, in another embodiment of the present application, the same or similar contents to the first embodiment described above may be referred to the above description, and are not repeated again in the following. On this basis, in step S50, the preset batching task includes a wind-controlled batching task, the first-layer subtask includes a client batching record subtask, the second-layer subtask includes a model entry parameter subtask, the execution result includes a client batching record, the second pull thread corresponding to each distributed machine is determined, the second pull thread is respectively used for polling and pulling the execution result corresponding to the second-layer batching subtask for the corresponding distributed machine to consume, and the step of performing distributed batching includes:

step S51, determining second pulling threads corresponding to the distributed machines, and polling and pulling the client batch records for the corresponding distributed machines through the second pulling threads for consumption so as to construct model participation;

step S52, after the model participation subtasks in the second-level business logic are all executed, determining a model scoring subtask of a third-level business logic depending on the model participation subtask and a third pulling thread corresponding to each distributed machine;

step S53, polling and pulling model entries corresponding to the model evaluation task for the corresponding distributed machines through the third pulling threads respectively for consumption, and inputting the model entries into a preset evaluation model for model evaluation to obtain a model evaluation result;

step S54, after the model evaluation task on the third-level business logic is executed, determining a wind control handling subtask of which the fourth-level business logic depends on the model evaluation task and a fourth pulling thread corresponding to each distributed machine;

step S55, polling and pulling a model scoring result corresponding to the wind control disposal subtask for the corresponding distributed machine through each fourth pulling thread to consume, so as to input the model scoring result into a preset wind control disposal model to perform model scoring, and obtain a disposal output result;

and step S56, performing wind control treatment on the user corresponding to the wind control batch running task according to the treatment output result.

In this embodiment, it should be noted that the preset batching task includes a wind-controlled batching task, and is used for performing post-credit risk early warning. The preset batch running task comprises a client batch running record subtask, a model parameter entering subtask, a model scoring subtask and a model handling subtask, wherein the client batch running record subtask is used for generating a client batch running record, the model parameter entering subtask is used for generating a model parameter of a scoring model, the model scoring subtask is used for carrying out post-loan risk scoring on a user according to the model parameter entering, the model handling subtask is used for carrying out wind control handling on the user according to a model scoring result, and the wind control handling mode can be account freezing, loan recycling in advance and the like. The specific process of polling and pulling the subtask message to the corresponding distributed machine for consumption in steps S51 to S56 may refer to the specific contents in steps S10 to S50 and the detailed steps thereof, and will not be described herein again.

In an implementation manner, referring to fig. 4, a system architecture diagram of a distributed batch is shown, where Queue1, Queue X, and Queue Y are preset message queues corresponding to batch subtasks of different business logic levels, server1, server2, and server3 are distributed machines, 1, X, and Y are subtask messages corresponding to batch subtasks of different business logic levels, the distributed machines perform pipeline operations according to the business logic level (1-X-Y), and a dashed line with an arrow indicates a process of pulling a thread to pull a subtask message. Referring to FIG. 5, a diagram of the dependency relationships between the batch subtasks is shown, wherein the "generate customer batch record" corresponds to the customer batch record subtask; the method comprises the following steps that a personal blacklist, an enterprise blacklist, an overdue condition, a personal convergence, an enterprise convergence, a work and business element, an enterprise credit investigation, a personal credit investigation, a tax bank and a peer shield respectively correspond to a personal blacklist subtask, an enterprise blacklist subtask, an overdue condition subtask, a personal convergence subtask, an enterprise convergence subtask, a work and business element subtask, an enterprise credit investigation subtask, a personal credit investigation subtask, a tax bank subtask and a peer shield subtask, and all the subtasks are model entry parameter subtasks; the "scoring model" corresponds to the model scoring subtask, and the "treatment model" corresponds to the model treatment subtask. The individual blacklist subtask is used for inquiring whether the current legal person hits the individual blacklist or not according to an ECIF number (individual client number) of the enterprise legal person, if so, updating the hit date to a preset task execution condition table, and using the hit date as a model of a scoring model to participate in the scoring; the enterprise blacklist subtask is used for inquiring whether the current enterprise hits the enterprise blacklist according to the CCIF number blacklist removing system of the enterprise, if so, updating the hit date to a preset task execution condition table, and using the hit date as a model of the scoring model to participate in the scoring; the overdue condition subtask is used for recording a first overdue date, a latest expected date and a maximum overdue amount to a preset task execution condition table according to whether the current all borrowings of the ECIF number of the current enterprise legal person are overdue or not, and if the overdue conditions exist, the first overdue date, the latest expected date and the maximum overdue amount are used as a model of the scoring model to be entered; the enterprise summarization subtask mainly queries related enterprise summarization records according to social unified credit codes, business registration numbers and organization codes of current enterprises and a summarization system of a company, updates the summarization records and summarization early warning results to a preset task execution condition table, and takes the updated summarization records and summarization early warning results as a model of a scoring model to be entered; the personal summarization subtask is used for querying personal summarization records of the summarization system according to the identity card number of the current enterprise legal person, updating the personal summarization records and the summarization early warning results to a preset task execution condition table, and taking the personal summarization records and the summarization early warning results as a model of a scoring model to be entered; the enterprise credit investigation subtask mainly inquires an enterprise credit investigation record according to any one of an enterprise name, a social unified credit code, a business registration number and an organization code, and records a relevant field required by a business in a field of a model entry; the personal credit investigation subtask is used for inquiring the credit investigation condition of a person according to the name of the legal person and the identification number, and recording relevant fields required by the service in fields of the model for entering the parameters; the business element subtask is used for inquiring business information of the enterprise according to any one of the enterprise name, the social unified credit code and the business registration number of the enterprise and recording relevant fields required by the business in fields of model participation; the tax bank subtask is used for inquiring the tax payment information condition of the enterprise according to the taxpayer identification number of the enterprise and recording relevant indexes required by the business in the field of the model entry; and the same shield subtask is used for inquiring the same shield information of the legal person according to the legal person name and the identity card number, and recording the result in the field of the model parameter. After all the subtasks are executed, the corresponding subtask state is updated to be completed.

The embodiment of the application provides a distributed batch running method for post-credit risk early warning, wherein in the whole distributed batch running process, a batch running subtask is divided into a client batch recording subtask, a model entry participation subtask, a model scoring subtask and a model handling subtask according to business logic levels, and then a subtask message is pulled to a distributed machine for consumption in a polling mode through a pulling thread of each distributed machine according to the sequence of the business logic levels, so that each distributed machine can realize running-type post-credit risk early warning operation, and the efficiency of post-credit risk early warning batch running is improved.

Example four

The embodiment of the present application further provides a distributed batching optimization device, where the distributed batching optimization device includes:

Optionally, the determining module is further configured to:

judging whether the first layer batch subtask has parallel subtasks which are not executed and completed;

if yes, judging that the batch running subtasks in the first-level service logic are not executed completely;

and if not, judging that all batch running subtasks on the first-level service logic are executed.

Optionally, the first poll consumption module is further configured to:

pulling a preset number of first subtask messages for the corresponding distributed machines through the first pulling thread;

submitting each first subtask message to a task execution thread pool of the distributed machine so as to consume the first subtask message and obtain an execution result corresponding to each first subtask message;

judging whether the first subtask message pulled in the current round is consumed completely, if the first subtask message pulled in the current round is consumed completely, returning to the execution step: and pulling a preset number of first subtask messages for the corresponding distributed machines through the first pulling thread.

Optionally, the first poll consumption module is further configured to:

judging whether an available thread exists in the task execution thread pool or not;

if the available thread exists, submitting the first subtask message to the available thread for consumption to obtain the execution result;

if the available thread does not exist, returning to the step of: and judging whether available threads exist in the task execution thread pool or not.

Optionally, the distributed batch optimization apparatus is further configured to:

adding a subtask message tag to a preset message queue;

the step of pulling a preset number of first subtask messages for the corresponding distributed machine through the first pulling thread includes:

acquiring a preset number of subtask message labels from the preset message queue through the first pull thread;

and pulling the first subtask message corresponding to each subtask message label to the distributed machine.

Optionally, the first determining module is further configured to:

acquiring address information corresponding to the distributed machine and batch information corresponding to the first-layer batch running subtask;

inquiring a corresponding pulling thread in a preset pulling thread creation table according to the address information and the batch information;

if the query fails, creating the first pulling thread, and storing the first pulling thread to the preset pulling thread creation table;

and if the query is successful, taking the queried pull thread as the first pull thread.

Optionally, the second determining module is further configured to:

inquiring a dependency relationship field corresponding to the first-layer batch subtask in a preset batch task configuration table;

and inquiring a second-layer batch subtask on the second-layer service logic in the preset batch task configuration table according to the dependency relationship field.

Optionally, the preset batching task includes a wind-controlled batching task, the first-layer subtask includes a client batching record subtask, the second-layer subtask includes a model participation subtask, the execution result includes a client batching record, and the second polling consumption module is further configured to:

determining second pulling threads corresponding to the distributed machines, and polling and pulling the client batch records for the corresponding distributed machines through the second pulling threads for consumption so as to construct model participation;

after the model participation subtasks in the second-level business logic are all executed, determining a model scoring subtask of a third-level business logic which depends on the model participation subtask and a third pulling thread corresponding to each distributed machine;

polling and pulling model entries corresponding to the model evaluation task for the corresponding distributed machines respectively through the third pulling threads for consumption, and inputting the model entries into a preset evaluation model for model evaluation to obtain a model evaluation result;

after the model evaluation task on the third-level business logic is executed, determining a wind control handling subtask of which the fourth-level business logic depends on the model evaluation task and a fourth pull thread corresponding to each distributed machine;

polling and pulling a model scoring result corresponding to the wind control handling subtask for the corresponding distributed machine for consumption through each fourth pulling thread, and inputting the model scoring result into a preset wind control handling model for model scoring to obtain a handling output result;

and performing wind control treatment on the user corresponding to the wind control batch running task according to the treatment output result.

The distributed batch running optimization device provided by the invention adopts the distributed batch running optimization method in the embodiment, and solves the technical problem of low efficiency of distributed batch running. Compared with the prior art, the beneficial effects of the distributed batch optimization device provided by the embodiment of the invention are the same as the beneficial effects of the distributed batch optimization method provided by the embodiment, and other technical characteristics of the distributed batch optimization device are the same as those disclosed by the embodiment method, which are not repeated herein.

EXAMPLE five

An embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the distributed batch optimization method of the first embodiment.

Referring now to FIG. 6, shown is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, the electronic device may include a processing means (e.g., a central processing unit, a graphic processor, etc.) that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage means into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device, ROM and RAM are trained on each other via the bus. An input/output (I/O) interface is also connected to the bus.

Generally, the following systems may be connected to the I/O interface: input devices including, for example, touch screens, touch pads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, and the like; output devices including, for example, Liquid Crystal Displays (LCDs), speakers, vibrators, and the like; storage devices including, for example, magnetic tape, hard disk, etc.; and a communication device. The communication means may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While the figures illustrate an electronic device with various systems, it is to be understood that not all illustrated systems are required to be implemented or provided. More or fewer systems may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means, or installed from a storage means, or installed from a ROM. The computer program, when executed by a processing device, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

The electronic equipment provided by the invention adopts the distributed batch running optimization method in the embodiment, and the technical problem of low efficiency of distributed batch running is solved. Compared with the prior art, the beneficial effects of the electronic device provided by the embodiment of the invention are the same as the beneficial effects of the distributed batch optimization method provided by the embodiment, and other technical features of the electronic device are the same as those disclosed by the embodiment method, which are not repeated herein.

It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the foregoing description of embodiments, the particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

EXAMPLE six

The present embodiment provides a computer readable storage medium having computer readable program instructions stored thereon for performing the method of distributed batch optimization in the first embodiment.

The computer readable storage medium provided by the embodiments of the present invention may be, for example, a USB flash disk, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or any combination thereof. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present embodiment, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer-readable storage medium may be embodied in an electronic device; or may be present alone without being incorporated into the electronic device.

The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a first-layer batch subtask on a first-layer service logic in a preset batch task, and determining a first pull thread corresponding to each distributed machine; polling and pulling a first subtask message corresponding to the first-layer batch subtask for the corresponding distributed machine through each first pulling thread to consume, so as to obtain an execution result corresponding to the first subtask message; judging whether the batch subtasks on the first-level service logic are all executed; if the execution is finished, determining a second-layer batch subtask of which the second-layer service logic depends on the first-layer batch subtask; and determining second pulling threads corresponding to the distributed machines, and polling and pulling the execution results corresponding to the second-layer batch subtasks for the corresponding distributed machines through the second pulling threads to consume so as to perform distributed batch running.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the names of the modules do not in some cases constitute a limitation of the unit itself.

The computer-readable storage medium provided by the invention stores computer-readable program instructions for executing the distributed batch optimization method, and solves the technical problem of low efficiency of distributed batch. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the embodiment of the invention are the same as the beneficial effects of the distributed batch optimization method provided by the embodiment, and are not described herein again.

EXAMPLE seven

The computer program product provided by the application solves the technical problem of low efficiency of distributed batch running. Compared with the prior art, the beneficial effects of the computer program product provided by the embodiment of the invention are the same as those of the distributed batch optimization method provided by the embodiment, and are not repeated herein.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A distributed batch optimization method, comprising:

2. The distributed lot optimization method of claim 1, wherein said determining whether all of the sub-tasks of the lot at the first level of business logic have been executed comprises:

3. The distributed batch optimization method as claimed in claim 1, wherein said step of obtaining the first task execution result by polling and pulling the first subtask message corresponding to the first layer batch subtask for the corresponding distributed machine by each of the first pull threads comprises:

4. The distributed lot optimization method of claim 3, wherein the step of submitting each of the first subtask messages to a task execution thread pool of the distributed machine to consume the first subtask messages to obtain the execution result corresponding to each of the first subtask messages comprises:

5. The distributed lot optimization method of claim 3, wherein prior to the step of pulling a preset number of first subtask messages for a corresponding distributed machine by the first pull thread, the distributed lot optimization method further comprises:

adding a subtask message tag to a preset message queue;

6. The distributed lot optimization method of claim 1, wherein the step of determining a first pull thread for each distributed machine comprises:

7. The distributed lot optimization method of claim 1, wherein the step of determining a second-level lot subtask that is logically dependent on the first-level lot subtask comprises:

8. The distributed batch optimization method of claim 1, wherein the pre-defined batch task includes a wind controlled batch task, the first layer subtask includes a client batch record subtask, the second layer subtask includes a model entry subtask, and the execution result includes a client batch record,

the step of determining a second pull thread corresponding to each distributed machine, and polling and pulling the execution result corresponding to the second-layer batch subtask for the corresponding distributed machine through each second pull thread to consume so as to perform distributed batch processing includes:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the distributed batch optimization method of any one of claims 1 to 8.

10. A computer-readable storage medium having stored thereon a program for implementing a distributed batch optimization method, the program being executable by a processor to perform the steps of the distributed batch optimization method according to any one of claims 1 to 8.