CN114581221B

CN114581221B - Distributed computing system and computer device

Info

Publication number: CN114581221B
Application number: CN202210481598.5A
Authority: CN
Inventors: 简道红; 顾科才; 吴华
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2022-07-29
Anticipated expiration: 2042-05-05
Also published as: CN114581221A

Abstract

The embodiment of the specification provides a distributed computing system and a computer device. According to the distributed computing system, under the original master-slave distributed computing framework, operator nodes with high computing capacity are added, and in the process of processing and solving the target optimization model constructed based on the original optimization problem by using the operator nodes, some tasks with large computation amount and time consumption are processed, so that the processing efficiency is improved.

Description

Distributed computing system and computer device

Technical Field

The present description relates to the field of data processing technology, and in particular, to a distributed computing system and a computer device.

Background

Aiming at optimization problems in a plurality of service scenes, an optimization model can be constructed to solve, and the optimal result of each decision variable in the optimization problems is obtained. Because some business scenarios involve a large amount of business data, and the constructed optimization model includes many decision variables, for example, hundreds of millions of decision variables in some scenarios, the amount of computation involved in solving the decision variables is large, which results in a slow processing speed. Therefore, in view of the above optimization problem, it is necessary to provide a solution that can improve the processing efficiency.

Disclosure of Invention

In this regard, the present specification provides a distributed computing system and computer device.

According to a first aspect of embodiments of the present specification, a distributed computing system is provided, where the distributed computing system includes a master node, operator nodes, and a plurality of working nodes, and the distributed computing system is configured to determine an optimization result of decision variables in a plurality of target optimization models constructed based on an original optimization problem, where each target optimization model includes part of the decision variables of the original optimization problem and target variables introduced when constraints of the original optimization problem are incorporated in the target optimization model, and each working node corresponds to one target optimization model;

wherein the master node, the operator nodes and the plurality of working nodes are configured to iteratively perform the following steps:

the operator nodes are used for determining constraint errors corresponding to the constraint conditions based on the optimization results of the decision variables determined by each working node in the previous iteration after receiving indication information which is sent by the main node and indicates that the iteration task is not terminated, sending the constraint errors to the main node, determining the optimization results of the target variables based on the constraint errors, and sending the optimization results to the working nodes;

the working node is used for updating a target optimization model corresponding to the working node by using the optimization result of the target variable sent by the operator node and determining the optimization result of each decision variable in the updated target optimization model;

and the main node is used for determining whether to terminate the iterative task based on the constraint error and informing the operator node.

According to a second aspect of embodiments herein, there is provided a computer device, the computer device being a master node, an operator node and/or a worker node in the distributed computing system of the first aspect.

By applying the scheme of the embodiment of the specification, a distributed computing system for solving a target optimization model constructed based on an original optimization problem is provided, the distributed computing system comprises a main node, an operator node and a plurality of working nodes, the operator node is used for determining a constraint error corresponding to a constraint condition of the original optimization problem based on an optimization result of each decision variable determined by each working node in the previous iteration after receiving indication information which indicates that an iteration task is not terminated and is sent by the main node, sending the constraint error to the main node, determining an optimization result of a target variable in the target optimization model except the decision variable based on the constraint error and sending the optimization result to the working nodes, the working nodes are used for receiving the optimization result of the target variable sent by the operator node and updating the target optimization model corresponding to the working nodes by using the received optimization result of the target variable, determining the optimization result of each decision variable in the updated target optimization model; and the main node is used for determining whether to terminate the iterative task based on the constraint error and informing the operator nodes. And repeating the steps by the three nodes to obtain the optimization result of each decision variable of the target optimization model. In the distributed computing system provided by the embodiment of the description, under the original master-slave distributed computing framework, an operator node with high computing capability is added, and in the process of processing the iterative solution target optimization model by using the operator node, some tasks with large computation amount and time consumption are processed, so that the processing efficiency is improved. And the decision variables of the original optimization problem are distributed to different working nodes for solving, so that the processing efficiency can be greatly improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.

FIG. 1 is a schematic diagram of a distributed computing system, one embodiment of the present description.

FIG. 2 is a schematic diagram of a method for constructing an objective optimization model, according to one embodiment of the present disclosure.

FIG. 3 is a schematic diagram of a distributed computing system, one embodiment of the present description.

FIG. 4 is a schematic diagram of a computer device of one embodiment of the present description.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the claims that follow.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Aiming at optimization problems in a plurality of service scenes, an optimization model can be constructed to solve, and the optimal result of each decision variable in the optimization problems is obtained. For example, in a resource allocation scenario, limited resources are generally required to be allocated to multiple resource receivers, and each resource receiver may generate a profit by using the allocated resources, so that a resource allocation manner may be determined, and the sum of the profits generated by each resource receiver based on the allocated resources is the highest under the condition that the corresponding limiting conditions are met. For the resource allocation problem, resources allocated to each resource receiver can be used as decision variables, total profit maximization is used as an optimization target, an optimization model is constructed, some limiting conditions to be followed in the resource allocation process are used as constraint conditions of the optimization model, and then the optimization model can be solved to determine the optimization results of the decision variables.

Due to some scenes, the data volume related to an original optimization model constructed by a user based on a business problem is large, decision variables contained in the original optimization model are hundreds of millions, and a conventional solver cannot be used for solving or is slow in solving speed. Thus, the optimization goal of the original optimization model can be split into multiple sub-goals by some specific algorithm, and then the sub-goals are solved in parallel. By the method, the solving speed of the optimization model which is relatively complex and has more decision variables can be accelerated, and the processing efficiency is improved. When the original optimization model is solved by using the specific algorithms, the original optimization model is generally converted into a target optimization model in a specified form, and then the target optimization model is solved by using the algorithms.

Taking an ADMM algorithm Alternating Direction Method of Multipliers, exchange Direction multiplier Method) as an example, the ADMM algorithm can be used for solving the decomposable convex optimization problem, is suitable for solving a large-scale optimization problem, can equivalently decompose the original optimization problem into a plurality of sub-problems which can be solved by using the ADMM algorithm, then solves each sub-problem in parallel, and finally coordinates the solution of the sub-problems to obtain a global solution of the original optimization problem. However, the model that can be solved by the ADMM algorithm is usually a model of a specified form, for example, an original optimization model that is generally constructed based on a business problem is all constrained, and thus, the original optimization model can be first converted into an equivalent target optimization model without constraint, for example, the target optimization model can be represented by a augmented lagrange function. And then, solving the target optimization model by using an ADMM algorithm to obtain an optimization result of each decision variable in the original optimization model.

In general, when an objective optimization model is constructed based on an original optimization model with constraints, the constraints can be coupled into the original optimization model through some objective variables, and the objective optimization model without the constraints is constructed, that is, other variables, hereinafter referred to as objective variables, except for decision variables of the original optimization problem are additionally introduced in the process of adding the constraints into the objective optimization model. Therefore, variables to be optimized in the target optimization model comprise decision variables of the original optimization problem and newly added target variables, and then the target variables and the decision variables in the original optimization problem can be solved in a multi-round iteration mode.

Due to the fact that the quantity of related service data is large in some service scenes, the number of decision variables in the finally constructed target optimization model is large, the calculation quantity involved in the process of iteratively solving the target optimization model is large, and some existing distributed calculation frameworks are not suitable for solving the optimization problem, so that the processing speed is low.

Based on this, embodiments of the present specification provide a distributed computing system, in an original master-slave distributed computing framework, an operator node with high computing capability is added, and in a process of processing an iterative solution target optimization model by using the operator node, some tasks (e.g., calculating a constraint error, solving a target variable, and the like) with large computation amount and time consumption are processed, so that processing efficiency is improved.

As shown in fig. 1, the distributed computing system includes a main node, an operator node, and several work nodes. The main node, the operator node and the working node can run on a physical machine or a virtual machine. The nodes may run on different physical machines or on the same physical machine. The operator nodes can adopt nodes with high computing power to specially process some computing tasks which are large in computing amount and time-consuming in the process of solving the target optimization model.

The distributed computing system may be used to determine an optimization result for each decision variable in the original optimization problem. The original optimization problem may be an optimization problem in various service scenarios, for example, in an investment and financial management scenario, how to allocate limited investment amount to a plurality of financial management products so as to maximize financial profit, or an optimization problem of determining a loan amount given by a bank to each user so as to maximize an approval amount of the bank to all users. The setting may be specifically set based on an actual service scenario and a service requirement, and the embodiments of the present specification are not limited.

In order to improve the processing efficiency, when an optimization model is constructed for the original optimization problem, a plurality of objective optimization models which can be solved in parallel can be constructed, and each objective optimization model contains a part of decision variables of the original optimization problem. For example, service data related to the original optimization problem may be divided into a plurality of data fragments, each data fragment being data related to a part of decision variables of the original optimization problem, and then an objective optimization model is constructed based on each data fragment, an optimization objective of the original optimization problem, and a constraint condition. The target optimization model can be manually constructed by a user or automatically constructed by equipment, and the implementation of the method is not limited. By constructing a plurality of objective optimization models based on the original optimization problem, each objective optimization model includes a part of decision variables of the original optimization problem (as shown in fig. 1, assuming that there are 4n decision variables of the original optimization problem, n decision variables may be included in each working node, for example, the decision variables included in each working node are x1-xn, xn +1-x2n, x2n +1-x3n, and x3n +1-x4 n), and then solving the objective optimization models by using a plurality of working nodes, which is equivalent to allocating the decision variables of the original optimization problem to different working nodes for solution, the processing efficiency can also be greatly improved.

In the process of solving the objective optimization model, each working node in the distributed computing system may correspond to one objective optimization model for solving the decision variables in the objective optimization model. In addition, because the target optimization model is coupled with the constraints of the original optimization problem, the variables in the target optimization model include some target variables (e.g., dual variables, etc.) newly added in the process of coupling the constraints, in addition to the decision variables of the original optimization problem. When solving the objective optimization model, the steps of fixing the objective variables, solving the decision variables, fixing the decision variables and solving the objective variables can be iteratively executed.

In the process of solving the objective optimization model, the main node, the operator node and each working node may iteratively perform the following steps to determine an optimization result of each decision variable in the objective optimization model.

For example, after the K-th iteration is completed, the master node may determine whether the iterative task may be terminated, and if not, the master node may send indication information indicating that the iterative task is not terminated to the operator node, so as to start a new round of iterative computation.

After receiving the indication information sent by the master node, the operator nodes can determine constraint errors according to the optimization results of the decision variables determined by each working node in the previous iteration (the K-th iteration) and the constraint conditions corresponding to the original optimization problem. The constraint error is then sent to the master node. Meanwhile, the operator nodes can determine the optimization result of the target variable in the target model based on the constraint error and send the optimization result to each working node.

After each working node receives the optimization result of the target variable sent by the operator node, the received optimization result of the target variable is used for updating the target optimization model corresponding to the working node, and the optimization result of each decision variable in the updated target optimization model is determined, so that the operator node can be used in the next iteration.

And after receiving the constraint error sent by the operator node, the main node determines whether to terminate the iterative task based on the constraint error and informs the operator node. For example, in some embodiments, the master node may compare the constraint error determined in the current iteration with the constraint error determined in the previous iteration, and if the constraint error is smaller than a preset threshold, the iteration process may be considered to be terminated. In some embodiments, the master node may also determine whether the constraint error determined in a plurality of consecutive iterations changes, and determine whether to terminate the iterative process, for example, if the constraint error determined in none of the three consecutive iterations changes, the iterative process may be terminated. In some embodiments, the master node may determine to terminate the iteration flow after determining that the number of iterations reaches the preset number. The method can be specifically set according to actual requirements, and the embodiment of the specification is not limited.

And repeating the steps of the main node, the operator node and the working node until the iteration task is terminated. And then, taking the finally obtained optimization result of each decision variable as the optimal solution of the original optimization problem.

For ease of understanding, the above iterative solution process is explained below with reference to an example, assuming that the decision variables included in the original optimization problem are

、

、…

With a constraint of

The distributed computing system comprises 5 working nodes in total, and functions corresponding to the target optimization model on each working node are consistent, but the decision variables are different. For example, the target optimization model on each working node can be uniformly expressed as follows:

wherein the content of the first and second substances,

for the target variable, the decision variable contained in the target optimization model in the working node w1

-

Decision variables contained in the objective optimization model in the working node w2

-

Decision variables contained in the objective optimization model in the working node w3

-

Decision variables contained in the objective optimization model in the working node w4

-

Decision variables contained in the objective optimization model in the working node w5

-

。

After determining that the iteration process needs to be continued, the main node can notify operator nodes, and the operator nodes can determine the constraint error based on the numerical value of each decision variable determined by each working node in the previous iteration. For example, the working node 1 may determine the decision variable in the previous iteration

-

Then according to the constraint condition

Determining decision variables

-

Corresponding constraint values (

) And sending the calculated constraint value to an operator node, wherein other nodes are similar.

After the operator nodes receive the constraint values sent by each working node, the constraint values can be accumulated to obtain

Then can calculate

As a constraint error. Then, an optimization result of the target variable in the target optimization model can be determined based on the constraint error, the optimization result of the determined target variable is sent to the working node, and the operator node can also send the constraint error to the main node.

After the working node receives the optimization result of the target variable sent by the operator node, the received optimization result of the target variable can be used for replacing the original value of the target variable in the target optimization model, the target model is updated, and then the updated target optimization model is solved, so that the value of each decision variable in the target optimization model in the current iteration is obtained. In the process of solving the target optimization model, an ADMM algorithm or an algorithm with a similar function may be adopted, which is not limited in the embodiments of the present specification.

After receiving the constraint error sent by the operator node, the master node may determine whether to end the iteration based on the constraint error. And if the operator nodes are determined not to be finished, informing the operator nodes so that the operator nodes continue the next iteration flow.

In some embodiments, when an operator node determines a constraint error based on an optimization result of each decision variable determined by one iteration of each working node and a constraint condition of an original optimization problem, there are two ways, one is to solve the target optimization model updated in the previous iteration by each working node to obtain an optimization result of each decision variable in the target optimization model after the previous iteration, and then directly send the optimization result to the operator node, and after receiving the optimization result of the decision variable sent by each working node, the operator node can substitute the optimization result of each decision variable into the constraint condition to determine the constraint error. For example, in the above example, the working node 1 may assign the decision variables

-

And sending the numerical value determined in the previous iteration to the operator nodes, wherein the rest working nodes are similar. The operator node receives the decision variables sent by the 5 working nodes

-

Then, the above constraint conditions may be substituted

Then can calculate

As a constraint error.

In another mode, each working node can solve the target optimization model updated in the previous round to obtain the optimization result of each decision variable in the target optimization model after the previous round of iteration, and then each working node determines the optimization result and the constraint condition based on each decision variableAnd determining a constraint value, sending the constraint value to the operator nodes, accumulating the constraint values sent by the working nodes by the operator nodes to obtain an accumulated result, and determining a constraint error based on the accumulated result and a constraint condition. For example, the working node 1 may determine the decision variable in the previous iteration

-

Then according to the constraint condition

Determining decision variables

-

Corresponding constraint values (

) And sending the calculated constraint value to an operator node, and similarly sending the calculated constraint value to other nodes. After the operator nodes receive the constraints sent by each working node, the constraint values can be accumulated to obtain

Then can calculate

As a constraint error.

In the second mode, the working node determines the constraint value based on the optimization result and the constraint condition of each decision variable and then sends the constraint value to the operator node, so that the data transmission quantity between the operator node and the working node can be reduced.

In some embodiments, the operator nodes are used for recording the completion conditions of the main nodes and the working nodes in the current round of iterative tasks in addition to calculating the constraint errors and the optimization results of the target variables. For example, after receiving the optimization results of the decision variables of the current round sent by all the working nodes or the constraint values corresponding to the decision variables, the operator nodes can mark the states of the working nodes as the completed states. Meanwhile, after the operator node receives the indication information of whether the main node prompts to stop the iteration process, the working state of the main node can be marked as a finished state. Therefore, the working node can determine whether the current iteration task of the main node is completed or not through the working state information in the operator node, and the main node can also determine whether the current iteration task of the working node is completed or not through the state information recorded in the operator node.

In some embodiments, after notifying the operator node that the iterative task is not terminated, the operator node and the work node may continue to perform the next iteration, and at this time, the master node may execute some tasks unrelated to the iterative task while waiting for the constraint error of the next iteration. For example, the master node may record the constraint error determined by each iteration obtained from the operator node in a report and display the report to a user, or the master node may execute some other scheduling task.

In some embodiments, when constructing the target optimization model, the constraint condition of the original optimization problem may be coupled to the original optimization model corresponding to the original optimization problem by using a dual variable to obtain the target optimization model, and thus, the target variable may be the dual variable. Wherein, if the constraints in the original optimization problem include equality constraints and non-equality constraints, the equality constraints and the inequality constraints can be coupled to the original optimization model by using one dual variable each, i.e. the target variable can include two or more dual variables.

In some embodiments, in addition to coupling constraints to the original optimization model, a secondary penalty term comprising a specified variable may be added to the original optimization model when constructing the target optimization model. Thus, the target variable may also be a specified variable in the secondary penalty term.

In some embodimentsThe constraint condition of the original optimization problem may include an equality constraint condition and an inequality constraint condition, and the operator node may determine the equality constraint error based on the optimization result of each decision variable determined in one iteration of each working node and the equality constraint condition when determining the constraint error based on the optimization result of each decision variable determined in one iteration of each working node and the constraint condition of the original optimization problem, for example, assuming that the equality constraint condition is

After determining the values of the decision variables, the values of the decision variables can be substituted into the constraint conditions to determine

As a constraint error.

In addition, the inequality constraint error can also be determined based on the optimization result of each decision variable determined by one iteration on each working node and the inequality constraint condition. For example, assume the inequality constraint condition is

As a constraint error.

When solving a business optimization problem by using a similar algorithm such as an ADMM algorithm, an easily conceivable way is to manually complete the conversion from an original optimization model to a target optimization model by a user, for example, when the user constructs an optimization model based on the original optimization problem in a business scene, the optimization model needs to be constructed into the target optimization model which can be solved by the algorithm, and this way is complicated, so that the processing efficiency of the business problem is low, and at the same time, the user is required to know both the original business optimization problem and the implementation principle of the algorithm, and the error is easy to occur.

In order to solve the above problem, in some embodiments, as shown in fig. 2, the work node is further configured to obtain a processing request submitted by a user, where the processing request includes an original optimization model constructed based on an original optimization problem, a constraint condition corresponding to the original optimization model, and a data fragment, where the data fragment is data related to a part of decision variables of the original optimization problem; that is, a user may divide the service data related to the original optimization problem into a plurality of data fragments, where each data fragment includes a part of the decision variables.

After each working node acquires the original optimization model, the data fragments and the constraint conditions, a target optimization model can be constructed based on the original optimization model, the data fragments and the constraint conditions, the optimization target of each target optimization model is equivalent to the optimization target of the original optimization model, and the target optimization model can be decomposed into a plurality of submodels capable of being solved in parallel.

The original optimization problem may be an optimization problem related to each service field, and the original optimization problem may be a linear programming problem or a non-linear programming problem.

In some scenarios, an interactive interface may be provided for a user, and the user may define, through the interactive interface, each decision variable, optimization target, constraint condition corresponding to the original optimization model, and the like in the original optimization model. The original optimization model and the constraint condition may be represented in various forms, for example, conditions, formulas, and the like, and the present application is not limited thereto. The original optimization model constructed by the user may be only an expression of the model, and the constraint condition may also be an expression of the constraint condition, that is, the model and the constraint condition only contain the type of the parameter, and the specific numerical value of the parameter needs to be extracted from the business data.

Meanwhile, a user can import original service data related to the original optimization problem through an interactive interface, and then divide the original service data into a plurality of data fragments, wherein the number of the data fragments can be consistent with the number of the working nodes. Each data slice contains part of the decision variables in the original optimization problem.

After the original optimization model, the data fragments and the constraint conditions input by the user are obtained, each working node can construct a target optimization model according to the original optimization model, the data fragments and the constraint conditions input by the user. For example, the decision variables and the optimization target of the target optimization model can be determined based on the original optimization model, the data fragment and the constraint condition, and the constraint condition is combined into the original optimization model to construct a target optimization model without constraint.

The optimization target of the target optimization model is equivalent to the optimization target of the original optimization model, that is, the optimization result of each decision variable in the original optimization model can be obtained by determining the optimization result of each decision variable in the target optimization model. In addition, the target optimization model can be decomposed into a plurality of sub-models, and the plurality of sub-models can be solved in parallel, so that the target optimization model is split into the plurality of sub-models and then solved in parallel after the original optimization model is converted into the target optimization model, and compared with the method of directly solving the original optimization model, the processing efficiency can be greatly improved.

For example, users typically need to loan banks, which determine the loan amount of each user based on the user's risk level. Suppose there are 2 million users, denoted by i, that need to loan 10 banks, denoted by j. Each bank will make loan assessment for each user, determine a passing rate, denoted pij, each user has a limit, denoted ai, each user has a risk level, denoted ri, each bank has a risk amount upper limit control, denoted Rj. Assuming that it is now necessary to decide how much credit each bank approves each user, denoted xij, the goal is to maximize the rate of passage (i.e., hopefully the user can maximize the amount approved).

Then the user can construct an original optimization model, the decision variable of the original optimization model is how much loan amount each bank approves to each user, the optimization target is the maximization of the approval amount, and the constraint conditions of the original optimization model are 2 types: (1) the loan amount of each user cannot exceed the upper limit, (2) the risk amount of each bank cannot exceed the upper limit, and the original optimization model can be expressed by the following formula (1):

formula (1)

The constraint conditions corresponding to the original optimization model can be expressed by formula (2):

formula (2)

The user can define the original optimization model and the constraint conditions through an interactive interface, and meanwhile, the user can import original business data related to the original optimization problem, such as user information (risk level of each user, loan assessment passing rate of each bank to each user, and the like), and bank information (such as risk amount upper limit of each bank, and the like). Before the original service data is imported, a user may first divide the original service data into a plurality of data fragments, for example, assuming that the distributed computing system has 5 working nodes, the original service data may be divided into 5 data fragments, each data fragment corresponds to user information of 0.3 hundred million users and bank information of 10 banks, and then a target optimization model may be constructed based on each data fragment, where the target optimization model includes part of decision variables

。

By the method, when the business problem is optimized, a user only needs to construct an original optimization model based on the business problem to be solved, and then a target optimization model is automatically constructed by the working node of the distributed computing system based on the original optimization model constructed by the user, the constraint condition corresponding to the original optimization model and the business data related to the business problem, so that the user does not need to manually convert the model, and the accuracy of the finally obtained optimization result is improved. In addition, the original optimization model is converted into the objective optimization model which can be split and solved, so that the objective optimization model can be split into a plurality of sub-models and then solved in parallel in the solving process, and the processing efficiency can be greatly improved.

In some embodiments, since the optimization result of each decision variable in the original optimization model needs to be obtained when the objective optimization model is solved, and thus the decision variables of the objective optimization model need to include the decision variables in the original optimization model, when the objective optimization model is constructed, terms related to the decision variables in the original optimization model, hereinafter referred to as decision variable terms, may be determined based on the data segments and used as a part of the objective optimization model. Meanwhile, because the target optimization model is a model without constraint conditions, terms related to the constraint conditions, hereinafter referred to as constraint terms, can also be determined based on the constraint conditions and the data fragmentation of the original optimization model, and then the target optimization model is constructed according to the decision variable terms and the constraint terms.

In some embodiments, when determining the decision variable terms related to the decision variables in the original optimization model based on the data slices, coefficients corresponding to the decision variables in the original optimization model may be extracted from the data slices, and a coefficient matrix may be constructed using the coefficients. The dimensionality of a coefficient matrix constructed by the extracted coefficients is Nx 1, N represents the number of decision variables included in the data fragment, and elements in the matrix are coefficients corresponding to the decision variables. Then, the decision variable item can be obtained by using the coefficient matrix and the decision variable included in the data slice.

For example, taking the above-mentioned scenario in which the bank approves the loan to the user, the original optimization model may be represented by the following formula (1):

namely, the loan amount of each user is determined by each bank as a decision variable in the original optimization model

The coefficient of the decision variable is the loan assessment passing rate of each bank to each user

Assuming that the service data is related to 10 banks corresponding to 100 users, and thus there are 1000 decision variables included in the data segment, then the coefficients of the 1000 decision variables can be extracted from the data segment

Obtaining a coefficient matrix Q = [ (])

，

、…]The decision variable term Q can then be determined

。

In some embodiments, when determining the constraint terms of the original optimization model based on the data slices and the constraint conditions, coefficients corresponding to decision variables in the constraint conditions may be extracted from the data slices to construct a first constraint matrix. And a first constraint matrix constructed based on the coefficients of the decision variables in the extracted constraint conditions is an M x N-dimensional matrix, wherein M represents the number of the constraint conditions, N represents the number of the decision variables contained in the data fragments, and the element of the ith row and the jth column in the matrix represents the coefficient corresponding to the jth decision variable in the ith constraint condition.

In addition, the corresponding limit value in each constraint condition, i.e., the value on the right side of the constraint equation or inequality, may be extracted, and then a second constraint matrix may be constructed based on the limit value, where the second constraint matrix is an M × 1-dimensional matrix, M represents the number of constraint conditions, and the elements in the matrix are the limit values corresponding to the respective constraint conditions.

Constraint terms may then be constructed based on the first constraint matrix, the second constraint matrix, and decision variables contained in the traffic data.

In some embodiments, if the constraints include equality constraints and inequality constraints, the equality constraints and the inequality constraints may separately construct corresponding constraint terms. For example, an equality constraint term may be constructed based on all equality constraints, and then an inequality constraint term may be constructed based on all inequality constraints.

For example, taking the above scenario in which the bank approves the loan to the user as an example, the original optimization model includes two inequality constraints, which are specifically expressed in formula (2):

formula (2)

Wherein in the first constraint, the coefficient of each decision variable is 1, and in the second constraint, the coefficient of each decision variable is the risk level of each user

Assuming that the data segment includes data of 10 users and 10 banks, that is, the data segment includes 100 decision variables, a coefficient corresponding to each of the 100 decision variables may be determined from the data segment

Then, a first constraint matrix may be constructed based on the extracted coefficients, where the first constraint matrix is a 2 × 100 matrix, which may be represented as a:

. Then, the constraint conditions corresponding to the service data can be extractedE.g. from constraints

Is prepared by

From the constraint

Is prepared by

Building a second constraint matrix A': (

，

). Constraint terms may then be derived based on the first constraint matrix, the second constraint matrix, and variables in the traffic data: a. the

。

Similarly, if the constraint condition includes an equality constraint condition, the equality constraint item may also be constructed based on the data fragmentation and the equality constraint condition, and the specific construction method may refer to the above steps, which are not described herein again.

In some embodiments, after determining the decision variable terms and the constraint terms, an objective optimization model may be constructed based on the decision variable terms and the constraint terms. For example, a dual variable may be added to each constraint term, and then summed with a decision variable term to obtain the objective optimization model, where the dual variable is also a variable of the objective optimization model in the solution process, i.e. the objective variable.

In some embodiments, when constructing the objective optimization model, a penalty term may be further added to the model, for example, a quadratic term including a diagonal matrix whose elements are designated variables may be constructed, and then the objective optimization model may be constructed based on a constraint term added with a dual variable, a decision term, and the quadratic term, for example, the above terms may be summed to obtain the objective optimization model, where the variables of the constructed objective optimization model include new variables in addition to the decision variables of the original optimization model: and the target variable comprises the dual variable and the specified variable.

In some embodiments, the original optimization problem may be a resource allocation problem for allocating a target number of resources to be allocated to the plurality of resource recipients if a constraint is satisfied, so that a benefit obtained by the plurality of resource recipients using the allocated resources is maximized.

For example, the resource to be allocated may be an amount to be invested by the user, the resource receiver may be each financial product (e.g., fund, stock, etc.), the decision variable may be an amount to be allocated to each financial product, the optimization goal is that the accumulated profit of each financial product is the highest, the constraint condition may be that the sum of the amounts to be allocated to each financial product is equal to the total amount to be invested, and the risk caused by the user to invest each financial product does not exceed the risk level that the user can bear, etc.

For another example, the resource to be allocated may be a total amount of coupons for a certain marketing activity, the resource recipient may be each user account, the decision variable may be an amount of coupons allocated to each user account, the optimization goal is that the conversion rate of the user to these optimized coupons is maximized (i.e., the amount of coupons used by the user and the proportion of the total amount), and the constraint condition is that the total amount of coupons allocated to each user account is equal to the total amount, and the like, and other constraint conditions in some marketing activities.

The distributed computing system of the embodiments of the present specification is explained below in conjunction with a specific embodiment.

Often the user needs to loan into a bank, which determines the loan amount for each user based on the user's risk level. Suppose there are 2 million users, denoted by i, that need to loan 10 banks, denoted by j. Each bank will make loan assessment for each user, determine a passing rate, denoted pij, each user has a limit, denoted ai, each user has a risk level, denoted ri, each bank has a risk amount upper limit control, denoted Rj. Assuming that it is now necessary to decide how much credit each bank approves each user, denoted xij, the goal is to maximize the rate of passage (i.e., hopefully the user can maximize the amount approved).

The user may build an original optimization model for the above optimization problem, for example, the original optimization model may be represented by formula (1), and the constraint condition may be represented by the following formula (2):

formula (1)

Formula (2)

Namely, the optimization goal of the original optimization model is to maximize the user throughput, and the constraint conditions include two types: (1) the loan amount of each user cannot exceed the upper limit, and (2) the risk amount of each bank cannot exceed the upper limit.

The optimization results of the original optimization model can then be determined by the methods provided by the embodiments of the present specification.

For example, the user can define decision variables, optimization targets, constraint conditions, and related business data of the original optimization model through an interactive interface, such as user data of 2 hundred million users (user risk level, limit, loan passing rate with respect to each bank, etc.), and data of 10 banks (risk amount upper limit of the bank, etc.).

Due to the large amount of data and the decision variables of 20 hundred million, in order to improve the processing efficiency, the business problem can be solved by means of a distributed computing system. As shown in fig. 3, the distributed computing system includes a main node, an operator node, and several working nodes (assuming that there are 5 working nodes), and in order to improve processing efficiency, a user may divide the service data into a plurality of data fragments, for example, into 5 data fragments, where each data fragment includes user data and bank data of 0.4 hundred million users, and then input the data fragments, the original optimization model, and the constraint conditions. Each working node can obtain a data fragment, an original optimization model and a constraint condition, and then each working node can execute the same process to obtain a target optimization model corresponding to each data fragment, wherein each target optimization model is equivalent to a sub-model of an original optimization problem and comprises a part of original decision variables, for example, a total of 20 hundred million decision variables, and the target model constructed by each working node comprises 4 hundred million decision variables. Specifically, the processing flow of each work node is as follows:

1. and extracting coefficients of a part of decision variables in the original optimization model from the data slices to obtain an N-dimensional vector Q, wherein N is the number (4 hundred million) of the decision variables. For example, each decision variable may be extracted

Coefficient of (2)

To obtain a vector Q, and further obtain a decision variable term Q

。

2. And extracting coefficients of decision variables in each constraint condition from the data fragments to construct a first constraint matrix A with NxM dimensions. Wherein, N is the number of decision variables (4 hundred million), M is the number of constraint conditions, and the ith row and the jth column are coefficients of the Nth decision variable in the Mth constraint condition. For example, for inequality constraints

The coefficient 1 of each decision variable can be extracted for the inequality constraint

The coefficients of each decision variable can be extracted

. A 2 x4 billion first constraint matrix can then be constructed. The limit values (the values on the right side of the inequality) in each constraint may then be extracted to construct a second constraint matrix A' of M1, for example, from the constraints

Is prepared by

From the constraint

Is prepared from

Building a second constraint matrix A': (

，

）。

Further, constraint term A can be derived

-A ’。

3. And constructing a quadratic term B, wherein the quadratic term comprises a diagonal matrix, and elements in the diagonal matrix are variables t.

4. And constructing a target optimization model corresponding to each data slice based on the decision variable items, the constraint items and the diagonal matrix. A dual variable lambda can be added in the constraint term, and then the decision variable term, the constraint term added with the dual variable and the diagonal matrix are summed to obtain the target optimization model. For example, the target optimization model can be expressed as the following equation (3) (which may, of course, be more complex):

F（

）= Q

+λ（A

+ B (t) formula (3)

Wherein the variables to be optimized of the target optimization model are except the original decision variables

T and lambda are also included.

After the target optimization model corresponding to each data fragment is constructed by each working node, each decision variable of the target optimization model can be solved through the distributed computing system, and the specific solving process is as follows:

the following steps are iteratively executed by each working node, the main node and the operator nodes to obtain the optimal solution of the decision variable in each target optimization model:

after the K-1 th iteration is completed, the master node can judge whether to continue the K-th iteration based on the constraint error, and if the fact that the iteration needs to be continued is determined, the operator node is informed.

And after receiving the notification of the main node, the operator node can update the state of the iteration task of the K-1 th round of the main node into a finished state. Then, the operator node can start the K-th iteration task, and specifically, the constraint values corresponding to the working nodes after the K-1 th iteration can be obtained from the working nodes

、

(wherein,

the optimization result of the decision variables in each working node in the K-1 round), then the constraint values of each working node are accumulated, and the constraint errors corresponding to each constraint condition are calculated

An absolute value of, and

as a constraint error. Then, the constraint error can be sent to a main node, and meanwhile, the solving result of target variables lambda and t in a target model in the K-th round can be determined according to the constraint error; the solution result of the target variables λ and t in the K-th round can be sent to each working node.

After receiving the solution result of the target variables λ and t in the kth round, each working node may update λ and t in the target optimization model with the solution result to obtain an updated model, then solve the updated model to obtain the optimization result of each decision variable in the kth round, and calculate a constraint value according to a constraint condition, for example, calculate a constraint value according to the optimization result of each decision variable in the kth round

、

And obtaining the constraint value of the K-th round, and then sending the constraint value to the operator node. After the operator nodes receive the constraint values sent by all the working nodes, the processing condition of the K-th iteration task of the working nodes can be updated to be in a finished state.

After receiving the constraint error sent by the operator node, the master node can determine whether to continue the iteration according to the constraint error, for example, if the constraint error determined by the K-th iteration and the constraint error determined by the K-1-th iteration are smaller than a constraint threshold, the iteration can be stopped, otherwise, the operator node is informed to continue the next iteration. After informing the operator nodes to continue the next iteration, the main node can process some work irrelevant to the iteration task so as to maximally utilize the computing resources of the main node.

By repeating the iteration process until the iteration is finished, the final optimization result of each decision variable can be obtained and used as the final solution of the business problem.

The various technical features in the above embodiments can be arbitrarily combined, so long as there is no conflict or contradiction between the combinations of the features, but the combination is limited by the space and is not described one by one, and therefore, any combination of the various technical features in the above embodiments also falls within the scope disclosed in the present specification.

Correspondingly, an embodiment of the present specification further provides a computer device, as shown in fig. 4, where the computer device includes at least one of the main node, the work node, and the operator node in the above embodiments. The specific processing flow of the master node, the working node, and the operator node in constructing the target optimization model and solving the target optimization model may refer to the description in the above embodiments, and will not be described herein again.

Accordingly, an embodiment of the present specification further provides a computer storage medium, where a program is stored in the storage medium, and when the program is executed by a processor, the program implements the step flow executed by the main node, the operator node, or the work node in any of the above embodiments.

Embodiments of the present description may take the form of a computer program product embodied on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having program code embodied therein. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.

Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The embodiments of the present specification are intended to cover any variations, uses, or adaptations of the embodiments of the specification following, in general, the principles of the embodiments of the specification and including such departures from the present disclosure as come within known or customary practice in the art to which the embodiments of the specification pertain. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the embodiments being indicated by the following claims.

It is to be understood that the embodiments of the present specification are not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the embodiments of the present specification is limited only by the appended claims.

The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A distributed computing system comprises a main node, operator nodes and a plurality of working nodes, wherein the distributed computing system is used for determining optimization results of decision variables in a plurality of target optimization models constructed based on an original optimization problem, each target optimization model comprises part of the decision variables of the original optimization problem and target variables introduced when constraint conditions of the original optimization problem are incorporated into the target optimization models, and each working node corresponds to one target optimization model;

the main node is used for receiving a constraint error corresponding to the constraint condition determined by the previous iteration sent by the operator node, determining whether to terminate the current iteration task or not based on the constraint error corresponding to the constraint condition determined by the previous iteration, and informing the operator node;

the operator node is used for determining an optimization result of the target variable in the current iteration based on a constraint error corresponding to the previous iteration after receiving indication information which is sent by the main node and indicates that the current iteration task is not terminated, and sending the optimization result to the working node;

the working node is used for updating a target optimization model corresponding to the working node by using the optimization result of the target variable sent by the operator node in the current iteration, and determining the optimization result of each decision variable in the updated target optimization model in the current iteration;

and the operator node is used for determining a constraint error corresponding to the constraint condition in the current iteration based on the optimization result of each decision variable determined by each working node in the current iteration, and sending the constraint error to the main node.

2. The distributed computing system of claim 1, wherein the operator nodes are configured to determine a constraint error corresponding to the constraint condition in a current iteration based on an optimization result of each decision variable determined by each working node in the current iteration, and the method includes:

obtaining an optimization result of each decision variable determined by the current iteration from each working node, and determining a constraint error corresponding to the constraint condition in the current iteration based on the optimization result of the decision variable obtained from each working node and the constraint condition; or

And acquiring a constraint value from each working node, accumulating the constraint values acquired from the working nodes, and determining a constraint error corresponding to the constraint condition in the current iteration based on an accumulated result and the constraint condition, wherein the constraint value is determined based on an optimization result of each decision variable determined in the current iteration and the constraint condition.

3. The distributed computing system of claim 1, wherein the operator nodes are further configured to update the state of the last iteration task of the master node to a completion state after receiving the indication information sent by the master node;

and the system is used for updating the state of the current round of iteration task of the working node to a completion state after receiving the optimization results or constraint values of the decision variables in the current round of iteration sent by all the working nodes, wherein the constraint values are determined based on the optimization results of the decision variables in the current round of iteration and the constraint conditions.

4. The distributed computing system of claim 1, the master node further configured to execute other tasks unrelated to the iterative task after notifying the operator nodes that the current round of iterative task has not terminated.

5. The distributed computing system of claim 1, the objective optimization model being derived based on:

coupling the constraint conditions of the original optimization problem to an original optimization model corresponding to the original optimization problem by using a dual variable to construct the target optimization model;

wherein the target variable comprises the dual variable.

6. The distributed computing system of claim 5, further comprising a secondary penalty term in the target optimization model, the secondary penalty term including a specified variable, the target variable further comprising the specified variable.

7. The distributed computing system of claim 1, the constraints comprising equality constraints and inequality constraints, the operator node being configured to determine an equality constraint error for a current iteration based on an optimization result of decision variables determined by each of the working nodes in the current iteration, and the equality constraints;

and the optimization method is used for determining the inequality constraint error corresponding to the current round of iteration based on the optimization result of each decision variable determined by each working node in the current round of iteration and the inequality constraint condition.

8. The distributed computing system of claim 1, the worker node further operable to perform the steps of:

acquiring a processing request submitted by a user, wherein the processing request comprises an original optimization model constructed based on the original optimization problem, a constraint condition corresponding to the original optimization model and data fragments, and the data fragments are data related to partial decision variables of the original optimization problem;

and constructing an object optimization model based on the original optimization model, the data fragments and the constraint conditions, wherein the optimization object of the object optimization model is equivalent to the optimization object of the original optimization model, and the object optimization model can be decomposed into a plurality of submodels capable of being solved in parallel.

9. The distributed computing system of claim 8, wherein the working node, when configured to construct a target optimization model based on the original optimization model, the data shards, and the constraints, is specifically configured to:

determining decision variable terms related to decision variables in the original optimization model based on the data shards;

determining a constraint item corresponding to the original optimization model based on the constraint condition and the data slice;

and constructing the target optimization model based on the decision variable term and the constraint term.

10. The distributed computing system according to claim 9, wherein the working node, when determining, based on the data shards, a decision variable term related to a decision variable in the original optimization model, is specifically configured to:

extracting coefficients corresponding to decision variables in the original optimization model from the data fragments to construct a coefficient matrix, wherein the coefficient matrix is an Nx 1-dimensional matrix, and N is the number of the decision variables in the data fragments;

constructing the decision variable term based on the coefficient matrix and a decision variable included in the data slice.

11. The distributed computing system of claim 9, wherein the working node, when configured to determine constraint terms of the original optimization model based on the data shards and the constraint conditions, is specifically configured to:

respectively extracting coefficients corresponding to decision variables in each constraint condition from the data fragments to construct a first constraint matrix, wherein the first constraint matrix is an M x N-dimensional matrix, M represents the number of the constraint conditions, and N represents the number of the decision variables included in the data fragments;

respectively extracting a limiting value in each constraint condition from the data fragments, and constructing a second constraint matrix, wherein the second constraint matrix is an M multiplied by 1 dimensional matrix, and M represents the number of the constraint conditions;

constructing the constraint term based on the first constraint matrix, the second constraint matrix, and decision variables included in the data slice.

12. The distributed computing system of claim 9, the constraint terms comprising equality constraint terms constructed based on equality constraints of the constraints and inequality constraint terms constructed based on inequality constraints of the constraints.

13. A computer device comprising the master node, the worker node, and/or the operator node in the distributed computing system as claimed in any one of claims 1 to 12.