WO2024001610A1

WO2024001610A1 - Method for solving goal programming problem, node selection method, and apparatus

Info

Publication number: WO2024001610A1
Application number: PCT/CN2023/095590
Authority: WO
Inventors: 李希君; 杨沐明; 匡宇飞; 曾嘉
Original assignee: 华为云计算技术有限公司
Priority date: 2022-07-01
Filing date: 2023-05-22
Publication date: 2024-01-04

Abstract

Embodiments of the present application provide a method for solving a goal programming problem, a node selection method, a node evaluation model training method, and an apparatus. The method for solving a goal programming problem comprises: acquiring a goal programming problem uploaded by a user; adjusting a candidate node set of the goal programming problem according to a node evaluation model, wherein nodes in the candidate node set correspond to sub-problems to be solved of the goal programming problem, and the node evaluation model is used for predicting correlation quantities of bound values of each node after multi-step unfolding; and solving the goal programming problem on the basis of the adjusted candidate node set to obtain a solution of the goal programming problem. According to the scheme in the embodiments of the present application, a candidate node set is adjusted on the basis of prediction results of correlation quantities of bound values of each node after multi-step unfolding, and thus, the efficiency of solving programming problems is improved.

Description

Methods for solving goal programming problems, methods and devices for selecting nodes

This application claims priority to the Chinese patent application filed with the China Patent Office on July 1, 2022, with application number 202210773925.4 and the application title "Model Training Method and Device", the entire content of which is incorporated into this application by reference.

This application requires the priority of the Chinese patent application submitted to the China Patent Office on November 15, 2022, with the application number 202211430767.9 and the application name "Method for solving the target planning problem, method and device for selecting nodes", and all its contents are approved This reference is incorporated into this application.

This application claims the priority of the Chinese patent application submitted to the China Patent Office on November 25, 2022, with the application number 202211490522.5 and the application name "Method for solving the target planning problem, method and device for selecting nodes", and its entire content has been approved This reference is incorporated into this application.

Technical field

The embodiments of the present application relate to the field of data processing technology, and more specifically, to a method for solving a goal planning problem, a method for selecting nodes, a method for training a node evaluation model, and a device.

Background technique

Operations research mainly uses mathematical methods to study optimization approaches and plans for various systems, providing decision-makers with a basis for scientific decision-making. Mathematical programming is an important branch of operations research. The main research goal is to find the optimal solution that maximizes or minimizes the objective function in a given area. In real scenarios, many problems have integer constraints, such as production scheduling, supply chain, production scheduling, and factory selection. Such problems can be modeled as mixed integer programming problems or integer programming problems, and through mathematical programming Solver and other tools to solve.

Mathematical programming solvers are mainly implemented based on the branch and bound algorithm. The branch-and-bound algorithm is a search and iterative method that repeatedly divides the solution space of the original problem into smaller and smaller subsets during the iterative calculation process, that is, it repeatedly generates sub-problems (also called nodes) of the original problem. , by continuously solving sub-problems to obtain the optimal solution to the original problem. For complex problems, for example, problems with large decision variables, a large number of nodes will be generated during the solution process, and the solution will take a long time, making it difficult to meet the user's needs.

Therefore, how to improve the performance of solving planning problems has become an urgent problem to be solved.

Contents of the invention

Embodiments of the present application provide a method for solving a target planning problem, a method for selecting nodes, a method for training a node evaluation model, and a device. This method is conducive to improving the efficiency of solving planning problems.

The first aspect provides a method for solving the goal planning problem, including: obtaining the goal planning problem uploaded by the user; adjusting the candidate node set of the goal planning problem according to the node evaluation model, where the candidate node set includes multiple nodes , each node among the multiple nodes corresponds to a sub-problem to be solved in the target planning problem, and the node evaluation model is used to predict the correlation quantity of the limit value of each node after multi-step expansion; based on the adjusted candidate node set, Solve the goal programming problem to obtain the solution result of the goal programming problem.

At least part of the decision variables of the goal planning problem are integer variables, that is, at least part of the decision variables have integer values. In other words, the goal programming problem is a pure integer programming model or a mixed integer programming model.

In the embodiment of the present application, the node evaluation model can predict the correlation quantity of the node's limit value before and after multi-step expansion, which is beneficial to predicting the optimal solution that can be searched from the multiple nodes. The correlation quantity can be used to measure the node The long-term value of expansion makes the selection of target nodes more accurate, which is conducive to selecting appropriate nodes for corresponding processing, making the nodes in the adjusted candidate node set more likely to obtain the optimal solution, thus helping to improve the solution efficiency.

Combined with the first aspect, in some implementations of the first aspect, the method further includes: determining a node evaluation model according to user instructions, and the node evaluation model is deployed on the cloud management platform.

For example, the user selects the node evaluation model from a plurality of selectable candidate node evaluation models.

As another example, users can enter nodes to evaluate the model.

Combined with the first aspect, in some implementations of the first aspect, adjusting the candidate node set of the target planning problem according to the node evaluation model includes: determining the first target node according to the node evaluation model; generating a first target node Child node; add the child node of the first target node to the candidate node set.

In the case where the target node includes the first target node, the first target node may be determined according to the output result of the node evaluation model, and iterative calculation may be performed based on the first target node during the solution process. Specifically, the first target node is expanded to obtain the child nodes of the first target node, and then iterative calculation is performed. The output results of the node evaluation model can be used to measure the long-term value of node expansion, which is helpful to judge the possibility of the node obtaining the optimal solution after multi-step expansion. Based on this, the first target node determined is more likely to obtain the optimal solution. , which is beneficial to improving the convergence speed and solving efficiency. For example, for the minimum optimization problem, the smaller the boundary value of the node in multi-step expansion, the higher the possibility of obtaining the global optimal solution starting from this node. The node evaluation model can be used to predict the boundary value of the node after multi-step expansion, which is helpful to judge the possibility of the node getting the optimal solution after multi-step expansion.

Combined with the first aspect, in some implementations of the first aspect, adjusting the candidate node set of the target planning problem according to the node evaluation model also includes: determining the second target node according to the node evaluation model; Remove from the set of candidate nodes.

When the target node includes a second target node, the second target node can be determined according to the output result of the node evaluation model, and the second target node can be pruned during the solution process. The output results of the node evaluation model can measure the long-term value of node expansion, which is helpful to judge the possibility of obtaining the optimal solution after node expansion, and determine the second target node based on this. Pruning nodes that are less likely to obtain the optimal solution can reduce the solution space and avoid the time delay caused by expanding and solving on useless nodes, thus improving the solution efficiency. For example, for the minimum optimization problem, the greater the limit value of the node in multi-step expansion, the smaller the possibility of obtaining the global optimal solution starting from this node. The node evaluation model can be used to predict the limit value of the node after multi-step expansion, which is helpful to judge the possibility of the node getting the optimal solution after multi-step expansion. Based on this, the second target node is determined to avoid expanding and expanding on useless nodes. Find the time delay caused by the solution.

Combined with the first aspect, in some implementations of the first aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion.

Combined with the first aspect, in some implementations of the first aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion and the limit value of each node's parent node. The difference between the limit values.

For example, the difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node can be The difference between the limit value of the node after multi-step expansion and the limit value of the node's parent node.

Taking the goal programming problem as a minimum value optimization problem as an example, if the limit value is a lower bound value, during the solution process, as the number of iterations increases, the lower bound values of the multiple nodes after multi-step expansion may be very small, which is limited. Due to factors such as computer calculation accuracy, it is difficult to compare the lower bound values of multiple nodes after multi-step expansion. The differences between these multiple nodes before and after multi-step expansion are more obvious. In the embodiment of the present application, the target node can be determined by predicting the differences between the multiple nodes before and after the multi-step expansion, and then comparing the differences between the multiple nodes before and after the multi-step expansion, which is beneficial to improving the accuracy of target node selection.

Combined with the first aspect, in some implementations of the first aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node's parent node. The difference between the limit values.

Combined with the first aspect, in some implementations of the first aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node's parent node. The difference between the limit values is indicated by the function value of the multi-step pseudo-cost function of each node, which satisfies the following formula:

Among them, C(·) represents the multi-step pseudo-cost function of the node. c(·) represents the change in the limit value of the node before and after single-step expansion. Node _Ni is a child node of node P.

In connection with the first aspect, in some implementations of the first aspect, the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to that of the plurality of nodes. The difference between the limit values of nodes other than the first target node after multi-step expansion and the limit values of the parent nodes of other nodes other than the first target node.

In connection with the first aspect, in some implementations of the first aspect, the difference between the multiple expanded limit values of the second target node and the limit value of the parent node of the second target node is greater than or equal to the multiple nodes. The difference between the expanded limit value of the remaining nodes in and the parent node of the remaining node.

Combined with the first aspect, in some implementations of the first aspect, the second target node belongs to k nodes among the plurality of nodes, and the limit value of the k nodes after multi-step expansion is the limit of the parent node of the k node. The difference between values is greater than or equal to the difference between the limit values of nodes other than k nodes in the multiple nodes after multi-step expansion and the limit values of the parent nodes of other nodes other than k nodes, k is greater than An integer of 1, k is less than the number of multiple nodes.

Since after a node is pruned, the node will no longer be solved during the solution process of the goal planning problem, and the pruning operation may cause the node containing the optimal solution to be pruned. In the solution of the embodiment of the present application, the second target node can be determined probabilistically through the above greedy method, which is beneficial to reducing the risk of the pruning operation.

Combined with the first aspect, in some implementations of the first aspect, the second target node is determined based on the probabilities corresponding to k nodes, and the probabilities corresponding to k nodes are the same as the limit values of k nodes after multi-step expansion. The differences between the limit values of the parent nodes of k nodes are positively correlated.

Combined with the first aspect, in some implementations of the first aspect, the limit value of each node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of each node after multi-step expansion.

Combined with the first aspect, in some implementations of the first aspect, the node evaluation model is trained based on the sample node and the label corresponding to the sample node, and the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. .

In the embodiment of the present application, the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. The label corresponding to the sample node is easier to determine, making the collection of training data more convenient and conducive to improving the efficiency of generating training data. , to obtain a large amount of training data, improve the utilization efficiency of samples, and thereby improve the training of node evaluation models. The effect is to improve the prediction accuracy of the node evaluation model.

In connection with the first aspect, in some implementations of the first aspect, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.

Combined with the first aspect, in some implementations of the first aspect, the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the limit value of the parent node of the sample node and the limit of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing. The target evaluation model has the same structure as the node evaluation model.

Illustratively, the first difference may be determined by a solver. For example, the solver is called to obtain the limit value of the parent node of the sample node and the limit value of the sample node, so that the first difference can be determined.

Combined with the first aspect, in some implementations of the first aspect, the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.

It is more convenient to obtain the function value of the objective function corresponding to the relaxed solution of the node. In the embodiment of the present application, when the limit value of the sample node is determined based on the relaxed solution, the label corresponding to the sample node is easier to determine. For example, in the process based on reinforcement learning, the labels corresponding to sample nodes can be determined in real time, and the calculation of the relaxed solution is more convenient. It is more efficient to determine the labels corresponding to the sample nodes based on the relaxed solution, which is beneficial to improving training efficiency.

In conjunction with the first aspect, in some implementations of the first aspect, the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node, and the relevant information of each node includes at least one of the following: Items: the objective function of each node, the constraints of each node or the decision variables of each node. The low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model. of.

In the solution of the embodiment of this application, by performing dimensionality reduction processing on the relevant information of the node, it is beneficial to the reasoning of the downstream module, that is, it is beneficial to the reasoning of the node evaluation model.

Combined with the first aspect, in some implementations of the first aspect, the method further includes: returning a solution result of the goal planning problem to the user.

In the second aspect, a method for selecting nodes is provided, including: obtaining a set of candidate nodes for a target planning problem. The set of candidate nodes includes multiple nodes, and each node in the multiple nodes corresponds to a sub-set of the target planning problem to be solved. Problem; Use the node evaluation model to predict the correlation quantity of the limit value of each node after multi-step expansion. The output result of the node evaluation model is used to determine the target node. The target node is used to adjust the candidate node set. The adjusted candidate node set is used Used to solve goal planning problems.

Combined with the second aspect, in some implementations of the second aspect, the method further includes: determining a node evaluation model according to user instructions, and the node evaluation model is deployed on the cloud management platform.

Combined with the second aspect, in some implementations of the second aspect, the output result of the node evaluation model is used to determine the first target node, and the adjusted candidate node set includes child nodes of the first target node.

Combined with the second aspect, in some implementations of the second aspect, the output result of the node evaluation model is used to determine the second target node, and the second target node is not included in the adjusted candidate node set.

Combined with the second aspect, in some implementations of the second aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion and the limit value of each node's parent node. The difference between the limit values.

Combined with the second aspect, in some implementations of the second aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node's parent node. The difference between the limit values.

Combined with the second aspect, in some implementations of the second aspect, the limit value of the first target node after multi-step expansion The difference between the limit value and the parent node of the first target node is less than or equal to the limit value of the other nodes other than the first target node among the multiple nodes after multi-step expansion and the limit value of the other nodes other than the first target node. The difference between the node's bounding values.

Combined with the second aspect, in some implementations of the second aspect, the second target node belongs to k nodes among the plurality of nodes, and the limit value of the k nodes after multi-step expansion is the limit of the parent node of the k node. The difference between values is greater than or equal to the difference between the limit values of nodes other than k nodes in the multiple nodes after multi-step expansion and the limit values of the parent nodes of other nodes other than k nodes, k is greater than An integer of 1, k is less than the number of multiple nodes.

Combined with the second aspect, in some implementations of the second aspect, the second target node is determined based on the probabilities corresponding to k nodes, and the probabilities corresponding to k nodes are the same as the limit values of k nodes after multi-step expansion. The differences between the limit values of the parent nodes of k nodes are positively correlated.

Combined with the second aspect, in some implementations of the second aspect, the limit value of each node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of each node after multi-step expansion.

Combined with the second aspect, in some implementations of the second aspect, the node evaluation model is trained based on the sample node and the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. .

Combined with the second aspect, in some implementations of the second aspect, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.

Combined with the second aspect, in some implementations of the second aspect, the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the limit value of the parent node of the sample node and the limit of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing. The target evaluation model has the same structure as the node evaluation model.

Combined with the second aspect, in some implementations of the second aspect, the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node, and the relevant information of multiple nodes includes at least one of the following: Items: the objective function of each node, the constraints of each node or the decision variables of each node. The low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model. of.

In the third aspect, a training method for a node evaluation model is provided. The node evaluation model is used to predict the correlation quantity of the limit value of each node in the candidate node set of the target planning problem after multi-step expansion. Each node corresponds to the target. A sub-problem of the planning problem to be solved. The output result of the node evaluation model is used to determine the target node. The target node is used to adjust the candidate node set. The adjusted candidate node set is used to solve the target planning problem. The training method includes: Obtain the sample node and obtain the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion; train the initial model based on the sample node and the label corresponding to the sample node to obtain the node evaluation model.

In the embodiment of the present application, the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. The label corresponding to the sample node is easier to determine, making the collection of training data more convenient and conducive to improving the efficiency of generating training data. , to obtain a large amount of training data, improve the utilization efficiency of samples, thereby improving the training effect of the node evaluation model, that is, improving the prediction accuracy of the node evaluation model.

Combined with the third aspect, in some implementations of the third aspect, the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.

Combined with the third aspect, in some implementations of the third aspect, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.

Combined with the third aspect, in some implementations of the third aspect, the initial model is trained based on the sample node and the label corresponding to the sample node to obtain the node evaluation model, including: training the initial model through reinforcement learning, To obtain the node evaluation model, in which the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference The difference is obtained by inputting the child nodes of the sample node into the target evaluation model, which has the same structure as the node evaluation model.

Combined with the third aspect, in some implementations of the third aspect, the input of the initial model includes relevant information of the sample node or a low-dimensional representation of the relevant information of the sample node, and the relevant information of the sample node includes at least one of the following: sample The objective function of the node, the constraint condition of the sample node or the decision variable of the sample node, and the low-dimensional representation of the relevant information of the sample node are obtained by reducing the dimensionality of the relevant information of the sample node through the feature extraction model.

It should be understood that the expansion, limitation, explanation and description of relevant content in the above-mentioned first aspect also apply to the same content in the second and third aspects.

A fourth aspect provides a device for solving a goal planning problem, which device includes a unit for executing the method of the above-mentioned first aspect and any implementation of the first aspect.

A fifth aspect provides a device for selecting a node, which device includes a unit for executing the above second aspect and the method of any implementation of the second aspect.

A sixth aspect provides a training device for a node evaluation model, which device includes a unit for executing the above third aspect and the method of any implementation of the third aspect.

A seventh aspect provides a chip that obtains instructions and executes the instructions to implement the method in any one of the above-mentioned implementations of the first to third aspects.

Optionally, as an implementation manner, the chip includes a processor and a data interface. The processor reads instructions stored in the memory through the data interface and executes any one of the implementation methods of the first to third aspects. Methods.

Optionally, as an implementation manner, the chip may also include a memory, the memory stores instructions, the processor is used to execute the instructions stored in the memory, and when the instructions are executed, the processor is used to execute the first A method in any one implementation manner from the third aspect to the third aspect.

In an eighth aspect, a computing device cluster is provided, including at least one computing device, each computing device including a processor and a memory. The processor of at least one computing device is configured to execute instructions stored in the memory of at least one computing device, so that the computing device cluster executes the method in any one implementation of the first to third aspects.

In a ninth aspect, a computer-readable medium is provided, including computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the method in any implementation of the first to third aspects.

In a tenth aspect, a computer program product containing instructions is provided. When the instructions are run by a computing device cluster, the computing device cluster executes the method in any one of the above implementations of the first to third aspects.

Description of drawings

Figure 1 is a schematic block diagram of a device for solving planning problems based on the branch and bound method;

Figure 2 is a schematic flow chart of a node selection method according to an embodiment of the present application;

Figure 3 is a schematic flow chart of a method for training a node evaluation model according to an embodiment of the present application;

Figure 4 is a schematic flow chart of a method for solving a planning problem according to an embodiment of the present application;

Figure 5 is a schematic flowchart of a dimensionality reduction process according to an embodiment of the present application;

Figure 6 is a schematic diagram of a fully connected neural network model according to an embodiment of the present application;

Figure 7 is a schematic diagram of a pruning node selection process according to an embodiment of the present application;

Figure 8 is a schematic flow chart of another method for training a node evaluation model according to an embodiment of the present application;

Figure 9 is a schematic flow chart of yet another method for training a node evaluation model according to an embodiment of the present application;

Figure 10 is a schematic diagram of a node expansion process according to an embodiment of the present application;

Figure 11 is a schematic diagram of an interaction form between a user and an AI basic development platform according to an embodiment of the present application;

Figure 12 is a schematic diagram of an AI model deployment according to an embodiment of the present application;

Figure 13 is a schematic diagram of an AI model providing online services according to an embodiment of the present application;

Figure 14 is a schematic block diagram of a device for selecting nodes according to an embodiment of the present application;

Figure 15 is a schematic block diagram of a training device for a node evaluation model according to an embodiment of the present application;

Figure 16 is a schematic block diagram of a device for solving a goal planning problem according to an embodiment of the present application;

Figure 17 is an architectural schematic diagram of a computing device provided by an embodiment of the present application;

Figure 18 is a schematic architectural diagram of a computing device cluster provided by an embodiment of the present application;

Figure 19 is a schematic diagram of the connection between computing devices through a network provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

The methods in the embodiments of this application can be applied to various fields such as supply chain, finance, energy, transportation, communications, and power systems. Specifically, the solutions of the embodiments of the present application can be applied to solving scenarios involving combinatorial optimization problems involving integer variables. Illustratively, the solutions of the embodiments of the present application can be applied to solving scenarios such as production scheduling, production scheduling, factory location selection, risk control, asset allocation, oil pipeline laying, logistics transportation, route optimization, and power grid layout and distribution. .

In order to better explain the solutions of the embodiments of this application, the terms that may be involved in this application are first described below.

(1) Operation research and optimization

Operations optimization mainly studies the use and planning of various resources, under certain constraints, in order to maximize the benefits of limited resources, achieve the overall optimal goal, and provide decision-makers with the basis for scientific decision-making.

(2)mathematical programming

Mathematical programming is a branch of operational planning. The research goal is mainly to find the optimal solution that can maximize or minimize the function value of a certain function in a given area. According to the nature of the problem and the difference in processing methods, mathematical programming can be divided into many different branches, such as linear programming, integer programming, nonlinear programming, combinatorial optimization, multi-objective programming, stochastic programming, dynamic programming, and parametric programming.

(3) Linear programming (LP)

Linear programming can be divided into two parts: objective function and constraints. When these two parts of a linear scale model When both are linear, the model can be called a linear programming model. In other words, linear programming studies the extreme value problem of a linear objective function under linear constraints.

(4) Integer linear programming

Integer programming refers to a linear programming problem where integer variables exist among the decision variables. If all decision variables in an integer programming model are integer variables, the model can also be called a pure integer programming model.

In an integer programming problem, the corresponding programming problem when the constraint that the decision variable is an integer variable is not considered can be called the relaxation problem corresponding to the integer programming problem. In other words, by performing linear relaxation on integer variables, the integer programming problem can be converted into a relaxed linear programming problem, that is, the relaxation problem corresponding to the integer programming problem. The solution obtained by solving this linear programming problem is the relaxed solution of the integer programming problem.

(5) Mixed integer linear programming (mixed integer linear programming)

Mixed integer programming refers to a linear programming problem in which some of the decision variables are restricted to integers.

For example, a mixed integer programming model can be expressed as follows:
min f(x)＝d ₁ x ₁ +d ₂ x ₂ +d ₃ x ₃ ;
stA ₁₁ x ₁ +A ₁₃ x ₃ ≤b ₁ ;
A ₂₁ x ₁ +A ₂₂ x ₂ ≤ b ₂ ;
x _2∈Z ;

Among them, f(x) is the objective function, A ₁₁ x ₁ +A ₁₃ x ₃ ≤b ₁ , A ₂₁ x ₁ +A ₂₂ x ₂ ≤b ₂ , and x ₂ ∈Z are all constraints, x ₁ , x ₂ , x ₃ is the decision variable. d ₁ , d ₂ , d ₃ , A ₁₁ , A ₂₂ , A ₂₁ , A ₂₂ , b ₁ and b ₂ are parameters, and Z represents an integer.

In the above formula, some decision variables are integer variables. If x ₂ ∈Z in the above constraints is replaced by x∈Z, the model is a pure integer programming model.

(6) Mathematical programming solver

Mathematical programming solver is a software system that solves established linear programming, integer programming, mixed integer programming and various nonlinear programming models.

(7) Branch and bound algorithm

The branch-and-bound algorithm is a commonly used algorithm for solving planning problems. The implementation of most mathematical programming solvers relies on this algorithm framework. The branch-and-bound algorithm is a search and iteration method that selects different branch variables and sub-problems for branching. Repeatedly divide the entire feasible solution space into smaller and smaller subsets, that is, branches; calculate a target limit for the solution set in each subset, that is, delimitation. After each branch, no further branches will be made for the subset whose target limit exceeds the target value of the known feasible solution set, that is, pruning.

A question can also be called a node. The original problem to be solved can be regarded as the root node. The process of branching is the process of continuously generating sub-problems of the original problem, that is, the process of continuously adding nodes. Delimiting refers to checking the upper and lower bounds of the subproblem during the branching process. If a subproblem cannot produce a better solution than the current optimal solution, the subproblem can be pruned. This sub-problem can be called pruning nodes. The algorithm ends when all subproblems cannot produce a better solution.

In each iteration process, appropriate nodes need to be selected to implement the next iterative calculation. This node may be called an expansion node or a search node. For example, taking the minimum value problem as an example, the pseudo cost function can be used to predict the lower bound value of each node after single-step expansion, and select the node with the smallest lower bound value as the search node.

The following takes integer programming as an example to illustrate the specific processing process of the branch and bound algorithm. That is, the original element to be solved The problem is an integer programming problem.

Find a relaxed solution to the original problem. If the relaxed solution is an integer solution, then the relaxed solution is the optimal solution to the original problem. If the relaxed solution is not an integer solution, the function value of the objective function corresponding to the optimal solution of the original problem will not be better than the function value of the objective function corresponding to the relaxed solution. The function value of the objective function corresponding to the relaxed solution can be used as a limit of the original problem. For the minimum value problem, that is, the solution goal of the original problem is to minimize the function value of an objective function, and the function value of the objective function corresponding to the relaxed solution is a lower bound of the original problem. For the maximum problem, that is, the solution goal of the original problem is to maximize the function value of an objective function, and the function value of the objective function corresponding to the relaxed solution is an upper bound of the original problem.

By branching a decision variable, the original problem is also divided into two sub-problems. Branching a decision variable can also be understood as constructing two constraints, and adding the two constraints to the original problem constitutes two sub-problems. If the corresponding relaxed solution of a sub-problem is an integer solution, the relaxed solution is a feasible solution of the original problem, and the optimal solution of the original problem must not be worse than the feasible solution. Therefore, the function value of the objective function corresponding to the feasible solution can be used as a limit of the original problem, that is, the current optimal solution. For the minimum value problem, the function value of the objective function corresponding to the feasible solution is an upper bound of the original problem. For the maximum problem, the function value of the objective function corresponding to the feasible solution is a lower bound of the original problem.

Among the function values of the objective function corresponding to the relaxed solution of each sub-problem, if the function value exceeds the limit of the original problem, the sub-problem (i.e., pruning node) can be pruned, that is, the sub-problem will no longer branch. . Select a subproblem (i.e., search node or expansion node) to continue branching and adjust the bounds of the original problem based on the relaxed solution of the subproblem. Repeat the above process, and when the upper bound is equal to the lower bound, the optimal solution to the original problem can be obtained. When the gap between the upper and lower bounds is small, an approximately optimal solution can be obtained. Taking the minimum value problem as an example, among the function values of the objective function corresponding to the current relaxed solution of each sub-problem, the minimum function value is used as the current lower bound, among the function values of the objective function corresponding to each feasible solution of the current original problem , taking the smallest function value as the current upper bound. If the function value of the objective function corresponding to the relaxed solution of a sub-problem is greater than the current upper bound, the sub-problem will no longer branch. Although the possible feasible solution to this sub-problem has not yet been found at this time, if you continue to branch to this node, that is, add more constraints, the solution found will not be better than the relaxed solution of this node, so there is no need to Keep branching.

(8)Neural network

The neural network can be composed of neural units. The neural unit can refer to an arithmetic unit that takes x _s and intercept 1 as input. The output of the arithmetic unit can be:

Among them, s=1, 2,...n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to perform nonlinear transformation on the features obtained in the neural network and convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.

(9)Loss function

In the process of training the neural network, because we hope that the output of the neural network is as close as possible to the value we really want to predict, we can compare the predicted value of the current network with the really desired target value, and then based on the difference between the two The situation comes Update the weight vector of each layer of the neural network (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, then Adjust the weight vector so that it predicts a lower value, and keep adjusting until the neural network can predict the truly desired target value or a value that is very close to the truly desired target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value". This is the loss function (loss function) or objective function (objective function), which is used to measure the difference between the predicted value and the target value. Important equations. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the neural network becomes a process of reducing this loss as much as possible.

(10)Graph neural network (GNN)

GNN is a neural network structure that takes graph structure data as input, and is usually used for deep learning tasks where the input features are graph structures.

(11) Reinforcement learning (RL)

Reinforcement learning is mainly used to solve sequential decision-making problems. Reinforcement learning is a process that continuously learns optimal strategies, makes sequence decisions, and obtains maximum returns through the interaction between an agent and the environment.

Agent: Used to learn the next appropriate action (action) based on the state and reward of environmental feedback to maximize long-term total revenue.

Environment: used to receive the actions performed by the agent, evaluate the actions and convert them into rewards to feed back to the agent. The rewards include positive rewards and negative rewards.

In addition to the agent and environment, the reinforcement learning system also has several core elements: policy, reward function, and value function.

Strategy: It is a mapping from state to action. The strategy defines how the agent chooses the action to be performed in the next step.

Reward function: A function used to evaluate the actions performed by the agent and calculate the reward value of the actions performed by the agent.

Value function: A function used to predict the long-term reward value of a state or action. In some cases, the value of the value function can be expressed as the weighted accumulation of the reward values of multiple reward functions in multiple future states starting from one state.

Action space: is the set of all possible actions.

State space: is the set of all possible states.

Given the state of an environment, the agent chooses an action to perform based on a certain strategy. After executing this action, the environment will change, the state of the environment will be converted to a new state, and the environment can evaluate the action and feedback the reward value corresponding to the action to the agent. The agent can adjust the strategy based on the reward value and repeatedly execute the above process so that the sum of reward values after all actions are executed is maximized.

(12) Deep Q learning (DQL)

DQL is a typical reinforcement learning algorithm suitable for discrete action sequence decision-making problems. DQL can help select optimal actions by estimating the long-term cumulative return (Q function) of each action. The Q function, Q(S,A), refers to the sum of reward values that will be obtained in the future after taking action A in state S, that is, the long-term cumulative return of action A. The Q value corresponding to the action can provide a reference for the strategy.

In DQL, the Q value can be calculated by a deep neural network (DNN). in smart In the body, the current state of the environment is input into the DNN, and the DNN predicts the Q value obtained by executing each action in this state.

Figure 1 shows a schematic diagram of a device for solving planning problems based on the branch and bound method.

As shown in Figure 1, the solving device can include a presolving module, a node selection module, a node presolving module, a linear programming relaxation (LP relaxation) module, and a heuristics module. Branching module and cutting plane module.

The preprocessing module is used to preprocess the original problem to simplify the original problem and reduce the scale of the original problem. Illustratively, preprocessing may include removing redundant constraints and decision variables.

The node selection module is used to select search nodes. The node selection module can determine the search node from the current node to be solved, so that the branch module can subsequently branch based on the search node.

Alternatively, the node selection module can determine the pruned node from the current nodes and no longer consider the node in subsequent solution processes.

The node preprocessing module is used to simplify the constraints on the variables in the search nodes determined by the node selection module.

The linear programming relaxation module is used to construct the relaxation model and solve the relaxation solution.

The heuristic module is used to search for higher quality solutions to the search node using a heuristic algorithm starting from the relaxed solution.

The branch module is used to branch the search node, that is, add constraints, obtain the child nodes of the search node, and return them to the node selection module for the node selection module to perform the next round of node selection.

The Cutting Plane module is used to add multivariable constraints based on the cutting plane method to remove relaxed solutions that do not satisfy the multivariable constraints.

Specifically, the cutting plane module can generate a series of linear constraints based on the relaxed solution, and select a part of the linear constraints to add to the original problem to reduce the feasible solution domain.

It should be understood that the solving device shown in Figure 1 is only an example, and in actual applications, the solving device may include more or fewer modules. For example, the cutting plane module may not be included in the solving device. For another example, the solving device may not include a heuristic module.

In many application scenarios, planning problems have integer constraints, such as factory location or production scheduling. Such problems can be modeled as mixed integer programming problems and solved by integer programming solvers. Integer programming solvers are usually implemented based on the branch-and-bound framework. Specifically, during the iterative calculation process, the solution space of the original problem is repeatedly divided into smaller and smaller subsets, that is, sub-problems (also called nodes) of the original problem are repeatedly generated, and the original problem is obtained by continuously solving the sub-problems. the optimal solution. For complex problems, for example, problems with large decision variables, a large number of nodes will be generated during the solution process, and the solution will take a long time, making it difficult to meet the user's needs. Selecting appropriate nodes for corresponding processing is the key to improving the speed of solving. For example, by selecting appropriate nodes for pruning, the number of nodes to be solved can be reduced, which is beneficial to improving the solving speed. For another example, by selecting appropriate nodes for branch processing, it is helpful to find the optimal solution as soon as possible, which is beneficial to improving the solution speed.

The embodiments of this application provide a method for selecting nodes, which can be used in the solution scenario of planning problems and is beneficial to improving the solution efficiency.

Specifically, the node selection method in the embodiment of the present application can be applied to scenarios where the branch and bound method is used to solve planning problems.

The method of selecting nodes in the embodiment of the present application will be described below with reference to Figure 2.

Figure 2 shows a schematic flowchart of a node selection method provided by an embodiment of the present application. The method 200 shown in FIG. 2 may be performed by a device that selects a node. For example, the device and solver for selecting nodes may be Two devices are deployed separately, or the device for selecting nodes and the solver can also be integrated in the same device (for example, a solving device). The embodiments of the present application do not limit this. The solver is implemented based on the branch and bound algorithm framework.

Illustratively, the node selection method in the embodiment of the present application can be applied to the node selection module shown in Figure 1. In other words, the device for selecting nodes in this embodiment of the present application may be the node selection module shown in Figure 1 . The node selection module shown in Figure 1 can use the node selection method in the embodiment of the present application to determine appropriate nodes.

As shown in FIG. 2 , the method 200 includes steps 210 to 220 . Steps 210 to 220 are described below. Solving the planning problem based on the branch-and-bound method is an iterative solution process, and steps 210 to 220 can be performed as steps in one of the iterative processes.

210. Obtain the candidate node set for the goal planning problem. The candidate node set includes multiple nodes.

Each node in the plurality of nodes respectively corresponds to a sub-problem to be solved of the goal programming problem.

In the embodiment of this application, sub-problems correspond to nodes, and nodes can be understood as sub-problems, or nodes can also be called branches or branch nodes, which will not be distinguished later.

The goal programming problem is the mathematical programming problem to be solved.

The goal programming problem can be represented by the objective function, constraints and decision variables of the goal programming problem. Constraints are used to constrain decision variables. At least part of the decision variables of the goal planning problem are integer variables, that is, at least part of the decision variables have integer values. In other words, the goal programming problem is a pure integer programming model or a mixed integer programming model.

Illustratively, the goal programming problem may be a maximum optimization problem. In other words, the optimal solution to a goal programming problem is the solution that maximizes the function value of the objective function of the goal programming problem.

Alternatively, the goal programming problem can be a minimum optimization problem. In other words, the optimal solution to a goal programming problem is the solution that minimizes the function value of the objective function of the goal programming problem.

Maximum optimization problems and minimum optimization problems can be converted into each other. In order to facilitate understanding and description, the embodiments of the present application only take the minimum value optimization problem as an example for explanation, which does not limit the scope of the embodiments of the present application.

Taking the objective programming problem used to solve logistics scheduling problems as an example, the objective function can be to minimize logistics scheduling costs, and the constraints can be that the distribution point needs to complete delivery within a specified period of time. The decision variables can include couriers, time and location, etc. The subproblems of the goal programming problem are generated based on the branch and bound method.

The multiple sub-problems to be solved may be generated in one iteration process during the process of solving the goal programming problem, or may be generated in multiple iteration processes.

The multiple sub-problems to be solved can also be called multiple live nodes. That is, the nodes in the candidate node set are all live nodes. Live nodes refer to nodes that have not yet been pruned.

In a possible implementation, the device for selecting nodes and the solver may be deployed separately. In this case, the target planning problem can be obtained by the solver, the sub-problems of the target planning problem can be generated based on the branch and bound method, and sent to the device for selecting nodes.

In another possible implementation, the device for selecting nodes and the solver may be integrated in the solving device. In this case, the target planning problem can be obtained by the solving device, and the sub-problems of the target planning problem can be generated based on the branch and bound method.

Illustratively, a goal planning problem may provide user-supplied data.

For example, the set of candidate nodes for the goal planning problem may be data provided by the user. In this case, select The node device may receive a set of candidate nodes for a goal planning problem provided by the user.

It should be understood that the above are only examples, and the candidate node set for the target planning problem can also be obtained in other ways, which is not limited in the embodiments of the present application.

220. Use the node evaluation model to predict the correlation amount of the limit value of some or all nodes in the candidate node set after multi-step expansion.

For example, the node evaluation model may be determined according to user instructions.

For example, the node evaluation model can be deployed on a cloud management platform.

The output result of the node evaluation model is used to determine the target node, the target node is used to adjust the candidate node set, and the adjusted candidate node set is used to solve the target planning problem.

Optionally, step 220 may include: predicting, through a node evaluation model, the correlation amount of the limit value of each node in the candidate node set after multi-step expansion.

In other words, the node evaluation model is used to predict the correlation amount of the limit value of all nodes in the candidate node set after multi-step expansion.

For the convenience of description, the embodiments of this application mainly take all nodes as an example, that is, processing each node through the node evaluation model as an example, which does not limit the solutions of the embodiments of this application.

The input to the node evaluation model may include node-related information.

For example, the input of the node evaluation model may include relevant information of each node.

The relevant information of the node includes at least one of the following: the objective function of the node, the constraint condition of the node or the decision variable of the node.

For example, the relevant information of the node may include the objective function of the node, the constraint conditions of the node and the decision variables of the node. Input the node's objective function, node's constraints and node's decision variables into the node evaluation model, and the relevant quantities of the node's limit value after multi-step expansion can be output.

In the embodiment of the present application, the output result of the node evaluation model can be used as the evaluation information of the node. The evaluation information of a node is related to the node's limit value after multi-step expansion. For example, the evaluation information of the node may be used to indicate the node evaluation model's prediction of the correlation quantity of the node's limit value after multi-step expansion.

For example, step 220 may include: determining evaluation information of the plurality of nodes through a node evaluation model. The evaluation information of the multiple nodes is used to determine the target node from the candidate node set, and the target node is used to adjust the candidate node set.

Expand a node to get its child nodes. Expanding a node is to branch a sub-problem to obtain a new sub-problem. The limit value of a node after multi-step expansion is the limit value of the sub-problem after multi-step branching.

The limit value of a node after multi-step expansion can be understood as the limit value of the child node obtained after multi-step expansion of the node.

It should be understood that the node evaluation model is used to predict the relevant quantity of the boundary value of the node after multi-step expansion. During the processing of the node evaluation model, a multi-step expansion operation is not performed on the nodes.

In the embodiment of the present application, the evaluation information of the node can be used to indicate the prediction of the relevant quantity of the node's limit value after multi-step expansion, and can be used to measure the long-term value of the node expansion.

It should be noted that the evaluation information of the multiple nodes is related to the limit values of the multiple nodes after multi-step expansion. The number of expansion steps of different nodes may be the same or different.

For example, the evaluation information of the multiple nodes is related to the limit values of the multiple nodes after being completely solved.

The number of steps required for different nodes to be expanded to complete solution may be the same or different.

Among them, if a node is completely solved, it means that all the descendant nodes of the node are solved.

Optionally, the limit values of the multiple nodes after the multi-step expansion include the function values of the objective functions corresponding to the relaxed solutions of the multiple nodes after the multi-step expansion.

Taking the goal programming problem as a minimum optimization problem as an example, the function value of the objective function corresponding to the relaxation solution of a node can be the lower bound of the node. In other words, the limit values of the multiple nodes after multi-step expansion may be the lower bound values of the multiple nodes after multi-step expansion. That is, the node evaluation model can be used to predict quantities related to the lower bound of a node after multi-step expansion.

In a possible implementation manner, the correlation quantity of the limit value of the node after multi-step expansion includes the limit value of the node after multi-step expansion.

In this case, the evaluation information of the multiple nodes may be used to indicate the prediction of the limit values of the multiple nodes after multi-step expansion.

In other words, the node evaluation model can output the limit values of the multiple nodes after multi-step expansion, that is, the node evaluation model predicts the limit values of the multiple nodes after multi-step expansion. In the embodiment of the present application, the outputs of the node evaluation model are all predicted values. For the convenience of description, unless otherwise emphasized, no distinction will be made in the following text.

Taking the goal programming model as a minimum value optimization problem as an example, the limit value can be a lower bound value. In this case, the smaller the lower bound value of a node after multi-step expansion, the higher the possibility of searching for the global optimal solution starting from this node, and the higher the long-term value of this node.

In another possible implementation, the correlation quantity of the limit value of the node after multi-step expansion includes the difference between the limit value of the node after multi-step expansion and the limit value of the node's parent node.

In this case, the evaluation information of the multiple nodes may be used to indicate the prediction of the difference between the limit values of the multiple nodes after multi-step expansion and the limit values of the parent nodes of the multiple nodes.

In other words, the node evaluation model can output the difference between the limit value of the node after multi-step expansion and the limit value of the parent node of the multiple nodes, that is, the node evaluation model can output the difference between the limit value of the multiple nodes after multi-step expansion. A prediction of the difference between the bounding value and the bounding value of the parent node of this multiple nodes.

The difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node is the change in the limit value of the node before and after multi-step expansion.

For example, the difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node may be the difference between the limit value of the node after multi-step expansion and the limit value of the node's parent node. difference.

Alternatively, the difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node can be obtained by dividing the limit value of the node after multi-step expansion by the limit value of the node's parent node. result.

It should be understood that the above are only examples, and the changes in the limit value of a node before and after multi-step expansion can also be determined in other ways, which is not limited in the embodiments of the present application.

Further, the correlation quantity of the node's limit value after multi-step expansion includes the difference between the node's limit value after being completely solved and the limit value of the node's parent node.

In this case, the evaluation information of the multiple nodes may be used to indicate the difference between the limit values of the multiple nodes after they are completely solved and the limit values of the parent nodes of the multiple nodes.

In other words, the evaluation information of a node can be used to indicate the change of the limit value in the process of the node being expanded to being completely solved. The change of the limit value in the process from the node being expanded to being completely solved can be represented by the function value of the multi-step pseudo-cost function of the node.

For example, the difference between the limit value of a node after it is completely solved and the limit value of the node's parent node can be It is determined based on the change of the limit value before and after single-step expansion of the node and the change of the limit value of the node's child nodes from expansion to complete solution. In other words, the function value of the node's multi-step pseudo-cost function can be determined based on the changes in the limit value before and after the node's single-step expansion and the function values of the multi-step pseudo-cost function of the node's child nodes.

For example, the function value of a node's multi-step pseudo-cost function can be the difference between the bound value of the node after it is completely solved and the bound value of the node's parent node. The function value of the node's multi-step pseudo-cost function can be the sum of the difference between the limit values before and after the node's single-step expansion and the function value of the multi-step pseudo-cost function of the node's child nodes. The node evaluation model is the function value of the multi-step pseudo-cost function used to predict the node.

The change of a node before and after single-step expansion is the difference between the limit value of the node's parent node and the limit value of the node, that is, the difference before and after single-step expansion of the node. The change of the limit value of the node's child nodes from expansion to complete solution, that is, the function value of the multi-step pseudo-cost function of the node's child node is the limit value of the node's child node after being fully solved and the node The difference between the limit values. If the node includes multiple child nodes, the function value of the node's multi-step pseudo-cost function may be determined based on the minimum value of the changes before and after the single-step expansion of the node and the function values of the multi-step pseudo-cost function of the multiple child nodes. Alternatively, the function value of the multi-step pseudo-cost function of the node may be determined based on the change of the node before and after single-step expansion and the maximum value of the function values of the multi-step pseudo-cost function of the multiple child nodes. Alternatively, the function value of the multi-step pseudo-cost function of the node may be determined based on the changes before and after the single-step expansion of the node and the average value of the function values of the multi-step pseudo-cost function of the multiple child nodes. It should be understood that the above is only an example. When the node includes multiple child nodes, the function value of the multi-step pseudo-cost function of the node can also be determined in other ways, which is not limited in the embodiment of the present application.

The change in the limit value of a node from expansion to complete solution can be represented by the function value of the node's multi-step pseudo-cost function. A multi-step pseudo-cost function can be used to measure the long-term value of node expansion. The node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function.

For example, the multi-step pseudo-cost function of a node can satisfy the following formula:

Among them, C(·) represents the multi-step pseudo-cost function of the node. c(·) represents the change in the limit value of the node before and after single-step expansion, that is, the first difference. Node _Ni is a child node of node P. The second term in the above formula is the minimum value of the function value of the multi-step pseudo-cost function in the child nodes of node P, which is the second difference. The multi-step pseudo-cost function of node P can be understood as the change in the limit value from the expansion of node P to the complete solution of node P. It should be understood that the above formula is only an example and does not limit the multi-step pseudo cost function of the embodiment of the present application. For example, the second term (ie, the second difference) in the above formula is the minimum difference between the limit value of the child node of node P after being completely solved and the limit value of node P. In other expressions, the second term in the above formula can also be the maximum difference between the limit value of the child node of node P after being completely solved and the limit value of node P, that is

The smaller the change in the limit value of a node before and after multi-step expansion, the higher the possibility of searching for the optimal solution starting from the node, and the higher the long-term value of the node.

Taking the goal programming model as a minimum value optimization problem as an example, the limit value can be a lower bound value. In this case, the smaller the change in the lower bound value of a node before and after multi-step expansion, the higher the possibility of searching for the global optimal solution starting from this node, and the higher the long-term value of this node.

Optionally, the node evaluation model can be a neural network model, a random forest model, a support vector machine model or a linear regression model, etc.

For example, the node evaluation model may be a fully connected neural network model.

It should be understood that the above are only examples, and the node evaluation model can also adopt models with other structures, which are not limited in the embodiments of the present application.

Optionally, the node evaluation model may be trained based on training data. The training data includes relevant information of the sample node and the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. The relevant information of the sample node includes at least one of the following: the objective function of the sample node. , the constraints of the sample node or the decision variables of the sample node.

Optionally, the node evaluation model can be trained through reinforcement learning.

For example, the node evaluation model may be trained through deep Q learning.

Optionally, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.

Further, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after being completely solved and the limit value of the parent node of the sample node.

Optionally, the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing. The target evaluation model has the same structure as the node evaluation model. The target evaluation model is used to predict the difference between the bounding value of the child node of the sample node after being completely solved and the bounding value of the sample node.

The first difference can be determined by the solver. For example, the solver is called to obtain the limit value of the parent node of the sample node and the limit value of the sample node, so that the first difference can be determined.

The node evaluation model can be trained by a device that selects nodes, or it can also be trained by other devices. The embodiments of the present application do not limit this.

The specific training process can be referred to the description later, and will not be described here.

Optionally, the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.

Taking the goal programming problem as a minimum optimization problem as an example, the function value of the objective function corresponding to the relaxation solution of a node can be the lower bound of the node. In other words, the limit value of the sample node after multi-step expansion may be the lower bound value of the sample node after multi-step expansion.

It is more convenient to obtain the function value of the objective function corresponding to the relaxed solution of the node. In the embodiment of this application, the node evaluation model can be trained based on the training data. When the limit value of the node is determined based on the relaxed solution, the label corresponding to the sample node is easier to determine, making the training data easier to collect. It is beneficial to generate a large amount of training data, thereby improving the training effect of the node evaluation model, that is, improving the prediction accuracy of the node evaluation model.

It should be understood that in the minimum value optimization problem, the embodiment of the present application mainly takes the limit value as the lower bound value as an example for explanation. In practical applications, the limit value can also be the upper limit value, which is not limited in the embodiment of the present application. .

The adjusted candidate node set in step 220 can be used as the candidate node set in the next round of iteration process.

For example, in the next iteration process, the candidate node set in step 210 can be replaced with the adjusted candidate node set, and the method 200 is repeatedly executed. Method 200 may be executed repeatedly until the solution is completed.

The target node may include at least one of a first target node and a second target node.

Optionally, the output result of the node evaluation model is used to determine the first target node. The adjusted candidate node set includes child nodes of the first target node.

As mentioned above, the output result of the node evaluation model can be used to determine the target node, and the target node can include the first target node. The child nodes of the first target node are used to adjust the candidate node set, and the adjusted candidate node set includes the child nodes of the first target node.

The first target node may also be called a search node or an expansion node.

In a possible implementation, the device for selecting nodes and the solver may be deployed separately. In this case, the solver can expand the first target node to obtain the child nodes of the first target node, and add the child nodes of the first target node to the candidate node set for the next round of iterative calculations .

In another possible implementation, the device for selecting nodes and the solver may be integrated in the solving device. In this case, the first target node can be expanded by the solving device to obtain the child nodes of the first target node, and the child nodes of the first target node can be added to the candidate node set for the next round of iteration calculate.

Optionally, the output result of the node evaluation model is used to determine the second target node. The second target node is not included in the adjusted candidate node set.

As mentioned above, the output result of the node evaluation model can be used to determine the target node, and the target node can include a second target node. The second target node is used to adjust the candidate node set, and the adjusted candidate node set does not include the second target node.

The second target node may also be called a pruning node. Pruned nodes will not be solved during the subsequent solution of the goal programming problem.

In a possible implementation, the device for selecting nodes and the solver may be deployed separately. In this case, the solver can prune the second target node. For example, the second target node is deleted from the set of candidate nodes. The adjusted set of candidate nodes is used for the next round of iterative calculations.

In another possible implementation, the device for selecting nodes and the solver may be integrated in the solving device. In this case, the solving device may perform pruning processing on the second target node. For example, the second target node is deleted from the set of candidate nodes. The adjusted set of candidate nodes is used for the next round of iterative calculations.

The method for determining the first target node and the second target node is illustratively described below by taking the node evaluation model used to predict changes in the limit values of multiple nodes before and after multi-step expansion as an example. In the process of determining the target node, the difference between the limit value of the node involved after multi-step expansion and the limit value of the parent node is predicted by the node evaluation model.

Optionally, the first target node is the node with the smallest difference between the limit value after multi-step expansion and the limit value of the parent node among the multiple nodes.

In the embodiment of this application, the smallest difference can be understood as the smallest change before and after node expansion.

The difference between the limit value of the first target node after multiple expansions and the limit value of the parent node of the first target node is less than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion and the difference between the limit values of the remaining nodes after multi-step expansion. Difference between parent nodes.

For example, among the multiple nodes, the difference between the limit value of node #1 after multi-step expansion and the limit value of node #1's parent node is the smallest, then node #1 can be used as the first target node.

Alternatively, the first target node belongs to the j nodes with the smallest difference between the limit values of the multiple nodes after multi-step expansion and the limit values of the parent nodes of the multiple nodes, where j is an integer greater than 1. j is less than the number of nodes.

In other words, the difference between the limit values of the j nodes after multi-step expansion and the limit values of the parent nodes of the j nodes is less than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion. between the node’s parent nodes difference.

The first target node can be determined from the j nodes.

For example, the first target node may be randomly determined from the j nodes.

For another example, the first target node may be determined based on the probabilities corresponding to the j nodes. The probability corresponding to the j nodes is the probability of being determined as the first target node. The probability corresponding to the j nodes is negatively correlated with the difference between the limit value of the j node after multi-step expansion and the limit value of the parent node of the j node. That is, among the j nodes, the more obvious the change of the node before and after multi-step expansion, the smaller the probability that the node is determined to be the first target node.

For example, the second target node is the node with the largest difference between the limit value after multi-step expansion and the limit value of the parent node among the multiple nodes.

In other words, the difference between the limit value of the second target node after multi-step expansion and the limit value of the parent node of the second target node is greater than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion and the remaining limit value. The difference between the node's parents.

Optionally, the second target node belongs to the k nodes with the largest difference between the limit values of multiple nodes after multi-step expansion and the limit values of parent nodes of the multiple nodes, where k is an integer greater than 1. k is less than the number of nodes.

In other words, the difference between the limit values of the k nodes after multi-step expansion and the limit values of the parent nodes of the k nodes is greater than or equal to the limit value of the remaining nodes after multi-step expansion and the remaining The difference between a node's parents.

The second target node can be determined from the k nodes.

For example, the second target node may be randomly determined from the k nodes.

Optionally, the second target node is determined based on the probability corresponding to the k nodes. The probability corresponding to the k nodes is between the limit value of the k node after multi-step expansion and the limit value of the parent node of the k node. The differences are positively correlated.

In other words, among the k nodes, the less obvious the change of the node before and after multi-step expansion, the smaller the probability that the node is determined to be the second target node.

Since after a node is pruned, the node will no longer be solved during the solution process of the goal planning problem, and the pruning operation may cause the node containing the optimal solution to be pruned. Probabilistically determining the second target node through the above greedy method is beneficial to reducing the risk of the pruning operation.

The method for determining the first target node and the second target node is illustrated below by taking the node evaluation model to predict the limit values of multiple nodes after multi-step expansion as an example. The limit values of the nodes involved in the process of determining the target node after multi-step expansion are all predicted by the node evaluation model. For the convenience of description, the following takes the minimum value optimization problem as an example for illustrative explanation.

Optionally, the first target node is the node with the smallest limit value after multi-step expansion among the multiple nodes.

The limit value of the first target node after multiple expansions is less than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion.

For example, among the multiple nodes, node #1 has the smallest limit value after multi-step expansion, then node #1 can be used as the first target node.

Alternatively, the first target node belongs to j nodes with the smallest limit value after multi-step expansion of multiple nodes, where j is an integer greater than 1. j is less than the number of nodes.

In other words, the limit values of the j nodes after multi-step expansion are less than or equal to the limit values of the remaining nodes among the plurality of nodes after multi-step expansion.

The first target node can be determined from the j nodes.

For example, the first target node may be randomly determined from the j nodes.

For another example, the first target node may be determined based on the probabilities corresponding to the j nodes. The probability corresponding to the j nodes is the probability of being determined as the first target node. The probabilities corresponding to the j nodes are negatively correlated with the limit values of the j nodes after multi-step expansion. That is, among the j nodes, the smaller the limit value of the node after multi-step expansion, the greater the probability that the node is determined to be the first target node.

For example, the second target node is the node with the largest limit value after multi-step expansion among the multiple nodes.

In other words, the limit value of the second target node after multi-step expansion is greater than or equal to the limit value of the remaining nodes among the plurality of nodes after multi-step expansion.

Optionally, the second target node belongs to the k nodes with the largest limit values of multiple nodes after multi-step expansion, and k is an integer greater than 1. k is less than the number of nodes.

In other words, the limit values of the k nodes after multi-step expansion are greater than or equal to the limit values of the remaining nodes among the plurality of nodes after multi-step expansion.

The second target node can be determined from the k nodes.

Optionally, the second target node is determined based on the probabilities corresponding to the k nodes. The probabilities corresponding to the k nodes are positively correlated with the limit values of the k nodes after multi-step expansion.

In other words, among the k nodes, the greater the limit value of the node after multi-step expansion, the greater the probability that the node is determined to be the second target node.

It should be understood that the above are only examples, and the first target node and the second target node can also be determined in other ways, which are not limited in this embodiment of the present application.

Optionally, the method 200 may also include: sending indication information of the target node to the solver.

In the case where the device for selecting a node and the solver are deployed separately, the device for selecting a node may send the indication information of the target node to the solver. The solver can solve objective programming problems based on target nodes.

For example, the indication information of the target node may include the target node itself.

For example, the means for selecting nodes may determine the target node based on the output result of the node evaluation model and send the target node to the solver.

For example, the indication information of the target node may include evaluation information of some or all nodes in the plurality of nodes.

For example, the device for selecting nodes may send evaluation information of some or all nodes to the solver. The solver can determine the target node based on the evaluation information of some or all nodes.

For example, the indication information of the target node may include the search order of the multiple nodes.

For example, the node ranked first can be the search node in the next iteration.

Alternatively, the indication information of the target node may also include other information related to the output results of the node evaluation model, as long as the solver can determine the evaluation information of the node based on this information, and then determine the target node.

In the embodiment of the present application, the node evaluation model can predict the correlation quantity of the node's limit value before and after multi-step expansion, which is beneficial to predicting the optimal solution that can be searched from the multiple nodes. The correlation quantity can be used to measure the node extended long The period value makes the selection of target nodes more accurate, which is conducive to selecting appropriate nodes for corresponding processing, making the nodes in the adjusted candidate node set more likely to obtain the optimal solution, which is conducive to improving the solution efficiency.

In addition, taking the goal programming problem as a minimum value optimization problem as an example, if the limit value is a lower bound value, during the solution process, as the number of iterations increases, the lower bound values of the multiple nodes after multi-step expansion may be very small. Limited by computer calculation accuracy and other factors, it is difficult to compare the lower bound values of multiple nodes after multi-step expansion. The differences between these multiple nodes before and after multi-step expansion are more obvious. In the embodiment of the present application, the target node can be determined by predicting the differences between the multiple nodes before and after the multi-step expansion, and then comparing the differences between the multiple nodes before and after the multi-step expansion, which is beneficial to improving the accuracy of target node selection.

Optionally, the input of the node evaluation model may include a low-dimensional representation of the relevant information of multiple nodes. The low-dimensional representation of the relevant information of multiple nodes is obtained by performing dimensionality reduction processing on the relevant information of multiple nodes through the feature extraction model. .

In the embodiment of the present application, the relevant information of the node can be input into the feature extraction model for dimensionality reduction processing, that is, feature extraction, and the processing results can be input into the node evaluation model. The low-dimensional representation of the relevant information of a node is the result of dimensionality reduction processing of the relevant information of the node.

The low-dimensional representation of the relevant information of the node includes at least one of the following: a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraint conditions, or a low-dimensional representation of the node's decision variable.

For example, the objective function of a node is dimensionally reduced according to the feature extraction model to obtain a low-dimensional representation of the objective function. For another example, the constraint conditions corresponding to the nodes are dimensionally reduced according to the feature extraction model to obtain a low-dimensional representation of the constraint conditions. For another example, the decision variables of nodes are dimensionally reduced according to the feature extraction model to obtain a low-dimensional representation of the decision variables.

For example, by inputting the low-dimensional representation of the node's objective function, the low-dimensional representation of the node's constraints, and the low-dimensional representation of the node's decision variables into the node evaluation model, the limit value of the node after multi-step expansion can be obtained. related quantities.

By reducing the dimensionality of the relevant information of the node, it is beneficial to the reasoning of the downstream module, that is, it is beneficial to the node evaluation. Model inference.

In a possible implementation, the device for selecting nodes and the solver may be deployed separately. In this case, the feature extraction model can be deployed in the solver, and the solver determines a low-dimensional representation of the relevant information of the node according to the feature extraction model, and sends it to the device for selecting the node. Alternatively, the feature extraction model may be deployed in a device for selecting nodes, and the device for selecting nodes determines a low-dimensional representation of the relevant information of the node according to the feature extraction model. Alternatively, the feature extraction model can also be deployed in other devices, which is not limited in the embodiments of the present application.

In another possible implementation, the device for selecting nodes and the solver may be integrated in the solving device. In this case, the low-dimensional representation of the relevant information of the node may be determined by the solving device according to the feature extraction model. Alternatively, other devices may determine a low-dimensional representation of the relevant information of the node based on the feature extraction model and send it to the solving device.

Optionally, the feature extraction model can be a graph convolutional neural network model.

It should be understood that the above are only examples, and the feature extraction model can also use models with other structures, as long as the dimensionality reduction process can be achieved, and the embodiments of the present application do not limit this.

In the embodiments of the present application, a low-dimensional representation of the relevant information of nodes can be obtained through the graph convolutional neural network, which can process the relevant information of nodes of different sizes, or in other words, can process the mathematics at nodes of different sizes. Planning model,Moreover, graph convolutional neural networks are,insensitive to the order of inputs.

Optionally, the feature extraction model can be trained. For the specific training process, please refer to the description below.

The feature extraction model can be trained by the device where the feature extraction model is located. Alternatively, it can also be trained by other devices. The embodiments of the present application do not limit this.

For example, the feature extraction model is deployed in a device that selects nodes. The feature extraction model can be trained by the device that selects nodes, or can be trained by other devices.

Optionally, the method 200 may also include: returning at least one of the following to the user: the solution result of the target planning model or the indication information of the target node.

The embodiment of the present application provides a training method for a node evaluation model, which can be used to train a node evaluation model. The trained node evaluation model can be applied in the method 200 shown in Figure 2.

The training method of the node evaluation model in the embodiment of the present application will be described below with reference to Figure 3.

Figure 3 shows a schematic flow chart of a node evaluation model training method provided by an embodiment of the present application. The method 300 shown in FIG. 3 may be executed by a training device of a node evaluation model. After the training device completes training, the obtained node evaluation model can be deployed in a device that selects nodes. The training device for the node evaluation model and the device for selecting nodes may be the same device, or they may be different devices.

The node evaluation model is used to predict the correlation quantity of the bound value of each node in the candidate node set of the goal planning problem after multi-step expansion. Each node corresponds to a sub-problem to be solved in the goal programming problem. The output of the node evaluation model is used to adjust the set of candidate nodes. The adjusted candidate node set is used to solve the goal planning problem.

As shown in FIG. 3 , the method 300 includes steps 310 to 330 . Steps 310 to 330 are described below.

310. Obtain sample nodes.

320. Obtain the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.

330. Perform training based on the sample node and the label corresponding to the sample node to obtain a node evaluation model.

In other words, the sample nodes and the labels corresponding to the sample nodes are used as training data for the node evaluation model. The initial model of the node evaluation model is trained based on the training data, and the trained node evaluation model can be used as the node evaluation model used in method 200. The initial model of the node evaluation model may also be called the initial node evaluation model.

Specifically, the parameters of the initial node evaluation model are adjusted with the goal of reducing the difference between the output of the initial node evaluation model and the labels corresponding to the sample nodes to obtain a trained node evaluation model.

For example, the limit value of the sample node after multi-step expansion may be the limit value of the sample node after it is completely solved.

The number of steps required for different sample nodes to be expanded to complete solution may be the same or different.

For example, the label corresponding to the sample node is used to indicate the limit value of the sample node after multi-step expansion.

For example, the label corresponding to the sample node can be used to indicate the limit value of the sample node after it is completely solved.

For example, the label corresponding to the sample node may be used to indicate the difference between the limit value of the sample node after it is completely solved and the limit value of the parent node of the sample node.

For example, the sample node and the label corresponding to the sample node may be data in the training database. The sample nodes and the labels corresponding to the sample nodes can be pre-generated according to the solver and stored in the training database.

In the process of solving the planning problem, the solver can generate multiple nodes and simultaneously solve the boundary values of the multiple nodes. These multiple nodes can be used as sample nodes. Based on the limit value of each node during the solution process, the quantity related to the limit value of the node after multiple expansions can be determined, that is, the label corresponding to the sample node can be obtained. Training data can be determined based on the solution of one or more planning problems. The one or more planning questions may be provided by the user or may be pre-stored.

As an example, the solver can receive batch data provided by the user (e.g., multiple planning problems) and solve based on the batch data provided by the user, sampling sample nodes from the multiple nodes generated during the solving process, and based on the solution The limit value of each node solved in the process determines the label corresponding to the sample node, and the relevant information of the sample node and the label corresponding to the sample node are stored in the training database. Alternatively, the solver can receive user-supplied batch data, e.g., multiple planning problems, and perform a solution based on the supplied batch data and historical data (e.g., multiple pre-stored planning problems), generated from the solution process. Sample nodes are sampled from multiple nodes, the labels corresponding to the sample nodes are determined based on the limit values of each node solved during the solution process, and the relevant information of the sample nodes and the labels corresponding to the sample nodes are stored in the training database.

The training device can obtain training data from the training database, use the relevant information of the sample nodes as the input of the initial model corresponding to the node evaluation model, and perform training on the initial model with the goal of reducing the gap between the output of the model and the labels corresponding to the sample nodes. Train to get the node evaluation model.

For example, the sample node and the label corresponding to the sample node may be provided by the user.

Alternatively, the sample node and the label corresponding to the sample node can also be obtained through other methods.

Optionally, step 330 may include: training through reinforcement learning to obtain a trained node evaluation model.

In this case, the label corresponding to the sample node can be obtained through interaction with the environment during the reinforcement learning process. Sample nodes can come from the training database, or sample nodes can be provided by the user.

In a possible implementation, the training device and the solver of the node evaluation model may be deployed separately, in which case the environment may be the solver.

In other words, the solver can be encapsulated into an environment in reinforcement learning, and the training device continuously interacts with the solver to collect data to obtain the labels of sample nodes.

For example, training is performed through deep Q learning to obtain a node evaluation model.

Optionally, the label corresponding to the sample node may be determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing. The target evaluation model and the node The evaluation model has the same structure. The target evaluation model is used to predict the difference between the bounding value of the child node of the sample node after being fully solved and the bounding value of the sample node.

The target evaluation model is the target network in deep Q learning.

The first difference can be determined by the solver.

For example, the solver may send the limit value of the parent node of the sample node and the limit value of the sample node to the training device. The training device can determine the first difference based on this.

Alternatively, the solver may determine the first difference according to the limit value of the parent node of the sample node and the limit value of the sample node, and send the first difference to the training device.

The node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function.

The training goal of the model is to enable the node evaluation model to learn the function value of an accurate multi-step pseudo-cost function. In the process of deep Q learning, the label corresponding to the sample node can also be called the prediction label corresponding to the sample node.

For example, the prediction label corresponding to the node can satisfy the following formula:

in, is the predicted label corresponding to the sample node. c(P) is the first difference, that is, the difference between the limit value of the parent node of node P and the limit value of node P, which can be obtained by the solver. For example, the solver can calculate the limit value of node P and the limit value of the parent node of node P, so that the training device can obtain the first difference. is the second difference, Evaluate the model for the target. This target evaluation model is used to stabilize training and prevent overfitting.

During the training process, to reduce the prediction label corresponding to the sample node The difference between C _θ (P) and the model’s output is the target to adjust the model parameters. C _θ represents the model during training.

For example, the training goal can be expressed as:

Among them, E represents the average value, θ represents the parameters of the model, and π represents the learned policy, that is, the multi-step pseudo-cost function.

It should be understood that the above are only examples, and other tags can also be trained using deep Q learning, which is not limited in the embodiments of the present application.

Optionally, the limit value of the sample node after multi-step expansion includes the relaxation solution corresponding to the sample node after multi-step expansion. The function value of the objective function.

Taking the goal programming problem as a minimum optimization problem as an example, the function value of the objective function corresponding to the relaxation solution of a node can be the lower bound of the node. In other words, the limit value of the sample node after multi-step expansion can be the lower bound value of the sample node after multi-step expansion. That is, the node evaluation model can be used to predict quantities related to the lower bound of a node after multi-step expansion.

The input type and output type of the model during the training process are consistent with the input type and output type of the trained node evaluation model.

Optionally, the input of the initial node evaluation model includes relevant information of the sample nodes or a low-dimensional representation of the relevant information of the sample nodes.

The relevant information of the sample node includes at least one of the following: the objective function of the sample node, the constraint condition of the sample node or the decision variable of the sample node.

For example, the input to the initial node evaluation model includes information about sample nodes. In this case, the input of the node evaluation model may include relevant information of the node. The relevant information of the node includes at least one of the following: the objective function of the node, the constraint condition of the node or the decision variable of the node.

For another example, the input of the initial node evaluation model may include a low-dimensional representation of the relevant information of the sample node. In this case, the input of the node evaluation model may include a low-dimensional representation of the relevant information of the node, and the output of the node evaluation model may include the relevant quantity of the limit value of the node after multi-step expansion. The low-dimensional representation of the relevant information of the node includes at least one of the following: a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraint conditions, or a low-dimensional representation of the node's decision variable.

Among them, the low-dimensional representation of the relevant information of the sample node can be obtained by performing dimensionality reduction processing on the relevant information of the sample node through a feature extraction model.

The relevant information of the sample nodes is input into the feature extraction model for dimensionality reduction processing, and the results of the dimensionality reduction processing are input into the initial model corresponding to the node evaluation model.

The feature extraction model can be a trained model or a model in the training process.

For example, the relevant information of the sample node is input into the initial feature extraction model for dimensionality reduction processing, and the results of the dimensionality reduction processing are input into the initial node evaluation model for processing to predict the limit value of the sample node after multi-step expansion. The two models are trained with the goal of reducing the gap between the output results of the initial node evaluation model and the labels corresponding to the sample nodes. After the training is completed, the trained node evaluation model and the trained feature extraction model are obtained. The initial feature extraction model is the initial model corresponding to the feature extraction model.

It should be understood that the above are only examples, and the feature extraction model can also be trained in other ways, which is not limited in the embodiments of the present application.

Any AI model needs to be trained before it can be used to solve specific technical problems. AI model training refers to using a specified initial model to calculate the training data, and using a certain method to calculate the initial data based on the calculation results. The parameters in the model are adjusted so that the model gradually learns certain rules and has specific functions. The AI model with stable functions after training can be used for inference. The inference of the AI model is the process of using the trained AI model to calculate the input data and obtain the predicted inference results.

The solution of the embodiment of this application can be divided into two stages: the training stage and the inference stage.

In the training phase, the initial node evaluation model can be trained to obtain the node evaluation model. Illustratively, the node evaluation model can be mounted to a solver so that search nodes and pruning nodes are determined during the solution process.

The following is an exemplary description of a method for solving a goal programming model provided by the embodiment of the present application with reference to Figure 4 .

It should be understood that the example in Figure 4 is only to help those skilled in the art understand the embodiments of the present application, but is not intended to limit the embodiments of the present application to the specific numerical values or specific scenarios illustrated. Figure 4 shows the solution method of the goal programming model provided by the embodiment of the present application. The method 400 shown in Figure 4 can use the method 200 shown in Figure 2 to implement node selection. For related descriptions, please refer to the method 200. In order to avoid repetition, part of the description is appropriately omitted when describing the method 400.

In order to facilitate understanding and description, when describing the method 400, the solver and the device for selecting nodes are deployed separately as an example, which does not limit the embodiments of the present application. In other implementations, the solver and the device for selecting nodes may be integrated in the same device.

As shown in Figure 4, the method 400 includes steps 410 to 430, which are described below.

Step 410: Obtain the goal planning problem.

Step 420: Adjust the candidate node set of the target planning problem according to the node evaluation model. The candidate node set includes multiple nodes. Each node in the plurality of nodes corresponds to a sub-problem to be solved of the goal programming problem. The node evaluation model is used to predict the relevant quantities of the bounding values of nodes after multi-step expansion.

Step 430: Solve the target planning problem based on the adjusted candidate node set to obtain the solution result of the target planning problem.

For example, goal planning problems can be uploaded by users. Users can input goal programming problems into the solver. Get the goal planning questions uploaded by users. The goal programming problem is the mathematical programming problem that the user needs to solve.

The goal programming problem can be represented by the objective function, constraints and decision variables of the goal programming problem. Constraints are used to constrain decision variables. At least some of the decision variables of the goal programming model are integer variables, that is, at least some of the values of the decision variables are integers. In other words, the goal programming model is a pure integer programming model or a mixed integer programming model.

The solver can generate multiple sub-problems to be solved for the goal planning problem, and the multiple sub-problems to be solved can be used as multiple nodes in the candidate node set. The solver can be implemented based on the branch-and-bound algorithm framework.

For example, the solver can generate multiple constraints on the decision variables based on the goal planning problem, and add the multiple constraints to the constraints corresponding to the goal planning problem, thereby forming multiple sub-problems of the goal planning problem, namely Obtain multiple branches of the goal programming problem.

It should be understood that the constraints corresponding to the goal programming problem are constraints on the solution space of the goal planning problem. The additional constraints generated in the process of generating sub-problems of the goal programming problem are used to constrain the decision variables in the branch, thereby narrowing the scope of the solution space on the branch.

In step 420, a node evaluation model may be used to predict whether some or all nodes in the candidate node set will The correlation quantity of the limit value after step expansion.

For example, step 420 may include: predicting the correlation amount of the limit value of each node in the candidate node set after multi-step expansion through a node evaluation model.

The output results of the node evaluation model can be used as node evaluation information. In other words, the node evaluation model can be used to output node evaluation information. The evaluation information of a node is related to the node's limit value after multi-step expansion. For example, the evaluation information of the node is used to indicate the node evaluation model's prediction of the correlation quantity of the node's limit value after multi-step expansion.

The node evaluation model may be included in the means for selecting nodes. The device for selecting nodes may generate evaluation information of the plurality of nodes through a node evaluation model.

Optionally, the method 400 further includes: determining the node evaluation model according to user instructions.

For example, the user can select a node evaluation model from multiple node evaluation models. The node evaluation model selected by the user may be used as the node evaluation model in method 400 .

For another example, the user can select one device for selecting a node from multiple devices for selecting a node. The means for selecting nodes may correspond to the node evaluation model. The node evaluation model deployed in the device for selecting nodes indicated by the user is the node evaluation model in method 400.

Alternatively, the node evaluation model may also be determined by means of selecting nodes.

Alternatively, the node evaluation model may also be determined by the solver.

Alternatively, the node evaluation model can also be determined in other ways. For example, the node evaluation model can also be a default model.

For example, the evaluation information of a node is used to indicate changes in the node's limit value before and after multi-step expansion. In other words, the evaluation information of the node is used to indicate the prediction of the change of the node's limit value before and after multi-step expansion.

For example, the change in the limit value of a node before and after multi-step expansion can be represented by the function value of the node's multi-step pseudo cost function. In other words, the function value of the node's multi-step pseudo-cost function can be used to evaluate the node. The node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function.

Among them, taking the target programming problem as a minimum value planning problem as an example, the limit value can be a lower limit value. The function value of the multi-step pseudo-cost is the change in the lower bound value from the node expansion to the node being completely solved.

The purpose of defining a multi-step pseudo-cost function is that if the multi-step pseudo-cost function can be accurately calculated or learned, the optimal solution that can be ultimately searched from a node can be accurately predicted. At this time, the multi-step pseudo-cost function can be accurately predicted based on the multi-step pseudo-cost function. The function value of the function selects the node containing the global optimal solution.

The node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function. In other words, the node evaluation model is used to fit the multi-step pseudo-cost function of the node. In other words, the training goal of the node evaluation model can be to learn a multi-step pseudo-cost function to accurately predict the function value of the multi-step pseudo-cost function of the node. For the training process, please refer to method 300 or method 800.

It should be understood that for the convenience of description, only the multi-step pseudo cost function in the above form is used as an example in method 400. This does not limit the solutions of the embodiments of the present application. For related description, please refer to method 200 and will not be described again here.

In a possible implementation, the input of the node evaluation model includes node-related information. The relevant information of the node includes at least one of the following: the objective function of the node, the constraint condition of the node or the decision variable of the node.

In another possible implementation, the input of the node evaluation model includes a low-dimensional representation of the relevant information of the node. The low-dimensional representation of the relevant information of the node includes at least one of the following: a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraint conditions, or a low-dimensional representation of the node's decision variable.

The low-dimensional representation of the relevant information of the node is obtained by reducing the dimensionality of the relevant information of the node through the feature extraction model.

Exemplarily, the objective function of the node, the constraint conditions of the node, and the decision variable of the node are input to the feature extraction model to obtain a low-dimensional representation of the relevant information of the node, and the low-dimensional representation of the relevant information of the node is input to the node evaluation model. , the correlation amount of the node's limit value after multi-step expansion can be predicted.

The following is an illustrative explanation of the data processing process in the feature extraction model and node evaluation model. Feature extraction models and node evaluation models can be trained. For the training process, you can refer to method 300 in the previous article or method 800 in the following article.

1. Feature extraction model

The feature extraction model can be deployed in a device for selecting nodes, or in a solver, or in other devices. The embodiments of the present application do not limit this.

The feature extraction model is used to output a low-dimensional representation of the relevant information of the multiple nodes. By reducing the dimensionality of the relevant information of the nodes, it is beneficial to the reasoning of the downstream modules, that is, it is beneficial to the reasoning of the node evaluation model.

For example, the relevant information of the node may be the high-dimensional mathematical programming model information of the node. The node's high-dimensional mathematical programming model information is embedded and represented (embedding), that is, the node's objective function, the node's constraint conditions and the node's decision variables are dimensionally reduced.

In other words, the feature extraction model is used to output the features of nodes. For example, the feature can be represented as a set of vectors.

The relevant information of the node after dimensionality reduction can also be called the low-dimensional embedding representation of the node.

As an example, the input of the feature extraction model is the mathematical programming model information including the node, for example, the objective function of the node, the constraint condition of the node and the decision variable of the node, and the output is the feature of the node. For example, the feature extraction model may be implemented through a graph convolutional neural network. The graph convolutional neural network can be used to embed and represent high-dimensional mathematical programming model information.

The specific implementation process of dimensionality reduction processing will be exemplified below with reference to Figure 5.

Figure 5 shows an exemplary flow chart of a dimensionality reduction process.

Step 1: Convert the mathematical programming model information (A, b, C) of the node into a bipartite graph representation, that is, fill (A, b, C) according to the connection relationship.

Among them, A represents the coefficient matrix of the constraint condition, b represents the coefficient vector of the right-hand term of the constraint condition, and C represents the coefficient vector of the objective function.

For example, the data planning model of the node in Figure 5 can satisfy the following formula:

A ₁₁ x ₁ +A ₁₃ x ₃ ≤ b ₁ ;
stA ₁₂ x ₁ +A ₂₂ x ₂ ≤b ₂ ;
x∈Z

Among them, the objective function (objective) is Constraints include: A ₁₁ x ₁ +A ₁₃ x ₃ ≤b ₁ , A ₁₂ x ₁ +A ₂₂ x ₂ ≤b ₂ , and decision variables include: x ₁ , x ₂ and x ₃ . d ₁ , d ₂ , d ₃ , A ₁₁ , A ₁₃ , b ₁ , A ₁₂ , A ₂₂ and b ₂ are all parameters.

Step 2: Input the above-mentioned bipartite graph connection relationship into the graph convolutional neural network, and embed the objective function of the node, the constraints of the node and the decision variable of the node.

As shown in Figure 5, V represents the decision variable, C represents the constraint condition, and E represents the connection between V and C, that is, the coefficient matrix A of the constraint condition. V ¹ represents the decision variable after one graph convolutional neural network processing, and V ² represents the decision variable after two graph convolutional neural network processing. V ¹ represents the constraints after one graph convolutional neural network processing, and V ² represents the constraints after two graph convolutional neural network processings. π(x) represents the output result of this node, which is the low-dimensional embedding representation in step 3.

It should be understood that the processing process shown in FIG. 5 is only an example and does not limit the solutions of the embodiments of the present application.

Step 3: Output the low-dimensional embedding representation of the node.

The low-dimensional embedding representation of a node (eg, the node embedding in Figure 5) includes a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraints, and a low-dimensional representation of the node's decision variables.

Taking the data planning model in step 1 above as an example, the low-dimensional embedding representation of the node can include a low-dimensional representation of the objective function, a low-dimensional representation of the two constraints and a low-dimensional representation of the three decision variables.

In the embodiments of the present application, through the above-mentioned use of graph convolutional neural network for embedded representation of information, mathematical programming models of different sizes can be processed, while being insensitive to the arrangement of inputs.

2. Node evaluation model

The node evaluation model can be used to predict the relevant quantities of the bounding values of nodes after multi-step expansion.

Exemplarily, the low-dimensional representation of the correlation information of the multiple nodes output by the feature extraction model is input into the node evaluation model to predict the correlation amount of the limit values of the multiple nodes after multi-step expansion.

For example, the node evaluation model may be a neural network model.

For example, the node evaluation model is implemented through a fully connected neural network, as shown in Figure 6. The low-dimensional representation of the node-related information output by the feature extraction model is used as the input of the node evaluation model, and the function value of the multi-step pseudo-cost function of the node is predicted through the fully connected neural network. In other words, the input of the fully connected neural network can be a low-dimensional representation of the relevant information of the node, and the output of the fully connected neural network can be the function value of the multi-step pseudo-cost function of the node.

Optionally, step 420 may include: determining the first target node according to the node evaluation model; generating child nodes of the first target node; and adding the child nodes of the first target node to the candidate node set.

Optionally, step 420 may include: determining the second target node according to the node evaluation model; and deleting the second target node from the candidate node set.

Determining the first target node according to the node evaluation model can be understood as determining the first target node, that is, the search node, according to the evaluation information of the multiple nodes.

Determining the second target node according to the node evaluation model can be understood as determining the second target node, that is, the pruning node, according to the evaluation information of the multiple nodes.

The following is an example in which the evaluation information of a node is used to indicate the prediction of the function value of a multi-step pseudo-cost function of the node. It should be understood that the function values of the multi-step pseudo-cost function in the method 400 are all predicted values output by the node evaluation model.

The function value of a node's multi-step pseudo-cost function can be used to measure the long-term value of the node. Alternatively, the function value of a node's multi-step pseudo-cost function can be used to measure the long-term cost of a node.

Illustratively, the node may be scored based on the function value of the node's multi-step pseudo-cost function. For example, the larger the function value and the higher the score of a node's multi-step pseudo-cost function, the lower the long-term value of the node, or in other words, the higher the long-term cost of the node. The smaller the function value and the lower the score of a node's multi-step pseudo-cost function, the higher the long-term value of the node, or in other words, the lower the long-term cost of the node. It should be understood that this is only an example, and the relationship between the node's score and the node's long-term value or the node's long-term cost can also be expressed in other forms, which is not limited in the embodiments of the present application.

Based on the function values of the multi-step pseudo-cost function of multiple nodes, the node with the lowest score is used as the search node.

Based on the function values of the multi-step pseudo-cost function of multiple nodes, the k nodes with the highest scores are used as candidate pruning nodes. A probability vector is constructed based on the scores of the k nodes, and one of the nodes is probabilistically selected as a pruning node.

The score of a node is positively related to the probability of the node. The higher the score of a node, the greater the probability of the node being pruned. The lower the score of a node, the lower the probability of the node being pruned.

Figure 7 shows a way of determining pruning nodes.

As shown in Figure 7, a probability vector is constructed based on the scores of multiple nodes with the lowest scores. The scores of node 1, node 3 and node 4 are 5, 2, 3 respectively. Determine the probability of a node based on its score. The higher the score of a node, the higher the probability of the node. As shown in Figure 7, the probabilities of node 1, node 3 and node 4 are 0.3, 0.1 and 0.25 respectively. The node with index 4 is sampled as a pruned node.

It should be understood that the above methods of determining the first target node and the second target node are only examples, and for descriptions of other methods, reference may be made to method 200 .

For example, the device for selecting a node may send indication information of the target node (the first target node and/or the second target node) to the solver.

For example, the indication information of the target node may include evaluation information of the multiple nodes.

For example, the indication information of the target node may include scores of the multiple nodes.

It should be understood that the above are only examples, and the indication information of the target node may also include other forms of information, as long as the target node can be determined based on the indication information.

The solver can adjust the candidate node set according to the target node, and solve the target planning problem according to the adjusted candidate node set to obtain the solution result.

In the case where the target node includes a search node, the solver can expand the search node to obtain the child nodes of the search node, that is, to obtain a new sub-problem of the target planning problem. Child nodes of the search node can be added to the set of candidate nodes. For example, the adjusted candidate node set can be used as the candidate node set used in the next round of iteration process. During the iterative process, the above step 420 can be repeated until the solution is completed to obtain the solution result.

When the target node includes a pruned node, the solver can prune the pruned node and adjust the set of candidate nodes. For example, the adjusted candidate node set can be used as the candidate node set used in the next round of iteration process. During the iterative process, the above step 420 can be repeated until the solution is completed to obtain the solution result.

For example, the end of the solution can be that all sub-problems have been solved, that is, the candidate node set does not include nodes. Alternatively, the end of the solution can be when the solution time exceeds a preset time. Alternatively, the solution can end when the difference between the global upper bound value and the global lower bound value is less than a set threshold. The conditions for ending the solution can be set as needed, and the embodiments of this application do not limit this.

Optionally, method 400 may also include: returning the solution result to the user.

Further, the method 400 may also include: returning the indication information of the target node to the user.

It should be understood that the solution process in method 400 is only an example. In method 400, a user may provide a goal programming problem to a solver and receive a solution result of the goal planning problem returned by the solver.

For example, in other possible implementations, the user may provide a set of candidate nodes to the device for selecting nodes, and receive indication information of the target node provided by the device for selecting nodes.

The following is an exemplary description of a model training method provided by the embodiment of the present application with reference to Figure 8 .

It should be understood that the example in FIG. 8 is only to help those skilled in the art understand the embodiments of the present application, but is not intended to limit the embodiments of the present application to the specific numerical values or specific scenarios illustrated. Figure 8 shows a model training method provided by the embodiment of the present application. The training method shown in Figure 8 can be regarded as a specific implementation of the method 300 shown in Figure 3.

In order to facilitate understanding and description, the training process is illustrated in Figure 8 by taking the example of separately deploying the solver and the training device of the model. In other implementations, the solver and the model training device may be integrated in the same device.

As shown in Figure 8, the method 800 includes steps 810 to 830, which are described below.

Step 810: Obtain sample nodes.

Illustratively, the sample nodes may be from a training database.

The solver can generate multiple nodes of the planning problem, which can be used as sample nodes. Sample nodes may be determined based on the solution process of one or more planning problems. The one or more planning questions may be provided by the user or may be pre-stored.

As an example, the solver can receive batch data provided by the user (for example, multiple planning problems), and solve based on the batch data provided by the user, sampling sample nodes from the multiple nodes generated during the solving process, and converting the samples into The relevant information of the nodes is stored in the training database. Alternatively, the solver can receive user-supplied batch data, e.g., multiple planning problems, and perform a solution based on the supplied batch data and historical data (e.g., multiple pre-stored planning problems), generated from the solution process. Sample nodes are sampled from multiple nodes, and the relevant information of the sample nodes is stored in the training database.

Illustratively, sample nodes may be provided by users.

Step 820: Perform dimensionality reduction processing on the relevant information of the sample node based on the feature extraction model to obtain a low-dimensional representation of the relevant information of the sample node.

Illustratively, the feature extraction model is a graph convolutional neural network. The input of the graph convolutional neural network can include relevant information of sample nodes. This graph convolutional neural network is used to reduce the dimensionality of the relevant information of the sample nodes and output The low-dimensional embedding representation of the sample node is obtained, that is, the low-dimensional representation of the relevant information of the sample node.

Step 830: Use the low-dimensional representation of the relevant information of the sample node as the input of the node evaluation model, and adjust the parameters of the node evaluation model with the goal of reducing the gap between the output result of the node evaluation model and the label corresponding to the sample node.

For example, the node evaluation model may be a fully connected neural network.

Specifically, the node evaluation model can be trained through deep Q learning. In other words, the node evaluation model can be a deep Q network.

The training process is explained below by taking the label corresponding to the sample node as the function value of the multi-step pseudo-cost function as an example.

The definition of the multi-step pseudo-cost function satisfies the Bellman equation of dynamic programming. The expression of the state transition function of this equation is unknown and can be solved by deep Q learning. In other words, the node evaluation model can be trained by deep Q learning. So that the trained node evaluation model can be used to predict the function value of the multi-step pseudo-cost function.

DQL can help select optimal actions by estimating the long-term cumulative return (Q function) of each action. In the embodiment of this application, the node evaluation network helps select nodes by predicting the multi-step pseudo-cost of each node. The multi-step pseudo-cost function can be used as a Q function.

During the training process, the prediction label corresponding to the sample node can be used as the label corresponding to the sample node.

The predicted label satisfies the following formula:

in, is the predicted label of the sample node. c(P) is the difference between the limit value of the parent node of node P and the limit value of node P, which can be obtained by the solver. Evaluate the model for the target.

During the training process, to reduce the label of the sample node and the model output C _θ (P) for the purpose of adjusting the model parameters. C _θ represents the model to be trained.

For example, the training goal can be expressed as:

During the training process, the solver can be encapsulated as an environment, and the training device collects data through continuous interaction with the solver. The training device can obtain the limit value of the node by calling the solver, and the limit value can be used as supervision information for fitting the multi-step pseudo cost function. For example, the training device can obtain the limit value of node P and the limit value of the parent node of node P from the solver, so that the training device can obtain c(P). The training device can determine the limit value of the child node of node P according to the target evaluation model, and then determine the second item in the above prediction label.

The target evaluation model has the same structure as the node evaluation model. The parameters of both may be the same or different. The target evaluation model is the target network in the deep Q learning process. This target evaluation model is used to stabilize training and prevent overfitting.

For example, during the training process of the node evaluation model, at regular intervals, the target evaluation model can be updated based on the parameters of the current node evaluation model, that is, the target evaluation model is replaced with the current node evaluation model.

For example, during the training process of the node evaluation model, after every N iterations, the target evaluation model can be updated based on the parameters of the current node evaluation model.

Figure 9 shows a schematic diagram of a training process based on reinforcement learning. The training process shown in Figure 9 may include the following steps:

1) The solver obtains the data set.

The data set contains one or more planning problems. At least some of the decision variables of the one or more planning problems The decision variables are integer variables.

2) The solver generates a set of candidate sample nodes based on the data set. The candidate sample set includes multiple sample nodes.

In other words, the solver may generate a branch-and-bound search tree based on the data set, the branch-and-bound search tree including the plurality of sample nodes. In other words, the set of candidate sample nodes can be represented in the form of a branch-and-bound search tree.

The set of candidate sample nodes can be stored in the training database.

3) Perform dimensionality reduction processing on the relevant information of multiple sample nodes to obtain the characteristics of multiple sample nodes, that is, a low-dimensional representation of the relevant information of the multiple sample nodes.

4) Input the characteristics of the multiple sample nodes into the node evaluation model for processing to predict the function values of the multi-step pseudo-costs of the multiple sample nodes.

The features of the multiple sample nodes can be used as states in deep Q learning.

5) Determine the target sample node based on the output of the node evaluation model.

6) Feed back the target sample nodes to the solver.

In other words, target sample nodes are fed back to the solver as actions in deep Q-learning.

7) The solver can provide the limit value of the target sample node and the limit value of the target sample node's parent node to the node evaluation model.

The limit value of the target sample node and the limit value of the target sample node's parent node can be used as rewards in deep Q learning to adjust the parameters of the node evaluation model.

8) The solver can adjust the set of candidate sample nodes based on the target sample node.

Repeat the above steps 3) to 8) until the training is completed.

Figure 10 shows a schematic diagram of recursive expansion of node P in a branch-and-bound search tree. As shown in Figure 10, node P can be expanded into two sub-nodes N ₁ and N ₂ . In Figure 10, child nodes (for example, N ₄ ) connected by dotted lines can be understood as nodes that have been pruned or nodes that have not yet been expanded. The child nodes (eg, N ₁ , N ₂ , N ₄ and N ₅ ) connected by solid lines can be understood as actual expanded nodes.

For example, in the process of adjusting the parameters of the node evaluation model, the parameters of the feature extraction model can be adjusted simultaneously, that is, the feature extraction model can be adjusted with the goal of reducing the gap between the output results of the node evaluation model and the labels corresponding to the sample nodes. Parameters and nodes evaluate the parameters of the model.

Alternatively, the feature extraction model can also be trained in other ways, which is not limited in the embodiments of the present application.

Optionally, the method 800 may also include: returning the node evaluation model to the user.

Further, method 800 may also include: returning the feature extraction model to the user.

In the method 800, a GCN-based feature extraction model is used to perform feature extraction on the nodes in the branch-and-bound search tree, and a node evaluation model based on a fully connected neural network is used to predict the multi-step pseudo-cost of the node. The feature extraction model and fully connected neural network are trained using reinforcement learning. The model trained using method 800 can be mounted to the solver and used to select search nodes and pruning nodes during the solution process, which is beneficial to improving solution efficiency.

It should be understood that the training method in method 800 is only an example. For other training methods, please refer to the description in method 300 and will not be described again here.

Table 1 shows the comparison results of test indicators under different solution methods. Specifically, Table 1 shows the results of the rule-based best estimate search method and the solution method using the solution of the embodiment of the present application. test indicators.

The solver uses a mixed integer programming solver to solve constraint integer programs (SCIP), and experiments are conducted based on multiple open source data sets.

Table 1 shows four groups of experiments, which are introduced as follows:

(1) The data set is the open source knapsack problem data set (MIK), a medium-sized data set, and the solution difficulty is medium. The planning problem is a minimum value optimization problem.

(2) The data set is a combinatorial auction (cauctions) problem data set, a medium-sized data set, and the solution difficulty is medium difficulty. The planning problem is a maximum optimization problem.

(3) The data set is an artificially generated set cover, a small-scale data set, and the solution difficulty is easy. The planning problem is a minimum value optimization problem.

(4) The data set is a facility location data set (facilities), a small-scale data set, and the difficulty of solving it is easy. The planning problem is a maximum optimization problem.

Test indicators include: solving time of the problem, the number of nodes of the search tree generated during the solving process, and the time of the change curves of the primal bound and dual bound during the solving process. The integral value is the primal dual integral.

As shown in Table 1, the performance of the solution in the embodiment of the present application has been improved on the above test indicators. Among them, the solution time increased by 27.8% on average.

Table 1

In the embodiment of this application, in a possible implementation manner, the user can train each of the above models on a local device.

In the embodiment of this application, in another possible implementation manner, users can train each of the above models on the AI basic development platform.

It should be understood that the AI basic development platform is a platform-as-a-service (PaaS) cloud service in the cloud platform, which is based on the large number of basic resources and software capabilities owned by public cloud service providers for users (also It is called: a software platform provided by tenants, AI developers, etc.) to assist in the construction, training, deployment of AI models, and the development and deployment of AI applications. As shown in Figure 11, the interaction between users and the AI basic development platform mainly includes: users log in to the cloud platform through the client web page, select and purchase the cloud service of the AI basic development platform in the cloud platform, and after purchase, the user can The basic development platform provides functions for full-process AI development. When users develop and train their own AI models on the AI basic development platform, they are based on the basic resources (mainly computing resources, such as central processing unit (CPU), graphics processor) in the cloud service provider's data center. (graphics processing unit, GPU), embedded neural network processor (neural-network process units, NPU), etc.).

The AI basic development platform can be independently deployed on a server or virtual machine in a data center in a cloud environment. The AI basic development platform can also be deployed distributedly on multiple servers in a data center, or distributed in a data center. on multiple virtual machines.

In another embodiment, the AI basic development platform provided by this application can also be deployed in a distributed manner in different environments. The AI basic development platform provided by this application can be logically divided into multiple parts, each part having different functions. For example, part of the AI basic development platform can be deployed in computing devices in the edge environment (also called edge computing devices), and the other part can be deployed in devices in the cloud environment. The edge environment is an environment that is geographically close to the user's terminal computing device. The edge environment includes edge computing devices, such as edge servers, edge stations with computing capabilities, etc. Various parts of the AI basic development platform deployed in different environments or devices work together to provide users with functions such as training AI models.

The following takes the training of the node evaluation model as an example to explain the AI model training service provided by the AI basic development platform.

The AI basic development platform can train the initial model and obtain a node evaluation model that meets the user's goals.

The initial model may be a built-in initial model in the AI basic development platform. Alternatively, the initial model may be an initial model provided by the user or selected by the user on the AI basic development platform. Alternatively, the initial model can also be a suitable model searched by the AI basic development platform using the background neural network architecture search algorithm.

Training data can include data built into the AI basic development platform. Alternatively, the training data may include user-supplied data or data processed based on user-supplied data. For example, the user-supplied data may be a data set that includes one or more mixed integer programming problems. The AI basic development platform can process the data set based on the built-in solver to obtain a set of candidate sample nodes. The AI basic development platform can save multiple sample nodes in the candidate sample node set to the training database. For another example, the data provided by the user can be a collection of candidate sample nodes. The AI basic development platform can save multiple sample nodes in the candidate sample node set to the training database. For another example, the data provided by the user may include a set of candidate sample nodes and labels corresponding to the sample nodes. The AI basic development platform can combine multiple sample nodes in the candidate sample node set and the multiple sample nodes The corresponding labels are saved to the training database. The data provided by the user can also be other types of data. For specific description, please refer to the method 300 or the method 800 mentioned above, which will not be described again here.

The AI basic development platform can also deploy the aforementioned trained AI model (for example, feature extraction model or node evaluation model) on nodes in the cloud environment or nodes in the edge environment. Among them, nodes in the cloud environment can be virtual machine instances, container instances, physical servers, etc., and nodes in the edge environment can be various edge devices. As shown in Figure 12, an example is shown. When the scale of the model is large, the model can be distributed and deployed on multiple nodes based on the idea of model parallelism. As another example, the model can also be deployed independently on multiple nodes to support a larger number of visits to online services. As another example, the AI basic development platform can also deploy AI applications to edge devices registered to the cloud platform based on the application requirements of the AI model.

The above-deployed AI model can become an AI application or become a part of an AI application. As shown in Figure 13, users can access AI applications online through Web pages or through client apps. When an AI application is used, the AI model deployed in the edge environment or cloud environment can be called online to provide a response. As a result, the AI model developed and trained through the AI basic development platform can implement inference on online request data and return inference results.

It should be noted that the nodes in Figures 12 and 13 may include nodes in the cloud environment or nodes in the edge environment, and the nodes in the method of the embodiment of the present application are sub-problems of the target planning problem.

For example, the feature extraction model and the node evaluation model can be used as an AI application, for example, an application for selecting nodes. Users can access the application of selected nodes online through web pages or client apps. When the node selection application is used, the feature extraction model and node evaluation model deployed in the edge environment or cloud environment can be called online to provide a response. Thus, the inference result is returned, for example, the indication information of the target node.

Alternatively, the feature extraction model and the node evaluation model may be part of an AI application, for example, the AI application may be a planning problem solving application. Users can access planning problem solving applications online through web pages or client apps. In this case, users can upload the planning problem to be solved, that is, the goal planning problem. When a solver application is used, the solver can call the select node service to determine the target node. The node selection service implements inference on the data requested by the solver through the feature extraction model and the node evaluation model, and returns the inference results to the solver, for example, the indication information of the target node. The solver solves the planning problem based on the target nodes and returns the solution results to the user.

While the AI model is providing online reasoning services, the AI basic development platform can continuously collect the input and output data of the reasoning process, use the input and output data of the reasoning phase to continue to enrich the training data set, and based on the data of the reasoning phase and the corresponding manual confirmation The final results continue to optimize and train the AI model.

It should be understood that in other cases, the AI model developed and trained by the aforementioned AI basic development platform may not be deployed online. Instead, users can download the trained AI model to the local area for users to freely deploy locally. For example, users can choose to save the trained AI model (for example, feature extraction model and node evaluation model) to OBS, and then the user downloads the AI model from OBS to the local.

The device according to the embodiment of the present application will be described below with reference to FIGS. 14 to 19 . It should be understood that the devices described below can perform the foregoing methods of the embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the devices of the embodiments of the present application.

Figure 14 is a schematic block diagram of a node selection device 1400 provided by an embodiment of the present application. The device 1400 can be applied to a cloud management platform and can be implemented through software, hardware, or a combination of both. Provided by the embodiments of this application The device 1400 can implement the method flow shown in Figure 2 of the embodiment of this application.

The device 1400 includes: an acquisition module 1410 and a prediction module 1420. The acquisition module 1410 is used to obtain a candidate node set of the target planning problem. The candidate node set includes multiple nodes, and each node in the multiple nodes corresponds to a sub-problem to be solved of the target planning problem. The prediction module 1420 is used to predict the relevant quantity of the limit value of each node after multi-step expansion through the node evaluation model. The output result of the node evaluation model is used to determine the target node. The target node is used to adjust the candidate node set. The adjusted candidate Node collections are used to solve goal programming problems.

Optionally, the device 1400 further includes: a determination module 1430 (not shown in the figure), which can be used to determine a node evaluation model according to user instructions, and the node evaluation model can be deployed on a cloud management platform.

Optionally, the output result of the node evaluation model is used to determine the first target node, and the adjusted candidate node set includes child nodes of the first target node.

Optionally, the output result of the node evaluation model is used to determine the second target node, and the second target node is not included in the adjusted candidate node set.

Optionally, the correlation quantity of the limit value of each node after multi-step expansion includes the difference between the limit value of each node after multi-step expansion and the limit value of each node's parent node.

Optionally, the correlation quantity of the limit value of each node after multi-step expansion includes the difference between the limit value of each node after being completely solved and the limit value of each node's parent node.

Optionally, the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to the multi-step value of other nodes other than the first target node among the multiple nodes. The difference between the expanded bounds and the bounds of the parent nodes of nodes other than the first target node.

Optionally, the second target node belongs to k nodes among the plurality of nodes, and the difference between the limit value of the k nodes after multi-step expansion and the limit value of the parent node of the k nodes is greater than or equal to that of the plurality of nodes. The difference between the limit value of other nodes other than k nodes after multi-step expansion and the limit value of the parent node of other nodes other than k nodes, k is an integer greater than 1, and k is less than the number of multiple nodes.

Optionally, the limit value of each node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of each node after multi-step expansion.

Optionally, the node evaluation model is trained based on the sample node and the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.

Optionally, the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is determined by The child nodes of the sample node are input into the target evaluation model for processing. The target evaluation model has the same structure as the node evaluation model.

Optionally, the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node, and the relevant information of multiple nodes includes at least one of the following: an objective function of each node, each node Constraints or decision variables of each node, the low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model.

Figure 15 is a schematic block diagram of a node evaluation model training device 1500 provided by an embodiment of the present application. Should be installed Set 1500 can be applied to cloud management platforms, which can be implemented through software, hardware or a combination of both. The device 1500 provided by the embodiment of the present application can implement the method flow shown in Figure 3 or Figure 8 of the embodiment of the present application.

The device 1500 includes: a first acquisition module 1510, a second acquisition module 1520 and a training module 1530. The first acquisition module 1510 is used to acquire sample nodes. The second obtaining module 1520 is used to obtain the label corresponding to the sample node, and the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. The training module 1530 is used to train the initial model based on the sample nodes and labels corresponding to the sample nodes to obtain a node evaluation model.

The first acquisition module 1510 and the second acquisition module 1520 may be the same acquisition module, or they may be different acquisition modules.

Optionally, the training module 1530 is specifically configured to: train the initial model through reinforcement learning to obtain a node evaluation model, wherein the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing. The target evaluation model and the node evaluation model The structure is the same.

Optionally, the input of the initial model includes relevant information of the sample node or a low-dimensional representation of the relevant information of the sample node. The relevant information of the sample node includes at least one of the following: the objective function of the sample node, the constraints of the sample node or the sample. The low-dimensional representation of the node's decision variables and the relevant information of the sample node is obtained by reducing the dimensionality of the relevant information of the sample node through the feature extraction model.

Figure 16 is a schematic block diagram of a device 1600 for solving a goal planning problem provided by an embodiment of the present application. The device 1600 can be applied to a cloud management platform and can be implemented through software, hardware, or a combination of both. The device 1600 provided by the embodiment of the present application can implement the method flow shown in Figure 4 of the embodiment of the present application.

The device 1600 includes: an acquisition module 1610, an adjustment module 1620 and a solution module 1630. The acquisition module 1610 is used to acquire the goal planning problem uploaded by the user. The adjustment module 1620 is used to adjust the candidate node set of the target planning problem according to the node evaluation model, where the candidate node set includes multiple nodes, and each node in the multiple nodes corresponds to a sub-problem to be solved of the target planning problem. ,The node evaluation model is used to predict the ,correlation quantity of the boundary value of each node after ,multi-step expansion. The solving module 1630 is used to solve the target planning problem based on the adjusted candidate node set to obtain the solution result of the target planning problem.

Optionally, the device 1600 further includes a determining module 1640 (not shown in the figure), which is used to determine the node evaluation model according to user instructions. This node evaluation model can be deployed on the cloud management platform.

Optionally, the adjustment module 1620 is specifically configured to determine the first target node according to the node evaluation model; generate child nodes of the first target node; and add the child nodes of the first target node to the candidate node set.

Optionally, the adjustment module 1620 is specifically configured to determine the second target node according to the node evaluation model; delete the second target node from the candidate node set.

Optionally, the correlation quantity of the bound value of each node after multi-step expansion includes the bound value of each node after being completely solved. The difference between the limit value and the limit value of each node's parent node.

Optionally, the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node, and the relevant information of each node includes at least one of the following: an objective function of each node, each node Constraints or decision variables of each node, the low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model.

Optionally, the device 1600 also includes a return module (not shown in the figure), which is used to return the solution result of the goal planning problem to the user.

The devices shown in Figures 14, 15 and 16 can be embodied in the form of functional modules. The term "module" here can be implemented in the form of software and/or hardware, and is not specifically limited.

For example, a "module" may be a software program, a hardware circuit, or a combination of both that implements the above functions. Illustratively, the following takes the adjustment module in Figure 16 as an example to introduce the implementation of the acquisition module. Similarly, the implementation of other modules can refer to the implementation of the adjustment module.

Adjustment module As an example of a software functional unit, an adjustment module may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the above computing instance may be one or more. For example, a tuning module can include code running on multiple hosts/VMs/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Furthermore, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.

Likewise, multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (virtual private cloud (VPC), or can be distributed in multiple VPCs. Among them, usually a VPC is set up in a region. Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .

The adjustment module is an example of a hardware functional unit. The adjustment module may include at least one computing device, such as a server. Alternatively, the adjustment module can also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). Among them, the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.

Multiple computing devices included in the adjustment module can be distributed in the same region or in different regions. Multiple computing devices included in the adjustment module can be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the adjustment module can be distributed in the same VPC or in multiple VPCs. The plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

Therefore, the modules of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

It should be noted that when the device provided in the above embodiment performs the above method, only the division of the above functional modules is used as an example. In actual application, the above function allocation can be completed by different functional modules as needed, that is, the device The internal structure is divided into different functional modules to complete all or part of the functions described above. For example, in the device 1600, the acquisition module can be used to perform any step in the above method, the adjustment module can be used to perform any step in the above method, and the solving module can be used to perform any step in the above method. The steps that the acquisition module, the adjustment module and the solution module are responsible for implementing can be specified as needed. The acquisition module, the adjustment module and the solution module respectively implement different steps in the above method to realize all the functions of the above device. The division of functional modules of the device 1400 and the device 1500 is only an example, and will not be described again here to avoid duplication.

In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the above method embodiments, which will not be described again here.

A computing device provided by an embodiment of the present application will be described in detail below with reference to FIG. 17 .

Figure 17 is a schematic architectural diagram of a computing device 1000 provided by an embodiment of the present application.

As shown in Figure 17, computing device 1000 includes: bus 1002, processor 1004, memory 1006, and communication interface 1008. The processor 1004, the memory 1006 and the communication interface 1008 communicate through the bus 1002. Computing device 1000 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 1000.

The bus 1002 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, or the like. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 17, but it does not mean that there is only one bus or one type of bus. Bus 1004 may include a path that carries information between various components of computing device 1000 (eg, memory 1006, processor 1004, communications interface 1008).

The processor 1004 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.

Memory 1006 may include volatile memory, such as random access memory (RAM). The processor 1004 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive). drive, SSD).

The memory 1006 stores executable program codes, and the processor 1004 executes the executable program codes to respectively implement the functions of the modules in Figure 14, Figure 15 or Figure 16, thereby implementing the methods of the embodiments of the present application. That is, the memory 1006 stores instructions for executing the method of the embodiment of the present application.

The communication interface 1003 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 1000 and other devices or communication networks.

An embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.

As shown in Figure 18, the computing device cluster includes at least one computing device 1000. The computing device cluster can be used to execute the method of the embodiment of the present application, for example, the method shown in Figure 2, Figure 3, Figure 4 or Figure 8.

The following mainly explains the method used by the computing device cluster to solve the goal planning problem as an example.

The memory 1006 in one or more computing devices 1000 in the computing device cluster may store the same instructions for performing the methods of the embodiments of the present application.

For example, the memory 1006 of one or more computing devices 1000 in a cluster of computing devices may store the same instructions for performing a method of solving a goal planning problem.

In some possible implementations, the memory 1006 of one or more computing devices 1000 in the computing device cluster may also store part of the instructions for executing the method of the embodiment of the present application. In other words, a combination of one or more computing devices 1000 may jointly execute instructions for performing the methods of embodiments of the present application.

For example, the memory 1006 of one or more computing devices 1000 in the computing device cluster may also store part of the instructions for executing the solution method of the goal planning problem. In other words, a combination of one or more computing devices 1000 may collectively execute instructions for performing a method of solving a goal planning problem.

It should be noted that the memories 1006 in different computing devices 1000 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the method device 1600 for solving a goal planning problem. That is, instructions stored in the memory 1006 in different computing devices 1000 may implement the functions of one or more of the acquisition module, the adjustment module, and the solution module.

In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein, the network may be a wide area network or a local area network, etc. Figure 19 shows a possible implementation. As shown in Figure 19, two computing devices 1000A and 1000B are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this possible implementation, the memory 1006 in the computing device 1000A stores instructions for executing the functions of the acquisition module and the solution module. At the same time, instructions for performing the functions of the adjustment module are stored in the memory 1006 in the computing device 1000B.

It should be understood that the functions of the computing device 1000A shown in FIG. 19 may also be performed by multiple computing devices 1000. Likewise, the functions of computing device 1000B may also be performed by multiple computing devices 1000.

An embodiment of the present application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, at least one computing device is caused to execute the method in the embodiment of the present application, for example, a method for solving a goal planning problem, a method for selecting nodes, or a method for training a node evaluation model.

An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc. The computer-readable storage medium includes instructions that indicate methods in embodiments of the present application, for example, a method for solving a goal planning problem, a method for selecting nodes, or a method for training a node evaluation model.

It should be understood that in the various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.

If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including A number of instructions that cause a computer device (that can be a personal computer, server, or network device, etc.) that executes all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application. should be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A method for solving a goal planning problem, characterized in that the method includes:

Obtain the goal planning problems uploaded by users;

According to the node evaluation model, the candidate node set of the target planning problem is adjusted, wherein the candidate node set includes a plurality of nodes, and each node in the plurality of nodes corresponds to one of the target planning problems to be solved. As a sub-problem, the node evaluation model is used to predict the correlation quantity of the limit value of each node after multi-step expansion;

Based on the adjusted candidate node set, the target planning problem is solved to obtain a solution result of the target planning problem.
The method of claim 1, further comprising:

The node evaluation model is determined according to the user instruction, and the node evaluation model is deployed on the cloud management platform.
The method according to claim 1 or 2, characterized in that adjusting the candidate node set of the target planning problem according to the node evaluation model includes:

Determine the first target node according to the node evaluation model;

Generate child nodes of the first target node;

Add child nodes of the first target node to the candidate node set.
The method according to any one of claims 1 to 3, characterized in that adjusting the candidate node set of the target planning problem according to the node evaluation model further includes:

Determine a second target node according to the node evaluation model;

The second target node is deleted from the candidate node set.
The method according to any one of claims 1 to 4, characterized in that the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion and The difference between the bound values of each node's parent node.
The method according to claim 5, characterized in that the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node. The difference between the parent node's limit values.
The method according to claim 5 or 6, characterized in that the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to The difference between the limit values of other nodes among the plurality of nodes other than the first target node after multi-step expansion and the limit values of parent nodes of other nodes other than the first target node.
The method according to any one of claims 5 to 7, characterized in that the second target node belongs to k nodes among the plurality of nodes, and the limit values of the k nodes after multi-step expansion The difference between the limit value of the parent node of the k nodes is greater than or equal to the limit value of other nodes in the plurality of nodes other than the k nodes after multi-step expansion. The difference between the limit values of the parent nodes of other nodes, k is an integer greater than 1, and k is less than the number of the multiple nodes.
The method according to claim 8, characterized in that the second target node is determined based on the probabilities corresponding to the k nodes, and the probabilities corresponding to the k nodes are expanded in multiple steps with the k nodes. There is a positive correlation between the difference between the last limit value and the limit value of the parent node of the k nodes.
The method according to any one of claims 1 to 9, characterized in that the limit value of each node after multi-step expansion includes an objective function corresponding to the relaxed solution of each node after multi-step expansion. function value.
The method according to claim 10, characterized in that the node evaluation model is trained based on sample nodes and labels corresponding to the sample nodes, and the labels corresponding to the sample nodes and the sample nodes are expanded in multiple steps related to the final limit value.
The method according to claim 11, characterized in that the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node. .
The method according to claim 11, characterized in that the label corresponding to the sample node is determined based on a first difference and a second difference, and the first difference is the limit value of the parent node of the sample node and the The difference between the limit values of the sample nodes, the second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing, the difference between the target evaluation model and the node evaluation model The structure is the same.
The method according to any one of claims 1 to 13, characterized in that the input of the node evaluation model includes the relevant information of each node or a low-dimensional representation of the relevant information of each node, so The relevant information of each node includes at least one of the following: an objective function of each node, a constraint condition of each node or a decision variable of each node, a low value of the relevant information of each node. The dimensional representation is obtained by performing dimensionality reduction processing on the relevant information of each node through a feature extraction model.
The method according to any one of claims 1 to 14, characterized in that the method further includes:

Return the solution result of the goal planning problem to the user.
A method for selecting nodes, characterized by including:

Obtain a candidate node set of the target planning problem, the candidate node set includes a plurality of nodes, each node in the plurality of nodes corresponds to a sub-problem to be solved of the target planning problem;

The correlation quantity of the limit value of each node after multi-step expansion is predicted through the node evaluation model, the output result of the node evaluation model is used to determine the target node, and the target node is used to adjust the candidate node set, so The adjusted candidate node set is used to solve the target planning problem.
The method of claim 16, further comprising:

The node evaluation model is determined according to user instructions, and the node evaluation model is deployed on the cloud management platform.
The method according to claim 16 or 17, characterized in that the output result of the node evaluation model is used to determine the first target node, and the adjusted candidate node set includes child nodes of the first target node. .
The method according to any one of claims 16 to 18, characterized in that the output result of the node evaluation model is used to determine the second target node, and the adjusted candidate node set does not include the second target node. target node.
The method according to any one of claims 16 to 19, characterized in that the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion and The difference between the bound values of each node's parent node.
The method according to claim 20, characterized in that the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node. The difference between the parent node's limit values.
The method according to claim 20 or 21, characterized in that the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to The limit values of other nodes other than the first target node among the plurality of nodes after multi-step expansion are the same as the limit values of other nodes other than the first target node. The difference between the bound values of other nodes' parents.
The method according to any one of claims 20 to 22, characterized in that the second target node belongs to k nodes among the plurality of nodes, and the limit values of the k nodes after multi-step expansion The difference between the limit value of the parent node of the k nodes is greater than or equal to the limit value of other nodes among the plurality of nodes other than the k nodes after multi-step expansion. The difference between the limit values of the parent nodes of other nodes, k is an integer greater than 1, and k is less than the number of the multiple nodes.
The method according to claim 23, characterized in that the second target node is determined based on the probabilities corresponding to the k nodes, and the probabilities corresponding to the k nodes are expanded in multiple steps with the k nodes. There is a positive correlation between the difference between the last limit value and the limit value of the parent node of the k nodes.
The method according to any one of claims 16 to 24, wherein the limit value of each node after multi-step expansion includes an objective function corresponding to the relaxed solution of each node after multi-step expansion. function value.
The method according to claim 25, characterized in that the node evaluation model is trained based on sample nodes and labels corresponding to the sample nodes, and the labels corresponding to the sample nodes and the sample nodes after multi-step expansion are Limit value related.
The method according to claim 26, characterized in that the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node. .
The method according to claim 26, characterized in that the label corresponding to the sample node is determined based on a first difference and a second difference, and the first difference is the limit value of the parent node of the sample node and the The difference between the limit values of the sample nodes, the second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing, the difference between the target evaluation model and the node evaluation model The structure is the same.
The method according to any one of claims 16 to 28, characterized in that the input of the node evaluation model includes the relevant information of each node or a low-dimensional representation of the relevant information of each node, so The relevant information of the plurality of nodes includes at least one of the following: an objective function of each node, a constraint condition of each node or a decision variable of each node, a low value of the relevant information of each node. The dimensional representation is obtained by performing dimensionality reduction processing on the relevant information of each node through a feature extraction model.
A training method for a node evaluation model, characterized in that the node evaluation model is used to predict the correlation quantity of the limit value of each node in a candidate node set of a target planning problem after multi-step expansion, and each node Corresponding to a sub-problem to be solved of the target planning problem, the output result of the node evaluation model is used to determine the target node, the target node is used to adjust the candidate node set, and the adjusted candidate node set is used to The goal planning problem is solved, and the training method includes:

Get the sample node,

Obtain the label corresponding to the sample node, and the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion;

An initial model is trained based on the sample node and the label corresponding to the sample node to obtain the node evaluation model.
The method of claim 30, wherein the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
The method according to claim 30 or 31, characterized in that the label corresponding to the sample node is used to indicate the gap between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node. difference.
The method according to claim 30 or 31, characterized in that said training an initial model based on the sample node and the label corresponding to the sample node to obtain the node evaluation model includes:

The initial model is trained through reinforcement learning to obtain the node evaluation model, wherein the label corresponding to the sample node is determined based on the first difference and the second difference, and the first difference is the The difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing, and the second difference is The target evaluation model has the same structure as the node evaluation model.
According to the method of any one of claims 30 to 33, the input of the initial model includes relevant information of the sample node or a low-dimensional representation of the relevant information of the sample node, and the relevant information of the sample node includes At least one of the following: the objective function of the sample node, the constraint condition of the sample node or the decision variable of the sample node, the low-dimensional representation of the relevant information of the sample node is a feature extraction model for the sample The relevant information of the nodes is obtained by dimension reduction processing.
A device for solving a goal planning problem, characterized by comprising a unit or module for executing the method according to any one of claims 1 to 15.
A solution device for selecting nodes, characterized by comprising a unit or module for executing the method according to any one of claims 16 to 29.
A training device for a node evaluation model, characterized by comprising a unit or module for executing the method according to any one of claims 30 to 34.
A computing device cluster, characterized by including at least one computing device, each computing device including a processor and a memory;

The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the cluster of computing devices executes the method according to any one of claims 1 to 15, so that the The cluster of computing devices performs the method according to any one of claims 16 to 29, or is configured to cause the cluster of computing devices to perform the method according to any one of claims 30 to 34.
A computer-readable storage medium, characterized in that it includes computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster performs the method according to any one of claims 1 to 15. , the computing device cluster performs the method according to any one of claims 16 to 29, or the computing device cluster performs the method according to any one of claims 30 to 34.
A computer program product containing instructions, characterized in that, when the instructions are executed by a cluster of computing devices, the cluster of computing devices causes the cluster of computing devices to execute the method according to any one of claims 1 to 15, causing the computing The device cluster performs the method according to any one of claims 16 to 29, or causes the computing device cluster to perform the method according to any one of claims 30 to 34.