WO2024001610A1 - Method for solving goal programming problem, node selection method, and apparatus - Google Patents
Method for solving goal programming problem, node selection method, and apparatus Download PDFInfo
- Publication number
- WO2024001610A1 WO2024001610A1 PCT/CN2023/095590 CN2023095590W WO2024001610A1 WO 2024001610 A1 WO2024001610 A1 WO 2024001610A1 CN 2023095590 W CN2023095590 W CN 2023095590W WO 2024001610 A1 WO2024001610 A1 WO 2024001610A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- nodes
- sample
- limit value
- target
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 295
- 238000010187 selection method Methods 0.000 title abstract description 7
- 238000013210 evaluation model Methods 0.000 claims abstract description 310
- 238000012549 training Methods 0.000 claims abstract description 129
- 230000006870 function Effects 0.000 claims description 263
- 238000013439 planning Methods 0.000 claims description 123
- 238000000605 extraction Methods 0.000 claims description 64
- 238000012545 processing Methods 0.000 claims description 49
- 230000015654 memory Effects 0.000 claims description 29
- 230000002787 reinforcement Effects 0.000 claims description 17
- 230000009467 reduction Effects 0.000 claims description 16
- 238000007726 management method Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 7
- 230000000875 corresponding effect Effects 0.000 description 147
- 230000008569 process Effects 0.000 description 93
- 238000011161 development Methods 0.000 description 30
- 238000011156 evaluation Methods 0.000 description 29
- 238000005457 optimization Methods 0.000 description 29
- 230000009471 action Effects 0.000 description 26
- 238000013528 artificial neural network Methods 0.000 description 23
- 230000009286 beneficial effect Effects 0.000 description 23
- 230000007774 longterm Effects 0.000 description 23
- 230000008859 change Effects 0.000 description 22
- 238000013138 pruning Methods 0.000 description 21
- 238000010586 diagram Methods 0.000 description 20
- 238000004364 calculation method Methods 0.000 description 16
- 238000004422 calculation algorithm Methods 0.000 description 14
- 238000013527 convolutional neural network Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 13
- 239000003795 chemical substances by application Substances 0.000 description 11
- 230000001537 neural effect Effects 0.000 description 10
- 206010047289 Ventricular extrasystoles Diseases 0.000 description 9
- 238000005129 volume perturbation calorimetry Methods 0.000 description 9
- 230000002596 correlated effect Effects 0.000 description 8
- 239000013598 vector Substances 0.000 description 7
- 238000005520 cutting process Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000003062 neural network model Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000001186 cumulative effect Effects 0.000 description 3
- 238000012804 iterative process Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000011946 reduction process Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000004549 pulsed laser deposition Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000012954 risk control Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
Definitions
- the embodiments of the present application relate to the field of data processing technology, and more specifically, to a method for solving a goal planning problem, a method for selecting nodes, a method for training a node evaluation model, and a device.
- Operations research mainly uses mathematical methods to study optimization approaches and plans for various systems, providing decision-makers with a basis for scientific decision-making.
- Mathematical programming is an important branch of operations research. The main research goal is to find the optimal solution that maximizes or minimizes the objective function in a given area.
- problems have integer constraints, such as production scheduling, supply chain, production scheduling, and factory selection. Such problems can be modeled as mixed integer programming problems or integer programming problems, and through mathematical programming Solver and other tools to solve.
- Mathematical programming solvers are mainly implemented based on the branch and bound algorithm.
- the branch-and-bound algorithm is a search and iterative method that repeatedly divides the solution space of the original problem into smaller and smaller subsets during the iterative calculation process, that is, it repeatedly generates sub-problems (also called nodes) of the original problem. , by continuously solving sub-problems to obtain the optimal solution to the original problem.
- sub-problems also called nodes
- Embodiments of the present application provide a method for solving a target planning problem, a method for selecting nodes, a method for training a node evaluation model, and a device. This method is conducive to improving the efficiency of solving planning problems.
- the first aspect provides a method for solving the goal planning problem, including: obtaining the goal planning problem uploaded by the user; adjusting the candidate node set of the goal planning problem according to the node evaluation model, where the candidate node set includes multiple nodes , each node among the multiple nodes corresponds to a sub-problem to be solved in the target planning problem, and the node evaluation model is used to predict the correlation quantity of the limit value of each node after multi-step expansion; based on the adjusted candidate node set, Solve the goal programming problem to obtain the solution result of the goal programming problem.
- At least part of the decision variables of the goal planning problem are integer variables, that is, at least part of the decision variables have integer values.
- the goal programming problem is a pure integer programming model or a mixed integer programming model.
- the node evaluation model can predict the correlation quantity of the node's limit value before and after multi-step expansion, which is beneficial to predicting the optimal solution that can be searched from the multiple nodes.
- the correlation quantity can be used to measure the node.
- the long-term value of expansion makes the selection of target nodes more accurate, which is conducive to selecting appropriate nodes for corresponding processing, making the nodes in the adjusted candidate node set more likely to obtain the optimal solution, thus helping to improve the solution efficiency.
- the user selects the node evaluation model from a plurality of selectable candidate node evaluation models.
- adjusting the candidate node set of the target planning problem according to the node evaluation model includes: determining the first target node according to the node evaluation model; generating a first target node Child node; add the child node of the first target node to the candidate node set.
- the first target node may be determined according to the output result of the node evaluation model, and iterative calculation may be performed based on the first target node during the solution process. Specifically, the first target node is expanded to obtain the child nodes of the first target node, and then iterative calculation is performed.
- the output results of the node evaluation model can be used to measure the long-term value of node expansion, which is helpful to judge the possibility of the node obtaining the optimal solution after multi-step expansion. Based on this, the first target node determined is more likely to obtain the optimal solution. , which is beneficial to improving the convergence speed and solving efficiency.
- the node evaluation model can be used to predict the boundary value of the node after multi-step expansion, which is helpful to judge the possibility of the node getting the optimal solution after multi-step expansion.
- adjusting the candidate node set of the target planning problem according to the node evaluation model also includes: determining the second target node according to the node evaluation model; Remove from the set of candidate nodes.
- the second target node can be determined according to the output result of the node evaluation model, and the second target node can be pruned during the solution process.
- the output results of the node evaluation model can measure the long-term value of node expansion, which is helpful to judge the possibility of obtaining the optimal solution after node expansion, and determine the second target node based on this. Pruning nodes that are less likely to obtain the optimal solution can reduce the solution space and avoid the time delay caused by expanding and solving on useless nodes, thus improving the solution efficiency. For example, for the minimum optimization problem, the greater the limit value of the node in multi-step expansion, the smaller the possibility of obtaining the global optimal solution starting from this node.
- the node evaluation model can be used to predict the limit value of the node after multi-step expansion, which is helpful to judge the possibility of the node getting the optimal solution after multi-step expansion. Based on this, the second target node is determined to avoid expanding and expanding on useless nodes. Find the time delay caused by the solution.
- the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion.
- the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion and the limit value of each node's parent node. The difference between the limit values.
- the difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node can be The difference between the limit value of the node after multi-step expansion and the limit value of the node's parent node.
- the target node can be determined by predicting the differences between the multiple nodes before and after the multi-step expansion, and then comparing the differences between the multiple nodes before and after the multi-step expansion, which is beneficial to improving the accuracy of target node selection.
- the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node's parent node. The difference between the limit values.
- the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node's parent node.
- the difference between the limit values is indicated by the function value of the multi-step pseudo-cost function of each node, which satisfies the following formula:
- C( ⁇ ) represents the multi-step pseudo-cost function of the node.
- c( ⁇ ) represents the change in the limit value of the node before and after single-step expansion.
- Node Ni is a child node of node P.
- the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to that of the plurality of nodes.
- the difference between the multiple expanded limit values of the second target node and the limit value of the parent node of the second target node is greater than or equal to the multiple nodes.
- the difference between the expanded limit value of the remaining nodes in and the parent node of the remaining node is greater than or equal to the multiple nodes.
- the second target node belongs to k nodes among the plurality of nodes, and the limit value of the k nodes after multi-step expansion is the limit of the parent node of the k node.
- the difference between values is greater than or equal to the difference between the limit values of nodes other than k nodes in the multiple nodes after multi-step expansion and the limit values of the parent nodes of other nodes other than k nodes, k is greater than An integer of 1, k is less than the number of multiple nodes.
- the second target node can be determined probabilistically through the above greedy method, which is beneficial to reducing the risk of the pruning operation.
- the second target node is determined based on the probabilities corresponding to k nodes, and the probabilities corresponding to k nodes are the same as the limit values of k nodes after multi-step expansion. The differences between the limit values of the parent nodes of k nodes are positively correlated.
- the limit value of each node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of each node after multi-step expansion.
- the node evaluation model is trained based on the sample node and the label corresponding to the sample node, and the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
- the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
- the label corresponding to the sample node is easier to determine, making the collection of training data more convenient and conducive to improving the efficiency of generating training data. , to obtain a large amount of training data, improve the utilization efficiency of samples, and thereby improve the training of node evaluation models. The effect is to improve the prediction accuracy of the node evaluation model.
- the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
- the label corresponding to the sample node is determined based on the first difference and the second difference.
- the first difference is the limit value of the parent node of the sample node and the limit of the sample node.
- the second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing.
- the target evaluation model has the same structure as the node evaluation model.
- the first difference may be determined by a solver.
- the solver is called to obtain the limit value of the parent node of the sample node and the limit value of the sample node, so that the first difference can be determined.
- the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
- the label corresponding to the sample node is easier to determine.
- the labels corresponding to sample nodes can be determined in real time, and the calculation of the relaxed solution is more convenient. It is more efficient to determine the labels corresponding to the sample nodes based on the relaxed solution, which is beneficial to improving training efficiency.
- the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node
- the relevant information of each node includes at least one of the following: Items: the objective function of each node, the constraints of each node or the decision variables of each node.
- the low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model. of.
- the method further includes: returning a solution result of the goal planning problem to the user.
- a method for selecting nodes including: obtaining a set of candidate nodes for a target planning problem.
- the set of candidate nodes includes multiple nodes, and each node in the multiple nodes corresponds to a sub-set of the target planning problem to be solved. Problem;
- the output result of the node evaluation model is used to determine the target node.
- the target node is used to adjust the candidate node set.
- the adjusted candidate node set is used Used to solve goal planning problems.
- the method further includes: determining a node evaluation model according to user instructions, and the node evaluation model is deployed on the cloud management platform.
- the output result of the node evaluation model is used to determine the first target node, and the adjusted candidate node set includes child nodes of the first target node.
- the output result of the node evaluation model is used to determine the second target node, and the second target node is not included in the adjusted candidate node set.
- the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion and the limit value of each node's parent node. The difference between the limit values.
- the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node's parent node. The difference between the limit values.
- the limit value of the first target node after multi-step expansion is less than or equal to the limit value of the other nodes other than the first target node among the multiple nodes after multi-step expansion and the limit value of the other nodes other than the first target node. The difference between the node's bounding values.
- the second target node belongs to k nodes among the plurality of nodes, and the limit value of the k nodes after multi-step expansion is the limit of the parent node of the k node.
- the difference between values is greater than or equal to the difference between the limit values of nodes other than k nodes in the multiple nodes after multi-step expansion and the limit values of the parent nodes of other nodes other than k nodes, k is greater than An integer of 1, k is less than the number of multiple nodes.
- the second target node is determined based on the probabilities corresponding to k nodes, and the probabilities corresponding to k nodes are the same as the limit values of k nodes after multi-step expansion. The differences between the limit values of the parent nodes of k nodes are positively correlated.
- the node evaluation model is trained based on the sample node and the label corresponding to the sample node.
- the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. .
- the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
- the label corresponding to the sample node is determined based on the first difference and the second difference.
- the first difference is the limit value of the parent node of the sample node and the limit of the sample node.
- the second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing.
- the target evaluation model has the same structure as the node evaluation model.
- the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node
- the relevant information of multiple nodes includes at least one of the following: Items: the objective function of each node, the constraints of each node or the decision variables of each node.
- the low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model. of.
- a training method for a node evaluation model is provided.
- the node evaluation model is used to predict the correlation quantity of the limit value of each node in the candidate node set of the target planning problem after multi-step expansion.
- Each node corresponds to the target.
- the output result of the node evaluation model is used to determine the target node.
- the target node is used to adjust the candidate node set.
- the adjusted candidate node set is used to solve the target planning problem.
- the training method includes: Obtain the sample node and obtain the label corresponding to the sample node.
- the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion; train the initial model based on the sample node and the label corresponding to the sample node to obtain the node evaluation model.
- the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
- the label corresponding to the sample node is easier to determine, making the collection of training data more convenient and conducive to improving the efficiency of generating training data. , to obtain a large amount of training data, improve the utilization efficiency of samples, thereby improving the training effect of the node evaluation model, that is, improving the prediction accuracy of the node evaluation model.
- the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
- the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
- the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
- the label corresponding to the sample node is easier to determine.
- the labels corresponding to sample nodes can be determined in real time, and the calculation of the relaxed solution is more convenient. It is more efficient to determine the labels corresponding to the sample nodes based on the relaxed solution, which is beneficial to improving training efficiency.
- the input of the initial model includes relevant information of the sample node or a low-dimensional representation of the relevant information of the sample node
- the relevant information of the sample node includes at least one of the following: sample The objective function of the node, the constraint condition of the sample node or the decision variable of the sample node, and the low-dimensional representation of the relevant information of the sample node are obtained by reducing the dimensionality of the relevant information of the sample node through the feature extraction model.
- a fourth aspect provides a device for solving a goal planning problem, which device includes a unit for executing the method of the above-mentioned first aspect and any implementation of the first aspect.
- a fifth aspect provides a device for selecting a node, which device includes a unit for executing the above second aspect and the method of any implementation of the second aspect.
- a sixth aspect provides a training device for a node evaluation model, which device includes a unit for executing the above third aspect and the method of any implementation of the third aspect.
- a seventh aspect provides a chip that obtains instructions and executes the instructions to implement the method in any one of the above-mentioned implementations of the first to third aspects.
- the chip includes a processor and a data interface.
- the processor reads instructions stored in the memory through the data interface and executes any one of the implementation methods of the first to third aspects. Methods.
- the chip may also include a memory, the memory stores instructions, the processor is used to execute the instructions stored in the memory, and when the instructions are executed, the processor is used to execute the first A method in any one implementation manner from the third aspect to the third aspect.
- a computing device cluster including at least one computing device, each computing device including a processor and a memory.
- the processor of at least one computing device is configured to execute instructions stored in the memory of at least one computing device, so that the computing device cluster executes the method in any one implementation of the first to third aspects.
- a computer-readable medium including computer program instructions.
- the computing device cluster executes the method in any implementation of the first to third aspects.
- a computer program product containing instructions is provided.
- the computing device cluster executes the method in any one of the above implementations of the first to third aspects.
- Figure 1 is a schematic block diagram of a device for solving planning problems based on the branch and bound method
- Figure 2 is a schematic flow chart of a node selection method according to an embodiment of the present application.
- Figure 3 is a schematic flow chart of a method for training a node evaluation model according to an embodiment of the present application
- Figure 4 is a schematic flow chart of a method for solving a planning problem according to an embodiment of the present application
- Figure 5 is a schematic flowchart of a dimensionality reduction process according to an embodiment of the present application.
- Figure 6 is a schematic diagram of a fully connected neural network model according to an embodiment of the present application.
- Figure 7 is a schematic diagram of a pruning node selection process according to an embodiment of the present application.
- Figure 8 is a schematic flow chart of another method for training a node evaluation model according to an embodiment of the present application.
- Figure 9 is a schematic flow chart of yet another method for training a node evaluation model according to an embodiment of the present application.
- Figure 10 is a schematic diagram of a node expansion process according to an embodiment of the present application.
- Figure 11 is a schematic diagram of an interaction form between a user and an AI basic development platform according to an embodiment of the present application
- Figure 12 is a schematic diagram of an AI model deployment according to an embodiment of the present application.
- Figure 13 is a schematic diagram of an AI model providing online services according to an embodiment of the present application.
- Figure 14 is a schematic block diagram of a device for selecting nodes according to an embodiment of the present application.
- Figure 15 is a schematic block diagram of a training device for a node evaluation model according to an embodiment of the present application.
- Figure 16 is a schematic block diagram of a device for solving a goal planning problem according to an embodiment of the present application.
- Figure 17 is an architectural schematic diagram of a computing device provided by an embodiment of the present application.
- Figure 18 is a schematic architectural diagram of a computing device cluster provided by an embodiment of the present application.
- Figure 19 is a schematic diagram of the connection between computing devices through a network provided by an embodiment of the present application.
- the methods in the embodiments of this application can be applied to various fields such as supply chain, finance, energy, transportation, communications, and power systems.
- the solutions of the embodiments of the present application can be applied to solving scenarios involving combinatorial optimization problems involving integer variables.
- the solutions of the embodiments of the present application can be applied to solving scenarios such as production scheduling, production scheduling, factory location selection, risk control, asset allocation, oil pipeline laying, logistics transportation, route optimization, and power grid layout and distribution. .
- Operations optimization mainly studies the use and planning of various resources, under certain constraints, in order to maximize the benefits of limited resources, achieve the overall optimal goal, and provide decision-makers with the basis for scientific decision-making.
- Mathematical programming is a branch of operational planning. The research goal is mainly to find the optimal solution that can maximize or minimize the function value of a certain function in a given area. According to the nature of the problem and the difference in processing methods, mathematical programming can be divided into many different branches, such as linear programming, integer programming, nonlinear programming, combinatorial optimization, multi-objective programming, stochastic programming, dynamic programming, and parametric programming.
- Linear programming can be divided into two parts: objective function and constraints. When these two parts of a linear scale model When both are linear, the model can be called a linear programming model. In other words, linear programming studies the extreme value problem of a linear objective function under linear constraints.
- Integer programming refers to a linear programming problem where integer variables exist among the decision variables. If all decision variables in an integer programming model are integer variables, the model can also be called a pure integer programming model.
- the corresponding programming problem when the constraint that the decision variable is an integer variable is not considered can be called the relaxation problem corresponding to the integer programming problem.
- the integer programming problem can be converted into a relaxed linear programming problem, that is, the relaxation problem corresponding to the integer programming problem.
- the solution obtained by solving this linear programming problem is the relaxed solution of the integer programming problem.
- Mixed integer programming refers to a linear programming problem in which some of the decision variables are restricted to integers.
- f(x) is the objective function
- a 11 x 1 +A 13 x 3 ⁇ b 1 A 21 x 1 +A 22 x 2 ⁇ b 2 , and x 2 ⁇ Z are all constraints
- x 1 , x 2 , x 3 is the decision variable.
- d 1 , d 2 , d 3 , A 11 , A 22 , A 21 , A 22 , b 1 and b 2 are parameters
- Z represents an integer.
- some decision variables are integer variables. If x 2 ⁇ Z in the above constraints is replaced by x ⁇ Z, the model is a pure integer programming model.
- a question can also be called a node.
- the original problem to be solved can be regarded as the root node.
- the process of branching is the process of continuously generating sub-problems of the original problem, that is, the process of continuously adding nodes. Delimiting refers to checking the upper and lower bounds of the subproblem during the branching process. If a subproblem cannot produce a better solution than the current optimal solution, the subproblem can be pruned. This sub-problem can be called pruning nodes. The algorithm ends when all subproblems cannot produce a better solution.
- This node may be called an expansion node or a search node.
- the pseudo cost function can be used to predict the lower bound value of each node after single-step expansion, and select the node with the smallest lower bound value as the search node.
- the following takes integer programming as an example to illustrate the specific processing process of the branch and bound algorithm. That is, the original element to be solved
- the problem is an integer programming problem.
- the relaxed solution is the optimal solution to the original problem. If the relaxed solution is not an integer solution, the function value of the objective function corresponding to the optimal solution of the original problem will not be better than the function value of the objective function corresponding to the relaxed solution.
- the function value of the objective function corresponding to the relaxed solution can be used as a limit of the original problem. For the minimum value problem, that is, the solution goal of the original problem is to minimize the function value of an objective function, and the function value of the objective function corresponding to the relaxed solution is a lower bound of the original problem. For the maximum problem, that is, the solution goal of the original problem is to maximize the function value of an objective function, and the function value of the objective function corresponding to the relaxed solution is an upper bound of the original problem.
- the minimum function value is used as the current lower bound, among the function values of the objective function corresponding to each feasible solution of the current original problem , taking the smallest function value as the current upper bound. If the function value of the objective function corresponding to the relaxed solution of a sub-problem is greater than the current upper bound, the sub-problem will no longer branch. Although the possible feasible solution to this sub-problem has not yet been found at this time, if you continue to branch to this node, that is, add more constraints, the solution found will not be better than the relaxed solution of this node, so there is no need to Keep branching.
- s 1, 2,...n, n is a natural number greater than 1
- W s is the weight of x s
- b is the bias of the neural unit.
- f is the activation function of the neural unit, which is used to perform nonlinear transformation on the features obtained in the neural network and convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
- the activation function can be a sigmoid function.
- a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
- the input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field.
- the local receptive field can be an area composed of several neural units.
- loss function loss function
- objective function object function
- GNN is a neural network structure that takes graph structure data as input, and is usually used for deep learning tasks where the input features are graph structures.
- Reinforcement learning is mainly used to solve sequential decision-making problems. Reinforcement learning is a process that continuously learns optimal strategies, makes sequence decisions, and obtains maximum returns through the interaction between an agent and the environment.
- Agent Used to learn the next appropriate action (action) based on the state and reward of environmental feedback to maximize long-term total revenue.
- the Environment used to receive the actions performed by the agent, evaluate the actions and convert them into rewards to feed back to the agent.
- the rewards include positive rewards and negative rewards.
- the reinforcement learning system also has several core elements: policy, reward function, and value function.
- Strategy It is a mapping from state to action. The strategy defines how the agent chooses the action to be performed in the next step.
- Reward function A function used to evaluate the actions performed by the agent and calculate the reward value of the actions performed by the agent.
- Value function A function used to predict the long-term reward value of a state or action.
- the value of the value function can be expressed as the weighted accumulation of the reward values of multiple reward functions in multiple future states starting from one state.
- Action space is the set of all possible actions.
- State space is the set of all possible states.
- the agent chooses an action to perform based on a certain strategy. After executing this action, the environment will change, the state of the environment will be converted to a new state, and the environment can evaluate the action and feedback the reward value corresponding to the action to the agent.
- the agent can adjust the strategy based on the reward value and repeatedly execute the above process so that the sum of reward values after all actions are executed is maximized.
- DQL is a typical reinforcement learning algorithm suitable for discrete action sequence decision-making problems. DQL can help select optimal actions by estimating the long-term cumulative return (Q function) of each action.
- the Q function, Q(S,A) refers to the sum of reward values that will be obtained in the future after taking action A in state S, that is, the long-term cumulative return of action A.
- the Q value corresponding to the action can provide a reference for the strategy.
- the Q value can be calculated by a deep neural network (DNN).
- DNN deep neural network
- the current state of the environment is input into the DNN, and the DNN predicts the Q value obtained by executing each action in this state.
- Figure 1 shows a schematic diagram of a device for solving planning problems based on the branch and bound method.
- the solving device can include a presolving module, a node selection module, a node presolving module, a linear programming relaxation (LP relaxation) module, and a heuristics module. Branching module and cutting plane module.
- the preprocessing module is used to preprocess the original problem to simplify the original problem and reduce the scale of the original problem.
- preprocessing may include removing redundant constraints and decision variables.
- the node selection module is used to select search nodes.
- the node selection module can determine the search node from the current node to be solved, so that the branch module can subsequently branch based on the search node.
- the node selection module can determine the pruned node from the current nodes and no longer consider the node in subsequent solution processes.
- the node preprocessing module is used to simplify the constraints on the variables in the search nodes determined by the node selection module.
- the linear programming relaxation module is used to construct the relaxation model and solve the relaxation solution.
- the heuristic module is used to search for higher quality solutions to the search node using a heuristic algorithm starting from the relaxed solution.
- the branch module is used to branch the search node, that is, add constraints, obtain the child nodes of the search node, and return them to the node selection module for the node selection module to perform the next round of node selection.
- the Cutting Plane module is used to add multivariable constraints based on the cutting plane method to remove relaxed solutions that do not satisfy the multivariable constraints.
- the cutting plane module can generate a series of linear constraints based on the relaxed solution, and select a part of the linear constraints to add to the original problem to reduce the feasible solution domain.
- the solving device shown in Figure 1 is only an example, and in actual applications, the solving device may include more or fewer modules.
- the cutting plane module may not be included in the solving device.
- the solving device may not include a heuristic module.
- planning problems have integer constraints, such as factory location or production scheduling.
- Such problems can be modeled as mixed integer programming problems and solved by integer programming solvers.
- Integer programming solvers are usually implemented based on the branch-and-bound framework. Specifically, during the iterative calculation process, the solution space of the original problem is repeatedly divided into smaller and smaller subsets, that is, sub-problems (also called nodes) of the original problem are repeatedly generated, and the original problem is obtained by continuously solving the sub-problems. the optimal solution.
- sub-problems also called nodes
- the optimal solution For complex problems, for example, problems with large decision variables, a large number of nodes will be generated during the solution process, and the solution will take a long time, making it difficult to meet the user's needs.
- Selecting appropriate nodes for corresponding processing is the key to improving the speed of solving. For example, by selecting appropriate nodes for pruning, the number of nodes to be solved can be reduced, which is beneficial to improving the solving speed. For another example, by selecting appropriate nodes for branch processing, it is helpful to find the optimal solution as soon as possible, which is beneficial to improving the solution speed.
- the embodiments of this application provide a method for selecting nodes, which can be used in the solution scenario of planning problems and is beneficial to improving the solution efficiency.
- the node selection method in the embodiment of the present application can be applied to scenarios where the branch and bound method is used to solve planning problems.
- Figure 2 shows a schematic flowchart of a node selection method provided by an embodiment of the present application.
- the method 200 shown in FIG. 2 may be performed by a device that selects a node.
- the device and solver for selecting nodes may be Two devices are deployed separately, or the device for selecting nodes and the solver can also be integrated in the same device (for example, a solving device).
- the embodiments of the present application do not limit this.
- the solver is implemented based on the branch and bound algorithm framework.
- the node selection method in the embodiment of the present application can be applied to the node selection module shown in Figure 1.
- the device for selecting nodes in this embodiment of the present application may be the node selection module shown in Figure 1 .
- the node selection module shown in Figure 1 can use the node selection method in the embodiment of the present application to determine appropriate nodes.
- the method 200 includes steps 210 to 220 . Steps 210 to 220 are described below. Solving the planning problem based on the branch-and-bound method is an iterative solution process, and steps 210 to 220 can be performed as steps in one of the iterative processes.
- the candidate node set includes multiple nodes.
- Each node in the plurality of nodes respectively corresponds to a sub-problem to be solved of the goal programming problem.
- sub-problems correspond to nodes, and nodes can be understood as sub-problems, or nodes can also be called branches or branch nodes, which will not be distinguished later.
- the goal programming problem is the mathematical programming problem to be solved.
- the goal programming problem can be represented by the objective function, constraints and decision variables of the goal programming problem. Constraints are used to constrain decision variables. At least part of the decision variables of the goal planning problem are integer variables, that is, at least part of the decision variables have integer values. In other words, the goal programming problem is a pure integer programming model or a mixed integer programming model.
- the goal programming problem may be a maximum optimization problem.
- the optimal solution to a goal programming problem is the solution that maximizes the function value of the objective function of the goal programming problem.
- the goal programming problem can be a minimum optimization problem.
- the optimal solution to a goal programming problem is the solution that minimizes the function value of the objective function of the goal programming problem.
- the objective function can be to minimize logistics scheduling costs, and the constraints can be that the distribution point needs to complete delivery within a specified period of time.
- the decision variables can include couriers, time and location, etc.
- the subproblems of the goal programming problem are generated based on the branch and bound method.
- the multiple sub-problems to be solved may be generated in one iteration process during the process of solving the goal programming problem, or may be generated in multiple iteration processes.
- the multiple sub-problems to be solved can also be called multiple live nodes. That is, the nodes in the candidate node set are all live nodes. Live nodes refer to nodes that have not yet been pruned.
- the device for selecting nodes and the solver may be deployed separately.
- the target planning problem can be obtained by the solver, the sub-problems of the target planning problem can be generated based on the branch and bound method, and sent to the device for selecting nodes.
- the device for selecting nodes and the solver may be integrated in the solving device.
- the target planning problem can be obtained by the solving device, and the sub-problems of the target planning problem can be generated based on the branch and bound method.
- a goal planning problem may provide user-supplied data.
- the set of candidate nodes for the goal planning problem may be data provided by the user.
- select The node device may receive a set of candidate nodes for a goal planning problem provided by the user.
- the node evaluation model may be determined according to user instructions.
- the node evaluation model can be deployed on a cloud management platform.
- the output result of the node evaluation model is used to determine the target node, the target node is used to adjust the candidate node set, and the adjusted candidate node set is used to solve the target planning problem.
- step 220 may include: predicting, through a node evaluation model, the correlation amount of the limit value of each node in the candidate node set after multi-step expansion.
- the node evaluation model is used to predict the correlation amount of the limit value of all nodes in the candidate node set after multi-step expansion.
- the embodiments of this application mainly take all nodes as an example, that is, processing each node through the node evaluation model as an example, which does not limit the solutions of the embodiments of this application.
- the input to the node evaluation model may include node-related information.
- the input of the node evaluation model may include relevant information of each node.
- the relevant information of the node includes at least one of the following: the objective function of the node, the constraint condition of the node or the decision variable of the node.
- the relevant information of the node may include the objective function of the node, the constraint conditions of the node and the decision variables of the node. Input the node's objective function, node's constraints and node's decision variables into the node evaluation model, and the relevant quantities of the node's limit value after multi-step expansion can be output.
- the output result of the node evaluation model can be used as the evaluation information of the node.
- the evaluation information of a node is related to the node's limit value after multi-step expansion.
- the evaluation information of the node may be used to indicate the node evaluation model's prediction of the correlation quantity of the node's limit value after multi-step expansion.
- step 220 may include: determining evaluation information of the plurality of nodes through a node evaluation model.
- the evaluation information of the multiple nodes is used to determine the target node from the candidate node set, and the target node is used to adjust the candidate node set.
- Expanding a node is to branch a sub-problem to obtain a new sub-problem.
- the limit value of a node after multi-step expansion is the limit value of the sub-problem after multi-step branching.
- the limit value of a node after multi-step expansion can be understood as the limit value of the child node obtained after multi-step expansion of the node.
- the node evaluation model is used to predict the relevant quantity of the boundary value of the node after multi-step expansion. During the processing of the node evaluation model, a multi-step expansion operation is not performed on the nodes.
- the evaluation information of the node can be used to indicate the prediction of the relevant quantity of the node's limit value after multi-step expansion, and can be used to measure the long-term value of the node expansion.
- evaluation information of the multiple nodes is related to the limit values of the multiple nodes after multi-step expansion.
- the number of expansion steps of different nodes may be the same or different.
- the evaluation information of the multiple nodes is related to the limit values of the multiple nodes after being completely solved.
- the number of steps required for different nodes to be expanded to complete solution may be the same or different.
- the limit values of the multiple nodes after the multi-step expansion include the function values of the objective functions corresponding to the relaxed solutions of the multiple nodes after the multi-step expansion.
- the function value of the objective function corresponding to the relaxation solution of a node can be the lower bound of the node.
- the limit values of the multiple nodes after multi-step expansion may be the lower bound values of the multiple nodes after multi-step expansion. That is, the node evaluation model can be used to predict quantities related to the lower bound of a node after multi-step expansion.
- the correlation quantity of the limit value of the node after multi-step expansion includes the limit value of the node after multi-step expansion.
- the evaluation information of the multiple nodes may be used to indicate the prediction of the limit values of the multiple nodes after multi-step expansion.
- the node evaluation model can output the limit values of the multiple nodes after multi-step expansion, that is, the node evaluation model predicts the limit values of the multiple nodes after multi-step expansion.
- the outputs of the node evaluation model are all predicted values.
- the limit value can be a lower bound value.
- the correlation quantity of the limit value of the node after multi-step expansion includes the difference between the limit value of the node after multi-step expansion and the limit value of the node's parent node.
- the evaluation information of the multiple nodes may be used to indicate the prediction of the difference between the limit values of the multiple nodes after multi-step expansion and the limit values of the parent nodes of the multiple nodes.
- the node evaluation model can output the difference between the limit value of the node after multi-step expansion and the limit value of the parent node of the multiple nodes, that is, the node evaluation model can output the difference between the limit value of the multiple nodes after multi-step expansion.
- the difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node is the change in the limit value of the node before and after multi-step expansion.
- the difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node may be the difference between the limit value of the node after multi-step expansion and the limit value of the node's parent node. difference.
- the difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node can be obtained by dividing the limit value of the node after multi-step expansion by the limit value of the node's parent node. result.
- the correlation quantity of the node's limit value after multi-step expansion includes the difference between the node's limit value after being completely solved and the limit value of the node's parent node.
- the evaluation information of the multiple nodes may be used to indicate the difference between the limit values of the multiple nodes after they are completely solved and the limit values of the parent nodes of the multiple nodes.
- the evaluation information of a node can be used to indicate the change of the limit value in the process of the node being expanded to being completely solved.
- the change of the limit value in the process from the node being expanded to being completely solved can be represented by the function value of the multi-step pseudo-cost function of the node.
- the difference between the limit value of a node after it is completely solved and the limit value of the node's parent node can be It is determined based on the change of the limit value before and after single-step expansion of the node and the change of the limit value of the node's child nodes from expansion to complete solution.
- the function value of the node's multi-step pseudo-cost function can be determined based on the changes in the limit value before and after the node's single-step expansion and the function values of the multi-step pseudo-cost function of the node's child nodes.
- the function value of a node's multi-step pseudo-cost function can be the difference between the bound value of the node after it is completely solved and the bound value of the node's parent node.
- the function value of the node's multi-step pseudo-cost function can be the sum of the difference between the limit values before and after the node's single-step expansion and the function value of the multi-step pseudo-cost function of the node's child nodes.
- the node evaluation model is the function value of the multi-step pseudo-cost function used to predict the node.
- the change of a node before and after single-step expansion is the difference between the limit value of the node's parent node and the limit value of the node, that is, the difference before and after single-step expansion of the node.
- the change of the limit value of the node's child nodes from expansion to complete solution, that is, the function value of the multi-step pseudo-cost function of the node's child node is the limit value of the node's child node after being fully solved and the node The difference between the limit values.
- the function value of the node's multi-step pseudo-cost function may be determined based on the minimum value of the changes before and after the single-step expansion of the node and the function values of the multi-step pseudo-cost function of the multiple child nodes.
- the function value of the multi-step pseudo-cost function of the node may be determined based on the change of the node before and after single-step expansion and the maximum value of the function values of the multi-step pseudo-cost function of the multiple child nodes.
- the function value of the multi-step pseudo-cost function of the node may be determined based on the changes before and after the single-step expansion of the node and the average value of the function values of the multi-step pseudo-cost function of the multiple child nodes. It should be understood that the above is only an example. When the node includes multiple child nodes, the function value of the multi-step pseudo-cost function of the node can also be determined in other ways, which is not limited in the embodiment of the present application.
- the change in the limit value of a node from expansion to complete solution can be represented by the function value of the node's multi-step pseudo-cost function.
- a multi-step pseudo-cost function can be used to measure the long-term value of node expansion.
- the node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function.
- the multi-step pseudo-cost function of a node can satisfy the following formula:
- C( ⁇ ) represents the multi-step pseudo-cost function of the node.
- c( ⁇ ) represents the change in the limit value of the node before and after single-step expansion, that is, the first difference.
- Node Ni is a child node of node P.
- the second term in the above formula is the minimum value of the function value of the multi-step pseudo-cost function in the child nodes of node P, which is the second difference.
- the multi-step pseudo-cost function of node P can be understood as the change in the limit value from the expansion of node P to the complete solution of node P. It should be understood that the above formula is only an example and does not limit the multi-step pseudo cost function of the embodiment of the present application.
- the second term (ie, the second difference) in the above formula is the minimum difference between the limit value of the child node of node P after being completely solved and the limit value of node P.
- the second term in the above formula can also be the maximum difference between the limit value of the child node of node P after being completely solved and the limit value of node P, that is
- the limit value can be a lower bound value.
- the node evaluation model can be a neural network model, a random forest model, a support vector machine model or a linear regression model, etc.
- the node evaluation model may be a fully connected neural network model.
- node evaluation model can also adopt models with other structures, which are not limited in the embodiments of the present application.
- the node evaluation model may be trained based on training data.
- the training data includes relevant information of the sample node and the label corresponding to the sample node.
- the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
- the relevant information of the sample node includes at least one of the following: the objective function of the sample node. , the constraints of the sample node or the decision variables of the sample node.
- the node evaluation model can be trained through reinforcement learning.
- the node evaluation model may be trained through deep Q learning.
- the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
- the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after being completely solved and the limit value of the parent node of the sample node.
- the label corresponding to the sample node is determined based on the first difference and the second difference.
- the first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node.
- the second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing.
- the target evaluation model has the same structure as the node evaluation model. The target evaluation model is used to predict the difference between the bounding value of the child node of the sample node after being completely solved and the bounding value of the sample node.
- the first difference can be determined by the solver.
- the solver is called to obtain the limit value of the parent node of the sample node and the limit value of the sample node, so that the first difference can be determined.
- the node evaluation model can be trained by a device that selects nodes, or it can also be trained by other devices.
- the embodiments of the present application do not limit this.
- the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
- the function value of the objective function corresponding to the relaxation solution of a node can be the lower bound of the node.
- the limit value of the sample node after multi-step expansion may be the lower bound value of the sample node after multi-step expansion.
- the node evaluation model can be trained based on the training data.
- the limit value of the node is determined based on the relaxed solution, the label corresponding to the sample node is easier to determine, making the training data easier to collect. It is beneficial to generate a large amount of training data, thereby improving the training effect of the node evaluation model, that is, improving the prediction accuracy of the node evaluation model.
- the embodiment of the present application mainly takes the limit value as the lower bound value as an example for explanation.
- the limit value can also be the upper limit value, which is not limited in the embodiment of the present application. .
- the adjusted candidate node set in step 220 can be used as the candidate node set in the next round of iteration process.
- the candidate node set in step 210 can be replaced with the adjusted candidate node set, and the method 200 is repeatedly executed.
- Method 200 may be executed repeatedly until the solution is completed.
- the target node may include at least one of a first target node and a second target node.
- the output result of the node evaluation model is used to determine the first target node.
- the adjusted candidate node set includes child nodes of the first target node.
- the output result of the node evaluation model can be used to determine the target node, and the target node can include the first target node.
- the child nodes of the first target node are used to adjust the candidate node set, and the adjusted candidate node set includes the child nodes of the first target node.
- the first target node may also be called a search node or an expansion node.
- the device for selecting nodes and the solver may be deployed separately.
- the solver can expand the first target node to obtain the child nodes of the first target node, and add the child nodes of the first target node to the candidate node set for the next round of iterative calculations .
- the device for selecting nodes and the solver may be integrated in the solving device.
- the first target node can be expanded by the solving device to obtain the child nodes of the first target node, and the child nodes of the first target node can be added to the candidate node set for the next round of iteration calculate.
- the output result of the node evaluation model is used to determine the second target node.
- the second target node is not included in the adjusted candidate node set.
- the output result of the node evaluation model can be used to determine the target node, and the target node can include a second target node.
- the second target node is used to adjust the candidate node set, and the adjusted candidate node set does not include the second target node.
- the second target node may also be called a pruning node. Pruned nodes will not be solved during the subsequent solution of the goal programming problem.
- the device for selecting nodes and the solver may be deployed separately.
- the solver can prune the second target node. For example, the second target node is deleted from the set of candidate nodes.
- the adjusted set of candidate nodes is used for the next round of iterative calculations.
- the device for selecting nodes and the solver may be integrated in the solving device.
- the solving device may perform pruning processing on the second target node. For example, the second target node is deleted from the set of candidate nodes. The adjusted set of candidate nodes is used for the next round of iterative calculations.
- the method for determining the first target node and the second target node is illustratively described below by taking the node evaluation model used to predict changes in the limit values of multiple nodes before and after multi-step expansion as an example.
- the difference between the limit value of the node involved after multi-step expansion and the limit value of the parent node is predicted by the node evaluation model.
- the first target node is the node with the smallest difference between the limit value after multi-step expansion and the limit value of the parent node among the multiple nodes.
- the smallest difference can be understood as the smallest change before and after node expansion.
- the difference between the limit value of the first target node after multiple expansions and the limit value of the parent node of the first target node is less than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion and the difference between the limit values of the remaining nodes after multi-step expansion. Difference between parent nodes.
- node #1 can be used as the first target node.
- the first target node belongs to the j nodes with the smallest difference between the limit values of the multiple nodes after multi-step expansion and the limit values of the parent nodes of the multiple nodes, where j is an integer greater than 1. j is less than the number of nodes.
- the difference between the limit values of the j nodes after multi-step expansion and the limit values of the parent nodes of the j nodes is less than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion. between the node’s parent nodes difference.
- the first target node can be determined from the j nodes.
- the first target node may be randomly determined from the j nodes.
- the first target node may be determined based on the probabilities corresponding to the j nodes.
- the probability corresponding to the j nodes is the probability of being determined as the first target node.
- the probability corresponding to the j nodes is negatively correlated with the difference between the limit value of the j node after multi-step expansion and the limit value of the parent node of the j node. That is, among the j nodes, the more obvious the change of the node before and after multi-step expansion, the smaller the probability that the node is determined to be the first target node.
- the second target node is the node with the largest difference between the limit value after multi-step expansion and the limit value of the parent node among the multiple nodes.
- the difference between the limit value of the second target node after multi-step expansion and the limit value of the parent node of the second target node is greater than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion and the remaining limit value.
- the difference between the node's parents is greater than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion and the remaining limit value.
- the second target node belongs to the k nodes with the largest difference between the limit values of multiple nodes after multi-step expansion and the limit values of parent nodes of the multiple nodes, where k is an integer greater than 1. k is less than the number of nodes.
- the difference between the limit values of the k nodes after multi-step expansion and the limit values of the parent nodes of the k nodes is greater than or equal to the limit value of the remaining nodes after multi-step expansion and the remaining The difference between a node's parents.
- the second target node can be determined from the k nodes.
- the second target node may be randomly determined from the k nodes.
- the second target node is determined based on the probability corresponding to the k nodes.
- the probability corresponding to the k nodes is between the limit value of the k node after multi-step expansion and the limit value of the parent node of the k node. The differences are positively correlated.
- the method for determining the first target node and the second target node is illustrated below by taking the node evaluation model to predict the limit values of multiple nodes after multi-step expansion as an example.
- the limit values of the nodes involved in the process of determining the target node after multi-step expansion are all predicted by the node evaluation model. For the convenience of description, the following takes the minimum value optimization problem as an example for illustrative explanation.
- the first target node is the node with the smallest limit value after multi-step expansion among the multiple nodes.
- the limit value of the first target node after multiple expansions is less than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion.
- node #1 has the smallest limit value after multi-step expansion, then node #1 can be used as the first target node.
- the first target node belongs to j nodes with the smallest limit value after multi-step expansion of multiple nodes, where j is an integer greater than 1. j is less than the number of nodes.
- the limit values of the j nodes after multi-step expansion are less than or equal to the limit values of the remaining nodes among the plurality of nodes after multi-step expansion.
- the first target node can be determined from the j nodes.
- the first target node may be randomly determined from the j nodes.
- the first target node may be determined based on the probabilities corresponding to the j nodes.
- the probability corresponding to the j nodes is the probability of being determined as the first target node.
- the probabilities corresponding to the j nodes are negatively correlated with the limit values of the j nodes after multi-step expansion. That is, among the j nodes, the smaller the limit value of the node after multi-step expansion, the greater the probability that the node is determined to be the first target node.
- the second target node is the node with the largest limit value after multi-step expansion among the multiple nodes.
- the limit value of the second target node after multi-step expansion is greater than or equal to the limit value of the remaining nodes among the plurality of nodes after multi-step expansion.
- the second target node belongs to the k nodes with the largest limit values of multiple nodes after multi-step expansion, and k is an integer greater than 1. k is less than the number of nodes.
- the limit values of the k nodes after multi-step expansion are greater than or equal to the limit values of the remaining nodes among the plurality of nodes after multi-step expansion.
- the second target node can be determined from the k nodes.
- the second target node may be randomly determined from the k nodes.
- the second target node is determined based on the probabilities corresponding to the k nodes.
- the probabilities corresponding to the k nodes are positively correlated with the limit values of the k nodes after multi-step expansion.
- first target node and the second target node can also be determined in other ways, which are not limited in this embodiment of the present application.
- the method 200 may also include: sending indication information of the target node to the solver.
- the device for selecting a node may send the indication information of the target node to the solver.
- the solver can solve objective programming problems based on target nodes.
- the indication information of the target node may include the target node itself.
- the means for selecting nodes may determine the target node based on the output result of the node evaluation model and send the target node to the solver.
- the indication information of the target node may include evaluation information of some or all nodes in the plurality of nodes.
- the device for selecting nodes may send evaluation information of some or all nodes to the solver.
- the solver can determine the target node based on the evaluation information of some or all nodes.
- the indication information of the target node may include the search order of the multiple nodes.
- the node ranked first can be the search node in the next iteration.
- the indication information of the target node may also include other information related to the output results of the node evaluation model, as long as the solver can determine the evaluation information of the node based on this information, and then determine the target node.
- the node evaluation model can predict the correlation quantity of the node's limit value before and after multi-step expansion, which is beneficial to predicting the optimal solution that can be searched from the multiple nodes.
- the correlation quantity can be used to measure the node extended long
- the period value makes the selection of target nodes more accurate, which is conducive to selecting appropriate nodes for corresponding processing, making the nodes in the adjusted candidate node set more likely to obtain the optimal solution, which is conducive to improving the solution efficiency.
- the first target node may be determined according to the output result of the node evaluation model, and iterative calculation may be performed based on the first target node during the solution process. Specifically, the first target node is expanded to obtain the child nodes of the first target node, and then iterative calculation is performed.
- the output results of the node evaluation model can be used to measure the long-term value of node expansion, which is helpful to judge the possibility of the node obtaining the optimal solution after multi-step expansion. Based on this, the first target node determined is more likely to obtain the optimal solution. , which is beneficial to improving the convergence speed and solving efficiency.
- the node evaluation model can be used to predict the boundary value of the node after multi-step expansion, which is helpful to judge the possibility of the node getting the optimal solution after multi-step expansion.
- the second target node can be determined according to the output result of the node evaluation model, and the second target node can be pruned during the solution process.
- the output results of the node evaluation model can measure the long-term value of node expansion, which is helpful to judge the possibility of obtaining the optimal solution after node expansion, and determine the second target node based on this. Pruning nodes that are less likely to obtain the optimal solution can reduce the solution space and avoid the time delay caused by expanding and solving on useless nodes, thus improving the solution efficiency. For example, for the minimum optimization problem, the greater the limit value of the node in multi-step expansion, the smaller the possibility of obtaining the global optimal solution starting from this node.
- the node evaluation model can be used to predict the limit value of the node after multi-step expansion, which is helpful to judge the possibility of the node getting the optimal solution after multi-step expansion. Based on this, the second target node is determined to avoid expanding and expanding on useless nodes. Find the time delay caused by the solution.
- the target node can be determined by predicting the differences between the multiple nodes before and after the multi-step expansion, and then comparing the differences between the multiple nodes before and after the multi-step expansion, which is beneficial to improving the accuracy of target node selection.
- the input of the node evaluation model may include a low-dimensional representation of the relevant information of multiple nodes.
- the low-dimensional representation of the relevant information of multiple nodes is obtained by performing dimensionality reduction processing on the relevant information of multiple nodes through the feature extraction model. .
- the relevant information of the node can be input into the feature extraction model for dimensionality reduction processing, that is, feature extraction, and the processing results can be input into the node evaluation model.
- the low-dimensional representation of the relevant information of a node is the result of dimensionality reduction processing of the relevant information of the node.
- the low-dimensional representation of the relevant information of the node includes at least one of the following: a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraint conditions, or a low-dimensional representation of the node's decision variable.
- the objective function of a node is dimensionally reduced according to the feature extraction model to obtain a low-dimensional representation of the objective function.
- the constraint conditions corresponding to the nodes are dimensionally reduced according to the feature extraction model to obtain a low-dimensional representation of the constraint conditions.
- the decision variables of nodes are dimensionally reduced according to the feature extraction model to obtain a low-dimensional representation of the decision variables.
- the limit value of the node after multi-step expansion can be obtained. related quantities.
- the device for selecting nodes and the solver may be deployed separately.
- the feature extraction model can be deployed in the solver, and the solver determines a low-dimensional representation of the relevant information of the node according to the feature extraction model, and sends it to the device for selecting the node.
- the feature extraction model may be deployed in a device for selecting nodes, and the device for selecting nodes determines a low-dimensional representation of the relevant information of the node according to the feature extraction model.
- the feature extraction model can also be deployed in other devices, which is not limited in the embodiments of the present application.
- the device for selecting nodes and the solver may be integrated in the solving device.
- the low-dimensional representation of the relevant information of the node may be determined by the solving device according to the feature extraction model.
- other devices may determine a low-dimensional representation of the relevant information of the node based on the feature extraction model and send it to the solving device.
- the feature extraction model can be a graph convolutional neural network model.
- a low-dimensional representation of the relevant information of nodes can be obtained through the graph convolutional neural network, which can process the relevant information of nodes of different sizes, or in other words, can process the mathematics at nodes of different sizes.
- Planning model Furthermore, graph convolutional neural networks are,insensitive to the order of inputs.
- the feature extraction model can be trained.
- the specific training process please refer to the description below.
- the feature extraction model can be trained by the device where the feature extraction model is located. Alternatively, it can also be trained by other devices. The embodiments of the present application do not limit this.
- the feature extraction model is deployed in a device that selects nodes.
- the feature extraction model can be trained by the device that selects nodes, or can be trained by other devices.
- the method 200 may also include: returning at least one of the following to the user: the solution result of the target planning model or the indication information of the target node.
- the embodiment of the present application provides a training method for a node evaluation model, which can be used to train a node evaluation model.
- the trained node evaluation model can be applied in the method 200 shown in Figure 2.
- Figure 3 shows a schematic flow chart of a node evaluation model training method provided by an embodiment of the present application.
- the method 300 shown in FIG. 3 may be executed by a training device of a node evaluation model. After the training device completes training, the obtained node evaluation model can be deployed in a device that selects nodes.
- the training device for the node evaluation model and the device for selecting nodes may be the same device, or they may be different devices.
- the node evaluation model is used to predict the correlation quantity of the bound value of each node in the candidate node set of the goal planning problem after multi-step expansion. Each node corresponds to a sub-problem to be solved in the goal programming problem.
- the output of the node evaluation model is used to adjust the set of candidate nodes.
- the adjusted candidate node set is used to solve the goal planning problem.
- the method 300 includes steps 310 to 330 . Steps 310 to 330 are described below.
- the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
- the sample nodes and the labels corresponding to the sample nodes are used as training data for the node evaluation model.
- the initial model of the node evaluation model is trained based on the training data, and the trained node evaluation model can be used as the node evaluation model used in method 200.
- the initial model of the node evaluation model may also be called the initial node evaluation model.
- the parameters of the initial node evaluation model are adjusted with the goal of reducing the difference between the output of the initial node evaluation model and the labels corresponding to the sample nodes to obtain a trained node evaluation model.
- the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
- the label corresponding to the sample node is easier to determine, making the collection of training data more convenient and conducive to improving the efficiency of generating training data. , to obtain a large amount of training data, improve the utilization efficiency of samples, thereby improving the training effect of the node evaluation model, that is, improving the prediction accuracy of the node evaluation model.
- the limit value of the sample node after multi-step expansion may be the limit value of the sample node after it is completely solved.
- the number of steps required for different sample nodes to be expanded to complete solution may be the same or different.
- the label corresponding to the sample node is used to indicate the limit value of the sample node after multi-step expansion.
- the label corresponding to the sample node can be used to indicate the limit value of the sample node after it is completely solved.
- the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
- the label corresponding to the sample node may be used to indicate the difference between the limit value of the sample node after it is completely solved and the limit value of the parent node of the sample node.
- sample node and the label corresponding to the sample node may be data in the training database.
- the sample nodes and the labels corresponding to the sample nodes can be pre-generated according to the solver and stored in the training database.
- the solver can generate multiple nodes and simultaneously solve the boundary values of the multiple nodes. These multiple nodes can be used as sample nodes. Based on the limit value of each node during the solution process, the quantity related to the limit value of the node after multiple expansions can be determined, that is, the label corresponding to the sample node can be obtained. Training data can be determined based on the solution of one or more planning problems. The one or more planning questions may be provided by the user or may be pre-stored.
- the solver can receive batch data provided by the user (e.g., multiple planning problems) and solve based on the batch data provided by the user, sampling sample nodes from the multiple nodes generated during the solving process, and based on the solution
- the limit value of each node solved in the process determines the label corresponding to the sample node, and the relevant information of the sample node and the label corresponding to the sample node are stored in the training database.
- the solver can receive user-supplied batch data, e.g., multiple planning problems, and perform a solution based on the supplied batch data and historical data (e.g., multiple pre-stored planning problems), generated from the solution process.
- Sample nodes are sampled from multiple nodes, the labels corresponding to the sample nodes are determined based on the limit values of each node solved during the solution process, and the relevant information of the sample nodes and the labels corresponding to the sample nodes are stored in the training database.
- the training device can obtain training data from the training database, use the relevant information of the sample nodes as the input of the initial model corresponding to the node evaluation model, and perform training on the initial model with the goal of reducing the gap between the output of the model and the labels corresponding to the sample nodes. Train to get the node evaluation model.
- sample node and the label corresponding to the sample node may be provided by the user.
- sample node and the label corresponding to the sample node can also be obtained through other methods.
- step 330 may include: training through reinforcement learning to obtain a trained node evaluation model.
- sample nodes can come from the training database, or sample nodes can be provided by the user.
- the training device and the solver of the node evaluation model may be deployed separately, in which case the environment may be the solver.
- the solver can be encapsulated into an environment in reinforcement learning, and the training device continuously interacts with the solver to collect data to obtain the labels of sample nodes.
- training is performed through deep Q learning to obtain a node evaluation model.
- the label corresponding to the sample node may be determined based on the first difference and the second difference.
- the first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node.
- the second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing.
- the target evaluation model and the node The evaluation model has the same structure.
- the target evaluation model is used to predict the difference between the bounding value of the child node of the sample node after being fully solved and the bounding value of the sample node.
- the target evaluation model is the target network in deep Q learning.
- the first difference can be determined by the solver.
- the solver may send the limit value of the parent node of the sample node and the limit value of the sample node to the training device.
- the training device can determine the first difference based on this.
- the solver may determine the first difference according to the limit value of the parent node of the sample node and the limit value of the sample node, and send the first difference to the training device.
- the node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function.
- the training goal of the model is to enable the node evaluation model to learn the function value of an accurate multi-step pseudo-cost function.
- the label corresponding to the sample node can also be called the prediction label corresponding to the sample node.
- the prediction label corresponding to the node can satisfy the following formula:
- c(P) is the first difference, that is, the difference between the limit value of the parent node of node P and the limit value of node P, which can be obtained by the solver.
- the solver can calculate the limit value of node P and the limit value of the parent node of node P, so that the training device can obtain the first difference.
- the second difference Evaluate the model for the target. This target evaluation model is used to stabilize training and prevent overfitting.
- C ⁇ represents the model during training.
- the training goal can be expressed as:
- E represents the average value
- ⁇ represents the parameters of the model
- ⁇ represents the learned policy, that is, the multi-step pseudo-cost function.
- the limit value of the sample node after multi-step expansion includes the relaxation solution corresponding to the sample node after multi-step expansion.
- the function value of the objective function is the objective function.
- the function value of the objective function corresponding to the relaxation solution of a node can be the lower bound of the node.
- the limit value of the sample node after multi-step expansion can be the lower bound value of the sample node after multi-step expansion. That is, the node evaluation model can be used to predict quantities related to the lower bound of a node after multi-step expansion.
- the label corresponding to the sample node is easier to determine.
- the labels corresponding to sample nodes can be determined in real time, and the calculation of the relaxed solution is more convenient. It is more efficient to determine the labels corresponding to the sample nodes based on the relaxed solution, which is beneficial to improving training efficiency.
- the embodiment of the present application mainly takes the limit value as the lower bound value as an example for explanation.
- the limit value can also be the upper limit value, which is not limited in the embodiment of the present application. .
- the input type and output type of the model during the training process are consistent with the input type and output type of the trained node evaluation model.
- the input of the initial node evaluation model includes relevant information of the sample nodes or a low-dimensional representation of the relevant information of the sample nodes.
- the relevant information of the sample node includes at least one of the following: the objective function of the sample node, the constraint condition of the sample node or the decision variable of the sample node.
- the input to the initial node evaluation model includes information about sample nodes.
- the input of the node evaluation model may include relevant information of the node.
- the relevant information of the node includes at least one of the following: the objective function of the node, the constraint condition of the node or the decision variable of the node.
- the input of the initial node evaluation model may include a low-dimensional representation of the relevant information of the sample node.
- the input of the node evaluation model may include a low-dimensional representation of the relevant information of the node
- the output of the node evaluation model may include the relevant quantity of the limit value of the node after multi-step expansion.
- the low-dimensional representation of the relevant information of the node includes at least one of the following: a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraint conditions, or a low-dimensional representation of the node's decision variable.
- the low-dimensional representation of the relevant information of the sample node can be obtained by performing dimensionality reduction processing on the relevant information of the sample node through a feature extraction model.
- the relevant information of the sample nodes is input into the feature extraction model for dimensionality reduction processing, and the results of the dimensionality reduction processing are input into the initial model corresponding to the node evaluation model.
- the feature extraction model can be a trained model or a model in the training process.
- the relevant information of the sample node is input into the initial feature extraction model for dimensionality reduction processing, and the results of the dimensionality reduction processing are input into the initial node evaluation model for processing to predict the limit value of the sample node after multi-step expansion.
- the two models are trained with the goal of reducing the gap between the output results of the initial node evaluation model and the labels corresponding to the sample nodes.
- the trained node evaluation model and the trained feature extraction model are obtained.
- the initial feature extraction model is the initial model corresponding to the feature extraction model.
- AI model training refers to using a specified initial model to calculate the training data, and using a certain method to calculate the initial data based on the calculation results.
- the parameters in the model are adjusted so that the model gradually learns certain rules and has specific functions.
- the AI model with stable functions after training can be used for inference.
- the inference of the AI model is the process of using the trained AI model to calculate the input data and obtain the predicted inference results.
- the solution of the embodiment of this application can be divided into two stages: the training stage and the inference stage.
- the initial node evaluation model can be trained to obtain the node evaluation model.
- the node evaluation model can be mounted to a solver so that search nodes and pruning nodes are determined during the solution process.
- Figure 4 shows the solution method of the goal programming model provided by the embodiment of the present application.
- the method 400 shown in Figure 4 can use the method 200 shown in Figure 2 to implement node selection.
- the method 200 please refer to the method 200.
- part of the description is appropriately omitted when describing the method 400.
- the solver and the device for selecting nodes are deployed separately as an example, which does not limit the embodiments of the present application. In other implementations, the solver and the device for selecting nodes may be integrated in the same device.
- the method 400 includes steps 410 to 430, which are described below.
- Step 410 Obtain the goal planning problem.
- Step 420 Adjust the candidate node set of the target planning problem according to the node evaluation model.
- the candidate node set includes multiple nodes. Each node in the plurality of nodes corresponds to a sub-problem to be solved of the goal programming problem.
- the node evaluation model is used to predict the relevant quantities of the bounding values of nodes after multi-step expansion.
- Step 430 Solve the target planning problem based on the adjusted candidate node set to obtain the solution result of the target planning problem.
- goal planning problems can be uploaded by users. Users can input goal programming problems into the solver. Get the goal planning questions uploaded by users.
- the goal programming problem is the mathematical programming problem that the user needs to solve.
- the goal programming problem can be represented by the objective function, constraints and decision variables of the goal programming problem. Constraints are used to constrain decision variables. At least some of the decision variables of the goal programming model are integer variables, that is, at least some of the values of the decision variables are integers. In other words, the goal programming model is a pure integer programming model or a mixed integer programming model.
- the solver can generate multiple sub-problems to be solved for the goal planning problem, and the multiple sub-problems to be solved can be used as multiple nodes in the candidate node set.
- the solver can be implemented based on the branch-and-bound algorithm framework.
- the solver can generate multiple constraints on the decision variables based on the goal planning problem, and add the multiple constraints to the constraints corresponding to the goal planning problem, thereby forming multiple sub-problems of the goal planning problem, namely Obtain multiple branches of the goal programming problem.
- constraints corresponding to the goal programming problem are constraints on the solution space of the goal planning problem.
- the additional constraints generated in the process of generating sub-problems of the goal programming problem are used to constrain the decision variables in the branch, thereby narrowing the scope of the solution space on the branch.
- a node evaluation model may be used to predict whether some or all nodes in the candidate node set will The correlation quantity of the limit value after step expansion.
- step 420 may include: predicting the correlation amount of the limit value of each node in the candidate node set after multi-step expansion through a node evaluation model.
- the node evaluation model is used to predict the correlation amount of the limit value of all nodes in the candidate node set after multi-step expansion.
- the embodiments of this application mainly take all nodes as an example, that is, processing each node through the node evaluation model as an example, which does not limit the solutions of the embodiments of this application.
- the output results of the node evaluation model can be used as node evaluation information.
- the node evaluation model can be used to output node evaluation information.
- the evaluation information of a node is related to the node's limit value after multi-step expansion.
- the evaluation information of the node is used to indicate the node evaluation model's prediction of the correlation quantity of the node's limit value after multi-step expansion.
- the node evaluation model may be included in the means for selecting nodes.
- the device for selecting nodes may generate evaluation information of the plurality of nodes through a node evaluation model.
- the method 400 further includes: determining the node evaluation model according to user instructions.
- the user can select a node evaluation model from multiple node evaluation models.
- the node evaluation model selected by the user may be used as the node evaluation model in method 400 .
- the user can select one device for selecting a node from multiple devices for selecting a node.
- the means for selecting nodes may correspond to the node evaluation model.
- the node evaluation model deployed in the device for selecting nodes indicated by the user is the node evaluation model in method 400.
- the node evaluation model may also be determined by means of selecting nodes.
- the node evaluation model may also be determined by the solver.
- the node evaluation model can also be determined in other ways.
- the node evaluation model can also be a default model.
- the node evaluation model can be deployed on a cloud management platform.
- the evaluation information of a node is used to indicate changes in the node's limit value before and after multi-step expansion.
- the evaluation information of the node is used to indicate the prediction of the change of the node's limit value before and after multi-step expansion.
- the change in the limit value of a node before and after multi-step expansion can be represented by the function value of the node's multi-step pseudo cost function.
- the function value of the node's multi-step pseudo-cost function can be used to evaluate the node.
- the node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function.
- the limit value can be a lower limit value.
- the function value of the multi-step pseudo-cost is the change in the lower bound value from the node expansion to the node being completely solved.
- the purpose of defining a multi-step pseudo-cost function is that if the multi-step pseudo-cost function can be accurately calculated or learned, the optimal solution that can be ultimately searched from a node can be accurately predicted. At this time, the multi-step pseudo-cost function can be accurately predicted based on the multi-step pseudo-cost function.
- the function value of the function selects the node containing the global optimal solution.
- the node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function.
- the node evaluation model is used to fit the multi-step pseudo-cost function of the node.
- the training goal of the node evaluation model can be to learn a multi-step pseudo-cost function to accurately predict the function value of the multi-step pseudo-cost function of the node.
- the training process please refer to method 300 or method 800.
- the input of the node evaluation model includes node-related information.
- the relevant information of the node includes at least one of the following: the objective function of the node, the constraint condition of the node or the decision variable of the node.
- the relevant information of the node may include the objective function of the node, the constraint conditions of the node and the decision variables of the node. Input the node's objective function, node's constraints and node's decision variables into the node evaluation model, and the relevant quantities of the node's limit value after multi-step expansion can be output.
- the input of the node evaluation model includes a low-dimensional representation of the relevant information of the node.
- the low-dimensional representation of the relevant information of the node includes at least one of the following: a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraint conditions, or a low-dimensional representation of the node's decision variable.
- the low-dimensional representation of the relevant information of the node is obtained by reducing the dimensionality of the relevant information of the node through the feature extraction model.
- the objective function of the node, the constraint conditions of the node, and the decision variable of the node are input to the feature extraction model to obtain a low-dimensional representation of the relevant information of the node, and the low-dimensional representation of the relevant information of the node is input to the node evaluation model.
- the correlation amount of the node's limit value after multi-step expansion can be predicted.
- the feature extraction model can be deployed in a device for selecting nodes, or in a solver, or in other devices.
- the embodiments of the present application do not limit this.
- the feature extraction model is used to output a low-dimensional representation of the relevant information of the multiple nodes.
- the relevant information of the node may be the high-dimensional mathematical programming model information of the node.
- the node's high-dimensional mathematical programming model information is embedded and represented (embedding), that is, the node's objective function, the node's constraint conditions and the node's decision variables are dimensionally reduced.
- the feature extraction model is used to output the features of nodes.
- the feature can be represented as a set of vectors.
- the relevant information of the node after dimensionality reduction can also be called the low-dimensional embedding representation of the node.
- the input of the feature extraction model is the mathematical programming model information including the node, for example, the objective function of the node, the constraint condition of the node and the decision variable of the node, and the output is the feature of the node.
- the feature extraction model may be implemented through a graph convolutional neural network.
- the graph convolutional neural network can be used to embed and represent high-dimensional mathematical programming model information.
- Figure 5 shows an exemplary flow chart of a dimensionality reduction process.
- Step 1 Convert the mathematical programming model information (A, b, C) of the node into a bipartite graph representation, that is, fill (A, b, C) according to the connection relationship.
- A represents the coefficient matrix of the constraint condition
- b represents the coefficient vector of the right-hand term of the constraint condition
- C represents the coefficient vector of the objective function.
- the data planning model of the node in Figure 5 can satisfy the following formula: A 11 x 1 +A 13 x 3 ⁇ b 1 ; stA 12 x 1 +A 22 x 2 ⁇ b 2 ; x ⁇ Z
- the objective function is Constraints include: A 11 x 1 +A 13 x 3 ⁇ b 1 , A 12 x 1 +A 22 x 2 ⁇ b 2 , and decision variables include: x 1 , x 2 and x 3 . d 1 , d 2 , d 3 , A 11 , A 13 , b 1 , A 12 , A 22 and b 2 are all parameters.
- Step 2 Input the above-mentioned bipartite graph connection relationship into the graph convolutional neural network, and embed the objective function of the node, the constraints of the node and the decision variable of the node.
- V represents the decision variable
- C represents the constraint condition
- E represents the connection between V and C, that is, the coefficient matrix A of the constraint condition.
- V 1 represents the decision variable after one graph convolutional neural network processing
- V 2 represents the decision variable after two graph convolutional neural network processing.
- V 1 represents the constraints after one graph convolutional neural network processing
- V 2 represents the constraints after two graph convolutional neural network processings.
- ⁇ (x) represents the output result of this node, which is the low-dimensional embedding representation in step 3.
- Step 3 Output the low-dimensional embedding representation of the node.
- the low-dimensional embedding representation of a node includes a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraints, and a low-dimensional representation of the node's decision variables.
- the low-dimensional embedding representation of the node can include a low-dimensional representation of the objective function, a low-dimensional representation of the two constraints and a low-dimensional representation of the three decision variables.
- the node evaluation model can be used to predict the relevant quantities of the bounding values of nodes after multi-step expansion.
- the low-dimensional representation of the correlation information of the multiple nodes output by the feature extraction model is input into the node evaluation model to predict the correlation amount of the limit values of the multiple nodes after multi-step expansion.
- the node evaluation model may be a neural network model.
- the node evaluation model is implemented through a fully connected neural network, as shown in Figure 6.
- the low-dimensional representation of the node-related information output by the feature extraction model is used as the input of the node evaluation model, and the function value of the multi-step pseudo-cost function of the node is predicted through the fully connected neural network.
- the input of the fully connected neural network can be a low-dimensional representation of the relevant information of the node
- the output of the fully connected neural network can be the function value of the multi-step pseudo-cost function of the node.
- step 420 may include: determining the first target node according to the node evaluation model; generating child nodes of the first target node; and adding the child nodes of the first target node to the candidate node set.
- step 420 may include: determining the second target node according to the node evaluation model; and deleting the second target node from the candidate node set.
- Determining the first target node according to the node evaluation model can be understood as determining the first target node, that is, the search node, according to the evaluation information of the multiple nodes.
- Determining the second target node according to the node evaluation model can be understood as determining the second target node, that is, the pruning node, according to the evaluation information of the multiple nodes.
- the evaluation information of a node is used to indicate the prediction of the function value of a multi-step pseudo-cost function of the node. It should be understood that the function values of the multi-step pseudo-cost function in the method 400 are all predicted values output by the node evaluation model.
- the function value of a node's multi-step pseudo-cost function can be used to measure the long-term value of the node.
- the function value of a node's multi-step pseudo-cost function can be used to measure the long-term cost of a node.
- the node may be scored based on the function value of the node's multi-step pseudo-cost function. For example, the larger the function value and the higher the score of a node's multi-step pseudo-cost function, the lower the long-term value of the node, or in other words, the higher the long-term cost of the node. The smaller the function value and the lower the score of a node's multi-step pseudo-cost function, the higher the long-term value of the node, or in other words, the lower the long-term cost of the node. It should be understood that this is only an example, and the relationship between the node's score and the node's long-term value or the node's long-term cost can also be expressed in other forms, which is not limited in the embodiments of the present application.
- the node with the lowest score is used as the search node.
- the k nodes with the highest scores are used as candidate pruning nodes.
- a probability vector is constructed based on the scores of the k nodes, and one of the nodes is probabilistically selected as a pruning node.
- the score of a node is positively related to the probability of the node. The higher the score of a node, the greater the probability of the node being pruned. The lower the score of a node, the lower the probability of the node being pruned.
- Figure 7 shows a way of determining pruning nodes.
- a probability vector is constructed based on the scores of multiple nodes with the lowest scores.
- the scores of node 1, node 3 and node 4 are 5, 2, 3 respectively. Determine the probability of a node based on its score. The higher the score of a node, the higher the probability of the node.
- the probabilities of node 1, node 3 and node 4 are 0.3, 0.1 and 0.25 respectively.
- the node with index 4 is sampled as a pruned node.
- the device for selecting a node may send indication information of the target node (the first target node and/or the second target node) to the solver.
- the indication information of the target node may include the search order of the multiple nodes.
- the node ranked first can be the search node in the next iteration.
- the indication information of the target node may include evaluation information of the multiple nodes.
- the indication information of the target node may include scores of the multiple nodes.
- the indication information of the target node may include the target node itself.
- the indication information of the target node may also include other forms of information, as long as the target node can be determined based on the indication information.
- the solver can adjust the candidate node set according to the target node, and solve the target planning problem according to the adjusted candidate node set to obtain the solution result.
- the solver can expand the search node to obtain the child nodes of the search node, that is, to obtain a new sub-problem of the target planning problem.
- Child nodes of the search node can be added to the set of candidate nodes.
- the adjusted candidate node set can be used as the candidate node set used in the next round of iteration process.
- the above step 420 can be repeated until the solution is completed to obtain the solution result.
- the solver can prune the pruned node and adjust the set of candidate nodes.
- the adjusted candidate node set can be used as the candidate node set used in the next round of iteration process.
- the above step 420 can be repeated until the solution is completed to obtain the solution result.
- the end of the solution can be that all sub-problems have been solved, that is, the candidate node set does not include nodes.
- the end of the solution can be when the solution time exceeds a preset time.
- the solution can end when the difference between the global upper bound value and the global lower bound value is less than a set threshold.
- the conditions for ending the solution can be set as needed, and the embodiments of this application do not limit this.
- method 400 may also include: returning the solution result to the user.
- the method 400 may also include: returning the indication information of the target node to the user.
- a user may provide a goal programming problem to a solver and receive a solution result of the goal planning problem returned by the solver.
- the user may provide a set of candidate nodes to the device for selecting nodes, and receive indication information of the target node provided by the device for selecting nodes.
- FIG. 8 shows a model training method provided by the embodiment of the present application.
- the training method shown in Figure 8 can be regarded as a specific implementation of the method 300 shown in Figure 3.
- the training process is illustrated in Figure 8 by taking the example of separately deploying the solver and the training device of the model.
- the solver and the model training device may be integrated in the same device.
- the method 800 includes steps 810 to 830, which are described below.
- Step 810 Obtain sample nodes.
- the sample nodes may be from a training database.
- the solver can generate multiple nodes of the planning problem, which can be used as sample nodes. Sample nodes may be determined based on the solution process of one or more planning problems. The one or more planning questions may be provided by the user or may be pre-stored.
- the solver can receive batch data provided by the user (for example, multiple planning problems), and solve based on the batch data provided by the user, sampling sample nodes from the multiple nodes generated during the solving process, and converting the samples into The relevant information of the nodes is stored in the training database.
- the solver can receive user-supplied batch data, e.g., multiple planning problems, and perform a solution based on the supplied batch data and historical data (e.g., multiple pre-stored planning problems), generated from the solution process. Sample nodes are sampled from multiple nodes, and the relevant information of the sample nodes is stored in the training database.
- sample nodes may be provided by users.
- Step 820 Perform dimensionality reduction processing on the relevant information of the sample node based on the feature extraction model to obtain a low-dimensional representation of the relevant information of the sample node.
- the feature extraction model is a graph convolutional neural network.
- the input of the graph convolutional neural network can include relevant information of sample nodes.
- This graph convolutional neural network is used to reduce the dimensionality of the relevant information of the sample nodes and output
- the low-dimensional embedding representation of the sample node is obtained, that is, the low-dimensional representation of the relevant information of the sample node.
- Step 830 Use the low-dimensional representation of the relevant information of the sample node as the input of the node evaluation model, and adjust the parameters of the node evaluation model with the goal of reducing the gap between the output result of the node evaluation model and the label corresponding to the sample node.
- the node evaluation model may be a fully connected neural network.
- the node evaluation model can be trained through deep Q learning.
- the node evaluation model can be a deep Q network.
- the training process is explained below by taking the label corresponding to the sample node as the function value of the multi-step pseudo-cost function as an example.
- the definition of the multi-step pseudo-cost function satisfies the Bellman equation of dynamic programming.
- the expression of the state transition function of this equation is unknown and can be solved by deep Q learning.
- the node evaluation model can be trained by deep Q learning. So that the trained node evaluation model can be used to predict the function value of the multi-step pseudo-cost function.
- DQL can help select optimal actions by estimating the long-term cumulative return (Q function) of each action.
- the node evaluation network helps select nodes by predicting the multi-step pseudo-cost of each node.
- the multi-step pseudo-cost function can be used as a Q function.
- the prediction label corresponding to the sample node can be used as the label corresponding to the sample node.
- c(P) is the difference between the limit value of the parent node of node P and the limit value of node P, which can be obtained by the solver. Evaluate the model for the target.
- C ⁇ represents the model to be trained.
- the training goal can be expressed as:
- the solver can be encapsulated as an environment, and the training device collects data through continuous interaction with the solver.
- the training device can obtain the limit value of the node by calling the solver, and the limit value can be used as supervision information for fitting the multi-step pseudo cost function.
- the training device can obtain the limit value of node P and the limit value of the parent node of node P from the solver, so that the training device can obtain c(P).
- the training device can determine the limit value of the child node of node P according to the target evaluation model, and then determine the second item in the above prediction label.
- the target evaluation model has the same structure as the node evaluation model. The parameters of both may be the same or different.
- the target evaluation model is the target network in the deep Q learning process. This target evaluation model is used to stabilize training and prevent overfitting.
- the target evaluation model can be updated based on the parameters of the current node evaluation model, that is, the target evaluation model is replaced with the current node evaluation model.
- the target evaluation model can be updated based on the parameters of the current node evaluation model.
- Figure 9 shows a schematic diagram of a training process based on reinforcement learning.
- the training process shown in Figure 9 may include the following steps:
- the data set contains one or more planning problems. At least some of the decision variables of the one or more planning problems
- the decision variables are integer variables.
- the solver generates a set of candidate sample nodes based on the data set.
- the candidate sample set includes multiple sample nodes.
- the solver may generate a branch-and-bound search tree based on the data set, the branch-and-bound search tree including the plurality of sample nodes.
- the set of candidate sample nodes can be represented in the form of a branch-and-bound search tree.
- the set of candidate sample nodes can be stored in the training database.
- the features of the multiple sample nodes can be used as states in deep Q learning.
- target sample nodes are fed back to the solver as actions in deep Q-learning.
- the solver can provide the limit value of the target sample node and the limit value of the target sample node's parent node to the node evaluation model.
- the limit value of the target sample node and the limit value of the target sample node's parent node can be used as rewards in deep Q learning to adjust the parameters of the node evaluation model.
- the solver can adjust the set of candidate sample nodes based on the target sample node.
- Figure 10 shows a schematic diagram of recursive expansion of node P in a branch-and-bound search tree.
- node P can be expanded into two sub-nodes N 1 and N 2 .
- child nodes for example, N 4
- dotted lines can be understood as nodes that have been pruned or nodes that have not yet been expanded.
- the child nodes eg, N 1 , N 2 , N 4 and N 5
- solid lines can be understood as actual expanded nodes.
- the parameters of the feature extraction model can be adjusted simultaneously, that is, the feature extraction model can be adjusted with the goal of reducing the gap between the output results of the node evaluation model and the labels corresponding to the sample nodes.
- Parameters and nodes evaluate the parameters of the model.
- the feature extraction model can also be trained in other ways, which is not limited in the embodiments of the present application.
- the method 800 may also include: returning the node evaluation model to the user.
- method 800 may also include: returning the feature extraction model to the user.
- a GCN-based feature extraction model is used to perform feature extraction on the nodes in the branch-and-bound search tree, and a node evaluation model based on a fully connected neural network is used to predict the multi-step pseudo-cost of the node.
- the feature extraction model and fully connected neural network are trained using reinforcement learning.
- the model trained using method 800 can be mounted to the solver and used to select search nodes and pruning nodes during the solution process, which is beneficial to improving solution efficiency.
- training method in method 800 is only an example.
- training methods please refer to the description in method 300 and will not be described again here.
- Table 1 shows the comparison results of test indicators under different solution methods. Specifically, Table 1 shows the results of the rule-based best estimate search method and the solution method using the solution of the embodiment of the present application. test indicators.
- the solver uses a mixed integer programming solver to solve constraint integer programs (SCIP), and experiments are conducted based on multiple open source data sets.
- SCIP constraint integer programs
- Table 1 shows four groups of experiments, which are introduced as follows:
- the data set is the open source knapsack problem data set (MIK), a medium-sized data set, and the solution difficulty is medium.
- the planning problem is a minimum value optimization problem.
- the data set is a combinatorial auction (cauctions) problem data set, a medium-sized data set, and the solution difficulty is medium difficulty.
- the planning problem is a maximum optimization problem.
- the data set is an artificially generated set cover, a small-scale data set, and the solution difficulty is easy.
- the planning problem is a minimum value optimization problem.
- the data set is a facility location data set (facilities), a small-scale data set, and the difficulty of solving it is easy.
- the planning problem is a maximum optimization problem.
- Test indicators include: solving time of the problem, the number of nodes of the search tree generated during the solving process, and the time of the change curves of the primal bound and dual bound during the solving process.
- the integral value is the primal dual integral.
- the user can train each of the above models on a local device.
- users can train each of the above models on the AI basic development platform.
- the AI basic development platform is a platform-as-a-service (PaaS) cloud service in the cloud platform, which is based on the large number of basic resources and software capabilities owned by public cloud service providers for users (also It is called: a software platform provided by tenants, AI developers, etc.) to assist in the construction, training, deployment of AI models, and the development and deployment of AI applications.
- PaaS platform-as-a-service
- the interaction between users and the AI basic development platform mainly includes: users log in to the cloud platform through the client web page, select and purchase the cloud service of the AI basic development platform in the cloud platform, and after purchase, the user can
- the basic development platform provides functions for full-process AI development.
- computing resources mainly computing resources, such as central processing unit (CPU), graphics processor
- graphics processor graphics processor
- graphics processing unit GPU
- embedded neural network processor neural-network process units, NPU
- the AI basic development platform can be independently deployed on a server or virtual machine in a data center in a cloud environment.
- the AI basic development platform can also be deployed distributedly on multiple servers in a data center, or distributed in a data center. on multiple virtual machines.
- the AI basic development platform provided by this application can also be deployed in a distributed manner in different environments.
- the AI basic development platform provided by this application can be logically divided into multiple parts, each part having different functions.
- part of the AI basic development platform can be deployed in computing devices in the edge environment (also called edge computing devices), and the other part can be deployed in devices in the cloud environment.
- the edge environment is an environment that is geographically close to the user's terminal computing device.
- the edge environment includes edge computing devices, such as edge servers, edge stations with computing capabilities, etc.
- Various parts of the AI basic development platform deployed in different environments or devices work together to provide users with functions such as training AI models.
- the following takes the training of the node evaluation model as an example to explain the AI model training service provided by the AI basic development platform.
- the AI basic development platform can train the initial model and obtain a node evaluation model that meets the user's goals.
- the initial model may be a built-in initial model in the AI basic development platform.
- the initial model may be an initial model provided by the user or selected by the user on the AI basic development platform.
- the initial model can also be a suitable model searched by the AI basic development platform using the background neural network architecture search algorithm.
- Training data can include data built into the AI basic development platform.
- the training data may include user-supplied data or data processed based on user-supplied data.
- the user-supplied data may be a data set that includes one or more mixed integer programming problems.
- the AI basic development platform can process the data set based on the built-in solver to obtain a set of candidate sample nodes.
- the AI basic development platform can save multiple sample nodes in the candidate sample node set to the training database.
- the data provided by the user can be a collection of candidate sample nodes.
- the AI basic development platform can save multiple sample nodes in the candidate sample node set to the training database.
- the data provided by the user may include a set of candidate sample nodes and labels corresponding to the sample nodes.
- the AI basic development platform can combine multiple sample nodes in the candidate sample node set and the multiple sample nodes The corresponding labels are saved to the training database.
- the data provided by the user can also be other types of data. For specific description, please refer to the method 300 or the method 800 mentioned above, which will not be described again here.
- the AI basic development platform can also deploy the aforementioned trained AI model (for example, feature extraction model or node evaluation model) on nodes in the cloud environment or nodes in the edge environment.
- nodes in the cloud environment can be virtual machine instances, container instances, physical servers, etc.
- nodes in the edge environment can be various edge devices.
- an example is shown.
- the model can be distributed and deployed on multiple nodes based on the idea of model parallelism.
- the model can also be deployed independently on multiple nodes to support a larger number of visits to online services.
- the AI basic development platform can also deploy AI applications to edge devices registered to the cloud platform based on the application requirements of the AI model.
- the above-deployed AI model can become an AI application or become a part of an AI application. As shown in Figure 13, users can access AI applications online through Web pages or through client apps. When an AI application is used, the AI model deployed in the edge environment or cloud environment can be called online to provide a response. As a result, the AI model developed and trained through the AI basic development platform can implement inference on online request data and return inference results.
- nodes in Figures 12 and 13 may include nodes in the cloud environment or nodes in the edge environment, and the nodes in the method of the embodiment of the present application are sub-problems of the target planning problem.
- the feature extraction model and the node evaluation model can be used as an AI application, for example, an application for selecting nodes. Users can access the application of selected nodes online through web pages or client apps.
- the node selection application is used, the feature extraction model and node evaluation model deployed in the edge environment or cloud environment can be called online to provide a response.
- the inference result is returned, for example, the indication information of the target node.
- the feature extraction model and the node evaluation model may be part of an AI application, for example, the AI application may be a planning problem solving application.
- Users can access planning problem solving applications online through web pages or client apps. In this case, users can upload the planning problem to be solved, that is, the goal planning problem.
- the solver can call the select node service to determine the target node.
- the node selection service implements inference on the data requested by the solver through the feature extraction model and the node evaluation model, and returns the inference results to the solver, for example, the indication information of the target node.
- the solver solves the planning problem based on the target nodes and returns the solution results to the user.
- the AI basic development platform can continuously collect the input and output data of the reasoning process, use the input and output data of the reasoning phase to continue to enrich the training data set, and based on the data of the reasoning phase and the corresponding manual confirmation The final results continue to optimize and train the AI model.
- the AI model developed and trained by the aforementioned AI basic development platform may not be deployed online. Instead, users can download the trained AI model to the local area for users to freely deploy locally. For example, users can choose to save the trained AI model (for example, feature extraction model and node evaluation model) to OBS, and then the user downloads the AI model from OBS to the local.
- the trained AI model for example, feature extraction model and node evaluation model
- the device according to the embodiment of the present application will be described below with reference to FIGS. 14 to 19 . It should be understood that the devices described below can perform the foregoing methods of the embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the devices of the embodiments of the present application.
- Figure 14 is a schematic block diagram of a node selection device 1400 provided by an embodiment of the present application.
- the device 1400 can be applied to a cloud management platform and can be implemented through software, hardware, or a combination of both. Provided by the embodiments of this application The device 1400 can implement the method flow shown in Figure 2 of the embodiment of this application.
- the device 1400 includes: an acquisition module 1410 and a prediction module 1420.
- the acquisition module 1410 is used to obtain a candidate node set of the target planning problem.
- the candidate node set includes multiple nodes, and each node in the multiple nodes corresponds to a sub-problem to be solved of the target planning problem.
- the prediction module 1420 is used to predict the relevant quantity of the limit value of each node after multi-step expansion through the node evaluation model.
- the output result of the node evaluation model is used to determine the target node.
- the target node is used to adjust the candidate node set.
- the adjusted candidate Node collections are used to solve goal programming problems.
- the device 1400 further includes: a determination module 1430 (not shown in the figure), which can be used to determine a node evaluation model according to user instructions, and the node evaluation model can be deployed on a cloud management platform.
- a determination module 1430 (not shown in the figure), which can be used to determine a node evaluation model according to user instructions, and the node evaluation model can be deployed on a cloud management platform.
- the output result of the node evaluation model is used to determine the first target node, and the adjusted candidate node set includes child nodes of the first target node.
- the output result of the node evaluation model is used to determine the second target node, and the second target node is not included in the adjusted candidate node set.
- the correlation quantity of the limit value of each node after multi-step expansion includes the difference between the limit value of each node after multi-step expansion and the limit value of each node's parent node.
- the correlation quantity of the limit value of each node after multi-step expansion includes the difference between the limit value of each node after being completely solved and the limit value of each node's parent node.
- the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to the multi-step value of other nodes other than the first target node among the multiple nodes.
- the difference between the expanded bounds and the bounds of the parent nodes of nodes other than the first target node is less than or equal to the multi-step value of other nodes other than the first target node among the multiple nodes.
- the second target node belongs to k nodes among the plurality of nodes, and the difference between the limit value of the k nodes after multi-step expansion and the limit value of the parent node of the k nodes is greater than or equal to that of the plurality of nodes.
- the difference between the limit value of other nodes other than k nodes after multi-step expansion and the limit value of the parent node of other nodes other than k nodes, k is an integer greater than 1, and k is less than the number of multiple nodes.
- the second target node is determined based on the probability corresponding to the k nodes.
- the probability corresponding to the k nodes is between the limit value of the k node after multi-step expansion and the limit value of the parent node of the k node. The differences are positively correlated.
- the limit value of each node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of each node after multi-step expansion.
- the node evaluation model is trained based on the sample node and the label corresponding to the sample node.
- the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
- the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
- the label corresponding to the sample node is determined based on the first difference and the second difference.
- the first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node.
- the second difference is determined by The child nodes of the sample node are input into the target evaluation model for processing.
- the target evaluation model has the same structure as the node evaluation model.
- the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node
- the relevant information of multiple nodes includes at least one of the following: an objective function of each node, each node Constraints or decision variables of each node, the low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model.
- Figure 15 is a schematic block diagram of a node evaluation model training device 1500 provided by an embodiment of the present application. Should be installed Set 1500 can be applied to cloud management platforms, which can be implemented through software, hardware or a combination of both.
- the device 1500 provided by the embodiment of the present application can implement the method flow shown in Figure 3 or Figure 8 of the embodiment of the present application.
- the device 1500 includes: a first acquisition module 1510, a second acquisition module 1520 and a training module 1530.
- the first acquisition module 1510 is used to acquire sample nodes.
- the second obtaining module 1520 is used to obtain the label corresponding to the sample node, and the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
- the training module 1530 is used to train the initial model based on the sample nodes and labels corresponding to the sample nodes to obtain a node evaluation model.
- the first acquisition module 1510 and the second acquisition module 1520 may be the same acquisition module, or they may be different acquisition modules.
- the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
- the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
- the training module 1530 is specifically configured to: train the initial model through reinforcement learning to obtain a node evaluation model, wherein the label corresponding to the sample node is determined based on the first difference and the second difference.
- the first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node.
- the second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing.
- the target evaluation model and the node evaluation model The structure is the same.
- the input of the initial model includes relevant information of the sample node or a low-dimensional representation of the relevant information of the sample node.
- the relevant information of the sample node includes at least one of the following: the objective function of the sample node, the constraints of the sample node or the sample.
- the low-dimensional representation of the node's decision variables and the relevant information of the sample node is obtained by reducing the dimensionality of the relevant information of the sample node through the feature extraction model.
- Figure 16 is a schematic block diagram of a device 1600 for solving a goal planning problem provided by an embodiment of the present application.
- the device 1600 can be applied to a cloud management platform and can be implemented through software, hardware, or a combination of both.
- the device 1600 provided by the embodiment of the present application can implement the method flow shown in Figure 4 of the embodiment of the present application.
- the device 1600 includes: an acquisition module 1610, an adjustment module 1620 and a solution module 1630.
- the acquisition module 1610 is used to acquire the goal planning problem uploaded by the user.
- the adjustment module 1620 is used to adjust the candidate node set of the target planning problem according to the node evaluation model, where the candidate node set includes multiple nodes, and each node in the multiple nodes corresponds to a sub-problem to be solved of the target planning problem.
- the node evaluation model is used to predict the ,correlation quantity of the boundary value of each node after ,multi-step expansion.
- the solving module 1630 is used to solve the target planning problem based on the adjusted candidate node set to obtain the solution result of the target planning problem.
- the device 1600 further includes a determining module 1640 (not shown in the figure), which is used to determine the node evaluation model according to user instructions.
- This node evaluation model can be deployed on the cloud management platform.
- the adjustment module 1620 is specifically configured to determine the first target node according to the node evaluation model; generate child nodes of the first target node; and add the child nodes of the first target node to the candidate node set.
- the adjustment module 1620 is specifically configured to determine the second target node according to the node evaluation model; delete the second target node from the candidate node set.
- the correlation quantity of the limit value of each node after multi-step expansion includes the difference between the limit value of each node after multi-step expansion and the limit value of each node's parent node.
- the correlation quantity of the bound value of each node after multi-step expansion includes the bound value of each node after being completely solved.
- the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to the multi-step value of other nodes other than the first target node among the multiple nodes.
- the difference between the expanded bounds and the bounds of the parent nodes of nodes other than the first target node is less than or equal to the multi-step value of other nodes other than the first target node among the multiple nodes.
- the second target node belongs to k nodes among the plurality of nodes, and the difference between the limit value of the k nodes after multi-step expansion and the limit value of the parent node of the k nodes is greater than or equal to that of the plurality of nodes.
- the difference between the limit value of other nodes other than k nodes after multi-step expansion and the limit value of the parent node of other nodes other than k nodes, k is an integer greater than 1, and k is less than the number of multiple nodes.
- the second target node is determined based on the probability corresponding to the k nodes.
- the probability corresponding to the k nodes is between the limit value of the k node after multi-step expansion and the limit value of the parent node of the k node. The differences are positively correlated.
- the limit value of each node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of each node after multi-step expansion.
- the node evaluation model is trained based on the sample node and the label corresponding to the sample node.
- the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
- the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
- the label corresponding to the sample node is determined based on the first difference and the second difference.
- the first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node.
- the second difference is determined by The child nodes of the sample node are input into the target evaluation model for processing.
- the target evaluation model has the same structure as the node evaluation model.
- the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node
- the relevant information of each node includes at least one of the following: an objective function of each node, each node Constraints or decision variables of each node, the low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model.
- the device 1600 also includes a return module (not shown in the figure), which is used to return the solution result of the goal planning problem to the user.
- a return module (not shown in the figure), which is used to return the solution result of the goal planning problem to the user.
- module can be implemented in the form of software and/or hardware, and is not specifically limited.
- a “module” may be a software program, a hardware circuit, or a combination of both that implements the above functions.
- the following takes the adjustment module in Figure 16 as an example to introduce the implementation of the acquisition module.
- the implementation of other modules can refer to the implementation of the adjustment module.
- an adjustment module may include code running on a computing instance.
- the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container.
- the above computing instance may be one or more.
- a tuning module can include code running on multiple hosts/VMs/containers.
- multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions.
- multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs.
- Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.
- VPC virtual private cloud
- multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (virtual private cloud (VPC), or can be distributed in multiple VPCs.
- VPC virtual private cloud
- Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .
- the adjustment module is an example of a hardware functional unit.
- the adjustment module may include at least one computing device, such as a server.
- the adjustment module can also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
- ASIC application-specific integrated circuit
- PLD programmable logic device
- the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
- CPLD complex programmable logical device
- FPGA field-programmable gate array
- GAL general array logic
- Multiple computing devices included in the adjustment module can be distributed in the same region or in different regions. Multiple computing devices included in the adjustment module can be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the adjustment module can be distributed in the same VPC or in multiple VPCs.
- the plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
- modules of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
- the device when the device provided in the above embodiment performs the above method, only the division of the above functional modules is used as an example.
- the above function allocation can be completed by different functional modules as needed, that is, the device
- the internal structure is divided into different functional modules to complete all or part of the functions described above.
- the acquisition module can be used to perform any step in the above method
- the adjustment module can be used to perform any step in the above method
- the solving module can be used to perform any step in the above method.
- the steps that the acquisition module, the adjustment module and the solution module are responsible for implementing can be specified as needed.
- the acquisition module, the adjustment module and the solution module respectively implement different steps in the above method to realize all the functions of the above device.
- the division of functional modules of the device 1400 and the device 1500 is only an example, and will not be described again here to avoid duplication.
- a computing device provided by an embodiment of the present application will be described in detail below with reference to FIG. 17 .
- Figure 17 is a schematic architectural diagram of a computing device 1000 provided by an embodiment of the present application.
- computing device 1000 includes: bus 1002, processor 1004, memory 1006, and communication interface 1008.
- the processor 1004, the memory 1006 and the communication interface 1008 communicate through the bus 1002.
- Computing device 1000 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 1000.
- the bus 1002 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, or the like.
- PCI peripheral component interconnect
- EISA extended industry standard architecture
- the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 17, but it does not mean that there is only one bus or one type of bus.
- Bus 1004 may include a path that carries information between various components of computing device 1000 (eg, memory 1006, processor 1004, communications interface 1008).
- the processor 1004 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
- CPU central processing unit
- GPU graphics processing unit
- MP microprocessor
- DSP digital signal processor
- Memory 1006 may include volatile memory, such as random access memory (RAM).
- the processor 1004 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive). drive, SSD).
- ROM read-only memory
- HDD hard disk drive
- SSD solid state drive
- the memory 1006 stores executable program codes, and the processor 1004 executes the executable program codes to respectively implement the functions of the modules in Figure 14, Figure 15 or Figure 16, thereby implementing the methods of the embodiments of the present application. That is, the memory 1006 stores instructions for executing the method of the embodiment of the present application.
- the communication interface 1003 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 1000 and other devices or communication networks.
- An embodiment of the present application also provides a computing device cluster.
- the computing device cluster includes at least one computing device.
- the computing device may be a server, such as a central server, an edge server, or a local server in a local data center.
- the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
- the computing device cluster includes at least one computing device 1000.
- the computing device cluster can be used to execute the method of the embodiment of the present application, for example, the method shown in Figure 2, Figure 3, Figure 4 or Figure 8.
- the memory 1006 in one or more computing devices 1000 in the computing device cluster may store the same instructions for performing the methods of the embodiments of the present application.
- the memory 1006 of one or more computing devices 1000 in a cluster of computing devices may store the same instructions for performing a method of solving a goal planning problem.
- the memory 1006 of one or more computing devices 1000 in the computing device cluster may also store part of the instructions for executing the method of the embodiment of the present application.
- a combination of one or more computing devices 1000 may jointly execute instructions for performing the methods of embodiments of the present application.
- the memory 1006 of one or more computing devices 1000 in the computing device cluster may also store part of the instructions for executing the solution method of the goal planning problem.
- a combination of one or more computing devices 1000 may collectively execute instructions for performing a method of solving a goal planning problem.
- the memories 1006 in different computing devices 1000 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the method device 1600 for solving a goal planning problem. That is, instructions stored in the memory 1006 in different computing devices 1000 may implement the functions of one or more of the acquisition module, the adjustment module, and the solution module.
- one or more computing devices in a cluster of computing devices may be connected through a network.
- the network may be a wide area network or a local area network, etc.
- Figure 19 shows a possible implementation. As shown in Figure 19, two computing devices 1000A and 1000B are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device.
- the memory 1006 in the computing device 1000A stores instructions for executing the functions of the acquisition module and the solution module. At the same time, instructions for performing the functions of the adjustment module are stored in the memory 1006 in the computing device 1000B.
- computing device 1000A shown in FIG. 19 may also be performed by multiple computing devices 1000.
- computing device 1000B may also be performed by multiple computing devices 1000.
- An embodiment of the present application also provides a computer program product containing instructions.
- the computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium.
- the computer program product is run on at least one computing device, at least one computing device is caused to execute the method in the embodiment of the present application, for example, a method for solving a goal planning problem, a method for selecting nodes, or a method for training a node evaluation model.
- An embodiment of the present application also provides a computer-readable storage medium.
- the computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media.
- the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
- the computer-readable storage medium includes instructions that indicate methods in embodiments of the present application, for example, a method for solving a goal planning problem, a method for selecting nodes, or a method for training a node evaluation model.
- the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
- the execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present application.
- the implementation process constitutes any limitation.
- the disclosed systems, devices and methods can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
- the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
- the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium, including A number of instructions that cause a computer device (that can be a personal computer, server, or network device, etc.) that executes all or part of the steps of the methods described in various embodiments of this application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Educational Administration (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Stored Programmes (AREA)
- Programmable Controllers (AREA)
Abstract
Embodiments of the present application provide a method for solving a goal programming problem, a node selection method, a node evaluation model training method, and an apparatus. The method for solving a goal programming problem comprises: acquiring a goal programming problem uploaded by a user; adjusting a candidate node set of the goal programming problem according to a node evaluation model, wherein nodes in the candidate node set correspond to sub-problems to be solved of the goal programming problem, and the node evaluation model is used for predicting correlation quantities of bound values of each node after multi-step unfolding; and solving the goal programming problem on the basis of the adjusted candidate node set to obtain a solution of the goal programming problem. According to the scheme in the embodiments of the present application, a candidate node set is adjusted on the basis of prediction results of correlation quantities of bound values of each node after multi-step unfolding, and thus, the efficiency of solving programming problems is improved.
Description
本申请要求于2022年7月1日提交中国专利局、申请号为202210773925.4、申请名称为“模型的训练方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on July 1, 2022, with application number 202210773925.4 and the application title "Model Training Method and Device", the entire content of which is incorporated into this application by reference.
本申请要求于2022年11月15日提交中国专利局、申请号为202211430767.9、申请名称为“目标规划问题的求解方法、选择节点的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on November 15, 2022, with the application number 202211430767.9 and the application name "Method for solving the target planning problem, method and device for selecting nodes", and all its contents are approved This reference is incorporated into this application.
本申请要求于2022年11月25日提交中国专利局、申请号为202211490522.5、申请名称为“目标规划问题的求解方法、选择节点的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on November 25, 2022, with the application number 202211490522.5 and the application name "Method for solving the target planning problem, method and device for selecting nodes", and its entire content has been approved This reference is incorporated into this application.
本申请实施例涉及数据处理技术领域,并且更具体地,涉及一种目标规划问题的求解方法、选择节点的方法、节点评估模型的训练方法及装置。The embodiments of the present application relate to the field of data processing technology, and more specifically, to a method for solving a goal planning problem, a method for selecting nodes, a method for training a node evaluation model, and a device.
运筹学主要是运用数学方法研究各种系统的优化途径及方案,为决策者提供科学决策的依据。数学规划是运筹学的一个重要分支,主要的研究目标为在给定的区域内寻找最大化或最小化目标函数的最优解。在真实场景中,很多问题带有整数约束,例如,生产排程、供应链、生产调度以及工厂选择等问题,这一类问题可以建模为混合整数规划问题或整数规划问题,并通过数学规划求解器等工具求解。Operations research mainly uses mathematical methods to study optimization approaches and plans for various systems, providing decision-makers with a basis for scientific decision-making. Mathematical programming is an important branch of operations research. The main research goal is to find the optimal solution that maximizes or minimizes the objective function in a given area. In real scenarios, many problems have integer constraints, such as production scheduling, supply chain, production scheduling, and factory selection. Such problems can be modeled as mixed integer programming problems or integer programming problems, and through mathematical programming Solver and other tools to solve.
数学规划求解器主要依托于分支定界(branch and bound)算法实现。分支定界算法是一种搜索与迭代的方法,在迭代计算过程中反复将原问题的解空间分割为越来越小的子集,即反复产生原问题的子问题(也可以称为节点),通过不断求解子问题以得到原问题的最优解。对于复杂问题,例如,决策变量的规模较大的问题,求解过程中会生成大量的节点,求解所需要的时间也较长,难以满足用户的使用需求。Mathematical programming solvers are mainly implemented based on the branch and bound algorithm. The branch-and-bound algorithm is a search and iterative method that repeatedly divides the solution space of the original problem into smaller and smaller subsets during the iterative calculation process, that is, it repeatedly generates sub-problems (also called nodes) of the original problem. , by continuously solving sub-problems to obtain the optimal solution to the original problem. For complex problems, for example, problems with large decision variables, a large number of nodes will be generated during the solution process, and the solution will take a long time, making it difficult to meet the user's needs.
因此,如何提升规划问题的求解性能成为一个亟待解决的题。Therefore, how to improve the performance of solving planning problems has become an urgent problem to be solved.
发明内容Contents of the invention
本申请实施例提供一种目标规划问题的求解方法、选择节点的方法、节点评估模型的训练方法及装置,该方法有利于提升规划问题的求解效率。Embodiments of the present application provide a method for solving a target planning problem, a method for selecting nodes, a method for training a node evaluation model, and a device. This method is conducive to improving the efficiency of solving planning problems.
第一方面,提供了一种目标规划问题的求解方法,包括:获取用户上传的目标规划问题;根据节点评估模型,对目标规划问题的候选节点集合进行调整,其中,候选节点集合包括多个节点,多个节点中的每个节点对应目标规划问题的一个待求解的子问题,节点评估模型用于预测每个节点在多步展开后的界限值的相关量;基于调整后的候选节点集合,
对目标规划问题进行求解,以得到目标规划问题的求解结果。The first aspect provides a method for solving the goal planning problem, including: obtaining the goal planning problem uploaded by the user; adjusting the candidate node set of the goal planning problem according to the node evaluation model, where the candidate node set includes multiple nodes , each node among the multiple nodes corresponds to a sub-problem to be solved in the target planning problem, and the node evaluation model is used to predict the correlation quantity of the limit value of each node after multi-step expansion; based on the adjusted candidate node set, Solve the goal programming problem to obtain the solution result of the goal programming problem.
目标规划问题的决策变量中的至少部分为整数变量,即至少部分决策变量的值为整数。换言之,该目标规划问题为纯整数规划模型或混合整数规划模型。At least part of the decision variables of the goal planning problem are integer variables, that is, at least part of the decision variables have integer values. In other words, the goal programming problem is a pure integer programming model or a mixed integer programming model.
在本申请实施例中,节点评估模型可以预测节点在多步展开前后的界限值的相关量,有利于预测从该多个节点出发能够搜索到的最优解,该相关量可以用于衡量节点展开的长期价值,使得目标节点的选择更准确,有利于选择合适的节点进行相应的处理,使得调整后的候选节点集合中的节点为更有可能得到最优解的节点,从而有利于提高求解效率。In the embodiment of the present application, the node evaluation model can predict the correlation quantity of the node's limit value before and after multi-step expansion, which is beneficial to predicting the optimal solution that can be searched from the multiple nodes. The correlation quantity can be used to measure the node The long-term value of expansion makes the selection of target nodes more accurate, which is conducive to selecting appropriate nodes for corresponding processing, making the nodes in the adjusted candidate node set more likely to obtain the optimal solution, thus helping to improve the solution efficiency.
结合第一方面,在第一方面的某些实现方式中,方法还包括:根据用户指示确定节点评估模型,该节点评估模型部署于云管理平台。Combined with the first aspect, in some implementations of the first aspect, the method further includes: determining a node evaluation model according to user instructions, and the node evaluation model is deployed on the cloud management platform.
例如,用户从可选择的多个候选节点评估模型中选择该节点评估模型。For example, the user selects the node evaluation model from a plurality of selectable candidate node evaluation models.
再如,用户可以输入节点评估模型。As another example, users can enter nodes to evaluate the model.
结合第一方面,在第一方面的某些实现方式中,根据节点评估模型,对目标规划问题的候选节点集合进行调整,包括:根据节点评估模型确定第一目标节点;生成第一目标节点的子节点;将第一目标节点的子节点增加到候选节点集合。Combined with the first aspect, in some implementations of the first aspect, adjusting the candidate node set of the target planning problem according to the node evaluation model includes: determining the first target node according to the node evaluation model; generating a first target node Child node; add the child node of the first target node to the candidate node set.
在目标节点包括第一目标节点的情况下,可以根据节点评估模型的输出结果确定第一目标节点,在求解过程中可以基于第一目标节点进行迭代计算。具体地,将第一目标节点展开以得到第一目标节点的子节点,进而执行迭代计算。节点评估模型的输出结果能够用于衡量节点展开的长期价值,有利于判断节点在多步展开后能够得到最优解的可能性,基于此确定出的第一目标节点更有可能得到最优解,从而有利于提高收敛速度,提高求解效率。例如,对于最小值优化问题,该节点在多步展开的界限值越小,从该节点出发能够得到全局最优解的可能性越高。节点评估模型可以用于预测节点在多步展开后的界限值,有利于判断节点在多步展开后能够得到最优解的可能性In the case where the target node includes the first target node, the first target node may be determined according to the output result of the node evaluation model, and iterative calculation may be performed based on the first target node during the solution process. Specifically, the first target node is expanded to obtain the child nodes of the first target node, and then iterative calculation is performed. The output results of the node evaluation model can be used to measure the long-term value of node expansion, which is helpful to judge the possibility of the node obtaining the optimal solution after multi-step expansion. Based on this, the first target node determined is more likely to obtain the optimal solution. , which is beneficial to improving the convergence speed and solving efficiency. For example, for the minimum optimization problem, the smaller the boundary value of the node in multi-step expansion, the higher the possibility of obtaining the global optimal solution starting from this node. The node evaluation model can be used to predict the boundary value of the node after multi-step expansion, which is helpful to judge the possibility of the node getting the optimal solution after multi-step expansion.
结合第一方面,在第一方面的某些实现方式中,根据节点评估模型,对目标规划问题的候选节点集合进行调整,还包括:根据节点评估模型确定第二目标节点;将第二目标节点从候选节点集合中删除。Combined with the first aspect, in some implementations of the first aspect, adjusting the candidate node set of the target planning problem according to the node evaluation model also includes: determining the second target node according to the node evaluation model; Remove from the set of candidate nodes.
在目标节点包括第二目标节点的情况下,可以根据节点评估模型的输出结果确定第二目标节点,在求解过程中可以对第二目标节点进行剪枝处理。节点评估模型的输出结果能够衡量节点展开的长期价值,有利于判断节点展开后能够得到最优解的可能性,基于此确定第二目标节点。将得到最优解的可能性较低的节点进行剪枝处理能够缩小解空间,避免在无用节点上展开和求解所带来的时延,从而提高求解效率。例如,对于最小值优化问题,该节点在多步展开的界限值越大,从该节点出发能够得到全局最优解的可能性越小。节点评估模型可以用于预测节点在多步展开后的界限值,有利于判断节点在多步展开后能够得到最优解的可能性,基于此确定第二目标节点,避免在无用节点上展开和求解所带来的时延。When the target node includes a second target node, the second target node can be determined according to the output result of the node evaluation model, and the second target node can be pruned during the solution process. The output results of the node evaluation model can measure the long-term value of node expansion, which is helpful to judge the possibility of obtaining the optimal solution after node expansion, and determine the second target node based on this. Pruning nodes that are less likely to obtain the optimal solution can reduce the solution space and avoid the time delay caused by expanding and solving on useless nodes, thus improving the solution efficiency. For example, for the minimum optimization problem, the greater the limit value of the node in multi-step expansion, the smaller the possibility of obtaining the global optimal solution starting from this node. The node evaluation model can be used to predict the limit value of the node after multi-step expansion, which is helpful to judge the possibility of the node getting the optimal solution after multi-step expansion. Based on this, the second target node is determined to avoid expanding and expanding on useless nodes. Find the time delay caused by the solution.
结合第一方面,在第一方面的某些实现方式中,每个节点在多步展开后的界限值的相关量包括每个节点在多步展开后的界限值。Combined with the first aspect, in some implementations of the first aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion.
结合第一方面,在第一方面的某些实现方式中,每个节点在多步展开后的界限值的相关量包括每个节点在多步展开后的界限值与每个节点的父节点的界限值之间的差异。Combined with the first aspect, in some implementations of the first aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion and the limit value of each node's parent node. The difference between the limit values.
示例性地,节点在多步展开后的界限值与该节点的父节点的界限值之间的差异可以为
该节点在多步展开后的界限值与该节点的父节点的界限值之间的差值。For example, the difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node can be The difference between the limit value of the node after multi-step expansion and the limit value of the node's parent node.
以目标规划问题为最小值优化问题为例,若界限值为下界值,在求解过程中,随着迭代次数的增加,该多个节点在多步展开后的下界值可能均很小,受限于计算机的计算精度等因素,难以比较该多个节点在多步展开后的下界值的大小。而该多个节点在多步展开前后的差异较为明显。在本申请实施例中,可以通过预测该多个节点在多步展开前后的差异,进而比较该多个节点在多步展开前后的差异以确定目标节点,有利于提高目标节点选择的准确性。Taking the goal programming problem as a minimum value optimization problem as an example, if the limit value is a lower bound value, during the solution process, as the number of iterations increases, the lower bound values of the multiple nodes after multi-step expansion may be very small, which is limited. Due to factors such as computer calculation accuracy, it is difficult to compare the lower bound values of multiple nodes after multi-step expansion. The differences between these multiple nodes before and after multi-step expansion are more obvious. In the embodiment of the present application, the target node can be determined by predicting the differences between the multiple nodes before and after the multi-step expansion, and then comparing the differences between the multiple nodes before and after the multi-step expansion, which is beneficial to improving the accuracy of target node selection.
结合第一方面,在第一方面的某些实现方式中,每个节点在多步展开后的界限值的相关量包括每个节点在被完全求解后的界限值与每个节点的父节点的界限值之间的差异。Combined with the first aspect, in some implementations of the first aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node's parent node. The difference between the limit values.
结合第一方面,在第一方面的某些实现方式中,每个节点在多步展开后的界限值的相关量包括每个节点在被完全求解后的界限值与每个节点的父节点的界限值之间的差异由每个节点的多步伪成本函数的函数值指示,节点的多步伪成本函数的函数值满足如下公式:
Combined with the first aspect, in some implementations of the first aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node's parent node. The difference between the limit values is indicated by the function value of the multi-step pseudo-cost function of each node, which satisfies the following formula:
Combined with the first aspect, in some implementations of the first aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node's parent node. The difference between the limit values is indicated by the function value of the multi-step pseudo-cost function of each node, which satisfies the following formula:
其中,C(·)代表节点的多步伪成本函数。c(·)代表节点单步展开前后的界限值的变化。节点Ni为节点P的子节点。Among them, C(·) represents the multi-step pseudo-cost function of the node. c(·) represents the change in the limit value of the node before and after single-step expansion. Node Ni is a child node of node P.
结合第一方面,在第一方面的某些实现方式中,第一目标节点在多步展开后的界限值与第一目标节点的父节点的界限值之间的差异小于或等于多个节点中的第一目标节点以外的其他节点在多步展开后的界限值与第一目标节点以外的其他节点的父节点的界限值之间的差异。In connection with the first aspect, in some implementations of the first aspect, the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to that of the plurality of nodes. The difference between the limit values of nodes other than the first target node after multi-step expansion and the limit values of the parent nodes of other nodes other than the first target node.
结合第一方面,在第一方面的某些实现方式中,第二目标节点在多个展开后的界限值与第二目标节点的父节点的界限值之间的差异大于或等于该多个节点中的剩余节点在展开后的界限值与剩余节点的父节点之间的差异。In connection with the first aspect, in some implementations of the first aspect, the difference between the multiple expanded limit values of the second target node and the limit value of the parent node of the second target node is greater than or equal to the multiple nodes. The difference between the expanded limit value of the remaining nodes in and the parent node of the remaining node.
结合第一方面,在第一方面的某些实现方式中,第二目标节点属于多个节点中的k个节点,k个节点在多步展开后的界限值与k个节点的父节点的界限值之间的差异大于或等于多个节点中的k个节点以外的其他节点在多步展开后的界限值与k个节点以外的其他节点的父节点的界限值之间的差异,k为大于1的整数,k小于多个节点的数量。Combined with the first aspect, in some implementations of the first aspect, the second target node belongs to k nodes among the plurality of nodes, and the limit value of the k nodes after multi-step expansion is the limit of the parent node of the k node. The difference between values is greater than or equal to the difference between the limit values of nodes other than k nodes in the multiple nodes after multi-step expansion and the limit values of the parent nodes of other nodes other than k nodes, k is greater than An integer of 1, k is less than the number of multiple nodes.
由于节点被剪枝之后,该节点在目标规划问题的求解过程中不再被求解,剪枝操作有可能导致包含最优解的节点被剪枝。在本申请实施例的方案中,可以通过上述贪婪方式概率性地确定第二目标节点,有利于降低剪枝操作的风险性。Since after a node is pruned, the node will no longer be solved during the solution process of the goal planning problem, and the pruning operation may cause the node containing the optimal solution to be pruned. In the solution of the embodiment of the present application, the second target node can be determined probabilistically through the above greedy method, which is beneficial to reducing the risk of the pruning operation.
结合第一方面,在第一方面的某些实现方式中,第二目标节点是基于k个节点对应的概率确定的,k个节点对应的概率与k个节点在多步展开后的界限值与k个节点的父节点的界限值之间的差异呈正相关关系。Combined with the first aspect, in some implementations of the first aspect, the second target node is determined based on the probabilities corresponding to k nodes, and the probabilities corresponding to k nodes are the same as the limit values of k nodes after multi-step expansion. The differences between the limit values of the parent nodes of k nodes are positively correlated.
结合第一方面,在第一方面的某些实现方式中,每个节点在多步展开后的界限值包括每个节点在多步展开后的松弛解对应的目标函数的函数值。Combined with the first aspect, in some implementations of the first aspect, the limit value of each node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of each node after multi-step expansion.
结合第一方面,在第一方面的某些实现方式中,节点评估模型是基于样本节点和样本节点对应的标签训练得到的,样本节点对应的标签与样本节点在多步展开后的界限值相关。Combined with the first aspect, in some implementations of the first aspect, the node evaluation model is trained based on the sample node and the label corresponding to the sample node, and the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. .
在本申请实施例中,样本节点对应的标签与样本节点在多步展开后的界限值相关,样本节点对应的标签较容易确定,使得训练数据的采集较为方便,有利于提高训练数据的生成效率,以得到大量的训练数据,提高了样本的利用效率,从而提高节点评估模型的训练
效果,即提高节点评估模型的预测准确度。In the embodiment of the present application, the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. The label corresponding to the sample node is easier to determine, making the collection of training data more convenient and conducive to improving the efficiency of generating training data. , to obtain a large amount of training data, improve the utilization efficiency of samples, and thereby improve the training of node evaluation models. The effect is to improve the prediction accuracy of the node evaluation model.
结合第一方面,在第一方面的某些实现方式中,样本节点对应的标签用于指示样本节点在多步展开后的界限值和样本节点的父节点的界限值之间的差异。In connection with the first aspect, in some implementations of the first aspect, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
结合第一方面,在第一方面的某些实现方式中,样本节点对应的标签是根据第一差异和第二差异确定的,第一差异为样本节点的父节点的界限值与样本节点的界限值之间的差异,第二差异是通过将样本节点的子节点输入至目标评估模型中进行处理后得到的,目标评估模型与节点评估模型的结构相同。Combined with the first aspect, in some implementations of the first aspect, the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the limit value of the parent node of the sample node and the limit of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing. The target evaluation model has the same structure as the node evaluation model.
示例性地,第一差异可以通过求解器确定。例如,调用求解器获取样本节点的父节点的界限值和样本节点的界限值,从而可以确定第一差异。Illustratively, the first difference may be determined by a solver. For example, the solver is called to obtain the limit value of the parent node of the sample node and the limit value of the sample node, so that the first difference can be determined.
结合第一方面,在第一方面的某些实现方式中,样本节点在多步展开后的界限值包括样本节点在多步展开后的松弛解对应的目标函数的函数值。Combined with the first aspect, in some implementations of the first aspect, the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
节点的松弛解对应的目标函数的函数值的获取较为方便。在本申请实施例中,在样本节点的界限值是基于松弛解确定的情况下,样本节点对应的标签更容易确定。例如,在基于强化学习的过程中,样本节点对应的标签可以是实时确定的,松弛解的计算更为便捷,基于松弛解确定样本节点对应的标签的效率更高,从而有利于提高训练效率。It is more convenient to obtain the function value of the objective function corresponding to the relaxed solution of the node. In the embodiment of the present application, when the limit value of the sample node is determined based on the relaxed solution, the label corresponding to the sample node is easier to determine. For example, in the process based on reinforcement learning, the labels corresponding to sample nodes can be determined in real time, and the calculation of the relaxed solution is more convenient. It is more efficient to determine the labels corresponding to the sample nodes based on the relaxed solution, which is beneficial to improving training efficiency.
结合第一方面,在第一方面的某些实现方式中,节点评估模型的输入包括每个节点的相关信息或每个节点的相关信息的低维表示,每个节点的相关信息包括以下至少一项:每个节点的目标函数,每个节点的约束条件或每个节点的决策变量,每个节点的相关信息的低维表示是通过特征提取模型对每个节点的相关信息进行降维处理得到的。In conjunction with the first aspect, in some implementations of the first aspect, the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node, and the relevant information of each node includes at least one of the following: Items: the objective function of each node, the constraints of each node or the decision variables of each node. The low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model. of.
在本申请实施例的方案中,通过对节点的相关信息进行降维处理,有利于下游模块的推理,即有利于节点评估模型的推理。In the solution of the embodiment of this application, by performing dimensionality reduction processing on the relevant information of the node, it is beneficial to the reasoning of the downstream module, that is, it is beneficial to the reasoning of the node evaluation model.
结合第一方面,在第一方面的某些实现方式中,方法还包括:向用户返回目标规划问题的求解结果。Combined with the first aspect, in some implementations of the first aspect, the method further includes: returning a solution result of the goal planning problem to the user.
第二方面,提供了一种选择节点的方法,包括:获取目标规划问题的候选节点集合,候选节点集合包括多个节点,多个节点中的每个节点对应目标规划问题的一个待求解的子问题;通过节点评估模型预测每个节点在多步展开后的界限值的相关量,节点评估模型的输出结果用于确定目标节点,目标节点用于调整候选节点集合,调整后的候选节点集合用于对目标规划问题进行求解。In the second aspect, a method for selecting nodes is provided, including: obtaining a set of candidate nodes for a target planning problem. The set of candidate nodes includes multiple nodes, and each node in the multiple nodes corresponds to a sub-set of the target planning problem to be solved. Problem; Use the node evaluation model to predict the correlation quantity of the limit value of each node after multi-step expansion. The output result of the node evaluation model is used to determine the target node. The target node is used to adjust the candidate node set. The adjusted candidate node set is used Used to solve goal planning problems.
结合第二方面,在第二方面的某些实现方式中,方法还包括:根据用户指示确定节点评估模型,该节点评估模型部署于云管理平台。Combined with the second aspect, in some implementations of the second aspect, the method further includes: determining a node evaluation model according to user instructions, and the node evaluation model is deployed on the cloud management platform.
结合第二方面,在第二方面的某些实现方式中,节点评估模型的输出结果用于确定第一目标节点,调整后的候选节点集合中包括第一目标节点的子节点。Combined with the second aspect, in some implementations of the second aspect, the output result of the node evaluation model is used to determine the first target node, and the adjusted candidate node set includes child nodes of the first target node.
结合第二方面,在第二方面的某些实现方式中,节点评估模型的输出结果用于确定第二目标节点,调整后的候选节点集合中不包括第二目标节点。Combined with the second aspect, in some implementations of the second aspect, the output result of the node evaluation model is used to determine the second target node, and the second target node is not included in the adjusted candidate node set.
结合第二方面,在第二方面的某些实现方式中,每个节点在多步展开后的界限值的相关量包括每个节点在多步展开后的界限值与每个节点的父节点的界限值之间的差异。Combined with the second aspect, in some implementations of the second aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion and the limit value of each node's parent node. The difference between the limit values.
结合第二方面,在第二方面的某些实现方式中,每个节点在多步展开后的界限值的相关量包括每个节点在被完全求解后的界限值与每个节点的父节点的界限值之间的差异。Combined with the second aspect, in some implementations of the second aspect, the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node's parent node. The difference between the limit values.
结合第二方面,在第二方面的某些实现方式中,第一目标节点在多步展开后的界限值
与第一目标节点的父节点的界限值之间的差异小于或等于多个节点中的第一目标节点以外的其他节点在多步展开后的界限值与第一目标节点以外的其他节点的父节点的界限值之间的差异。Combined with the second aspect, in some implementations of the second aspect, the limit value of the first target node after multi-step expansion The difference between the limit value and the parent node of the first target node is less than or equal to the limit value of the other nodes other than the first target node among the multiple nodes after multi-step expansion and the limit value of the other nodes other than the first target node. The difference between the node's bounding values.
结合第二方面,在第二方面的某些实现方式中,第二目标节点属于多个节点中的k个节点,k个节点在多步展开后的界限值与k个节点的父节点的界限值之间的差异大于或等于多个节点中的k个节点以外的其他节点在多步展开后的界限值与k个节点以外的其他节点的父节点的界限值之间的差异,k为大于1的整数,k小于多个节点的数量。Combined with the second aspect, in some implementations of the second aspect, the second target node belongs to k nodes among the plurality of nodes, and the limit value of the k nodes after multi-step expansion is the limit of the parent node of the k node. The difference between values is greater than or equal to the difference between the limit values of nodes other than k nodes in the multiple nodes after multi-step expansion and the limit values of the parent nodes of other nodes other than k nodes, k is greater than An integer of 1, k is less than the number of multiple nodes.
结合第二方面,在第二方面的某些实现方式中,第二目标节点是基于k个节点对应的概率确定的,k个节点对应的概率与k个节点在多步展开后的界限值与k个节点的父节点的界限值之间的差异呈正相关关系。Combined with the second aspect, in some implementations of the second aspect, the second target node is determined based on the probabilities corresponding to k nodes, and the probabilities corresponding to k nodes are the same as the limit values of k nodes after multi-step expansion. The differences between the limit values of the parent nodes of k nodes are positively correlated.
结合第二方面,在第二方面的某些实现方式中,每个节点在多步展开后的界限值包括每个节点在多步展开后的松弛解对应的目标函数的函数值。Combined with the second aspect, in some implementations of the second aspect, the limit value of each node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of each node after multi-step expansion.
结合第二方面,在第二方面的某些实现方式中,节点评估模型是基于样本节点和样本节点对应的标签训练得到的,样本节点对应的标签与样本节点在多步展开后的界限值相关。Combined with the second aspect, in some implementations of the second aspect, the node evaluation model is trained based on the sample node and the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. .
结合第二方面,在第二方面的某些实现方式中,样本节点对应的标签用于指示样本节点在多步展开后的界限值和样本节点的父节点的界限值之间的差异。Combined with the second aspect, in some implementations of the second aspect, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
结合第二方面,在第二方面的某些实现方式中,样本节点对应的标签是根据第一差异和第二差异确定的,第一差异为样本节点的父节点的界限值与样本节点的界限值之间的差异,第二差异是通过将样本节点的子节点输入至目标评估模型中进行处理后得到的,目标评估模型与节点评估模型的结构相同。Combined with the second aspect, in some implementations of the second aspect, the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the limit value of the parent node of the sample node and the limit of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing. The target evaluation model has the same structure as the node evaluation model.
结合第二方面,在第二方面的某些实现方式中,节点评估模型的输入包括每个节点的相关信息或每个节点的相关信息的低维表示,多个节点的相关信息包括以下至少一项:每个节点的目标函数,每个节点的约束条件或每个节点的决策变量,每个节点的相关信息的低维表示是通过特征提取模型对每个节点的相关信息进行降维处理得到的。Combined with the second aspect, in some implementations of the second aspect, the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node, and the relevant information of multiple nodes includes at least one of the following: Items: the objective function of each node, the constraints of each node or the decision variables of each node. The low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model. of.
第三方面,提供了一种节点评估模型的训练方法,节点评估模型用于预测目标规划问题的候选节点集合中的每个节点在多步展开后的界限值的相关量,每个节点对应目标规划问题的一个待求解的子问题,节点评估模型的输出结果用于确定目标节点,目标节点用于调整候选节点集合,调整后的候选节点集合用于对目标规划问题进行求解,训练方法包括:获取样本节点,获取样本节点对应的标签,样本节点对应的标签与样本节点在多步展开后的界限值相关;基于样本节点和样本节点对应的标签对初始模型进行训练,以得到节点评估模型。In the third aspect, a training method for a node evaluation model is provided. The node evaluation model is used to predict the correlation quantity of the limit value of each node in the candidate node set of the target planning problem after multi-step expansion. Each node corresponds to the target. A sub-problem of the planning problem to be solved. The output result of the node evaluation model is used to determine the target node. The target node is used to adjust the candidate node set. The adjusted candidate node set is used to solve the target planning problem. The training method includes: Obtain the sample node and obtain the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion; train the initial model based on the sample node and the label corresponding to the sample node to obtain the node evaluation model.
在本申请实施例中,样本节点对应的标签与样本节点在多步展开后的界限值相关,样本节点对应的标签较容易确定,使得训练数据的采集较为方便,有利于提高训练数据的生成效率,以得到大量的训练数据,提高了样本的利用效率,从而提高节点评估模型的训练效果,即提高节点评估模型的预测准确度。In the embodiment of the present application, the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. The label corresponding to the sample node is easier to determine, making the collection of training data more convenient and conducive to improving the efficiency of generating training data. , to obtain a large amount of training data, improve the utilization efficiency of samples, thereby improving the training effect of the node evaluation model, that is, improving the prediction accuracy of the node evaluation model.
结合第三方面,在第三方面的某些实现方式中,样本节点在多步展开后的界限值包括样本节点在多步展开后的松弛解对应的目标函数的函数值。Combined with the third aspect, in some implementations of the third aspect, the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
结合第三方面,在第三方面的某些实现方式中,样本节点对应的标签用于指示样本节点在多步展开后的界限值和样本节点的父节点的界限值之间的差异。
Combined with the third aspect, in some implementations of the third aspect, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
结合第三方面,在第三方面的某些实现方式中,基于样本节点和样本节点对应的标签对初始模型进行训练,以得到节点评估模型,包括:通过强化学习的方式对初始模型进行训练,以得到节点评估模型,其中,样本节点对应的标签是根据第一差异和第二差异确定的,第一差异为样本节点的父节点的界限值与样本节点的界限值之间的差异,第二差异是通过将样本节点的子节点输入至目标评估模型中进行处理后得到的,目标评估模型与节点评估模型的结构相同。Combined with the third aspect, in some implementations of the third aspect, the initial model is trained based on the sample node and the label corresponding to the sample node to obtain the node evaluation model, including: training the initial model through reinforcement learning, To obtain the node evaluation model, in which the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference The difference is obtained by inputting the child nodes of the sample node into the target evaluation model, which has the same structure as the node evaluation model.
结合第一方面,在第一方面的某些实现方式中,样本节点在多步展开后的界限值包括样本节点在多步展开后的松弛解对应的目标函数的函数值。Combined with the first aspect, in some implementations of the first aspect, the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
节点的松弛解对应的目标函数的函数值的获取较为方便。在本申请实施例中,在样本节点的界限值是基于松弛解确定的情况下,样本节点对应的标签更容易确定。例如,在基于强化学习的过程中,样本节点对应的标签可以是实时确定的,松弛解的计算更为便捷,基于松弛解确定样本节点对应的标签的效率更高,从而有利于提高训练效率。It is more convenient to obtain the function value of the objective function corresponding to the relaxed solution of the node. In the embodiment of the present application, when the limit value of the sample node is determined based on the relaxed solution, the label corresponding to the sample node is easier to determine. For example, in the process based on reinforcement learning, the labels corresponding to sample nodes can be determined in real time, and the calculation of the relaxed solution is more convenient. It is more efficient to determine the labels corresponding to the sample nodes based on the relaxed solution, which is beneficial to improving training efficiency.
结合第三方面,在第三方面的某些实现方式中,初始模型的输入包括样本节点的相关信息或样本节点的相关信息的低维表示,样本节点的相关信息包括以下至少以下一项:样本节点的目标函数,样本节点的约束条件或样本节点的决策变量,样本节点的相关信息的低维表示是通过特征提取模型对样本节点的相关信息进行降维处理得到的。Combined with the third aspect, in some implementations of the third aspect, the input of the initial model includes relevant information of the sample node or a low-dimensional representation of the relevant information of the sample node, and the relevant information of the sample node includes at least one of the following: sample The objective function of the node, the constraint condition of the sample node or the decision variable of the sample node, and the low-dimensional representation of the relevant information of the sample node are obtained by reducing the dimensionality of the relevant information of the sample node through the feature extraction model.
应理解,在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第二方面和第三方面中相同的内容。It should be understood that the expansion, limitation, explanation and description of relevant content in the above-mentioned first aspect also apply to the same content in the second and third aspects.
第四方面,提供了一种目标规划问题的求解装置,该装置包括用于执行上述第一方面以及第一方面的任意一种实现方式的方法的单元。A fourth aspect provides a device for solving a goal planning problem, which device includes a unit for executing the method of the above-mentioned first aspect and any implementation of the first aspect.
第五方面,提供了一种选择节点的装置,该装置包括用于执行上述第二方面以及第二方面的任意一种实现方式的方法的单元。A fifth aspect provides a device for selecting a node, which device includes a unit for executing the above second aspect and the method of any implementation of the second aspect.
第六方面,提供了一种节点评估模型的训练装置,该装置包括用于执行上述第三方面以及第三方面的任意一种实现方式的方法的单元。A sixth aspect provides a training device for a node evaluation model, which device includes a unit for executing the above third aspect and the method of any implementation of the third aspect.
第七方面,提供了一种芯片,该芯片获取指令并执行该指令来实现上述第一方面至第三方面的任意一种实现方式中的方法。A seventh aspect provides a chip that obtains instructions and executes the instructions to implement the method in any one of the above-mentioned implementations of the first to third aspects.
可选地,作为一种实现方式,该芯片包括处理器与数据接口,该处理器通过该数据接口读取存储器上存储的指令,执行上述第一方面至第三方面的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip includes a processor and a data interface. The processor reads instructions stored in the memory through the data interface and executes any one of the implementation methods of the first to third aspects. Methods.
可选地,作为一种实现方式,该芯片还可以包括存储器,该存储器中存储有指令,该处理器用于执行该存储器上存储的指令,当该指令被执行时,该处理器用于执行第一方面至第三方面中的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may also include a memory, the memory stores instructions, the processor is used to execute the instructions stored in the memory, and when the instructions are executed, the processor is used to execute the first A method in any one implementation manner from the third aspect to the third aspect.
第八方面,提供了一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器。至少一个计算设备的处理器用于执行至少一个计算设备的存储器中存储的指令,以使得计算设备集群执行第一方面至第三方面的任意一种实现方式中的方法。In an eighth aspect, a computing device cluster is provided, including at least one computing device, each computing device including a processor and a memory. The processor of at least one computing device is configured to execute instructions stored in the memory of at least one computing device, so that the computing device cluster executes the method in any one implementation of the first to third aspects.
第九方面,提供一种计算机可读介质,包括计算机程序指令,当计算机程序指令由计算设备集群执行时,计算设备集群执行第一方面至第三方面的任意一种实现方式中的方法。In a ninth aspect, a computer-readable medium is provided, including computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the method in any implementation of the first to third aspects.
第十方面,提供一种包含指令的计算机程序产品,当所述指令被计算设备集群运行时,使得计算设备集群执行上述第一方面至第三方面的任意一种实现方式中的方法。
In a tenth aspect, a computer program product containing instructions is provided. When the instructions are run by a computing device cluster, the computing device cluster executes the method in any one of the above implementations of the first to third aspects.
图1为一种基于分支定界法求解规划问题的装置的示意性框图;Figure 1 is a schematic block diagram of a device for solving planning problems based on the branch and bound method;
图2为本申请实施例的一种选择节点的方法的示意性流程图;Figure 2 is a schematic flow chart of a node selection method according to an embodiment of the present application;
图3为本申请实施例的一种训练节点评估模型的方法的示意性流程图;Figure 3 is a schematic flow chart of a method for training a node evaluation model according to an embodiment of the present application;
图4为本申请实施例的一种规划问题的求解方法的示意性流程图;Figure 4 is a schematic flow chart of a method for solving a planning problem according to an embodiment of the present application;
图5为本申请实施例的一种降维处理过程的示意性流程;Figure 5 is a schematic flowchart of a dimensionality reduction process according to an embodiment of the present application;
图6为本申请实施例的一种全连接神经网络模型的示意图;Figure 6 is a schematic diagram of a fully connected neural network model according to an embodiment of the present application;
图7为本申请实施例的一种剪枝节点的选择过程的示意图;Figure 7 is a schematic diagram of a pruning node selection process according to an embodiment of the present application;
图8为本申请实施例的另一种训练节点评估模型的方法的示意性流程图;Figure 8 is a schematic flow chart of another method for training a node evaluation model according to an embodiment of the present application;
图9为本申请实施例的又一种训练节点评估模型的方法的示意性流程图;Figure 9 is a schematic flow chart of yet another method for training a node evaluation model according to an embodiment of the present application;
图10为本申请实施例的一种节点展开过程的示意图;Figure 10 is a schematic diagram of a node expansion process according to an embodiment of the present application;
图11为本申请实施例的一种用户与AI基础开发平台的交互形态的示意图;Figure 11 is a schematic diagram of an interaction form between a user and an AI basic development platform according to an embodiment of the present application;
图12为本申请实施例的一种AI模型部署的示意图;Figure 12 is a schematic diagram of an AI model deployment according to an embodiment of the present application;
图13为本申请实施例的一种AI模型提供在线服务的示意图;Figure 13 is a schematic diagram of an AI model providing online services according to an embodiment of the present application;
图14为本申请实施例的一种选择节点的装置的示意性框图;Figure 14 is a schematic block diagram of a device for selecting nodes according to an embodiment of the present application;
图15为本申请实施例的一种节点评估模型的训练装置的示意性框图;Figure 15 is a schematic block diagram of a training device for a node evaluation model according to an embodiment of the present application;
图16为本申请实施例的一种目标规划问题的求解装置的示意性框图;Figure 16 is a schematic block diagram of a device for solving a goal planning problem according to an embodiment of the present application;
图17是本申请实施例提供的一种计算设备的架构示意图;Figure 17 is an architectural schematic diagram of a computing device provided by an embodiment of the present application;
图18是本申请实施例提供的一种计算设备集群的架构示意图;Figure 18 is a schematic architectural diagram of a computing device cluster provided by an embodiment of the present application;
图19是本申请实施例提供的计算设备之间通过网络进行连接的示意图。Figure 19 is a schematic diagram of the connection between computing devices through a network provided by an embodiment of the present application.
下面将结合附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.
本申请实施例的方法可以应用于供应链、金融、能源、交通、通信以及电力系统等多种领域。具体地,本申请实施例的方案可以应用于涉及整数变量的组合优化问题的求解场景中。示例性地,本申请实施例的方案可以应用于生产排程、生产调度、工厂选址、风险控制、资产配置、输油管路铺设、物流运输、路径优化以及电网的布局分配等问题的求解场景中。The methods in the embodiments of this application can be applied to various fields such as supply chain, finance, energy, transportation, communications, and power systems. Specifically, the solutions of the embodiments of the present application can be applied to solving scenarios involving combinatorial optimization problems involving integer variables. Illustratively, the solutions of the embodiments of the present application can be applied to solving scenarios such as production scheduling, production scheduling, factory location selection, risk control, asset allocation, oil pipeline laying, logistics transportation, route optimization, and power grid layout and distribution. .
为了更好地说明本申请实施例的方案,下面先对本申请中可能涉及的术语进行说明。In order to better explain the solutions of the embodiments of this application, the terms that may be involved in this application are first described below.
(1)运筹优化(operation research and optimization)(1) Operation research and optimization
运筹优化主要研究对各种资源的运用及筹划,在满足一定约束的条件下,以期发挥有限资源的最大效益,达到总体最优的目标,为决策者提供科学决策的依据。Operations optimization mainly studies the use and planning of various resources, under certain constraints, in order to maximize the benefits of limited resources, achieve the overall optimal goal, and provide decision-makers with the basis for scientific decision-making.
(2)数学规划(mathematical programming)(2)mathematical programming
数学规划是运筹规划的一个分支,研究的目标主要为在给定的区域中寻找可以最大化或最小化某一函数的函数值的最优解。根据问题的性质和处理方法的差异,数学规划可以划分为许多不同的分支,例如,线性规划、整数规划、非线性规划、组合优化、多目标规划、随机规划、动态规划以及参数规划等。Mathematical programming is a branch of operational planning. The research goal is mainly to find the optimal solution that can maximize or minimize the function value of a certain function in a given area. According to the nature of the problem and the difference in processing methods, mathematical programming can be divided into many different branches, such as linear programming, integer programming, nonlinear programming, combinatorial optimization, multi-objective programming, stochastic programming, dynamic programming, and parametric programming.
(3)线性规划(linear programming,LP)(3) Linear programming (LP)
线性规划可以划分为目标函数和约束条件两个部分。当一个线性规模模型的这两部分
均为线性时,该模型可以称为线性规划模型。换言之,线性规划研究线性约束下线性目标函数的极值问题。Linear programming can be divided into two parts: objective function and constraints. When these two parts of a linear scale model When both are linear, the model can be called a linear programming model. In other words, linear programming studies the extreme value problem of a linear objective function under linear constraints.
(4)整数规划(integer linear programming)(4) Integer linear programming
整数规划是指决策变量中存在整数变量的线性规划问题。若一个整数规划模型中的所有决策变量均为整数变量,则该模型也可以称为纯整数规划模型。Integer programming refers to a linear programming problem where integer variables exist among the decision variables. If all decision variables in an integer programming model are integer variables, the model can also be called a pure integer programming model.
在整数规划问题中不考虑决策变量为整数变量这一限制条件时对应的规划问题可以称为该整数规划问题对应的松弛问题。或者说,通过将整数变量进行线性松弛(linear relaxation),可以将整数规划问题转换为松弛后的线性规划问题,即该整数规划问题对应的松弛问题。求解该线性规划问题得到的解即为该整数规划问题的松弛解。In an integer programming problem, the corresponding programming problem when the constraint that the decision variable is an integer variable is not considered can be called the relaxation problem corresponding to the integer programming problem. In other words, by performing linear relaxation on integer variables, the integer programming problem can be converted into a relaxed linear programming problem, that is, the relaxation problem corresponding to the integer programming problem. The solution obtained by solving this linear programming problem is the relaxed solution of the integer programming problem.
(5)混合整数规划(mixed integer linear programming)(5) Mixed integer linear programming (mixed integer linear programming)
混合整数规划是指决策变量中的一部分限制为整数的线性规划问题。Mixed integer programming refers to a linear programming problem in which some of the decision variables are restricted to integers.
例如,混合整数规划模型可以表示为如下形式:
min f(x)=d1x1+d2x2+d3x3;
s.t.A11x1+A13x3≤b1;
A21x1+A22x2≤b2;
x2∈Z;For example, a mixed integer programming model can be expressed as follows:
min f(x)=d 1 x 1 +d 2 x 2 +d 3 x 3 ;
stA 11 x 1 +A 13 x 3 ≤b 1 ;
A 21 x 1 +A 22 x 2 ≤ b 2 ;
x 2∈Z ;
min f(x)=d1x1+d2x2+d3x3;
s.t.A11x1+A13x3≤b1;
A21x1+A22x2≤b2;
x2∈Z;For example, a mixed integer programming model can be expressed as follows:
min f(x)=d 1 x 1 +d 2 x 2 +d 3 x 3 ;
stA 11 x 1 +A 13 x 3 ≤b 1 ;
A 21 x 1 +A 22 x 2 ≤ b 2 ;
x 2∈Z ;
其中,f(x)为目标函数,A11x1+A13x3≤b1,A21x1+A22x2≤b2,以及x2∈Z均为约束条件,x1,x2,x3为决策变量。d1,d2,d3,A11,A22,A21,A22,b1和b2为参数,Z表示整数。Among them, f(x) is the objective function, A 11 x 1 +A 13 x 3 ≤b 1 , A 21 x 1 +A 22 x 2 ≤b 2 , and x 2 ∈Z are all constraints, x 1 , x 2 , x 3 is the decision variable. d 1 , d 2 , d 3 , A 11 , A 22 , A 21 , A 22 , b 1 and b 2 are parameters, and Z represents an integer.
在上述公式中,部分决策变量为整数变量。若将上述约束条件中的x2∈Z替换为x∈Z,则该模型为纯整数规划模型。In the above formula, some decision variables are integer variables. If x 2 ∈Z in the above constraints is replaced by x∈Z, the model is a pure integer programming model.
(6)数学规划求解器(mathematical programming solver)(6) Mathematical programming solver
数学规划求解器是针对已经建立的线性规划、整数规划、混合整数规划以及各种非线性规划模型进行求解的软件系统。Mathematical programming solver is a software system that solves established linear programming, integer programming, mixed integer programming and various nonlinear programming models.
(7)分支定界(branch and bound)算法(7) Branch and bound algorithm
分支定界算法是求解规划问题的常用算法。大部分数学规划求解器的实现均是依托于该算法框架。分支定界算法是一种搜索与迭代的方法,选择不同的分支变量和子问题进行分支。将全部可行解空间反复地分割为越来越小的子集,即分支;对每个子集内的解集计算一个目标界限,即定界。在每次分支后,对于目标界限超过已知可行解集目标值的子集不再进一步分支,即剪枝。The branch-and-bound algorithm is a commonly used algorithm for solving planning problems. The implementation of most mathematical programming solvers relies on this algorithm framework. The branch-and-bound algorithm is a search and iteration method that selects different branch variables and sub-problems for branching. Repeatedly divide the entire feasible solution space into smaller and smaller subsets, that is, branches; calculate a target limit for the solution set in each subset, that is, delimitation. After each branch, no further branches will be made for the subset whose target limit exceeds the target value of the known feasible solution set, that is, pruning.
一个问题也可以称为一个节点。待求解的原问题可以视为根节点。分支的过程就是不断产生原问题的子问题的过程,也即不断增加节点的过程。定界指的是在分支的过程中检查子问题的上界和下界。如果子问题不能产生比当前的最优解更优的解,即可对该子问题进行剪枝。该子问题可以称为剪枝节点。当所有子问题均不能产生一个更优的解时,算法结束。A question can also be called a node. The original problem to be solved can be regarded as the root node. The process of branching is the process of continuously generating sub-problems of the original problem, that is, the process of continuously adding nodes. Delimiting refers to checking the upper and lower bounds of the subproblem during the branching process. If a subproblem cannot produce a better solution than the current optimal solution, the subproblem can be pruned. This sub-problem can be called pruning nodes. The algorithm ends when all subproblems cannot produce a better solution.
在每次迭代过程中,需要选择合适的节点以实现下一步的迭代计算。该节点可以称为扩展节点或搜索节点。例如,以最小值问题为例,伪成本(pseudo cost)函数可以用于预测每个节点的单步展开后的下界值,选择具有最小下界值的节点作为搜索节点。In each iteration process, appropriate nodes need to be selected to implement the next iterative calculation. This node may be called an expansion node or a search node. For example, taking the minimum value problem as an example, the pseudo cost function can be used to predict the lower bound value of each node after single-step expansion, and select the node with the smallest lower bound value as the search node.
下面以整数规划为例对分支定界算法的具体处理过程进行示例性说明。即待求解的原
问题为整数规划问题。The following takes integer programming as an example to illustrate the specific processing process of the branch and bound algorithm. That is, the original element to be solved The problem is an integer programming problem.
求解原问题的松弛解。若该松弛解为整数解,则该松弛解即为原问题的最优解。若该松弛解不是整数解,则原问题的最优解对应的目标函数的函数值一定不会优于该松弛解对应的目标函数的函数值。该松弛解对应的目标函数的函数值可以作为原问题的一个界限。对于最小值问题,即原问题的求解目标是最小化一个目标函数的函数值,该松弛解对应的目标函数的函数值即为原问题的一个下界(lower bound)。对于最大值问题,即原问题的求解目标是最大化一个目标函数的函数值,该松弛解对应的目标函数的函数值即为原问题的一个上界(upper bound)。Find a relaxed solution to the original problem. If the relaxed solution is an integer solution, then the relaxed solution is the optimal solution to the original problem. If the relaxed solution is not an integer solution, the function value of the objective function corresponding to the optimal solution of the original problem will not be better than the function value of the objective function corresponding to the relaxed solution. The function value of the objective function corresponding to the relaxed solution can be used as a limit of the original problem. For the minimum value problem, that is, the solution goal of the original problem is to minimize the function value of an objective function, and the function value of the objective function corresponding to the relaxed solution is a lower bound of the original problem. For the maximum problem, that is, the solution goal of the original problem is to maximize the function value of an objective function, and the function value of the objective function corresponding to the relaxed solution is an upper bound of the original problem.
对一个决策变量进行分支,原问题也被划分为两个子问题。对一个决策变量进行分支,也可以理解为构造两个约束条件,将该两个约束条件加入到原问题中,即构成了两个子问题。若一个子问题对应的松弛解为整数解时,该松弛解即为原问题的一个可行解,则原问题的最优解一定不会差于该可行解。因此,该可行解对应的目标函数的函数值可以作为原问题的一个界限,即当前的最优解。对于最小值问题,该可行解对应的目标函数的函数值即为原问题的一个上界。对于最大值问题,该可行解对应的目标函数的函数值即为原问题的一个下界。By branching a decision variable, the original problem is also divided into two sub-problems. Branching a decision variable can also be understood as constructing two constraints, and adding the two constraints to the original problem constitutes two sub-problems. If the corresponding relaxed solution of a sub-problem is an integer solution, the relaxed solution is a feasible solution of the original problem, and the optimal solution of the original problem must not be worse than the feasible solution. Therefore, the function value of the objective function corresponding to the feasible solution can be used as a limit of the original problem, that is, the current optimal solution. For the minimum value problem, the function value of the objective function corresponding to the feasible solution is an upper bound of the original problem. For the maximum problem, the function value of the objective function corresponding to the feasible solution is a lower bound of the original problem.
在各个子问题的松弛解对应的目标函数的函数值中,若函数值超出原问题的界限,则可以对该子问题(即剪枝节点)进行剪枝处理,即该子问题不再继续分支。选择一个子问题(即搜索节点或扩展节点)继续分支,并基于子问题的松弛解调整原问题的界限。重复上述过程,在上界等于下界时,即可得到原问题的最优解。在上界和下界之间的差距较小时,即可得到近似最优解。以最小值问题为例,在当前各个子问题的松弛解对应的目标函数的函数值中,将最小的函数值作为当前的下界,在当前原问题的各个可行解对应的目标函数的函数值中,将最小的函数值作为当前的上界。若一个子问题的松弛解对应的目标函数的函数值大于当前的上界,则该子问题不再继续分支。此时虽然在该子问题的可能的可行解还未找到,但是如果对该节点继续分支,即增加更多的约束,找到的解不会优于该节点的松弛解,因此,无需对该节点继续进行分支。Among the function values of the objective function corresponding to the relaxed solution of each sub-problem, if the function value exceeds the limit of the original problem, the sub-problem (i.e., pruning node) can be pruned, that is, the sub-problem will no longer branch. . Select a subproblem (i.e., search node or expansion node) to continue branching and adjust the bounds of the original problem based on the relaxed solution of the subproblem. Repeat the above process, and when the upper bound is equal to the lower bound, the optimal solution to the original problem can be obtained. When the gap between the upper and lower bounds is small, an approximately optimal solution can be obtained. Taking the minimum value problem as an example, among the function values of the objective function corresponding to the current relaxed solution of each sub-problem, the minimum function value is used as the current lower bound, among the function values of the objective function corresponding to each feasible solution of the current original problem , taking the smallest function value as the current upper bound. If the function value of the objective function corresponding to the relaxed solution of a sub-problem is greater than the current upper bound, the sub-problem will no longer branch. Although the possible feasible solution to this sub-problem has not yet been found at this time, if you continue to branch to this node, that is, add more constraints, the solution found will not be better than the relaxed solution of this node, so there is no need to Keep branching.
(8)神经网络(8)Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
The neural network can be composed of neural units. The neural unit can refer to an arithmetic unit that takes x s and intercept 1 as input. The output of the arithmetic unit can be:
The neural network can be composed of neural units. The neural unit can refer to an arithmetic unit that takes x s and intercept 1 as input. The output of the arithmetic unit can be:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于对神经网络中获取到的特征进行非线性变换,将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2,...n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to perform nonlinear transformation on the features obtained in the neural network and convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.
(9)损失函数(9)Loss function
在训练神经网络的过程中,因为希望神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来
更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training the neural network, because we hope that the output of the neural network is as close as possible to the value we really want to predict, we can compare the predicted value of the current network with the really desired target value, and then based on the difference between the two The situation comes Update the weight vector of each layer of the neural network (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, then Adjust the weight vector so that it predicts a lower value, and keep adjusting until the neural network can predict the truly desired target value or a value that is very close to the truly desired target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value". This is the loss function (loss function) or objective function (objective function), which is used to measure the difference between the predicted value and the target value. Important equations. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the neural network becomes a process of reducing this loss as much as possible.
(10)图神经网络(graph neural network,GNN)(10)Graph neural network (GNN)
GNN是一种将图结构数据作为输入的神经网络结构,通常用于输入特征为图结构的深度学习任务。GNN is a neural network structure that takes graph structure data as input, and is usually used for deep learning tasks where the input features are graph structures.
(11)强化学习(reinforcement learning,RL)(11) Reinforcement learning (RL)
强化学习主要用于解决序列决策问题。强化学习是通过智能体(agent)和环境(environment)的相互作用,不断学习最优策略,做出序列决策,并获得最大回报的过程。Reinforcement learning is mainly used to solve sequential decision-making problems. Reinforcement learning is a process that continuously learns optimal strategies, makes sequence decisions, and obtains maximum returns through the interaction between an agent and the environment.
智能体:用于根据环境反馈的状态(state)和奖励(reward)学习下一个合适的动作(action),以获得最大化长期总收益。Agent: Used to learn the next appropriate action (action) based on the state and reward of environmental feedback to maximize long-term total revenue.
环境:用于接收智能体执行的动作,对动作进行评价并转换为奖励反馈给智能体,奖励包括正向奖励和负向奖励。Environment: used to receive the actions performed by the agent, evaluate the actions and convert them into rewards to feed back to the agent. The rewards include positive rewards and negative rewards.
除了智能体和环境之外,强化学习系统还有几个核心要素:策略(policy)、奖励函数(reward function)、价值函数(value function)。In addition to the agent and environment, the reinforcement learning system also has several core elements: policy, reward function, and value function.
策略:是从状态到动作的映射,策略定义了智能体在下一步选择要执行的动作的方式。Strategy: It is a mapping from state to action. The strategy defines how the agent chooses the action to be performed in the next step.
奖励函数:用于对智能体执行的动作进行评价,计算智能体执行的动作的奖励值的函数。Reward function: A function used to evaluate the actions performed by the agent and calculate the reward value of the actions performed by the agent.
价值函数:用于预测状态或者动作的长期回报值的函数。在一些情况下,价值函数的值可以表示为从一个状态开始,在未来多个状态下的多个奖励函数的奖励值的加权累加。Value function: A function used to predict the long-term reward value of a state or action. In some cases, the value of the value function can be expressed as the weighted accumulation of the reward values of multiple reward functions in multiple future states starting from one state.
动作空间:是所有可能的动作的集合。Action space: is the set of all possible actions.
状态空间:是所有可能达到的状态的集合。State space: is the set of all possible states.
在给定一个环境的状态下,智能体根据某种策略选择一个要执行的动作。执行这个动作后,环境会发生改变,环境的状态会转换为新的状态,且环境可以对动作进行评价,向智能体反馈该动作对应的奖励值。智能体可以根据该奖励值调整策略,重复执行上述过程,以使得所有动作执行完之后的奖励值之和最大。Given the state of an environment, the agent chooses an action to perform based on a certain strategy. After executing this action, the environment will change, the state of the environment will be converted to a new state, and the environment can evaluate the action and feedback the reward value corresponding to the action to the agent. The agent can adjust the strategy based on the reward value and repeatedly execute the above process so that the sum of reward values after all actions are executed is maximized.
(12)深度Q学习(deep Q learning,DQL)(12) Deep Q learning (DQL)
DQL是一种典型的强化学习算法,适用于离散动作的序列决策问题。DQL可以通过估计每个动作的长期累积回报(Q函数)来帮助选择最优动作。Q函数,即Q(S,A),指的是在状态S下,采取动作A之后,未来将得到的奖励值之和,即动作A的长期累积回报。动作对应的Q值可以为策略提供参考依据。DQL is a typical reinforcement learning algorithm suitable for discrete action sequence decision-making problems. DQL can help select optimal actions by estimating the long-term cumulative return (Q function) of each action. The Q function, Q(S,A), refers to the sum of reward values that will be obtained in the future after taking action A in state S, that is, the long-term cumulative return of action A. The Q value corresponding to the action can provide a reference for the strategy.
在DQL中,Q值可以由深度神经网络(deep neural network,DNN)计算。在智能
体中,将环境当前的状态输入至DNN中,由DNN预测在该状态下执行各动作得到的Q值。In DQL, the Q value can be calculated by a deep neural network (DNN). in smart In the body, the current state of the environment is input into the DNN, and the DNN predicts the Q value obtained by executing each action in this state.
图1示出了一种基于分支定界法求解规划问题的装置的示意图。Figure 1 shows a schematic diagram of a device for solving planning problems based on the branch and bound method.
如图1所示,该求解装置可以包括预处理(presolving)模块、节点选择(node selection)模块、节点预处理(node presolving)模块、线性规划松弛(LP relaxation)模块、启发(heuristics)模块、分支(branching)模块和割平面(cutting plane)模块。As shown in Figure 1, the solving device can include a presolving module, a node selection module, a node presolving module, a linear programming relaxation (LP relaxation) module, and a heuristics module. Branching module and cutting plane module.
预处理模块用于对原问题进行预处理,以简化原问题,减少原问题的规模。示例性地,预处理可以包括删除冗余约束条件和决策变量。The preprocessing module is used to preprocess the original problem to simplify the original problem and reduce the scale of the original problem. Illustratively, preprocessing may include removing redundant constraints and decision variables.
节点选择模块用于选择搜索节点。节点选择模块可以从当前的待求解节点中确定搜索节点,使得分支模块后续可以基于该搜索节点进行分支。The node selection module is used to select search nodes. The node selection module can determine the search node from the current node to be solved, so that the branch module can subsequently branch based on the search node.
或者,节点选择模块可以从当前的节点中确定剪枝节点,在之后的求解过程中不再考虑该节点。Alternatively, the node selection module can determine the pruned node from the current nodes and no longer consider the node in subsequent solution processes.
节点预处理模块用于简化节点选择模块确定的搜索节点中的变量的约束。The node preprocessing module is used to simplify the constraints on the variables in the search nodes determined by the node selection module.
线性规划松弛模块用于构造松弛模型,并求解松弛解。The linear programming relaxation module is used to construct the relaxation model and solve the relaxation solution.
启发模块用于从松弛解出发通过启发式算法搜索该搜索节点的质量更高的解。The heuristic module is used to search for higher quality solutions to the search node using a heuristic algorithm starting from the relaxed solution.
分支模块用于对该搜索节点进行分支,即添加约束条件,得到该搜索节点的子节点,返回给节点选择模块,供节点选择模块执行下一轮的节点选择。The branch module is used to branch the search node, that is, add constraints, obtain the child nodes of the search node, and return them to the node selection module for the node selection module to perform the next round of node selection.
割平面模块用于基于割平面法添加多变量约束,以去除不满足该多变量约束的松弛解。The Cutting Plane module is used to add multivariable constraints based on the cutting plane method to remove relaxed solutions that do not satisfy the multivariable constraints.
具体地,割平面模块可以根据松弛解,生成一系列线性约束,并选择一部分线性约束加到原问题中,缩小求解可行域。Specifically, the cutting plane module can generate a series of linear constraints based on the relaxed solution, and select a part of the linear constraints to add to the original problem to reduce the feasible solution domain.
应理解,图1所示的求解装置仅为示例,在实际应用中,求解装置中可以包括更多或更少的模块。例如,求解装置中可以不包括割平面模块。再如,求解装置中可以不包括启发模块。It should be understood that the solving device shown in Figure 1 is only an example, and in actual applications, the solving device may include more or fewer modules. For example, the cutting plane module may not be included in the solving device. For another example, the solving device may not include a heuristic module.
在很多应用场景中,规划问题是带有整数约束的,例如,工厂选址或生产调度等问题。这类问题可以被建模为混合整数规划问题,由整数规划求解器进行求解。整数规划求解器通常是基于分支定界法框架实现的。具体地,在迭代计算过程中反复将原问题的解空间分割为越来越小的子集,即反复产生原问题的子问题(也可以称为节点),通过不断求解子问题以得到原问题的最优解。对于复杂问题,例如,决策变量的规模较大的问题,求解过程中会生成大量的节点,求解所需要的时间也较长,难以满足用户的使用需求。选择合适的节点进行相应的处理是求解速度提升的关键。例如,通过选择合适的节点进行剪枝处理,可以减少待求解的节点的数量,有利于提升求解速度。再如,通过选择合适的节点进行分支处理,有利于尽快找到最优解,即有利于提升求解速度。In many application scenarios, planning problems have integer constraints, such as factory location or production scheduling. Such problems can be modeled as mixed integer programming problems and solved by integer programming solvers. Integer programming solvers are usually implemented based on the branch-and-bound framework. Specifically, during the iterative calculation process, the solution space of the original problem is repeatedly divided into smaller and smaller subsets, that is, sub-problems (also called nodes) of the original problem are repeatedly generated, and the original problem is obtained by continuously solving the sub-problems. the optimal solution. For complex problems, for example, problems with large decision variables, a large number of nodes will be generated during the solution process, and the solution will take a long time, making it difficult to meet the user's needs. Selecting appropriate nodes for corresponding processing is the key to improving the speed of solving. For example, by selecting appropriate nodes for pruning, the number of nodes to be solved can be reduced, which is beneficial to improving the solving speed. For another example, by selecting appropriate nodes for branch processing, it is helpful to find the optimal solution as soon as possible, which is beneficial to improving the solution speed.
本申请实施例提供了一种选择节点的方法,应用于规划问题的求解场景中,有利于提高求解效率。The embodiments of this application provide a method for selecting nodes, which can be used in the solution scenario of planning problems and is beneficial to improving the solution efficiency.
具体地,本申请实施例的选择节点的方法可以应用于采用分支定界法求解规划问题的场景中。Specifically, the node selection method in the embodiment of the present application can be applied to scenarios where the branch and bound method is used to solve planning problems.
下面结合图2对本申请实施例中的选择节点的方法进行说明。The method of selecting nodes in the embodiment of the present application will be described below with reference to Figure 2.
图2示出了本申请实施例提供的一种选择节点的方法的示意性流程图。图2所示的方法200可以由选择节点的装置执行。示例性地,该选择节点的装置与求解器可以是分
开部署的两个装置,或者,该选择节点的装置与求解器也可以集成在同一个装置(例如,求解装置)中。本申请实施例对此不做限定。该求解器是基于分支定界算法框架实现的。Figure 2 shows a schematic flowchart of a node selection method provided by an embodiment of the present application. The method 200 shown in FIG. 2 may be performed by a device that selects a node. For example, the device and solver for selecting nodes may be Two devices are deployed separately, or the device for selecting nodes and the solver can also be integrated in the same device (for example, a solving device). The embodiments of the present application do not limit this. The solver is implemented based on the branch and bound algorithm framework.
示例性地,本申请实施例的选择节点的方法可以应用于图1所示的节点选择模块中。换言之,本申请实施例中的选择节点的装置可以为图1所示的节点选择模块。图1所示的节点选择模块可以采用本申请实施例的选择节点的方法确定合适的节点。Illustratively, the node selection method in the embodiment of the present application can be applied to the node selection module shown in Figure 1. In other words, the device for selecting nodes in this embodiment of the present application may be the node selection module shown in Figure 1 . The node selection module shown in Figure 1 can use the node selection method in the embodiment of the present application to determine appropriate nodes.
如图2所示,方法200包括步骤210至步骤220。下面对步骤210至步骤220进行说明。基于分支定界法求解规划问题是一个迭代求解的过程,步骤210至步骤220可以作为其中的一次迭代过程中执行的步骤。As shown in FIG. 2 , the method 200 includes steps 210 to 220 . Steps 210 to 220 are described below. Solving the planning problem based on the branch-and-bound method is an iterative solution process, and steps 210 to 220 can be performed as steps in one of the iterative processes.
210,获取目标规划问题的候选节点集合。该候选节点集合包括多个节点。210. Obtain the candidate node set for the goal planning problem. The candidate node set includes multiple nodes.
该多个节点中的每个节点分别对应目标规划问题的一个待求解的子问题。Each node in the plurality of nodes respectively corresponds to a sub-problem to be solved of the goal programming problem.
在本申请实施例中,子问题与节点是对应的,节点即可以理解为子问题,或者,节点也可以称为分支或分支节点,后文中不再区分。In the embodiment of this application, sub-problems correspond to nodes, and nodes can be understood as sub-problems, or nodes can also be called branches or branch nodes, which will not be distinguished later.
目标规划问题即为待求解的数学规划问题。The goal programming problem is the mathematical programming problem to be solved.
目标规划问题可以由目标规划问题的目标函数、约束条件以及决策变量表示。约束条件用于对决策变量进行约束。目标规划问题的决策变量中的至少部分为整数变量,即至少部分决策变量的值为整数。换言之,该目标规划问题为纯整数规划模型或混合整数规划模型。The goal programming problem can be represented by the objective function, constraints and decision variables of the goal programming problem. Constraints are used to constrain decision variables. At least part of the decision variables of the goal planning problem are integer variables, that is, at least part of the decision variables have integer values. In other words, the goal programming problem is a pure integer programming model or a mixed integer programming model.
示例性地,目标规划问题可以为最大值优化问题。换言之,目标规划问题的最优解为使得目标规划问题的目标函数的函数值最大的解。Illustratively, the goal programming problem may be a maximum optimization problem. In other words, the optimal solution to a goal programming problem is the solution that maximizes the function value of the objective function of the goal programming problem.
可替代地,目标规划问题可以为最小值优化问题。换言之,目标规划问题的最优解为使得目标规划问题的目标函数的函数值最小的解。Alternatively, the goal programming problem can be a minimum optimization problem. In other words, the optimal solution to a goal programming problem is the solution that minimizes the function value of the objective function of the goal programming problem.
最大值优化问题和最小值优化问题可以互相转换。为了便于理解和描述,本申请实施例中仅以最小值优化问题为例进行说明,不对本申请实施例的范围构成限定。Maximum optimization problems and minimum optimization problems can be converted into each other. In order to facilitate understanding and description, the embodiments of the present application only take the minimum value optimization problem as an example for explanation, which does not limit the scope of the embodiments of the present application.
以目标规划问题用于解决物流调度问题为例,目标函数可以为最小化物流调度费用,约束条件可以为配送点需要在指定时段内完成配送,决策变量可以包括快递员、时间和地点等。目标规划问题的子问题是基于分支定界法生成的。Taking the objective programming problem used to solve logistics scheduling problems as an example, the objective function can be to minimize logistics scheduling costs, and the constraints can be that the distribution point needs to complete delivery within a specified period of time. The decision variables can include couriers, time and location, etc. The subproblems of the goal programming problem are generated based on the branch and bound method.
该多个待求解的子问题可以在求解目标规划问题的过程中的一次迭代过程中生成的,也可以在多次迭代过程中生成的。The multiple sub-problems to be solved may be generated in one iteration process during the process of solving the goal programming problem, or may be generated in multiple iteration processes.
该多个待求解的子问题也可以称为多个活节点。即候选节点集合中的节点均为活节点。活结点指的是当前尚未被剪枝的节点。The multiple sub-problems to be solved can also be called multiple live nodes. That is, the nodes in the candidate node set are all live nodes. Live nodes refer to nodes that have not yet been pruned.
在一种可能的实现方式中,该选择节点的装置与求解器可以是分开部署的。在该情况下,可以由求解器获取目标规划问题,基于分支定界法生成目标规划问题的子问题,并发送至选择节点的装置中。In a possible implementation, the device for selecting nodes and the solver may be deployed separately. In this case, the target planning problem can be obtained by the solver, the sub-problems of the target planning problem can be generated based on the branch and bound method, and sent to the device for selecting nodes.
在另一种可能的实现方式中,该选择节点的装置与求解器可以是集成在求解装置中。在该情况下,可以由求解装置获取目标规划问题,基于分支定界法生成目标规划问题的子问题。In another possible implementation, the device for selecting nodes and the solver may be integrated in the solving device. In this case, the target planning problem can be obtained by the solving device, and the sub-problems of the target planning problem can be generated based on the branch and bound method.
示例性地,目标规划问题可以为用户提供的数据。Illustratively, a goal planning problem may provide user-supplied data.
示例性地,目标规划问题的候选节点集合可以为用户提供的数据。在该情况下,选择
节点的装置可以接收用户提供的目标规划问题的候选节点集合。For example, the set of candidate nodes for the goal planning problem may be data provided by the user. In this case, select The node device may receive a set of candidate nodes for a goal planning problem provided by the user.
应理解,以上仅为示例,还可以通过其他方式获取目标规划问题的候选节点集合,本申请实施例对此不做限定。It should be understood that the above are only examples, and the candidate node set for the target planning problem can also be obtained in other ways, which is not limited in the embodiments of the present application.
220,通过节点评估模型预测该候选节点集合中的部分或全部节点在多步展开后的界限值的相关量。220. Use the node evaluation model to predict the correlation amount of the limit value of some or all nodes in the candidate node set after multi-step expansion.
示例性地,节点评估模型可以是根据用户指示确定的。For example, the node evaluation model may be determined according to user instructions.
示例性地,该节点评估模型可以部署于云管理平台。For example, the node evaluation model can be deployed on a cloud management platform.
节点评估模型的输出结果用于确定目标节点,目标节点用于调整该候选节点集合,调整后的候选节点集合用于对目标规划问题进行求解。The output result of the node evaluation model is used to determine the target node, the target node is used to adjust the candidate node set, and the adjusted candidate node set is used to solve the target planning problem.
可选地,步骤220可以包括:通过节点评估模型预测该候选节点集合中的每个节点在多步展开后的界限值的相关量。Optionally, step 220 may include: predicting, through a node evaluation model, the correlation amount of the limit value of each node in the candidate node set after multi-step expansion.
换言之,通过节点评估模型预测该候选节点集合中的全部节点在多步展开后的界限值的相关量。In other words, the node evaluation model is used to predict the correlation amount of the limit value of all nodes in the candidate node set after multi-step expansion.
为了便于描述,本申请实施例中主要以全部节点为例进行说明,即以通过节点评估模型对每个节点进行处理为例进行说明,不对本申请实施例的方案构成限定。For the convenience of description, the embodiments of this application mainly take all nodes as an example, that is, processing each node through the node evaluation model as an example, which does not limit the solutions of the embodiments of this application.
节点评估模型的输入可以包括节点的相关信息。The input to the node evaluation model may include node-related information.
示例性地,节点评估模型的输入可以包括每个节点的相关信息。For example, the input of the node evaluation model may include relevant information of each node.
节点的相关信息包括以下至少一项:节点的目标函数,节点的约束条件或节点的决策变量。The relevant information of the node includes at least one of the following: the objective function of the node, the constraint condition of the node or the decision variable of the node.
示例性地,节点的相关信息可以包括节点的目标函数,节点的约束条件和节点的决策变量。将节点的目标函数,节点的约束条件和节点的决策变量输入至节点评估模型,可以输出该节点在多步展开后的界限值的相关量。For example, the relevant information of the node may include the objective function of the node, the constraint conditions of the node and the decision variables of the node. Input the node's objective function, node's constraints and node's decision variables into the node evaluation model, and the relevant quantities of the node's limit value after multi-step expansion can be output.
在本申请实施例中,节点评估模型的输出结果可以作为节点的评价信息。节点的评价信息与节点在多步展开后的界限值相关。示例性地,节点的评价信息可以用于指示节点评估模型对节点在多步展开后的界限值的相关量的预测。In the embodiment of the present application, the output result of the node evaluation model can be used as the evaluation information of the node. The evaluation information of a node is related to the node's limit value after multi-step expansion. For example, the evaluation information of the node may be used to indicate the node evaluation model's prediction of the correlation quantity of the node's limit value after multi-step expansion.
示例性地,步骤220可以包括:通过节点评估模型确定该多个节点的评价信息。该多个节点的评价信息用于从候选节点集合中确定目标节点,目标节点用于调整候选节点集合。For example, step 220 may include: determining evaluation information of the plurality of nodes through a node evaluation model. The evaluation information of the multiple nodes is used to determine the target node from the candidate node set, and the target node is used to adjust the candidate node set.
将一个节点展开可以得到该节点的子节点。将一个节点展开即为将一个子问题进行分支,从而得到新的子问题。节点在多步展开后的界限值即为子问题在多步分支后的界限值。Expand a node to get its child nodes. Expanding a node is to branch a sub-problem to obtain a new sub-problem. The limit value of a node after multi-step expansion is the limit value of the sub-problem after multi-step branching.
节点在多步展开后的界限值可以理解为节点在多步展开后得到的子节点的界限值。The limit value of a node after multi-step expansion can be understood as the limit value of the child node obtained after multi-step expansion of the node.
应理解,该节点评估模型用于预测节点在多步展开后的界限值的相关量。在节点评估模型的处理过程中,并非对节点执行了多步展开的操作。It should be understood that the node evaluation model is used to predict the relevant quantity of the boundary value of the node after multi-step expansion. During the processing of the node evaluation model, a multi-step expansion operation is not performed on the nodes.
在本申请实施例中,节点的评价信息可以用于指示对节点在多步展开后的界限值的相关量的预测,能够用于衡量节点展开的长期价值。In the embodiment of the present application, the evaluation information of the node can be used to indicate the prediction of the relevant quantity of the node's limit value after multi-step expansion, and can be used to measure the long-term value of the node expansion.
需要说明的是,该多个节点的评价信息与该多个节点在多步展开后的界限值相关,不同的节点展开的步数可能是相同的,也可能是不同的。It should be noted that the evaluation information of the multiple nodes is related to the limit values of the multiple nodes after multi-step expansion. The number of expansion steps of different nodes may be the same or different.
示例性地,该多个节点的评价信息与该多个节点在被完全求解后的界限值相关。For example, the evaluation information of the multiple nodes is related to the limit values of the multiple nodes after being completely solved.
不同节点在被展开至完全求解的过程中所需要的步数可能是相同的,也可能是不同的。The number of steps required for different nodes to be expanded to complete solution may be the same or different.
其中,一个节点被完全求解指的是该节点的子孙节点全部被求解。
Among them, if a node is completely solved, it means that all the descendant nodes of the node are solved.
可选地,多个节点在多步展开后的界限值包括该多个节点在多步展开后的松弛解对应的目标函数的函数值。Optionally, the limit values of the multiple nodes after the multi-step expansion include the function values of the objective functions corresponding to the relaxed solutions of the multiple nodes after the multi-step expansion.
以目标规划问题为最小值优化问题为例,节点的松弛解对应的目标函数的函数值可以为该节点的下界(lower bound)。换言之,该多个节点在多步展开后的界限值可以为该多个节点在多步展开后的下界值。即节点评估模型可以用于预测与节点在多步展开后的下界的相关的量。Taking the goal programming problem as a minimum optimization problem as an example, the function value of the objective function corresponding to the relaxation solution of a node can be the lower bound of the node. In other words, the limit values of the multiple nodes after multi-step expansion may be the lower bound values of the multiple nodes after multi-step expansion. That is, the node evaluation model can be used to predict quantities related to the lower bound of a node after multi-step expansion.
在一种可能的实现方式中,节点在多步展开后的界限值的相关量包括节点在多步展开后的界限值。In a possible implementation manner, the correlation quantity of the limit value of the node after multi-step expansion includes the limit value of the node after multi-step expansion.
在该情况下,该多个节点的评价信息可以用于指示对该多个节点在多步展开后的界限值的预测。In this case, the evaluation information of the multiple nodes may be used to indicate the prediction of the limit values of the multiple nodes after multi-step expansion.
换言之,节点评估模型可以输出该多个节点在多步展开后的界限值,即该节点评估模型对该多个节点在多步展开后的界限值的预测。在本申请实施例中,由节点评估模型的输出均为预测值,为了便于描述,除非特别强调,后文中不再区分。In other words, the node evaluation model can output the limit values of the multiple nodes after multi-step expansion, that is, the node evaluation model predicts the limit values of the multiple nodes after multi-step expansion. In the embodiment of the present application, the outputs of the node evaluation model are all predicted values. For the convenience of description, unless otherwise emphasized, no distinction will be made in the following text.
以目标规划模型为最小值优化问题为例,该界限值可以为下界值。在该情况下,一个节点在多步展开后的下界值越小,则从该节点出发能够搜索到全局最优解的可能性越高,则该节点的长期价值越高。Taking the goal programming model as a minimum value optimization problem as an example, the limit value can be a lower bound value. In this case, the smaller the lower bound value of a node after multi-step expansion, the higher the possibility of searching for the global optimal solution starting from this node, and the higher the long-term value of this node.
在另一种可能的实现方式中,节点在多步展开后的界限值的相关量包括节点在多步展开后的界限值与该节点的父节点的界限值之间的差异。In another possible implementation, the correlation quantity of the limit value of the node after multi-step expansion includes the difference between the limit value of the node after multi-step expansion and the limit value of the node's parent node.
在该情况下,该多个节点的评价信息可以用于指示对该多个节点在多步展开后的界限值与该多个节点的父节点的界限值之间的差异的预测。In this case, the evaluation information of the multiple nodes may be used to indicate the prediction of the difference between the limit values of the multiple nodes after multi-step expansion and the limit values of the parent nodes of the multiple nodes.
换言之,节点评估模型可以输出该个节点在多步展开后的界限值与该多个节点的父节点的界限值之间的差异,即该节点评估模型对该多个节点在多步展开后的界限值与该多个节点的父节点的界限值之间的差异的预测。In other words, the node evaluation model can output the difference between the limit value of the node after multi-step expansion and the limit value of the parent node of the multiple nodes, that is, the node evaluation model can output the difference between the limit value of the multiple nodes after multi-step expansion. A prediction of the difference between the bounding value and the bounding value of the parent node of this multiple nodes.
节点在多步展开后的界限值与该节点的父节点的界限值之间的差异即为该节点在多步展开前后的界限值的变化情况。The difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node is the change in the limit value of the node before and after multi-step expansion.
示例性地,节点在多步展开后的界限值与该节点的父节点的界限值之间的差异可以为该节点在多步展开后的界限值与该节点的父节点的界限值之间的差值。For example, the difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node may be the difference between the limit value of the node after multi-step expansion and the limit value of the node's parent node. difference.
可替换地,节点在多步展开后的界限值与该节点的父节点的界限值之间的差异可以为该节点在多步展开后的界限值除以该节点的父节点的界限值得到的结果。Alternatively, the difference between the limit value of a node after multi-step expansion and the limit value of the node's parent node can be obtained by dividing the limit value of the node after multi-step expansion by the limit value of the node's parent node. result.
应理解,以上仅为示例,节点在多步展开前后的界限值的变化情况还可以通过其他形式确定,本申请实施例对此不作限定。It should be understood that the above are only examples, and the changes in the limit value of a node before and after multi-step expansion can also be determined in other ways, which is not limited in the embodiments of the present application.
进一步地,节点在多步展开后的界限值的相关量包括节点在被完全求解后的界限值与该节点的父节点的界限值之间的差异。Further, the correlation quantity of the node's limit value after multi-step expansion includes the difference between the node's limit value after being completely solved and the limit value of the node's parent node.
在该情况下,该多个节点的评价信息可以用于指示该多个节点被完全求解后的界限值与该多个节点的父节点的界限值之间的差异。In this case, the evaluation information of the multiple nodes may be used to indicate the difference between the limit values of the multiple nodes after they are completely solved and the limit values of the parent nodes of the multiple nodes.
换言之,节点的评价信息可以用于指示该节点展开到被完全求解过程中的界限值的变化情况。该节点展开到被完全求解过程中的界限值的变化情况可以由该节点的多步伪成本函数的函数值表示。In other words, the evaluation information of a node can be used to indicate the change of the limit value in the process of the node being expanded to being completely solved. The change of the limit value in the process from the node being expanded to being completely solved can be represented by the function value of the multi-step pseudo-cost function of the node.
示例性地,节点被完全求解后的界限值与该节点的父节点的界限值之间的差异可以是
根据节点单步展开前后的界限值的变化情况和节点的子节点从展开到被完全求解过程中的界限值的变化情况确定。换言之,节点的多步伪成本函数的函数值可以根据节点单步展开前后的界限值的变化情况和节点的子节点的多步伪成本函数的函数值确定。For example, the difference between the limit value of a node after it is completely solved and the limit value of the node's parent node can be It is determined based on the change of the limit value before and after single-step expansion of the node and the change of the limit value of the node's child nodes from expansion to complete solution. In other words, the function value of the node's multi-step pseudo-cost function can be determined based on the changes in the limit value before and after the node's single-step expansion and the function values of the multi-step pseudo-cost function of the node's child nodes.
例如,节点的多步伪成本函数的函数值可以为节点被完全求解后的界限值与该节点的父节点的界限值之间的差值。节点的多步伪成本函数的函数值可以为节点单步展开前后的界限值的差值和节点的子节点的多步伪成本函数的函数值之和。节点评估模型即用于预测节点的多步伪成本函数的函数值。For example, the function value of a node's multi-step pseudo-cost function can be the difference between the bound value of the node after it is completely solved and the bound value of the node's parent node. The function value of the node's multi-step pseudo-cost function can be the sum of the difference between the limit values before and after the node's single-step expansion and the function value of the multi-step pseudo-cost function of the node's child nodes. The node evaluation model is the function value of the multi-step pseudo-cost function used to predict the node.
节点单步展开前后的变化情况为该节点的父节点的界限值与该节点的界限值之间的差异,即该节点单步展开前后的差异。节点的子节点从展开到被完全求解过程中的界限值的变化情况,即节点的子节点的多步伪成本函数的函数值为该节点的子节点在被完全求解后的界限值与该节点的界限值之间的差异。若该节点包括多个子节点,节点的多步伪成本函数的函数值可以是根据节点单步展开前后的变化情况和该多个子节点的多步伪成本函数的函数值中的最小值确定的。或者,节点的多步伪成本函数的函数值可以是根据节点单步展开前后的变化情况和该多个子节点的多步伪成本函数的函数值中的最大值确定的。或者,节点的多步伪成本函数的函数值可以是根据节点单步展开前后的变化情况和该多个子节点的多步伪成本函数的函数值的平均值确定的。应理解,以上仅为示例,该节点包括多个子节点时,节点的多步伪成本函数的函数值还可以通过其他方式确定,本申请实施例对此不做限定。The change of a node before and after single-step expansion is the difference between the limit value of the node's parent node and the limit value of the node, that is, the difference before and after single-step expansion of the node. The change of the limit value of the node's child nodes from expansion to complete solution, that is, the function value of the multi-step pseudo-cost function of the node's child node is the limit value of the node's child node after being fully solved and the node The difference between the limit values. If the node includes multiple child nodes, the function value of the node's multi-step pseudo-cost function may be determined based on the minimum value of the changes before and after the single-step expansion of the node and the function values of the multi-step pseudo-cost function of the multiple child nodes. Alternatively, the function value of the multi-step pseudo-cost function of the node may be determined based on the change of the node before and after single-step expansion and the maximum value of the function values of the multi-step pseudo-cost function of the multiple child nodes. Alternatively, the function value of the multi-step pseudo-cost function of the node may be determined based on the changes before and after the single-step expansion of the node and the average value of the function values of the multi-step pseudo-cost function of the multiple child nodes. It should be understood that the above is only an example. When the node includes multiple child nodes, the function value of the multi-step pseudo-cost function of the node can also be determined in other ways, which is not limited in the embodiment of the present application.
节点从展开到被完全求解过程中的界限值的变化情况可以由该节点的多步伪成本函数的函数值表示。多步伪成本函数可以用于衡量节点展开的长期价值。节点评估模型可以用于预测节点的多步伪成本函数的函数值。The change in the limit value of a node from expansion to complete solution can be represented by the function value of the node's multi-step pseudo-cost function. A multi-step pseudo-cost function can be used to measure the long-term value of node expansion. The node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function.
示例性地,节点的多步伪成本函数可以满足如下公式:
For example, the multi-step pseudo-cost function of a node can satisfy the following formula:
For example, the multi-step pseudo-cost function of a node can satisfy the following formula:
其中,C(·)代表节点的多步伪成本函数。c(·)代表节点单步展开前后的界限值的变化,即第一差异。节点Ni为节点P的子节点。上述公式中的第二项为节点P的子节点中多步伪成本函数的函数值的最小值,即为第二差异。节点P的多步伪成本函数可以理解为,从节点P展开到该节点P被完全求解过程的界限值的变化情况。应理解,以上公式仅为示例,不对本申请实施例的多步伪成本函数构成限定。例如,上述公式中的第二项(即第二差异)为节点P的子节点在被完全求解后的界限值与节点P的界限值之间的最小差异。在其他表示方式中,上述公式中的第二项还可以为节点P的子节点在被完全求解后的界限值与节点P的界限值之间的最大差异,即
Among them, C(·) represents the multi-step pseudo-cost function of the node. c(·) represents the change in the limit value of the node before and after single-step expansion, that is, the first difference. Node Ni is a child node of node P. The second term in the above formula is the minimum value of the function value of the multi-step pseudo-cost function in the child nodes of node P, which is the second difference. The multi-step pseudo-cost function of node P can be understood as the change in the limit value from the expansion of node P to the complete solution of node P. It should be understood that the above formula is only an example and does not limit the multi-step pseudo cost function of the embodiment of the present application. For example, the second term (ie, the second difference) in the above formula is the minimum difference between the limit value of the child node of node P after being completely solved and the limit value of node P. In other expressions, the second term in the above formula can also be the maximum difference between the limit value of the child node of node P after being completely solved and the limit value of node P, that is
一个节点在多步展开前后的界限值的变化越小,则从该节点出发能够搜索到最优解的可能性越高,则该节点的长期价值越高。The smaller the change in the limit value of a node before and after multi-step expansion, the higher the possibility of searching for the optimal solution starting from the node, and the higher the long-term value of the node.
以目标规划模型为最小值优化问题为例,该界限值可以为下界值。在该情况下,一个节点在多步展开前后的下界值的变化越小,则从该节点出发能够搜索到全局最优解的可能性越高,该节点的长期价值越高。Taking the goal programming model as a minimum value optimization problem as an example, the limit value can be a lower bound value. In this case, the smaller the change in the lower bound value of a node before and after multi-step expansion, the higher the possibility of searching for the global optimal solution starting from this node, and the higher the long-term value of this node.
可选地,节点评估模型可以为神经网络模型、随机森林模型、支持向量机模型或线性回归模型等。Optionally, the node evaluation model can be a neural network model, a random forest model, a support vector machine model or a linear regression model, etc.
示例性地,节点评估模型可以为全连接神经网络模型。
For example, the node evaluation model may be a fully connected neural network model.
应理解,以上仅为示例,节点评估模型还可以采用其他结构的模型,本申请实施例对此不做限定。It should be understood that the above are only examples, and the node evaluation model can also adopt models with other structures, which are not limited in the embodiments of the present application.
可选地,该节点评估模型可以是基于训练数据训练得到的。训练数据包括样本节点的相关信息和样本节点对应的标签,样本节点对应的标签与样本节点在多步展开后的界限值相关,样本节点的相关信息包括以下至少以下一项:样本节点的目标函数,样本节点的约束条件或样本节点的决策变量。Optionally, the node evaluation model may be trained based on training data. The training data includes relevant information of the sample node and the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. The relevant information of the sample node includes at least one of the following: the objective function of the sample node. , the constraints of the sample node or the decision variables of the sample node.
可选地,节点评估模型可以是通过强化学习的方式训练得到的。Optionally, the node evaluation model can be trained through reinforcement learning.
示例性地,节点评估模型可以是通过深度Q学习的方式训练得到的。For example, the node evaluation model may be trained through deep Q learning.
可选地,样本节点对应的标签用于指示样本节点在多步展开后的界限值和样本节点的父节点的界限值之间的差异。Optionally, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
进一步地,样本节点对应的标签用于指示样本节点在被完全求解后的界限值和样本节点的父节点之间的界限值之间的差异。Further, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after being completely solved and the limit value of the parent node of the sample node.
可选地,样本节点对应的标签是根据第一差异和第二差异确定的。第一差异为样本节点的父节点的界限值与样本节点的界限值之间的差异。第二差异是通过将样本节点的子节点输入至目标评估模型中进行处理后得到的。目标评估模型与节点评估模型的结构相同。目标评估模型用于预测样本节点的子节点在被完全求解后的界限值与该样本节点的界限值之间的差异。Optionally, the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing. The target evaluation model has the same structure as the node evaluation model. The target evaluation model is used to predict the difference between the bounding value of the child node of the sample node after being completely solved and the bounding value of the sample node.
第一差异可以通过求解器确定。例如,调用求解器获取样本节点的父节点的界限值和样本节点的界限值,从而可以确定第一差异。The first difference can be determined by the solver. For example, the solver is called to obtain the limit value of the parent node of the sample node and the limit value of the sample node, so that the first difference can be determined.
节点评估模型可以由选择节点的装置训练得到,或者,也可以由其他装置训练得到。本申请实施例对此不做限定。The node evaluation model can be trained by a device that selects nodes, or it can also be trained by other devices. The embodiments of the present application do not limit this.
具体训练过程可以参考后文中的描述,此处不展开描述。The specific training process can be referred to the description later, and will not be described here.
可选地,样本节点在多步展开后的界限值包括样本节点在多步展开后的松弛解对应的目标函数的函数值。Optionally, the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
以目标规划问题为最小值优化问题为例,节点的松弛解对应的目标函数的函数值可以为该节点的下界(lower bound)。换言之,样本节点在多步展开后的界限值可以为样本节点在多步展开后的下界值。Taking the goal programming problem as a minimum optimization problem as an example, the function value of the objective function corresponding to the relaxation solution of a node can be the lower bound of the node. In other words, the limit value of the sample node after multi-step expansion may be the lower bound value of the sample node after multi-step expansion.
节点的松弛解对应的目标函数的函数值的获取较为方便。在本申请实施例中,节点评估模型是可以是基于训练数据训练得到的,在节点的界限值是基于松弛解确定的情况下,样本节点对应的标签较易确定,使得训练数据更容易采集,有利于生成大量的训练数据,从而提高节点评估模型的训练效果,即提高节点评估模型的预测准确度。It is more convenient to obtain the function value of the objective function corresponding to the relaxed solution of the node. In the embodiment of this application, the node evaluation model can be trained based on the training data. When the limit value of the node is determined based on the relaxed solution, the label corresponding to the sample node is easier to determine, making the training data easier to collect. It is beneficial to generate a large amount of training data, thereby improving the training effect of the node evaluation model, that is, improving the prediction accuracy of the node evaluation model.
应理解,在最小值优化问题中,在本申请实施例主要以界限值为下界值为例进行说明,在实际应用中,界限值也可以为上界值,本申请实施例对此不做限定。It should be understood that in the minimum value optimization problem, the embodiment of the present application mainly takes the limit value as the lower bound value as an example for explanation. In practical applications, the limit value can also be the upper limit value, which is not limited in the embodiment of the present application. .
步骤220中的调整后的候选节点集合可以作为下一轮迭代过程中的候选节点集合。The adjusted candidate node set in step 220 can be used as the candidate node set in the next round of iteration process.
示例性地,在下一轮迭代过程中,可以将步骤210中的候选节点集合替换为调整后的候选节点集合,重复执行方法200。方法200可以重复执行直至求解结束。For example, in the next iteration process, the candidate node set in step 210 can be replaced with the adjusted candidate node set, and the method 200 is repeatedly executed. Method 200 may be executed repeatedly until the solution is completed.
目标节点可以包括第一目标节点和第二目标节点中的至少一项。The target node may include at least one of a first target node and a second target node.
可选地,节点评估模型的输出结果用于确定第一目标节点。调整后的候选节点集合中包括第一目标节点的子节点。
Optionally, the output result of the node evaluation model is used to determine the first target node. The adjusted candidate node set includes child nodes of the first target node.
如前所述,节点评估模型的输出结果可以用于确定目标节点,目标节点可以包括第一目标节点。第一目标节点的子节点用于调整候选节点集合,调整后的候选节点集合中包括第一目标节点的子节点。As mentioned above, the output result of the node evaluation model can be used to determine the target node, and the target node can include the first target node. The child nodes of the first target node are used to adjust the candidate node set, and the adjusted candidate node set includes the child nodes of the first target node.
第一目标节点也可以称为搜索节点或扩展节点。The first target node may also be called a search node or an expansion node.
在一种可能的实现方式中,该选择节点的装置与求解器可以是分开部署的。在该情况下,求解器可以对第一目标节点进行展开,以得到第一目标节点的子节点,并将第一目标节点的子节点添加至候选节点集合中,用于下一轮的迭代计算。In a possible implementation, the device for selecting nodes and the solver may be deployed separately. In this case, the solver can expand the first target node to obtain the child nodes of the first target node, and add the child nodes of the first target node to the candidate node set for the next round of iterative calculations .
在另一种可能的实现方式中,该选择节点的装置与求解器可以是集成在求解装置中。在该情况下,可以由求解装置对第一目标节点进行展开,以得到第一目标节点的子节点,并将第一目标节点的子节点添加至候选节点集合中,用于下一轮的迭代计算。In another possible implementation, the device for selecting nodes and the solver may be integrated in the solving device. In this case, the first target node can be expanded by the solving device to obtain the child nodes of the first target node, and the child nodes of the first target node can be added to the candidate node set for the next round of iteration calculate.
可选地,节点评估模型的输出结果用于确定第二目标节点。调整后的候选节点集合中不包括第二目标节点。Optionally, the output result of the node evaluation model is used to determine the second target node. The second target node is not included in the adjusted candidate node set.
如前所述,节点评估模型的输出结果可以用于确定目标节点,目标节点可以包括第二目标节点。第二目标节点用于调整候选节点集合,调整后的候选节点集合中不包括第二目标节点。As mentioned above, the output result of the node evaluation model can be used to determine the target node, and the target node can include a second target node. The second target node is used to adjust the candidate node set, and the adjusted candidate node set does not include the second target node.
第二目标节点也可以称为剪枝节点。剪枝节点在目标规划问题之后的求解过程中不会被求解。The second target node may also be called a pruning node. Pruned nodes will not be solved during the subsequent solution of the goal programming problem.
在一种可能的实现方式中,该选择节点的装置与求解器可以是分开部署的。在该情况下,求解器可以对第二目标节点进行剪枝处理。例如,从候选节点集合删除第二目标节点。调整后的候选节点集合用于下一轮的迭代计算。In a possible implementation, the device for selecting nodes and the solver may be deployed separately. In this case, the solver can prune the second target node. For example, the second target node is deleted from the set of candidate nodes. The adjusted set of candidate nodes is used for the next round of iterative calculations.
在另一种可能的实现方式中,该选择节点的装置与求解器可以是集成在求解装置中。在该情况下,求解装置可以对第二目标节点进行剪枝处理。例如,从候选节点集合删除第二目标节点。调整后的候选节点集合用于下一轮的迭代计算。In another possible implementation, the device for selecting nodes and the solver may be integrated in the solving device. In this case, the solving device may perform pruning processing on the second target node. For example, the second target node is deleted from the set of candidate nodes. The adjusted set of candidate nodes is used for the next round of iterative calculations.
下面以节点评估模型用于预测多个节点在多步展开前后的界限值的变化情况为例对第一目标节点和第二目标节点的确定方式进行示例性说明。在确定目标节点的过程中所涉及的节点在多步展开后的界限值与父节点的界限值之间的差异均为节点评估模型预测得到的。The method for determining the first target node and the second target node is illustratively described below by taking the node evaluation model used to predict changes in the limit values of multiple nodes before and after multi-step expansion as an example. In the process of determining the target node, the difference between the limit value of the node involved after multi-step expansion and the limit value of the parent node is predicted by the node evaluation model.
可选地,第一目标节点为多个节点中在多步展开后的界限值与父节点的界限值之间的差异最小的节点。Optionally, the first target node is the node with the smallest difference between the limit value after multi-step expansion and the limit value of the parent node among the multiple nodes.
在本申请实施例中,差异最小可以理解为节点展开前后的变化最小。In the embodiment of this application, the smallest difference can be understood as the smallest change before and after node expansion.
第一目标节点在多个展开后的界限值与第一目标节点的父节点的界限值之间的差异小于或等于该多个节点中的剩余节点在多步展开后的界限值与剩余节点的父节点之间的差异。The difference between the limit value of the first target node after multiple expansions and the limit value of the parent node of the first target node is less than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion and the difference between the limit values of the remaining nodes after multi-step expansion. Difference between parent nodes.
例如,该多个节点中,节点#1在多步展开后的界限值与节点#1的父节点的界限值之间的差异最小,则节点#1可以作为第一目标节点。For example, among the multiple nodes, the difference between the limit value of node #1 after multi-step expansion and the limit value of node #1's parent node is the smallest, then node #1 can be used as the first target node.
可替换地,第一目标节点属于多个节点在多步展开后的界限值与多个节点的父节点的界限值之间的差异最小的j个节点,j为大于1的整数。j小于该多个节点的数量。Alternatively, the first target node belongs to the j nodes with the smallest difference between the limit values of the multiple nodes after multi-step expansion and the limit values of the parent nodes of the multiple nodes, where j is an integer greater than 1. j is less than the number of nodes.
换言之,该j个节点在多步展开后的界限值与该j个节点的父节点的界限值之间的差异小于或等于该多个节点中的剩余节点在多步展开后的界限值与剩余节点的父节点之间
的差异。In other words, the difference between the limit values of the j nodes after multi-step expansion and the limit values of the parent nodes of the j nodes is less than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion. between the node’s parent nodes difference.
第一目标节点可以从该j个节点中确定的。The first target node can be determined from the j nodes.
例如,第一目标节点可以是从该j个节点中随机确定的。For example, the first target node may be randomly determined from the j nodes.
再如,第一目标节点可以是根据该j个节点对应的概率确定的。该j个节点对应的概率即为被确定为第一目标节点的概率。该j个节点对应的概率与该j个节点在多步展开后的界限值与该j个节点的父节点的界限值之间的差异成负相关关系。即在该j个节点中,节点多步展开前后的变化情况越明显,该节点被确定为第一目标节点的概率越小。For another example, the first target node may be determined based on the probabilities corresponding to the j nodes. The probability corresponding to the j nodes is the probability of being determined as the first target node. The probability corresponding to the j nodes is negatively correlated with the difference between the limit value of the j node after multi-step expansion and the limit value of the parent node of the j node. That is, among the j nodes, the more obvious the change of the node before and after multi-step expansion, the smaller the probability that the node is determined to be the first target node.
示例性地,第二目标节点为多个节点中在多步展开后的界限值与父节点的界限值之间的差异最大的节点。For example, the second target node is the node with the largest difference between the limit value after multi-step expansion and the limit value of the parent node among the multiple nodes.
换言之,第二目标节点在多步展开后的界限值与第二目标节点的父节点的界限值之间的差异大于或等于该多个节点中的剩余节点在多步展开后的界限值与剩余节点的父节点之间的差异。In other words, the difference between the limit value of the second target node after multi-step expansion and the limit value of the parent node of the second target node is greater than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion and the remaining limit value. The difference between the node's parents.
可选地,第二目标节点属于多个节点在多步展开后的界限值与多个节点的父节点的界限值之间的差异最大的k个节点,k为大于1的整数。k小于该多个节点的数量。Optionally, the second target node belongs to the k nodes with the largest difference between the limit values of multiple nodes after multi-step expansion and the limit values of parent nodes of the multiple nodes, where k is an integer greater than 1. k is less than the number of nodes.
换言之,该k个节点在多步展开后的界限值与该k个节点的父节点的界限值之间的差异大于或等于该多个节点中的剩余节点在多步展开后的界限值与剩余节点的父节点之间的差异。In other words, the difference between the limit values of the k nodes after multi-step expansion and the limit values of the parent nodes of the k nodes is greater than or equal to the limit value of the remaining nodes after multi-step expansion and the remaining The difference between a node's parents.
第二目标节点可以从该k个节点中确定的。The second target node can be determined from the k nodes.
例如,第二目标节点可以是从该k个节点中随机确定的。For example, the second target node may be randomly determined from the k nodes.
可选地,第二目标节点是基于该k个节点对应的概率确定的,k个节点对应的概率与k个节点在多步展开后的界限值与k个节点的父节点的界限值之间的差异呈正相关关系。Optionally, the second target node is determined based on the probability corresponding to the k nodes. The probability corresponding to the k nodes is between the limit value of the k node after multi-step expansion and the limit value of the parent node of the k node. The differences are positively correlated.
换言之,在该k个节点中,节点多步展开前后的变化情况越不明显,该节点被确定为第二目标节点的概率越小。In other words, among the k nodes, the less obvious the change of the node before and after multi-step expansion, the smaller the probability that the node is determined to be the second target node.
由于节点被剪枝之后,该节点在目标规划问题的求解过程中不再被求解,剪枝操作有可能导致包含最优解的节点被剪枝。通过上述贪婪方式概率性地确定第二目标节点,有利于降低剪枝操作的风险性。Since after a node is pruned, the node will no longer be solved during the solution process of the goal planning problem, and the pruning operation may cause the node containing the optimal solution to be pruned. Probabilistically determining the second target node through the above greedy method is beneficial to reducing the risk of the pruning operation.
下面以节点评估模型用于预测多个节点在多步展开后的界限值为例对第一目标节点和第二目标节点的确定方式进行示例性说明。在确定目标节点的过程中所涉及的节点在多步展开后的界限值均为节点评估模型预测得到的。为了便于描述,下面以最小值优化问题为例进行示例性说明。The method for determining the first target node and the second target node is illustrated below by taking the node evaluation model to predict the limit values of multiple nodes after multi-step expansion as an example. The limit values of the nodes involved in the process of determining the target node after multi-step expansion are all predicted by the node evaluation model. For the convenience of description, the following takes the minimum value optimization problem as an example for illustrative explanation.
可选地,第一目标节点为多个节点中在多步展开后的界限值最小的节点。Optionally, the first target node is the node with the smallest limit value after multi-step expansion among the multiple nodes.
第一目标节点在多个展开后的界限值小于或等于该多个节点中的剩余节点在多步展开后的界限值。The limit value of the first target node after multiple expansions is less than or equal to the limit value of the remaining nodes among the multiple nodes after multi-step expansion.
例如,该多个节点中,节点#1在多步展开后的界限值最小,则节点#1可以作为第一目标节点。For example, among the multiple nodes, node #1 has the smallest limit value after multi-step expansion, then node #1 can be used as the first target node.
可替换地,第一目标节点属于多个节点在多步展开后的界限值最小的j个节点,j为大于1的整数。j小于该多个节点的数量。Alternatively, the first target node belongs to j nodes with the smallest limit value after multi-step expansion of multiple nodes, where j is an integer greater than 1. j is less than the number of nodes.
换言之,该j个节点在多步展开后的界限值小于或等于该多个节点中的剩余节点在多步展开后的界限值。
In other words, the limit values of the j nodes after multi-step expansion are less than or equal to the limit values of the remaining nodes among the plurality of nodes after multi-step expansion.
第一目标节点可以从该j个节点中确定的。The first target node can be determined from the j nodes.
例如,第一目标节点可以是从该j个节点中随机确定的。For example, the first target node may be randomly determined from the j nodes.
再如,第一目标节点可以是根据该j个节点对应的概率确定的。该j个节点对应的概率即为被确定为第一目标节点的概率。该j个节点对应的概率与该j个节点在多步展开后的界限值成负相关关系。即在该j个节点中,节点多步展开后的界限值越小,该节点被确定为第一目标节点的概率越大。For another example, the first target node may be determined based on the probabilities corresponding to the j nodes. The probability corresponding to the j nodes is the probability of being determined as the first target node. The probabilities corresponding to the j nodes are negatively correlated with the limit values of the j nodes after multi-step expansion. That is, among the j nodes, the smaller the limit value of the node after multi-step expansion, the greater the probability that the node is determined to be the first target node.
示例性地,第二目标节点为多个节点中在多步展开后的界限值最大的节点。For example, the second target node is the node with the largest limit value after multi-step expansion among the multiple nodes.
换言之,第二目标节点在多步展开后的界限值大于或等于该多个节点中的剩余节点在多步展开后的界限值。In other words, the limit value of the second target node after multi-step expansion is greater than or equal to the limit value of the remaining nodes among the plurality of nodes after multi-step expansion.
可选地,第二目标节点属于多个节点在多步展开后的界限值最大的k个节点,k为大于1的整数。k小于该多个节点的数量。Optionally, the second target node belongs to the k nodes with the largest limit values of multiple nodes after multi-step expansion, and k is an integer greater than 1. k is less than the number of nodes.
换言之,该k个节点在多步展开后的界限值大于或等于该多个节点中的剩余节点在多步展开后的界限值。In other words, the limit values of the k nodes after multi-step expansion are greater than or equal to the limit values of the remaining nodes among the plurality of nodes after multi-step expansion.
第二目标节点可以从该k个节点中确定的。The second target node can be determined from the k nodes.
例如,第二目标节点可以是从该k个节点中随机确定的。For example, the second target node may be randomly determined from the k nodes.
可选地,第二目标节点是基于该k个节点对应的概率确定的,k个节点对应的概率与k个节点在多步展开后的界限值呈正相关关系。Optionally, the second target node is determined based on the probabilities corresponding to the k nodes. The probabilities corresponding to the k nodes are positively correlated with the limit values of the k nodes after multi-step expansion.
换言之,在该k个节点中,节点多步展开后的界限值越大,该节点被确定为第二目标节点的概率越大。In other words, among the k nodes, the greater the limit value of the node after multi-step expansion, the greater the probability that the node is determined to be the second target node.
由于节点被剪枝之后,该节点在目标规划问题的求解过程中不再被求解,剪枝操作有可能导致包含最优解的节点被剪枝。通过上述贪婪方式概率性地确定第二目标节点,有利于降低剪枝操作的风险性。Since after a node is pruned, the node will no longer be solved during the solution process of the goal planning problem, and the pruning operation may cause the node containing the optimal solution to be pruned. Probabilistically determining the second target node through the above greedy method is beneficial to reducing the risk of the pruning operation.
应理解,以上仅为示例,还可以根据其他方式确定第一目标节点和第二目标节点,本申请实施例对此不做限定。It should be understood that the above are only examples, and the first target node and the second target node can also be determined in other ways, which are not limited in this embodiment of the present application.
可选地,方法200还可以包括:将目标节点的指示信息发送至求解器。Optionally, the method 200 may also include: sending indication information of the target node to the solver.
在选择节点的装置和求解器分开部署的情况下,选择节点的装置可以将目标节点的指示信息发送至求解器。求解器可以根据目标节点求解目标规划问题。In the case where the device for selecting a node and the solver are deployed separately, the device for selecting a node may send the indication information of the target node to the solver. The solver can solve objective programming problems based on target nodes.
示例性地,目标节点的指示信息可以包括目标节点自身。For example, the indication information of the target node may include the target node itself.
例如,选择节点的装置可以基于节点评估模型的输出结果确定目标节点,并将目标节点发送至求解器。For example, the means for selecting nodes may determine the target node based on the output result of the node evaluation model and send the target node to the solver.
示例性地,目标节点的指示信息可以包括该多个节点中的部分或全部节点的评价信息。For example, the indication information of the target node may include evaluation information of some or all nodes in the plurality of nodes.
例如,选择节点的装置可以将该部分或全部节点的评价信息发送至求解器。求解器可以基于该部分或全部节点的评价信息确定目标节点。For example, the device for selecting nodes may send evaluation information of some or all nodes to the solver. The solver can determine the target node based on the evaluation information of some or all nodes.
示例性地,目标节点的指示信息可以包括该多个节点的搜索顺序。For example, the indication information of the target node may include the search order of the multiple nodes.
例如,排在第一位的节点可以为下一轮迭代中的搜索节点。For example, the node ranked first can be the search node in the next iteration.
或者,目标节点的指示信息还可以包括其他与节点评估模型的输出结果相关的信息,只要求解器能够根据该信息确定节点的评价信息,进而确定目标节点即可。Alternatively, the indication information of the target node may also include other information related to the output results of the node evaluation model, as long as the solver can determine the evaluation information of the node based on this information, and then determine the target node.
在本申请实施例中,节点评估模型可以预测节点在多步展开前后的界限值的相关量,有利于预测从该多个节点出发能够搜索到的最优解,该相关量可以用于衡量节点展开的长
期价值,使得目标节点的选择更准确,有利于选择合适的节点进行相应的处理,使得调整后的候选节点集合中的节点为更有可能得到最优解的节点,从而有利于提高求解效率。In the embodiment of the present application, the node evaluation model can predict the correlation quantity of the node's limit value before and after multi-step expansion, which is beneficial to predicting the optimal solution that can be searched from the multiple nodes. The correlation quantity can be used to measure the node extended long The period value makes the selection of target nodes more accurate, which is conducive to selecting appropriate nodes for corresponding processing, making the nodes in the adjusted candidate node set more likely to obtain the optimal solution, which is conducive to improving the solution efficiency.
在目标节点包括第一目标节点的情况下,可以根据节点评估模型的输出结果确定第一目标节点,在求解过程中可以基于第一目标节点进行迭代计算。具体地,将第一目标节点展开以得到第一目标节点的子节点,进而执行迭代计算。节点评估模型的输出结果能够用于衡量节点展开的长期价值,有利于判断节点在多步展开后能够得到最优解的可能性,基于此确定出的第一目标节点更有可能得到最优解,从而有利于提高收敛速度,提高求解效率。例如,对于最小值优化问题,该节点在多步展开的界限值越小,从该节点出发能够得到全局最优解的可能性越高。节点评估模型可以用于预测节点在多步展开后的界限值,有利于判断节点在多步展开后能够得到最优解的可能性。In the case where the target node includes the first target node, the first target node may be determined according to the output result of the node evaluation model, and iterative calculation may be performed based on the first target node during the solution process. Specifically, the first target node is expanded to obtain the child nodes of the first target node, and then iterative calculation is performed. The output results of the node evaluation model can be used to measure the long-term value of node expansion, which is helpful to judge the possibility of the node obtaining the optimal solution after multi-step expansion. Based on this, the first target node determined is more likely to obtain the optimal solution. , which is beneficial to improving the convergence speed and solving efficiency. For example, for the minimum optimization problem, the smaller the boundary value of the node in multi-step expansion, the higher the possibility of obtaining the global optimal solution starting from this node. The node evaluation model can be used to predict the boundary value of the node after multi-step expansion, which is helpful to judge the possibility of the node getting the optimal solution after multi-step expansion.
在目标节点包括第二目标节点的情况下,可以根据节点评估模型的输出结果确定第二目标节点,在求解过程中可以对第二目标节点进行剪枝处理。节点评估模型的输出结果能够衡量节点展开的长期价值,有利于判断节点展开后能够得到最优解的可能性,基于此确定第二目标节点。将得到最优解的可能性较低的节点进行剪枝处理能够缩小解空间,避免在无用节点上展开和求解所带来的时延,从而提高求解效率。例如,对于最小值优化问题,该节点在多步展开的界限值越大,从该节点出发能够得到全局最优解的可能性越小。节点评估模型可以用于预测节点在多步展开后的界限值,有利于判断节点在多步展开后能够得到最优解的可能性,基于此确定第二目标节点,避免在无用节点上展开和求解所带来的时延。When the target node includes a second target node, the second target node can be determined according to the output result of the node evaluation model, and the second target node can be pruned during the solution process. The output results of the node evaluation model can measure the long-term value of node expansion, which is helpful to judge the possibility of obtaining the optimal solution after node expansion, and determine the second target node based on this. Pruning nodes that are less likely to obtain the optimal solution can reduce the solution space and avoid the time delay caused by expanding and solving on useless nodes, thus improving the solution efficiency. For example, for the minimum optimization problem, the greater the limit value of the node in multi-step expansion, the smaller the possibility of obtaining the global optimal solution starting from this node. The node evaluation model can be used to predict the limit value of the node after multi-step expansion, which is helpful to judge the possibility of the node getting the optimal solution after multi-step expansion. Based on this, the second target node is determined to avoid expanding and expanding on useless nodes. Find the time delay caused by the solution.
此外,以目标规划问题为最小值优化问题为例,若界限值为下界值,在求解过程中,随着迭代次数的增加,该多个节点在多步展开后的下界值可能均很小,受限于计算机的计算精度等因素,难以比较该多个节点在多步展开后的下界值的大小。而该多个节点在多步展开前后的差异较为明显。在本申请实施例中,可以通过预测该多个节点在多步展开前后的差异,进而比较该多个节点在多步展开前后的差异以确定目标节点,有利于提高目标节点选择的准确性。In addition, taking the goal programming problem as a minimum value optimization problem as an example, if the limit value is a lower bound value, during the solution process, as the number of iterations increases, the lower bound values of the multiple nodes after multi-step expansion may be very small. Limited by computer calculation accuracy and other factors, it is difficult to compare the lower bound values of multiple nodes after multi-step expansion. The differences between these multiple nodes before and after multi-step expansion are more obvious. In the embodiment of the present application, the target node can be determined by predicting the differences between the multiple nodes before and after the multi-step expansion, and then comparing the differences between the multiple nodes before and after the multi-step expansion, which is beneficial to improving the accuracy of target node selection.
可选地,节点评估模型的输入可以包括多个节点的相关信息的低维表示,多个节点的相关信息的低维表示是通过特征提取模型对多个节点的相关信息进行降维处理得到的。Optionally, the input of the node evaluation model may include a low-dimensional representation of the relevant information of multiple nodes. The low-dimensional representation of the relevant information of multiple nodes is obtained by performing dimensionality reduction processing on the relevant information of multiple nodes through the feature extraction model. .
在本申请实施例中,可以将节点的相关信息输入至特征提取模型中进行降维处理,即特征提取,并将处理结果输入至节点评估模型中。一个节点的相关信息的低维表示即为对该节点的相关信息进行降维处理后的处理结果。In the embodiment of the present application, the relevant information of the node can be input into the feature extraction model for dimensionality reduction processing, that is, feature extraction, and the processing results can be input into the node evaluation model. The low-dimensional representation of the relevant information of a node is the result of dimensionality reduction processing of the relevant information of the node.
节点的相关信息的低维表示包括以下至少一项:节点的目标函数的低维表示,节点的约束条件的低维表示或节点的决策变量的低维表示。The low-dimensional representation of the relevant information of the node includes at least one of the following: a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraint conditions, or a low-dimensional representation of the node's decision variable.
例如,根据特征提取模型对节点的目标函数进行降维处理得到该目标函数的低维表示。又如,根据特征提取模型对节点对应的约束条件进行降维处理得到约束条件的低维表示。又如,根据特征提取模型对节点的决策变量进行降维处理得到决策变量的低维表示。For example, the objective function of a node is dimensionally reduced according to the feature extraction model to obtain a low-dimensional representation of the objective function. For another example, the constraint conditions corresponding to the nodes are dimensionally reduced according to the feature extraction model to obtain a low-dimensional representation of the constraint conditions. For another example, the decision variables of nodes are dimensionally reduced according to the feature extraction model to obtain a low-dimensional representation of the decision variables.
示例性地,将节点的目标函数的低维表示,节点的约束条件的低维表示和节点的决策变量的低维表示输入至节点评估模型中,可以得到节点在多步展开后的界限值的相关量。For example, by inputting the low-dimensional representation of the node's objective function, the low-dimensional representation of the node's constraints, and the low-dimensional representation of the node's decision variables into the node evaluation model, the limit value of the node after multi-step expansion can be obtained. related quantities.
通过对节点的相关信息进行降维处理,有利于下游模块的推理,即有利于节点评估
模型的推理。By reducing the dimensionality of the relevant information of the node, it is beneficial to the reasoning of the downstream module, that is, it is beneficial to the node evaluation. Model inference.
在一种可能的实现方式中,选择节点的装置与求解器可以是分开部署的。在该情况下,特征提取模型可以部署于求解器中,由求解器根据特征提取模型确定节点的相关信息的低维表示,并发送至选择节点的装置中。或者,特征提取模型可以部署于选择节点的装置中,由选择节点的装置根据特征提取模型确定节点的相关信息的低维表示。或者,特征提取模型还可以部署于其他装置中,本申请实施例对此不做限定。In a possible implementation, the device for selecting nodes and the solver may be deployed separately. In this case, the feature extraction model can be deployed in the solver, and the solver determines a low-dimensional representation of the relevant information of the node according to the feature extraction model, and sends it to the device for selecting the node. Alternatively, the feature extraction model may be deployed in a device for selecting nodes, and the device for selecting nodes determines a low-dimensional representation of the relevant information of the node according to the feature extraction model. Alternatively, the feature extraction model can also be deployed in other devices, which is not limited in the embodiments of the present application.
在另一种可能的实现方式中,该选择节点的装置与求解器可以是集成在求解装置中。在该情况下,可以由求解装置根据特征提取模型确定节点的相关信息的低维表示。或者,也可以由其他装置根据特征提取模型确定节点的相关信息的低维表示,并发送至求解装置中。In another possible implementation, the device for selecting nodes and the solver may be integrated in the solving device. In this case, the low-dimensional representation of the relevant information of the node may be determined by the solving device according to the feature extraction model. Alternatively, other devices may determine a low-dimensional representation of the relevant information of the node based on the feature extraction model and send it to the solving device.
可选地,特征提取模型可以为图卷积神经网络模型。Optionally, the feature extraction model can be a graph convolutional neural network model.
应理解,以上仅为示例,特征提取模型还可以采用其他结构的模型,只要能够实现降维处理即可,本申请实施例对此不做限定。It should be understood that the above are only examples, and the feature extraction model can also use models with other structures, as long as the dimensionality reduction process can be achieved, and the embodiments of the present application do not limit this.
在本申请实施例中,可以通过图卷积神经网络得到节点的相关信息的低维表示,能够处理规模大小不一的节点的相关信息,或者说,能够处理规模大小不一的节点处的数学规划模型,而且,图卷积神经网络对输入的排列顺序不敏感。In the embodiments of the present application, a low-dimensional representation of the relevant information of nodes can be obtained through the graph convolutional neural network, which can process the relevant information of nodes of different sizes, or in other words, can process the mathematics at nodes of different sizes. Planning model,Moreover, graph convolutional neural networks are,insensitive to the order of inputs.
可选地,特征提取模型是可以通过训练得到的。具体训练过程可以参考后文中的描述。Optionally, the feature extraction model can be trained. For the specific training process, please refer to the description below.
特征提取模型可以由特征提取模型所处的装置训练得到。或者,也可以由其他装置训练得到。本申请实施例对此不做限定。The feature extraction model can be trained by the device where the feature extraction model is located. Alternatively, it can also be trained by other devices. The embodiments of the present application do not limit this.
例如,特征提取模型部署于选择节点的装置中,特征提取模型可以是由选择节点的装置训练得到,也可以由其他装置训练得到。For example, the feature extraction model is deployed in a device that selects nodes. The feature extraction model can be trained by the device that selects nodes, or can be trained by other devices.
可选地,方法200还可以包括:向用户返回以下至少一项:目标规划模型的求解结果或目标节点的指示信息。Optionally, the method 200 may also include: returning at least one of the following to the user: the solution result of the target planning model or the indication information of the target node.
本申请实施例提供了一种节点评估模型的训练方法,该训练方法可以用于训练得到节点评估模型。训练好的节点评估模型可以应用于图2所示的方法200中。The embodiment of the present application provides a training method for a node evaluation model, which can be used to train a node evaluation model. The trained node evaluation model can be applied in the method 200 shown in Figure 2.
下面结合图3对本申请实施例中的节点评估模型的训练方法进行说明。The training method of the node evaluation model in the embodiment of the present application will be described below with reference to Figure 3.
图3示出了本申请实施例提供的一种节点评估模型的训练方法的示意性流程图。图3所示的方法300可以由节点评估模型的训练装置执行。在训练装置完成训练后,得到的节点评估模型可以部署于选择节点的装置中。节点评估模型的训练装置与选择节点的装置可以为同一装置,也可以为不同装置。Figure 3 shows a schematic flow chart of a node evaluation model training method provided by an embodiment of the present application. The method 300 shown in FIG. 3 may be executed by a training device of a node evaluation model. After the training device completes training, the obtained node evaluation model can be deployed in a device that selects nodes. The training device for the node evaluation model and the device for selecting nodes may be the same device, or they may be different devices.
节点评估模型用于预测目标规划问题的候选节点集合中的每个节点在多步展开后的界限值的相关量。每个节点对应目标规划问题的一个待求解的子问题。节点评估模型的输出结果用于调整候选节点集合。调整后的候选节点集合用于对目标规划问题进行求解。The node evaluation model is used to predict the correlation quantity of the bound value of each node in the candidate node set of the goal planning problem after multi-step expansion. Each node corresponds to a sub-problem to be solved in the goal programming problem. The output of the node evaluation model is used to adjust the set of candidate nodes. The adjusted candidate node set is used to solve the goal planning problem.
如图3所示,方法300包括步骤310至步骤330。下面对步骤310至步骤330进行说明。As shown in FIG. 3 , the method 300 includes steps 310 to 330 . Steps 310 to 330 are described below.
310,获取样本节点。310. Obtain sample nodes.
320,获取样本节点对应的标签,样本节点对应的标签与样本节点在多步展开后的界限值相关。
320. Obtain the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
330,基于样本节点和样本节点对应的标签进行训练以得到节点评估模型。330. Perform training based on the sample node and the label corresponding to the sample node to obtain a node evaluation model.
换言之,将样本节点和样本节点对应的标签作为节点评估模型的训练数据。基于训练数据对节点评估模型的初始模型进行训练,训练好的节点评估模型可以作为方法200中所使用的节点评估模型。节点评估模型的初始模型也可以称为初始节点评估模型。In other words, the sample nodes and the labels corresponding to the sample nodes are used as training data for the node evaluation model. The initial model of the node evaluation model is trained based on the training data, and the trained node evaluation model can be used as the node evaluation model used in method 200. The initial model of the node evaluation model may also be called the initial node evaluation model.
具体地,以减少初始节点评估模型的输出和样本节点对应的标签之间的差异为目标调整初始节点评估模型的参数,以得到训练好的节点评估模型。Specifically, the parameters of the initial node evaluation model are adjusted with the goal of reducing the difference between the output of the initial node evaluation model and the labels corresponding to the sample nodes to obtain a trained node evaluation model.
在本申请实施例中,样本节点对应的标签与样本节点在多步展开后的界限值相关,样本节点对应的标签较容易确定,使得训练数据的采集较为方便,有利于提高训练数据的生成效率,以得到大量的训练数据,提高了样本的利用效率,从而提高节点评估模型的训练效果,即提高节点评估模型的预测准确度。In the embodiment of the present application, the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. The label corresponding to the sample node is easier to determine, making the collection of training data more convenient and conducive to improving the efficiency of generating training data. , to obtain a large amount of training data, improve the utilization efficiency of samples, thereby improving the training effect of the node evaluation model, that is, improving the prediction accuracy of the node evaluation model.
示例性地,样本节点在多步展开后的界限值可以为样本节点在被完全求解后的界限值。For example, the limit value of the sample node after multi-step expansion may be the limit value of the sample node after it is completely solved.
不同的样本节点在被展开至完全求解的过程中所需要的步数可能是相同的,也可能是不同的。The number of steps required for different sample nodes to be expanded to complete solution may be the same or different.
示例性地,样本节点对应的标签用于指示样本节点在多步展开后的界限值。For example, the label corresponding to the sample node is used to indicate the limit value of the sample node after multi-step expansion.
例如,样本节点对应的标签可以用于指示该样本节点被完全求解后的界限值。For example, the label corresponding to the sample node can be used to indicate the limit value of the sample node after it is completely solved.
可选地,样本节点对应的标签用于指示样本节点在多步展开后的界限值与样本节点的父节点的界限值之间的差异。Optionally, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
例如,样本节点对应的标签可以用于指示样本节点在被完全求解后的界限值与该样本节点的父节点的界限值之间的差异。For example, the label corresponding to the sample node may be used to indicate the difference between the limit value of the sample node after it is completely solved and the limit value of the parent node of the sample node.
示例性地,样本节点和样本节点对应的标签可以为训练数据库中的数据。样本节点和样本节点对应的标签可以是根据求解器预先生成的,并存储在训练数据库。For example, the sample node and the label corresponding to the sample node may be data in the training database. The sample nodes and the labels corresponding to the sample nodes can be pre-generated according to the solver and stored in the training database.
在规划问题的求解过程中,求解器可以生成多个节点,同时可以求解出该多个节点的界限值。该多个节点即可作为样本节点。基于求解过程中各个节点的界限值可以确定与节点在多个展开后的界限值相关的量,即得到样本节点对应的标签。训练数据可以基于一个或多个规划问题的求解过程确定的。该一个或多个规划问题可以是由用户提供的,也可以是预先存储的。In the process of solving the planning problem, the solver can generate multiple nodes and simultaneously solve the boundary values of the multiple nodes. These multiple nodes can be used as sample nodes. Based on the limit value of each node during the solution process, the quantity related to the limit value of the node after multiple expansions can be determined, that is, the label corresponding to the sample node can be obtained. Training data can be determined based on the solution of one or more planning problems. The one or more planning questions may be provided by the user or may be pre-stored.
作为一种示例,求解器可以接收用户提供的批量数据(例如,多个规划问题),并基于用户提供的批量数据进行求解,从求解过程中生成的多个节点中采样出样本节点,基于求解过程中求解的各个节点的界限值确定样本节点对应的标签,将样本节点的相关信息以及样本节点对应的标签存储至训练数据库中。或者,求解器可以接收用户提供的批量数据,例如,多个规划问题,并基于用于提供的批量数据和历史数据(例如,预先存储的多个规划问题)进行求解,从求解过程中生成的多个节点中采样出样本节点,基于求解过程中求解的各个节点的界限值确定样本节点对应的标签,将样本节点的相关信息以及样本节点对应的标签存储至训练数据库中。As an example, the solver can receive batch data provided by the user (e.g., multiple planning problems) and solve based on the batch data provided by the user, sampling sample nodes from the multiple nodes generated during the solving process, and based on the solution The limit value of each node solved in the process determines the label corresponding to the sample node, and the relevant information of the sample node and the label corresponding to the sample node are stored in the training database. Alternatively, the solver can receive user-supplied batch data, e.g., multiple planning problems, and perform a solution based on the supplied batch data and historical data (e.g., multiple pre-stored planning problems), generated from the solution process. Sample nodes are sampled from multiple nodes, the labels corresponding to the sample nodes are determined based on the limit values of each node solved during the solution process, and the relevant information of the sample nodes and the labels corresponding to the sample nodes are stored in the training database.
训练装置可以从训练数据库中获取训练数据,以样本节点的相关信息作为节点评估模型对应的初始模型的输入,以减少模型的输出和样本节点对应的标签之间的差距为目标对该初始模型进行训练以得到节点评估模型。The training device can obtain training data from the training database, use the relevant information of the sample nodes as the input of the initial model corresponding to the node evaluation model, and perform training on the initial model with the goal of reducing the gap between the output of the model and the labels corresponding to the sample nodes. Train to get the node evaluation model.
示例性地,样本节点和样本节点对应的标签可以是由用户提供的。For example, the sample node and the label corresponding to the sample node may be provided by the user.
或者,样本节点和样本节点对应的标签还可以通过其他方式获取。
Alternatively, the sample node and the label corresponding to the sample node can also be obtained through other methods.
可选地,步骤330可以包括:通过强化学习的方式进行训练以得到训练好的节点评估模型。Optionally, step 330 may include: training through reinforcement learning to obtain a trained node evaluation model.
在该情况下,样本节点对应的标签可以是在强化学习的过程中,通过与环境的交互获取的。样本节点可以来自于训练数据库,或者样本节点可以是由用户提供的。In this case, the label corresponding to the sample node can be obtained through interaction with the environment during the reinforcement learning process. Sample nodes can come from the training database, or sample nodes can be provided by the user.
在一种可能的实现方式中,节点评估模型的训练装置和求解器可以是分开部署的,在该情况下,环境可以为求解器。In a possible implementation, the training device and the solver of the node evaluation model may be deployed separately, in which case the environment may be the solver.
换言之,可以将求解器封装成强化学习中的环境,训练装置通过和求解器不断交互采集数据,从而获取样本节点的标签。In other words, the solver can be encapsulated into an environment in reinforcement learning, and the training device continuously interacts with the solver to collect data to obtain the labels of sample nodes.
示例性地,通过深度Q学习的方式进行训练以得到节点评估模型。For example, training is performed through deep Q learning to obtain a node evaluation model.
可选地,样本节点对应的标签可以是根据第一差异和第二差异确定的。第一差异为样本节点的父节点的界限值与样本节点的界限值之间的差异,第二差异通过将样本节点的子节点输入至目标评估模型中进行处理后得到的,目标评估模型与节点评估模型的结构相同。目标评估模型用于预测样本节点的子节点在被完全求解后的界限值和样本节点的界限值之间的差异。Optionally, the label corresponding to the sample node may be determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing. The target evaluation model and the node The evaluation model has the same structure. The target evaluation model is used to predict the difference between the bounding value of the child node of the sample node after being fully solved and the bounding value of the sample node.
目标评估模型即为深度Q学习中的目标网络(target network)。The target evaluation model is the target network in deep Q learning.
第一差异可以通过求解器确定。The first difference can be determined by the solver.
示例性地,求解器可以将样本节点的父节点的界限值和样本节点的界限值发送至训练装置。训练装置可以基于此确定第一差异。For example, the solver may send the limit value of the parent node of the sample node and the limit value of the sample node to the training device. The training device can determine the first difference based on this.
可替换地,求解器可以根据样本节点的父节点的界限值和样本节点的界限值确定第一差异,并将第一差异发送至训练装置。Alternatively, the solver may determine the first difference according to the limit value of the parent node of the sample node and the limit value of the sample node, and send the first difference to the training device.
节点评估模型可以用于预测节点的多步伪成本函数的函数值。The node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function.
模型的训练目标即为使得节点评估模型学习到准确的多步伪成本函数的函数值。在深度Q学习的过程中,样本节点对应的标签也可以称为样本节点对应的预测标签。The training goal of the model is to enable the node evaluation model to learn the function value of an accurate multi-step pseudo-cost function. In the process of deep Q learning, the label corresponding to the sample node can also be called the prediction label corresponding to the sample node.
示例性地,节点对应的预测标签可以满足如下公式:
For example, the prediction label corresponding to the node can satisfy the following formula:
For example, the prediction label corresponding to the node can satisfy the following formula:
其中,为样本节点对应的预测标签。c(P)为第一差异,即节点P的父节点的界限值和节点P的界限值之间的差异,可以通过求解器获得。例如,求解器可以计算得到节点P的界限值和节点P的父节点的界限值,从而使得训练装置可以获取到第一差异。为第二差异,为目标评估模型。该目标评估模型用于稳定训练,防止出现过拟合。in, is the predicted label corresponding to the sample node. c(P) is the first difference, that is, the difference between the limit value of the parent node of node P and the limit value of node P, which can be obtained by the solver. For example, the solver can calculate the limit value of node P and the limit value of the parent node of node P, so that the training device can obtain the first difference. is the second difference, Evaluate the model for the target. This target evaluation model is used to stabilize training and prevent overfitting.
在训练过程中,以减小样本节点对应的预测标签和模型的输出Cθ(P)之间的差距为目标调整模型参数。Cθ表示训练过程中的模型。During the training process, to reduce the prediction label corresponding to the sample node The difference between C θ (P) and the model’s output is the target to adjust the model parameters. C θ represents the model during training.
示例性地,训练目标可以表示为:
For example, the training goal can be expressed as:
其中,E表示平均值,θ表示模型的参数,π表示学习到的策略,即多步伪成本函数。Among them, E represents the average value, θ represents the parameters of the model, and π represents the learned policy, that is, the multi-step pseudo-cost function.
应理解,以上仅为示例,对于其他标签也可以采用深度Q学习的方式进行训练,本申请实施例对此不做限定。It should be understood that the above are only examples, and other tags can also be trained using deep Q learning, which is not limited in the embodiments of the present application.
可选地,样本节点在多步展开后的界限值包括样本节点在多步展开后的松弛解对应
的目标函数的函数值。Optionally, the limit value of the sample node after multi-step expansion includes the relaxation solution corresponding to the sample node after multi-step expansion. The function value of the objective function.
以目标规划问题为最小值优化问题为例,节点的松弛解对应的目标函数的函数值可以为该节点的下界(lower bound)。换言之,该样本节点在多步展开后的界限值可以为该样本节点在多步展开后的下界值。即节点评估模型可以用于预测与节点在多步展开后的下界的相关的量。Taking the goal programming problem as a minimum optimization problem as an example, the function value of the objective function corresponding to the relaxation solution of a node can be the lower bound of the node. In other words, the limit value of the sample node after multi-step expansion can be the lower bound value of the sample node after multi-step expansion. That is, the node evaluation model can be used to predict quantities related to the lower bound of a node after multi-step expansion.
节点的松弛解对应的目标函数的函数值的获取较为方便。在本申请实施例中,在样本节点的界限值是基于松弛解确定的情况下,样本节点对应的标签更容易确定。例如,在基于强化学习的过程中,样本节点对应的标签可以是实时确定的,松弛解的计算更为便捷,基于松弛解确定样本节点对应的标签的效率更高,从而有利于提高训练效率。It is more convenient to obtain the function value of the objective function corresponding to the relaxed solution of the node. In the embodiment of the present application, when the limit value of the sample node is determined based on the relaxed solution, the label corresponding to the sample node is easier to determine. For example, in the process based on reinforcement learning, the labels corresponding to sample nodes can be determined in real time, and the calculation of the relaxed solution is more convenient. It is more efficient to determine the labels corresponding to the sample nodes based on the relaxed solution, which is beneficial to improving training efficiency.
应理解,在最小值优化问题中,在本申请实施例主要以界限值为下界值为例进行说明,在实际应用中,界限值也可以为上界值,本申请实施例对此不做限定。It should be understood that in the minimum value optimization problem, the embodiment of the present application mainly takes the limit value as the lower bound value as an example for explanation. In practical applications, the limit value can also be the upper limit value, which is not limited in the embodiment of the present application. .
训练过程中的模型的输入类型和输出类型与训练好的节点评估模型的输入类型和输出类型是一致的。The input type and output type of the model during the training process are consistent with the input type and output type of the trained node evaluation model.
可选地,初始节点评估模型的输入包括样本节点的相关信息或样本节点的相关信息的低维表示。Optionally, the input of the initial node evaluation model includes relevant information of the sample nodes or a low-dimensional representation of the relevant information of the sample nodes.
样本节点的相关信息包括以下至少一项:样本节点的目标函数,样本节点的约束条件或样本节点的决策变量。The relevant information of the sample node includes at least one of the following: the objective function of the sample node, the constraint condition of the sample node or the decision variable of the sample node.
例如,初始节点评估模型的输入包括样本节点的相关信息。在该情况下,节点评估模型的输入可以包括节点的相关信息。节点的相关信息包括以下至少一项:节点的目标函数,节点的约束条件或节点的决策变量。For example, the input to the initial node evaluation model includes information about sample nodes. In this case, the input of the node evaluation model may include relevant information of the node. The relevant information of the node includes at least one of the following: the objective function of the node, the constraint condition of the node or the decision variable of the node.
再如,初始节点评估模型的输入可以包括样本节点的相关信息的低维表示。在该情况下,节点评估模型的输入可以包括节点的相关信息的低维表示,节点评估模型的输出包括节点在多步展开后的界限值的相关量。节点的相关信息的低维表示包括以下至少一项:节点的目标函数的低维表示,节点的约束条件的低维表示或节点的决策变量的低维表示。For another example, the input of the initial node evaluation model may include a low-dimensional representation of the relevant information of the sample node. In this case, the input of the node evaluation model may include a low-dimensional representation of the relevant information of the node, and the output of the node evaluation model may include the relevant quantity of the limit value of the node after multi-step expansion. The low-dimensional representation of the relevant information of the node includes at least one of the following: a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraint conditions, or a low-dimensional representation of the node's decision variable.
其中,样本节点的相关信息的低维表示可以是通过特征提取模型对样本节点的相关信息进行降维处理得到的。Among them, the low-dimensional representation of the relevant information of the sample node can be obtained by performing dimensionality reduction processing on the relevant information of the sample node through a feature extraction model.
将样本节点的相关信息输入至特征提取模型中进行降维处理,将降维处理的结果输入至节点评估模型对应的初始模型中。The relevant information of the sample nodes is input into the feature extraction model for dimensionality reduction processing, and the results of the dimensionality reduction processing are input into the initial model corresponding to the node evaluation model.
该特征提取模型可以是训练好的模型,也可以是训练过程中的模型。The feature extraction model can be a trained model or a model in the training process.
示例性地,将样本节点的相关信息输入至初始特征提取模型中进行降维处理,将降维处理的结果输入至初始节点评估模型中进行处理,以预测样本节点在多步展开后的界限值的相关量,以减小初始节点评估模型的输出结果和样本节点对应的标签之间的差距为目标对该两个模型进行训练。训练完成后即得到训练好的节点评估模型和训练好的特征提取模型。初始特征提取模型即特征提取模型对应的初始模型。For example, the relevant information of the sample node is input into the initial feature extraction model for dimensionality reduction processing, and the results of the dimensionality reduction processing are input into the initial node evaluation model for processing to predict the limit value of the sample node after multi-step expansion. The two models are trained with the goal of reducing the gap between the output results of the initial node evaluation model and the labels corresponding to the sample nodes. After the training is completed, the trained node evaluation model and the trained feature extraction model are obtained. The initial feature extraction model is the initial model corresponding to the feature extraction model.
应理解,以上仅为示例,特征提取模型还可以是通过其他方式训练得到的,本申请实施例对此不做限定。It should be understood that the above are only examples, and the feature extraction model can also be trained in other ways, which is not limited in the embodiments of the present application.
任何一个AI模型在用于解决特定的技术问题之前,都需要经过训练。AI模型的训练是指利用指定初始模型对训练数据进行计算,根据计算的结果采用一定的方法对初始
模型中的参数进行调整,使得该模型逐渐学习到一定的规律,具备特定的功能的过程。经过训练后具有稳定功能的AI模型即可用于推理。AI模型的推理是利用训练完成的AI模型对输入的数据进行计算,获得预测的推理结果的过程。Any AI model needs to be trained before it can be used to solve specific technical problems. AI model training refers to using a specified initial model to calculate the training data, and using a certain method to calculate the initial data based on the calculation results. The parameters in the model are adjusted so that the model gradually learns certain rules and has specific functions. The AI model with stable functions after training can be used for inference. The inference of the AI model is the process of using the trained AI model to calculate the input data and obtain the predicted inference results.
本申请实施例的方案可以分为两个阶段:训练阶段和推理阶段。The solution of the embodiment of this application can be divided into two stages: the training stage and the inference stage.
在训练阶段可以对初始节点评估模型进行训练,以得到节点评估模型。示例性地,该节点评估模型可以挂载至求解器,以便在求解过程中确定搜索节点和剪枝节点。In the training phase, the initial node evaluation model can be trained to obtain the node evaluation model. Illustratively, the node evaluation model can be mounted to a solver so that search nodes and pruning nodes are determined during the solution process.
下面结合图4对本申请实施例提供的一种目标规划模型的求解方法进行示例性说明。The following is an exemplary description of a method for solving a goal programming model provided by the embodiment of the present application with reference to Figure 4 .
应理解,图4的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将申请实施例限制于所示例的具体数值或具体场景。图4示出了本申请实施例提供的目标规划模型的求解方法,图4所示的方法400可以采用图2所示的方法200实现节点的选择,相关描述可以参考方法200。为了避免重复,在描述方法400时适当省略部分描述。It should be understood that the example in Figure 4 is only to help those skilled in the art understand the embodiments of the present application, but is not intended to limit the embodiments of the present application to the specific numerical values or specific scenarios illustrated. Figure 4 shows the solution method of the goal programming model provided by the embodiment of the present application. The method 400 shown in Figure 4 can use the method 200 shown in Figure 2 to implement node selection. For related descriptions, please refer to the method 200. In order to avoid repetition, part of the description is appropriately omitted when describing the method 400.
为了便于理解和描述,在描述方法400时主要以求解器和选择节点的装置分开部署为例进行说明,不对本申请实施例构成限定。在其他实现方式中,求解器和选择节点的装置可以是集成在同一装置中的。In order to facilitate understanding and description, when describing the method 400, the solver and the device for selecting nodes are deployed separately as an example, which does not limit the embodiments of the present application. In other implementations, the solver and the device for selecting nodes may be integrated in the same device.
如图4所示,方法400包括步骤410至步骤430,下面对步骤410至步骤430进行说明。As shown in Figure 4, the method 400 includes steps 410 to 430, which are described below.
步骤410,获取目标规划问题。Step 410: Obtain the goal planning problem.
步骤420,根据节点评估模型对目标规划问题的候选节点集合进行调整。该候选节点集合包括多个节点。该多个节点中的每个节点对应目标规划问题的一个待求解子问题。节点评估模型用于预测节点在多步展开后的界限值的相关量。Step 420: Adjust the candidate node set of the target planning problem according to the node evaluation model. The candidate node set includes multiple nodes. Each node in the plurality of nodes corresponds to a sub-problem to be solved of the goal programming problem. The node evaluation model is used to predict the relevant quantities of the bounding values of nodes after multi-step expansion.
步骤430,基于调整后的候选节点集合,对目标规划问题进行求解,以得到目标规划问题的求解结果。Step 430: Solve the target planning problem based on the adjusted candidate node set to obtain the solution result of the target planning problem.
示例性地,目标规划问题可以由用户上传的。用户可以将目标规划问题输入至求解器中。获取用户上传的目标规划问题。目标规划问题即为用户待求解的数学规划问题。For example, goal planning problems can be uploaded by users. Users can input goal programming problems into the solver. Get the goal planning questions uploaded by users. The goal programming problem is the mathematical programming problem that the user needs to solve.
目标规划问题可以由目标规划问题的目标函数、约束条件以及决策变量表示。约束条件用于对决策变量进行约束。目标规划模型的决策变量中的至少部分为整数变量,即至少部分决策变量的值为整数。换言之,该目标规划模型为纯整数规划模型或混合整数规划模型。The goal programming problem can be represented by the objective function, constraints and decision variables of the goal programming problem. Constraints are used to constrain decision variables. At least some of the decision variables of the goal programming model are integer variables, that is, at least some of the values of the decision variables are integers. In other words, the goal programming model is a pure integer programming model or a mixed integer programming model.
求解器可以生成该目标规划问题的多个待求解的子问题,该多个待求解的子问题可以作为候选节点集合中的多个节点。该求解器可以是基于分支定界算法框架实现的。The solver can generate multiple sub-problems to be solved for the goal planning problem, and the multiple sub-problems to be solved can be used as multiple nodes in the candidate node set. The solver can be implemented based on the branch-and-bound algorithm framework.
示例性地,求解器可以根据该目标规划问题生成针对决策变量的多个约束条件,将该多个约束条件添加到目标规划问题对应的约束条件中,从而形成目标规划问题的多个子问题,即得到目标规划问题的多个分支。For example, the solver can generate multiple constraints on the decision variables based on the goal planning problem, and add the multiple constraints to the constraints corresponding to the goal planning problem, thereby forming multiple sub-problems of the goal planning problem, namely Obtain multiple branches of the goal programming problem.
应理解,目标规划问题对应的约束条件是针对目标规划问题的解空间的约束。在生成目标规划问题的子问题的过程中生成的额外的约束条件用于对分支中的决策变量进行约束,从而缩小在该分支上的解空间的范围。It should be understood that the constraints corresponding to the goal programming problem are constraints on the solution space of the goal planning problem. The additional constraints generated in the process of generating sub-problems of the goal programming problem are used to constrain the decision variables in the branch, thereby narrowing the scope of the solution space on the branch.
在步骤420中,可以通过节点评估模型预测该候选节点集合中的部分或全部节点在多
步展开后的界限值的相关量。In step 420, a node evaluation model may be used to predict whether some or all nodes in the candidate node set will The correlation quantity of the limit value after step expansion.
例如,步骤420可以包括:通过节点评估模型预测该候选节点集合中的每个节点在多步展开后的界限值的相关量。For example, step 420 may include: predicting the correlation amount of the limit value of each node in the candidate node set after multi-step expansion through a node evaluation model.
换言之,通过节点评估模型预测该候选节点集合中的全部节点在多步展开后的界限值的相关量。In other words, the node evaluation model is used to predict the correlation amount of the limit value of all nodes in the candidate node set after multi-step expansion.
为了便于描述,本申请实施例中主要以全部节点为例进行说明,即以通过节点评估模型对每个节点进行处理为例进行说明,不对本申请实施例的方案构成限定。For the convenience of description, the embodiments of this application mainly take all nodes as an example, that is, processing each node through the node evaluation model as an example, which does not limit the solutions of the embodiments of this application.
节点评估模型的输出结果可以作为节点的评价信息。或者说,节点评估模型可以用于输出节点的评价信息。节点的评价信息与节点在多步展开后的界限值相关。示例性地,节点的评价信息用于指示节点评估模型对节点在多步展开后的界限值的相关量的预测。The output results of the node evaluation model can be used as node evaluation information. In other words, the node evaluation model can be used to output node evaluation information. The evaluation information of a node is related to the node's limit value after multi-step expansion. For example, the evaluation information of the node is used to indicate the node evaluation model's prediction of the correlation quantity of the node's limit value after multi-step expansion.
选择节点的装置中可以包括该节点评估模型。选择节点的装置可以通过节点评估模型生成该多个节点的评价信息。The node evaluation model may be included in the means for selecting nodes. The device for selecting nodes may generate evaluation information of the plurality of nodes through a node evaluation model.
可选地,方法400还包括:根据用户指示确定该节点评估模型。Optionally, the method 400 further includes: determining the node evaluation model according to user instructions.
例如,用户可以从多个节点评估模型中选择一个节点评估模型。用户选择的节点评估模型可以作为方法400中的节点评估模型。For example, the user can select a node evaluation model from multiple node evaluation models. The node evaluation model selected by the user may be used as the node evaluation model in method 400 .
再如,用户可以从多个选择节点的装置中选择一个选择节点的装置。选择节点的装置与节点评估模型可以是对应的。用户指示的选择节点的装置中部署的节点评估模型即为方法400中的节点评估模型。For another example, the user can select one device for selecting a node from multiple devices for selecting a node. The means for selecting nodes may correspond to the node evaluation model. The node evaluation model deployed in the device for selecting nodes indicated by the user is the node evaluation model in method 400.
可替换地,该节点评估模型也可以是由选择节点的装置确定的。Alternatively, the node evaluation model may also be determined by means of selecting nodes.
可替换地,该节点评估模型也可以是由求解器确定的。Alternatively, the node evaluation model may also be determined by the solver.
或者,该节点评估模型也可以通过其他方式确定,例如,该节点评估模型也可以是默认的模型。Alternatively, the node evaluation model can also be determined in other ways. For example, the node evaluation model can also be a default model.
示例性地,该节点评估模型可以部署于云管理平台。For example, the node evaluation model can be deployed on a cloud management platform.
示例性地,节点的评价信息用于指示节点在多步展开前后的界限值的变化情况。或者说,节点的评价信息用于指示对节点在多步展开前后的界限值的变化情况的预测。For example, the evaluation information of a node is used to indicate changes in the node's limit value before and after multi-step expansion. In other words, the evaluation information of the node is used to indicate the prediction of the change of the node's limit value before and after multi-step expansion.
例如,节点在多步展开前后的界限值的变化可以通过节点的多步伪成本(multi-step pseudo cost)函数的函数值表示。或者说,节点的多步伪成本函数的函数值可以用于评估节点。节点评估模型可以用于预测节点的多步伪成本函数的函数值。For example, the change in the limit value of a node before and after multi-step expansion can be represented by the function value of the node's multi-step pseudo cost function. In other words, the function value of the node's multi-step pseudo-cost function can be used to evaluate the node. The node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function.
其中,以目标规划问题为最小值规划问题为例,该界限值可以为下界值。多步伪成本的函数值即为从节点展开到该节点被完全求解过程的下界值的变化。Among them, taking the target programming problem as a minimum value planning problem as an example, the limit value can be a lower limit value. The function value of the multi-step pseudo-cost is the change in the lower bound value from the node expansion to the node being completely solved.
定义多步伪成本函数的目的是,如果能准确地计算或者学习多步伪成本函数,则可以准确地预测从一个节点出发最终能搜索到的最优解,此时即可根据多步伪成本函数的函数值选择包含全局最优解的节点。The purpose of defining a multi-step pseudo-cost function is that if the multi-step pseudo-cost function can be accurately calculated or learned, the optimal solution that can be ultimately searched from a node can be accurately predicted. At this time, the multi-step pseudo-cost function can be accurately predicted based on the multi-step pseudo-cost function. The function value of the function selects the node containing the global optimal solution.
节点评估模型可以用于预测节点的多步伪成本函数的函数值。或者说,节点评估模型用于拟合节点的多步伪成本函数。或者说,节点评估模型的训练目标可以为学习多步伪成本函数,以准确地预测节点的多步伪成本函数的函数值。训练过程可以参考方法300或方法800。The node evaluation model can be used to predict the function value of a node's multi-step pseudo-cost function. In other words, the node evaluation model is used to fit the multi-step pseudo-cost function of the node. In other words, the training goal of the node evaluation model can be to learn a multi-step pseudo-cost function to accurately predict the function value of the multi-step pseudo-cost function of the node. For the training process, please refer to method 300 or method 800.
应理解,为了便于描述,在方法400中仅以上述形式的多步伪成本函数为例进行说明,
不对本申请实施例的方案构成限定。相关描述可以参考方法200,此处不再赘述。It should be understood that for the convenience of description, only the multi-step pseudo cost function in the above form is used as an example in method 400. This does not limit the solutions of the embodiments of the present application. For related description, please refer to method 200 and will not be described again here.
在一种可能的实现方式中,节点评估模型的输入包括节点的相关信息。节点的相关信息包括以下至少一项:节点的目标函数,节点的约束条件或节点的决策变量。In a possible implementation, the input of the node evaluation model includes node-related information. The relevant information of the node includes at least one of the following: the objective function of the node, the constraint condition of the node or the decision variable of the node.
示例性地,节点的相关信息可以包括节点的目标函数,节点的约束条件和节点的决策变量。将节点的目标函数,节点的约束条件和节点的决策变量输入至节点评估模型,可以输出该节点在多步展开后的界限值的相关量。For example, the relevant information of the node may include the objective function of the node, the constraint conditions of the node and the decision variables of the node. Input the node's objective function, node's constraints and node's decision variables into the node evaluation model, and the relevant quantities of the node's limit value after multi-step expansion can be output.
在另一种可能的实现方式中,节点评估模型的输入包括节点的相关信息的低维表示。节点的相关信息的低维表示包括以下至少一项:节点的目标函数的低维表示,节点的约束条件的低维表示或节点的决策变量的低维表示。In another possible implementation, the input of the node evaluation model includes a low-dimensional representation of the relevant information of the node. The low-dimensional representation of the relevant information of the node includes at least one of the following: a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraint conditions, or a low-dimensional representation of the node's decision variable.
节点的相关信息的低维表示是通过特征提取模型对节点的相关信息进行降维处理得到的。The low-dimensional representation of the relevant information of the node is obtained by reducing the dimensionality of the relevant information of the node through the feature extraction model.
示例性地,将节点的目标函数、节点的约束条件和节点的决策变量输入至特征提取模型,以得到节点的相关信息的低维表示,将节点的相关信息的低维表示输入至节点评估模型中,可以预测该节点在多步展开后的界限值的相关量。Exemplarily, the objective function of the node, the constraint conditions of the node, and the decision variable of the node are input to the feature extraction model to obtain a low-dimensional representation of the relevant information of the node, and the low-dimensional representation of the relevant information of the node is input to the node evaluation model. , the correlation amount of the node's limit value after multi-step expansion can be predicted.
下面对特征提取模型和节点评估模型中的数据处理过程进行示例性说明。特征提取模型和节点评估模型可以是训练得到的。训练过程可以参考前文中方法300或后文中的方法800。The following is an illustrative explanation of the data processing process in the feature extraction model and node evaluation model. Feature extraction models and node evaluation models can be trained. For the training process, you can refer to method 300 in the previous article or method 800 in the following article.
1.特征提取模型1. Feature extraction model
该特征提取模型可以部署于选择节点的装置中,或者,可以部署于求解器中,或者,还可以部署于其他装置中。本申请实施例对此不做限定。The feature extraction model can be deployed in a device for selecting nodes, or in a solver, or in other devices. The embodiments of the present application do not limit this.
特征提取模型用于输出该多个节点的相关信息的低维表示。通过对节点的相关信息进行降维处理,有利于下游模块的推理,即有利于节点评估模型的推理。The feature extraction model is used to output a low-dimensional representation of the relevant information of the multiple nodes. By reducing the dimensionality of the relevant information of the nodes, it is beneficial to the reasoning of the downstream modules, that is, it is beneficial to the reasoning of the node evaluation model.
示例性地,节点的相关信息可以为节点的高维的数学规划模型信息。对节点的高维的数学规划模型信息进行嵌入表示(embedding),即将节点的目标函数、节点的约束条件和节点的决策变量进行降维处理。For example, the relevant information of the node may be the high-dimensional mathematical programming model information of the node. The node's high-dimensional mathematical programming model information is embedded and represented (embedding), that is, the node's objective function, the node's constraint conditions and the node's decision variables are dimensionally reduced.
或者说,特征提取模型用于输出节点的特征。例如,该特征可以表示为一组向量。In other words, the feature extraction model is used to output the features of nodes. For example, the feature can be represented as a set of vectors.
降维之后的节点的相关信息也可以称为节点的低维嵌入表示。The relevant information of the node after dimensionality reduction can also be called the low-dimensional embedding representation of the node.
作为示例,该特征提取模型的输入为包括节点的数学规划模型信息,例如,节点的目标函数、节点的约束条件和节点的决策变量,输出为节点的特征。示例性地,特征提取模型可以是通过图卷积神经网络实现的。该图卷积神经网络可以用于将高维的数学规划模型信息进行嵌入表示。As an example, the input of the feature extraction model is the mathematical programming model information including the node, for example, the objective function of the node, the constraint condition of the node and the decision variable of the node, and the output is the feature of the node. For example, the feature extraction model may be implemented through a graph convolutional neural network. The graph convolutional neural network can be used to embed and represent high-dimensional mathematical programming model information.
下面结合图5对降维处理的具体实现过程进行示例性说明。The specific implementation process of dimensionality reduction processing will be exemplified below with reference to Figure 5.
图5示出了一种降维处理过程的示例性流程图。Figure 5 shows an exemplary flow chart of a dimensionality reduction process.
步骤1:将节点的数学规划模型信息(A,b,C)转化为一个二部图(bipartite graph representation),即将(A,b,C)按照连接关系进行填充。Step 1: Convert the mathematical programming model information (A, b, C) of the node into a bipartite graph representation, that is, fill (A, b, C) according to the connection relationship.
其中,A表示约束条件的系数矩阵,b表示约束条件的右端项系数向量,C表示目标函数的系数向量。Among them, A represents the coefficient matrix of the constraint condition, b represents the coefficient vector of the right-hand term of the constraint condition, and C represents the coefficient vector of the objective function.
例如,图5中的节点的数据规划模型可以满足如下公式:
A11x1+A13x3≤b1;
s.t.A12x1+A22x2≤b2;
x∈ZFor example, the data planning model of the node in Figure 5 can satisfy the following formula:
A 11 x 1 +A 13 x 3 ≤ b 1 ;
stA 12 x 1 +A 22 x 2 ≤b 2 ;
x∈Z
A11x1+A13x3≤b1;
s.t.A12x1+A22x2≤b2;
x∈ZFor example, the data planning model of the node in Figure 5 can satisfy the following formula:
A 11 x 1 +A 13 x 3 ≤ b 1 ;
stA 12 x 1 +A 22 x 2 ≤b 2 ;
x∈Z
其中,目标函数(objective)为约束条件(constraints)包括:A11x1+A13x3≤b1、A12x1+A22x2≤b2,决策变量(variables)包括:x1、x2和x3。d1、d2、d3、A11、A13、b1、A12、A22和b2均为参数。Among them, the objective function (objective) is Constraints include: A 11 x 1 +A 13 x 3 ≤b 1 , A 12 x 1 +A 22 x 2 ≤b 2 , and decision variables include: x 1 , x 2 and x 3 . d 1 , d 2 , d 3 , A 11 , A 13 , b 1 , A 12 , A 22 and b 2 are all parameters.
步骤2:将上述二部图连接关系输入图卷积神经网络,对节点的目标函数、节点的约束条件和节点的决策变量进行嵌入表示。Step 2: Input the above-mentioned bipartite graph connection relationship into the graph convolutional neural network, and embed the objective function of the node, the constraints of the node and the decision variable of the node.
如图5所示,V表示决策变量,C表示约束条件,E表示V和C之间的联系,即约束条件的系数矩阵A。V1表示经过一次图卷积神经网络处理后的决策变量,V2表示经过两次图卷积神经网络处理后的决策变量。V1表示经过一次图卷积神经网络处理后的约束条件,V2表示经过两次图卷积神经网络处理后的约束条件。π(x)表示该节点的输出结果,即步骤3中的低维嵌入表示。As shown in Figure 5, V represents the decision variable, C represents the constraint condition, and E represents the connection between V and C, that is, the coefficient matrix A of the constraint condition. V 1 represents the decision variable after one graph convolutional neural network processing, and V 2 represents the decision variable after two graph convolutional neural network processing. V 1 represents the constraints after one graph convolutional neural network processing, and V 2 represents the constraints after two graph convolutional neural network processings. π(x) represents the output result of this node, which is the low-dimensional embedding representation in step 3.
应理解,图5所示的处理过程仅为示例,不对本申请实施例的方案构成限定。It should be understood that the processing process shown in FIG. 5 is only an example and does not limit the solutions of the embodiments of the present application.
步骤3:输出节点的低维嵌入表示。Step 3: Output the low-dimensional embedding representation of the node.
节点的低维嵌入(例如,图5中的节点嵌入)表示包括节点的目标函数的低维表示、节点的约束条件的低维表示和节点的决策变量的低维表示。The low-dimensional embedding representation of a node (eg, the node embedding in Figure 5) includes a low-dimensional representation of the node's objective function, a low-dimensional representation of the node's constraints, and a low-dimensional representation of the node's decision variables.
以上述步骤1中的数据规划模型为例,该节点的低维嵌入表示可以包括一个目标函数的低维表示,两个约束条件的低维表示和三个决策变量的低维表示。Taking the data planning model in step 1 above as an example, the low-dimensional embedding representation of the node can include a low-dimensional representation of the objective function, a low-dimensional representation of the two constraints and a low-dimensional representation of the three decision variables.
本申请实施例中,通过上述利用图卷积神经网络进行信息的嵌入表示,可以处理规模大小不一的数学规划模型,同时对输入的排列方式不敏感。In the embodiments of the present application, through the above-mentioned use of graph convolutional neural network for embedded representation of information, mathematical programming models of different sizes can be processed, while being insensitive to the arrangement of inputs.
2.节点评估模型2. Node evaluation model
节点评估模型可以用于预测节点在多步展开后的界限值的相关量。The node evaluation model can be used to predict the relevant quantities of the bounding values of nodes after multi-step expansion.
示例性地,将特征提取模型输出的该多个节点的相关信息的低维表示输入至节点评估模型中,以预测该多个节点在多步展开后的界限值的相关量。Exemplarily, the low-dimensional representation of the correlation information of the multiple nodes output by the feature extraction model is input into the node evaluation model to predict the correlation amount of the limit values of the multiple nodes after multi-step expansion.
示例性地,节点评估模型可以为神经网络模型。For example, the node evaluation model may be a neural network model.
例如,该节点评估模型是通过全连接神经网络实现的,如图6所示。将特征提取模型输出的节点的相关信息的低维表示作为节点评估模型的输入,通过全连接神经网络预测节点的多步伪成本函数的函数值。换言之,该全连接神经网络的输入可以为节点的相关信息的低维表示,全连接神经网络的输出可以该节点的多步伪成本函数的函数值。For example, the node evaluation model is implemented through a fully connected neural network, as shown in Figure 6. The low-dimensional representation of the node-related information output by the feature extraction model is used as the input of the node evaluation model, and the function value of the multi-step pseudo-cost function of the node is predicted through the fully connected neural network. In other words, the input of the fully connected neural network can be a low-dimensional representation of the relevant information of the node, and the output of the fully connected neural network can be the function value of the multi-step pseudo-cost function of the node.
可选地,步骤420可以包括:根据节点评估模型确定第一目标节点;生成第一目标节点的子节点;将第一目标节点的子节点增加到候选节点集合。Optionally, step 420 may include: determining the first target node according to the node evaluation model; generating child nodes of the first target node; and adding the child nodes of the first target node to the candidate node set.
可选地,步骤420可以包括:根据节点评估模型确定第二目标节点;将第二目标节点从候选节点集合中删除。Optionally, step 420 may include: determining the second target node according to the node evaluation model; and deleting the second target node from the candidate node set.
根据节点评估模型确定第一目标节点,可以理解为,根据该多个节点的评价信息确定第一目标节点,即搜索节点。Determining the first target node according to the node evaluation model can be understood as determining the first target node, that is, the search node, according to the evaluation information of the multiple nodes.
根据节点评估模型确定第二目标节点,可以理解为,根据该多个节点的评价信息确定第二目标节点,即剪枝节点。
Determining the second target node according to the node evaluation model can be understood as determining the second target node, that is, the pruning node, according to the evaluation information of the multiple nodes.
下面以节点的评价信息用于指示对节点的多步伪成本函数的函数值的预测为例进行说明。应理解,方法400中的多步伪成本函数的函数值均为节点评估模型输出的预测值。The following is an example in which the evaluation information of a node is used to indicate the prediction of the function value of a multi-step pseudo-cost function of the node. It should be understood that the function values of the multi-step pseudo-cost function in the method 400 are all predicted values output by the node evaluation model.
节点的多步伪成本函数的函数值可以用于衡量节点的长期价值。或者,节点多步伪成本函数的函数值可以用于衡量节点的长期成本。The function value of a node's multi-step pseudo-cost function can be used to measure the long-term value of the node. Alternatively, the function value of a node's multi-step pseudo-cost function can be used to measure the long-term cost of a node.
示例性地,基于节点的多步伪成本函数的函数值可以对节点进行打分。例如,节点的多步伪成本函数的函数值越大,分数越高,则该节点的长期价值越低,或者说,该节点的长期成本越高。节点的多步伪成本函数的函数值越小,分数越低,该节点的长期价值越高,或者说,该节点的长期成本越低。应理解,此处仅为示例,节点的分数与节点的长期价值或节点的长期成本之间的关系还可以表示为其他形式,本申请实施例对此不做限定。Illustratively, the node may be scored based on the function value of the node's multi-step pseudo-cost function. For example, the larger the function value and the higher the score of a node's multi-step pseudo-cost function, the lower the long-term value of the node, or in other words, the higher the long-term cost of the node. The smaller the function value and the lower the score of a node's multi-step pseudo-cost function, the higher the long-term value of the node, or in other words, the lower the long-term cost of the node. It should be understood that this is only an example, and the relationship between the node's score and the node's long-term value or the node's long-term cost can also be expressed in other forms, which is not limited in the embodiments of the present application.
基于多个节点的多步伪成本函数的函数值,将分数最低的节点作为搜索节点。Based on the function values of the multi-step pseudo-cost function of multiple nodes, the node with the lowest score is used as the search node.
基于多个节点的多步伪成本函数的函数值,将分数最高的k个节点作为候选剪枝节点。基于该k个节点的分数构造概率向量,概率性地选中其中一个节点作为剪枝节点。Based on the function values of the multi-step pseudo-cost function of multiple nodes, the k nodes with the highest scores are used as candidate pruning nodes. A probability vector is constructed based on the scores of the k nodes, and one of the nodes is probabilistically selected as a pruning node.
节点的分数与节点的概率呈正相关关系。节点的分数越高,节点被剪枝的概率越大。节点的分数越低,节点被剪枝的概率越低。The score of a node is positively related to the probability of the node. The higher the score of a node, the greater the probability of the node being pruned. The lower the score of a node, the lower the probability of the node being pruned.
图7示出了一种剪枝节点的确定方式。Figure 7 shows a way of determining pruning nodes.
如图7所示,基于分数最低的多个节点的分数构造概率向量。节点1,节点3和节点4的分数分别为5,2,3。基于节点的分数确定节点的概率。节点的分数越高,节点的概率越高。如图7所示,节点1,节点3和节点4的概率分别为0.3,0.1和0.25。索引为4的节点被采样为剪枝节点。As shown in Figure 7, a probability vector is constructed based on the scores of multiple nodes with the lowest scores. The scores of node 1, node 3 and node 4 are 5, 2, 3 respectively. Determine the probability of a node based on its score. The higher the score of a node, the higher the probability of the node. As shown in Figure 7, the probabilities of node 1, node 3 and node 4 are 0.3, 0.1 and 0.25 respectively. The node with index 4 is sampled as a pruned node.
应理解,以上确定第一目标节点和第二目标节点的方式仅为示例,其他方式的描述可以参考方法200。It should be understood that the above methods of determining the first target node and the second target node are only examples, and for descriptions of other methods, reference may be made to method 200 .
示例性地,选择节点的装置可以将目标节点(第一目标节点和/或第二目标节点)的指示信息发送给求解器。For example, the device for selecting a node may send indication information of the target node (the first target node and/or the second target node) to the solver.
示例性地,目标节点的指示信息可以包括该多个节点的搜索顺序。For example, the indication information of the target node may include the search order of the multiple nodes.
例如,排在第一位的节点可以为下一轮迭代中的搜索节点。For example, the node ranked first can be the search node in the next iteration.
示例性地,目标节点的指示信息可以包括该多个节点的评价信息。For example, the indication information of the target node may include evaluation information of the multiple nodes.
示例性地,目标节点的指示信息可以包括该多个节点的分数。For example, the indication information of the target node may include scores of the multiple nodes.
示例性地,目标节点的指示信息可以包括目标节点本身。For example, the indication information of the target node may include the target node itself.
应理解,以上仅为示例,目标节点的指示信息还可以包括其他形式的信息,只要根据该指示信息能够确定目标节点即可。It should be understood that the above are only examples, and the indication information of the target node may also include other forms of information, as long as the target node can be determined based on the indication information.
求解器可以根据目标节点调整候选节点集合,根据调整后的候选节点集合求解目标规划问题,以得到求解结果。The solver can adjust the candidate node set according to the target node, and solve the target planning problem according to the adjusted candidate node set to obtain the solution result.
在目标节点包括搜索节点的情况下,求解器可以对搜索节点进行展开,以得到搜索节点的子节点,即得到目标规划问题的新的子问题。该搜索节点的子节点可以被添加至候选节点集合中。示例性地,该调整后的候选节点集合可以作为下一轮迭代过程所使用的候选节点集合。在迭代过程中,可以重复上述步骤420直至求解结束,得到求解结果。
In the case where the target node includes a search node, the solver can expand the search node to obtain the child nodes of the search node, that is, to obtain a new sub-problem of the target planning problem. Child nodes of the search node can be added to the set of candidate nodes. For example, the adjusted candidate node set can be used as the candidate node set used in the next round of iteration process. During the iterative process, the above step 420 can be repeated until the solution is completed to obtain the solution result.
在目标节点包括剪枝节点的情况下,求解器可以对剪枝节点进行剪枝处理,并调整候选节点集合。示例性地,该调整后的候选节点集合可以作为下一轮迭代过程所使用的候选节点集合。在迭代过程中,可以重复上述步骤420直至求解结束,得到求解结果。When the target node includes a pruned node, the solver can prune the pruned node and adjust the set of candidate nodes. For example, the adjusted candidate node set can be used as the candidate node set used in the next round of iteration process. During the iterative process, the above step 420 can be repeated until the solution is completed to obtain the solution result.
示例性地,求解结束可以为所有子问题全部求解完,即候选节点集合中不包括节点。或者,求解结束可以为求解时间超过预设时间。或者,求解结束可以为全局上界值与全局下界值之间的差异小于设定阈值。求解结束的条件可以根据需要设置,本申请实施例对此不做限定。For example, the end of the solution can be that all sub-problems have been solved, that is, the candidate node set does not include nodes. Alternatively, the end of the solution can be when the solution time exceeds a preset time. Alternatively, the solution can end when the difference between the global upper bound value and the global lower bound value is less than a set threshold. The conditions for ending the solution can be set as needed, and the embodiments of this application do not limit this.
可选地,方法400还可以包括:将求解结果返回给用户。Optionally, method 400 may also include: returning the solution result to the user.
进一步地,方法400还可以包括:将目标节点的指示信息返回给用户。Further, the method 400 may also include: returning the indication information of the target node to the user.
应理解,方法400中的求解过程仅为示例。在方法400中,用户可以向求解器提供目标规划问题,并接收求解器返回的目标规划问题的求解结果。It should be understood that the solution process in method 400 is only an example. In method 400, a user may provide a goal programming problem to a solver and receive a solution result of the goal planning problem returned by the solver.
例如,在其他可能的实现方式中,用户可以向选择节点的装置提供候选节点集合,并接收由选择节点的装置提供的目标节点的指示信息。For example, in other possible implementations, the user may provide a set of candidate nodes to the device for selecting nodes, and receive indication information of the target node provided by the device for selecting nodes.
下面结合图8对本申请实施例提供的一种模型的训练方法进行示例性说明。The following is an exemplary description of a model training method provided by the embodiment of the present application with reference to Figure 8 .
应理解,图8的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将申请实施例限制于所示例的具体数值或具体场景。图8示出了本申请实施例提供的一种模型的训练方法,图8所示的训练方法可以视为图3所示的方法300的一种具体实现方式。It should be understood that the example in FIG. 8 is only to help those skilled in the art understand the embodiments of the present application, but is not intended to limit the embodiments of the present application to the specific numerical values or specific scenarios illustrated. Figure 8 shows a model training method provided by the embodiment of the present application. The training method shown in Figure 8 can be regarded as a specific implementation of the method 300 shown in Figure 3.
为了便于理解和描述,在图8中仅以求解器和模型的训练装置分开部署为例对训练过程进行示例性说明。在其他实现方式中,求解器和模型的训练装置可以是集成在同一装置中的。In order to facilitate understanding and description, the training process is illustrated in Figure 8 by taking the example of separately deploying the solver and the training device of the model. In other implementations, the solver and the model training device may be integrated in the same device.
如图8所示,方法800包括步骤810至步骤830,下面对步骤810至步骤830进行说明。As shown in Figure 8, the method 800 includes steps 810 to 830, which are described below.
步骤810,获取样本节点。Step 810: Obtain sample nodes.
示例性地,样本节点可以来自训练数据库。Illustratively, the sample nodes may be from a training database.
求解器可以生成规划问题的多个节点,该多个节点可以作为样本节点。样本节点可以是基于一个或多个规划问题的求解过程确定的。该一个或多个规划问题可以是由用户提供的,也可以是预先存储的。The solver can generate multiple nodes of the planning problem, which can be used as sample nodes. Sample nodes may be determined based on the solution process of one or more planning problems. The one or more planning questions may be provided by the user or may be pre-stored.
作为一种示例,求解器可以接收用户提供的批量数据(例如,多个规划问题),并基于用户提供的批量数据进行求解,从求解过程中生成的多个节点中采样出样本节点,将样本节点的相关信息存储至训练数据库中。或者,求解器可以接收用户提供的批量数据,例如,多个规划问题,并基于用于提供的批量数据和历史数据(例如,预先存储的多个规划问题)进行求解,从求解过程中生成的多个节点中采样出样本节点,将样本节点的相关信息存储至训练数据库中。As an example, the solver can receive batch data provided by the user (for example, multiple planning problems), and solve based on the batch data provided by the user, sampling sample nodes from the multiple nodes generated during the solving process, and converting the samples into The relevant information of the nodes is stored in the training database. Alternatively, the solver can receive user-supplied batch data, e.g., multiple planning problems, and perform a solution based on the supplied batch data and historical data (e.g., multiple pre-stored planning problems), generated from the solution process. Sample nodes are sampled from multiple nodes, and the relevant information of the sample nodes is stored in the training database.
示例性地,样本节点可以由用户提供。Illustratively, sample nodes may be provided by users.
步骤820,基于特征提取模型对样本节点的相关信息进行降维处理,以得到样本节点的相关信息的低维表示。Step 820: Perform dimensionality reduction processing on the relevant information of the sample node based on the feature extraction model to obtain a low-dimensional representation of the relevant information of the sample node.
示例性地,该特征提取模型为图卷积神经网络。该图卷积神经网络的输入可以包括样本节点的相关信息。该图卷积神经网络用于对样本节点的相关信息进行降维处理,输
出样本节点的低维嵌入表示,即样本节点的相关信息的低维表示。Illustratively, the feature extraction model is a graph convolutional neural network. The input of the graph convolutional neural network can include relevant information of sample nodes. This graph convolutional neural network is used to reduce the dimensionality of the relevant information of the sample nodes and output The low-dimensional embedding representation of the sample node is obtained, that is, the low-dimensional representation of the relevant information of the sample node.
步骤830,以样本节点的相关信息的低维表示作为节点评估模型的输入,以减少节点评估模型的输出结果和样本节点对应的标签之间的差距为目标调整节点评估模型的参数。Step 830: Use the low-dimensional representation of the relevant information of the sample node as the input of the node evaluation model, and adjust the parameters of the node evaluation model with the goal of reducing the gap between the output result of the node evaluation model and the label corresponding to the sample node.
示例性地,节点评估模型可以为全连接神经网络。For example, the node evaluation model may be a fully connected neural network.
具体地,可以通过深度Q学习的方式训练节点评估模型。或者说,节点评估模型可以为深度Q网络。Specifically, the node evaluation model can be trained through deep Q learning. In other words, the node evaluation model can be a deep Q network.
下面以样本节点对应的标签为多步伪成本函数的函数值为例对训练过程进行说明。The training process is explained below by taking the label corresponding to the sample node as the function value of the multi-step pseudo-cost function as an example.
多步伪成本函数的定义满足动态规划的贝尔曼方程,该方程的状态转移函数表达式未知,可以通过深度Q学习的方式进行求解,或者说,可以通过深度Q学习的方式训练节点评估模型,以使得训练好的节点评估模型可以用于预测多步伪成本函数的函数值。The definition of the multi-step pseudo-cost function satisfies the Bellman equation of dynamic programming. The expression of the state transition function of this equation is unknown and can be solved by deep Q learning. In other words, the node evaluation model can be trained by deep Q learning. So that the trained node evaluation model can be used to predict the function value of the multi-step pseudo-cost function.
DQL可以通过估计每个动作的长期累积回报(Q函数)来帮助选择最优动作。在本申请实施例中,节点评估网络通过预测每个节点的多步伪成本来帮助选择节点。多步伪成本函数可以作为Q函数。DQL can help select optimal actions by estimating the long-term cumulative return (Q function) of each action. In the embodiment of this application, the node evaluation network helps select nodes by predicting the multi-step pseudo-cost of each node. The multi-step pseudo-cost function can be used as a Q function.
在训练过程中,可以将样本节点对应的预测标签作为样本节点对应的标签。During the training process, the prediction label corresponding to the sample node can be used as the label corresponding to the sample node.
预测标签满足如下公式:
The predicted label satisfies the following formula:
The predicted label satisfies the following formula:
其中,为样本节点的预测标签。c(P)为节点P的父节点的界限值和节点P的界限值之间的差异,可以通过求解器获得。为目标评估模型。in, is the predicted label of the sample node. c(P) is the difference between the limit value of the parent node of node P and the limit value of node P, which can be obtained by the solver. Evaluate the model for the target.
在训练过程中,以减小样本节点的标签和模型的输出Cθ(P)之间的差距为目的调整模型参数。Cθ表示待训练的模型。During the training process, to reduce the label of the sample node and the model output C θ (P) for the purpose of adjusting the model parameters. C θ represents the model to be trained.
示例性地,训练目标可以表示为:
For example, the training goal can be expressed as:
在训练过程中,可以将求解器封装为环境,训练装置通过和求解器的不断交互采集数据。训练装置可以通过调用求解器获得节点的界限值,该界限值可以作为监督信息用于拟合多步伪成本函数。例如,训练装置可以从求解器中获取节点P的界限值以及节点P的父节点的界限值,从而使得训练装置可以获取到c(P)。训练装置可以根据目标评估模型确定节点P的子节点的界限值,进而确定上述预测标签中的第二项。During the training process, the solver can be encapsulated as an environment, and the training device collects data through continuous interaction with the solver. The training device can obtain the limit value of the node by calling the solver, and the limit value can be used as supervision information for fitting the multi-step pseudo cost function. For example, the training device can obtain the limit value of node P and the limit value of the parent node of node P from the solver, so that the training device can obtain c(P). The training device can determine the limit value of the child node of node P according to the target evaluation model, and then determine the second item in the above prediction label.
目标评估模型与节点评估模型的结构相同。两者的参数可能相同,也可能不同。目标评估模型即为深度Q学习过程中的目标网络。该目标评估模型用于稳定训练,防止出现过拟合。The target evaluation model has the same structure as the node evaluation model. The parameters of both may be the same or different. The target evaluation model is the target network in the deep Q learning process. This target evaluation model is used to stabilize training and prevent overfitting.
示例性地,在节点评估模型的训练过程中,每隔一段时间,可以基于当前的节点评估模型的参数更新目标评估模型,即将目标评估模型替换为当前的节点评估模型。For example, during the training process of the node evaluation model, at regular intervals, the target evaluation model can be updated based on the parameters of the current node evaluation model, that is, the target evaluation model is replaced with the current node evaluation model.
例如,在节点评估模型的训练过程中,每N次迭代后,可以基于当前的节点评估模型的参数更新目标评估模型。For example, during the training process of the node evaluation model, after every N iterations, the target evaluation model can be updated based on the parameters of the current node evaluation model.
图9示出了一种基于强化学习的训练过程的示意图。图9所示的训练过程可以包括如下步骤:Figure 9 shows a schematic diagram of a training process based on reinforcement learning. The training process shown in Figure 9 may include the following steps:
1)求解器获取数据集。1) The solver obtains the data set.
数据集中包括一个或多个规划问题。该一个或多个规划问题的决策变量中的至少部
分决策变量为整数变量。The data set contains one or more planning problems. At least some of the decision variables of the one or more planning problems The decision variables are integer variables.
2)求解器基于该数据集生成候选样本节点集合。候选样本集合中包括多个样本节点。2) The solver generates a set of candidate sample nodes based on the data set. The candidate sample set includes multiple sample nodes.
换言之,求解器可以基于该数据集生成分支定界搜索树,分支定界搜索树包括该多个样本节点。换言之,候选样本节点集合可以以分支定界搜索树的形式表示。In other words, the solver may generate a branch-and-bound search tree based on the data set, the branch-and-bound search tree including the plurality of sample nodes. In other words, the set of candidate sample nodes can be represented in the form of a branch-and-bound search tree.
候选样本节点集合可以存储至训练数据库中。The set of candidate sample nodes can be stored in the training database.
3)对多个样本节点的相关信息进行降维处理,以得到多个样本节点的特征,即该多个样本节点的相关信息的低维表示。3) Perform dimensionality reduction processing on the relevant information of multiple sample nodes to obtain the characteristics of multiple sample nodes, that is, a low-dimensional representation of the relevant information of the multiple sample nodes.
4)将该多个样本节点的特征输入至节点评估模型中进行处理,以预测该多个样本节点的多步伪成本的函数值。4) Input the characteristics of the multiple sample nodes into the node evaluation model for processing to predict the function values of the multi-step pseudo-costs of the multiple sample nodes.
该多个样本节点的特征可以作为深度Q学习中的状态。The features of the multiple sample nodes can be used as states in deep Q learning.
5)基于节点评估模型的输出确定目标样本节点。5) Determine the target sample node based on the output of the node evaluation model.
6)将目标样本节点反馈至求解器中。6) Feed back the target sample nodes to the solver.
换言之,将目标样本节点作为深度Q学习中的动作反馈到求解器中。In other words, target sample nodes are fed back to the solver as actions in deep Q-learning.
7)求解器可以将目标样本节点的界限值以及目标样本节点的父节点的界限值提供给节点评估模型。7) The solver can provide the limit value of the target sample node and the limit value of the target sample node's parent node to the node evaluation model.
目标样本节点的界限值以及目标样本节点的父节点的界限值可以作为深度Q学习中的奖励,用于调整节点评估模型的参数。The limit value of the target sample node and the limit value of the target sample node's parent node can be used as rewards in deep Q learning to adjust the parameters of the node evaluation model.
8)求解器可以基于目标样本节点调整候选样本节点集合。8) The solver can adjust the set of candidate sample nodes based on the target sample node.
重复上述步骤3)至步骤8)直至训练完成。Repeat the above steps 3) to 8) until the training is completed.
图10示出了节点P在分支定界搜索树中递归展开的示意图。如图10所示,节点P可以展开为两个子节点N1和N2。在图10中,由虚线连接的子节点(例如,N4)可以理解为被剪枝处理的节点或者当前还未展开的节点。由实线连接的子节点(例如,N1、N2、N4和N5)可以理解为实际展开的节点。Figure 10 shows a schematic diagram of recursive expansion of node P in a branch-and-bound search tree. As shown in Figure 10, node P can be expanded into two sub-nodes N 1 and N 2 . In Figure 10, child nodes (for example, N 4 ) connected by dotted lines can be understood as nodes that have been pruned or nodes that have not yet been expanded. The child nodes (eg, N 1 , N 2 , N 4 and N 5 ) connected by solid lines can be understood as actual expanded nodes.
示例性地,在调整节点评估模型的参数的过程中,可以同步调整特征提取模型的参数,即以减少节点评估模型的输出结果和样本节点对应的标签之间的差距为目标调整特征提取模型的参数和节点评估模型的参数。For example, in the process of adjusting the parameters of the node evaluation model, the parameters of the feature extraction model can be adjusted simultaneously, that is, the feature extraction model can be adjusted with the goal of reducing the gap between the output results of the node evaluation model and the labels corresponding to the sample nodes. Parameters and nodes evaluate the parameters of the model.
或者,特征提取模型也可以通过其他方式训练得到,本申请实施例对此不做限定。Alternatively, the feature extraction model can also be trained in other ways, which is not limited in the embodiments of the present application.
可选地,方法800还可以包括:将节点评估模型返回给用户。Optionally, the method 800 may also include: returning the node evaluation model to the user.
进一步地,方法800还可以包括:将特征提取模型返回给用户。Further, method 800 may also include: returning the feature extraction model to the user.
在方法800中,使用基于GCN的特征提取模型对分支定界搜索树中的节点进行特征提取,并使用基于全连接神经网络的节点评估模型对节点的多步伪成本进行预测。特征提取模型和全连接神经网络的训练采用强化学习的方式进行。采用方法800训练好的模型可以挂载至求解器,用于求解过程中的搜索节点和剪枝节点的选择,有利于提高求解效率。In the method 800, a GCN-based feature extraction model is used to perform feature extraction on the nodes in the branch-and-bound search tree, and a node evaluation model based on a fully connected neural network is used to predict the multi-step pseudo-cost of the node. The feature extraction model and fully connected neural network are trained using reinforcement learning. The model trained using method 800 can be mounted to the solver and used to select search nodes and pruning nodes during the solution process, which is beneficial to improving solution efficiency.
应理解,方法800中的训练方法仅为示例,其他训练方法可以参考方法300中的描述,此处不再赘述。It should be understood that the training method in method 800 is only an example. For other training methods, please refer to the description in method 300 and will not be described again here.
表1示出了不同求解方法下的测试指标的对比结果。具体地,表1示出了基于规则的最佳估计搜索(best estimate search)方法和采用本申请实施例的方案的求解方法的测
试指标。Table 1 shows the comparison results of test indicators under different solution methods. Specifically, Table 1 shows the results of the rule-based best estimate search method and the solution method using the solution of the embodiment of the present application. test indicators.
求解器采用混合整数规划求解器求解约束整数程序(solving constraint integer programs,SCIP),基于多个开源数据集进行实验。The solver uses a mixed integer programming solver to solve constraint integer programs (SCIP), and experiments are conducted based on multiple open source data sets.
表1示出了四组试验,分别介绍如下:Table 1 shows four groups of experiments, which are introduced as follows:
(1)数据集为开源背包问题(knapsack problem)数据集(MIK),中等规模数据集,求解难度为中等(medium)难度。规划问题为最小值优化问题。(1) The data set is the open source knapsack problem data set (MIK), a medium-sized data set, and the solution difficulty is medium. The planning problem is a minimum value optimization problem.
(2)数据集为联合拍卖(combinatorial auctions,cauctions)问题数据集,中等规模数据集,求解难度为medium难度。规划问题为最大值优化问题。(2) The data set is a combinatorial auction (cauctions) problem data set, a medium-sized data set, and the solution difficulty is medium difficulty. The planning problem is a maximum optimization problem.
(3)数据集为人工生成的集合覆盖(set cover),小规模数据集,求解难度为简单(easy)难度。规划问题为最小值优化问题。(3) The data set is an artificially generated set cover, a small-scale data set, and the solution difficulty is easy. The planning problem is a minimum value optimization problem.
(4)数据集为工厂定位问题(facility location)数据集(facilities),小规模数据集,求解难度为简单(easy)难度。规划问题为最大值优化问题。(4) The data set is a facility location data set (facilities), a small-scale data set, and the difficulty of solving it is easy. The planning problem is a maximum optimization problem.
测试指标包括:问题的求解时间(solving time),求解过程中生成的搜索树的节点(nodes)的数量以及求解过程中的原始界(primal bound)和对偶界(dual bound)的变化曲线的时间积分值,即原始对偶积分(primal dual integral)。Test indicators include: solving time of the problem, the number of nodes of the search tree generated during the solving process, and the time of the change curves of the primal bound and dual bound during the solving process. The integral value is the primal dual integral.
如表1所述,本申请实施例的方案在上述测试指标上的性能均有所提升。其中,求解时间平均提升了27.8%。As shown in Table 1, the performance of the solution in the embodiment of the present application has been improved on the above test indicators. Among them, the solution time increased by 27.8% on average.
表1
Table 1
Table 1
本申请实施例中,一种可能的实现方式中,用户可以在本地设备上对上述各个模型进行训练。In the embodiment of this application, in a possible implementation manner, the user can train each of the above models on a local device.
本申请实施例中,另一种可能的实现方式中,用户可以在AI基础开发平台上对上述各个模型进行训练。In the embodiment of this application, in another possible implementation manner, users can train each of the above models on the AI basic development platform.
应理解,AI基础开发平台是云平台中一项平台即服务(platform-as-a-service,PaaS)云服务,是基于公有云服务提供商所拥有的大量基础资源和软件能力对用户(也称为:租户、AI开发者等)提供的辅助进行AI模型的构建、训练、部署以及AI应用的开发和部署的软件平台。如图11所示,用户与AI基础开发平台的交互形态主要包括:用户通过客户端网页登录云平台,在云平台中选择并购买AI基础开发平台的云服务,购买后,用户即可以基于AI基础开发平台提供的功能进行全流程的AI开发。用户在AI基础开发平台上开发和训练自己的AI模型时,是基于云服务提供商的数据中心中的基础资源(主要是计算资源,例如中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、嵌入式神经网络处理器(neural-network process units,NPU)等)进行的。It should be understood that the AI basic development platform is a platform-as-a-service (PaaS) cloud service in the cloud platform, which is based on the large number of basic resources and software capabilities owned by public cloud service providers for users (also It is called: a software platform provided by tenants, AI developers, etc.) to assist in the construction, training, deployment of AI models, and the development and deployment of AI applications. As shown in Figure 11, the interaction between users and the AI basic development platform mainly includes: users log in to the cloud platform through the client web page, select and purchase the cloud service of the AI basic development platform in the cloud platform, and after purchase, the user can The basic development platform provides functions for full-process AI development. When users develop and train their own AI models on the AI basic development platform, they are based on the basic resources (mainly computing resources, such as central processing unit (CPU), graphics processor) in the cloud service provider's data center. (graphics processing unit, GPU), embedded neural network processor (neural-network process units, NPU), etc.).
AI基础开发平台可以独立地部署在云环境的数据中心中的服务器或虚拟机上,AI基础开发平台也可以分布式地部署在数据中心中的多台服务器上、或者分布式地部署在数据中心中的多台虚拟机上。The AI basic development platform can be independently deployed on a server or virtual machine in a data center in a cloud environment. The AI basic development platform can also be deployed distributedly on multiple servers in a data center, or distributed in a data center. on multiple virtual machines.
在另一种实施例中,本申请提供的AI基础开发平台还可以分布式地部署在不同的环境中。本申请提供的AI基础开发平台可以在逻辑上分成多个部分,每个部分具有不同的功能。例如,AI基础开发平台中的一部分可以部署在边缘环境中的计算设备中(也称边缘计算设备),另一部分可以部署在云环境中的设备中。边缘环境为在地理位置上距离用户的终端计算设备较近的环境,边缘环境包括边缘计算设备,例如:边缘服务器、拥有计算能力的边缘小站等。部署在不同环境或设备的AI基础开发平台的各个部分协同实现为用户提供训练AI模型等功能。In another embodiment, the AI basic development platform provided by this application can also be deployed in a distributed manner in different environments. The AI basic development platform provided by this application can be logically divided into multiple parts, each part having different functions. For example, part of the AI basic development platform can be deployed in computing devices in the edge environment (also called edge computing devices), and the other part can be deployed in devices in the cloud environment. The edge environment is an environment that is geographically close to the user's terminal computing device. The edge environment includes edge computing devices, such as edge servers, edge stations with computing capabilities, etc. Various parts of the AI basic development platform deployed in different environments or devices work together to provide users with functions such as training AI models.
下面以节点评估模型的训练为例对AI基础开发平台提供的AI模型的训练服务进行说明。The following takes the training of the node evaluation model as an example to explain the AI model training service provided by the AI basic development platform.
AI基础开发平台可以对初始模型进行训练,获得满足用户的目标的节点评估模型。The AI basic development platform can train the initial model and obtain a node evaluation model that meets the user's goals.
该初始模型可以是AI基础开发平台中内置的初始模型。或者,该初始模型可以是用户提供的或用户在AI基础开发平台上选择的初始模型。或者,该初始模型还可以是AI基础开发平台利用后台的神经网络架构搜索算法搜索到的合适的模型。The initial model may be a built-in initial model in the AI basic development platform. Alternatively, the initial model may be an initial model provided by the user or selected by the user on the AI basic development platform. Alternatively, the initial model can also be a suitable model searched by the AI basic development platform using the background neural network architecture search algorithm.
训练数据可以包括AI基础开发平台中内置的数据。或者,训练数据可以包括用户提供的数据或基于用户提供数据进行处理后的数据。例如,用户提供的数据可以为数据集,该数据集中包括一个或多个混合整数规划问题。AI基础开发平台可以基于内置的求解器对该数据集进行处理,以得到候选样本节点集合。AI基础开发平台可以将该候选样本节点集合中的多个样本节点保存至训练数据库中。再如,用户提供的数据可以为候选样本节点集合。AI基础开发平台可以将该候选样本节点集合中的多个样本节点保存至训练数据库中。再如,用户提供的数据可以包括候选样本节点集合以及样本节点对应的标签。AI基础开发平台可以将该候选样本节点集合中的多个样本节点以及该多个样本节点
对应的标签保存至训练数据库中。用户提供的数据还可以为其他类型的数据,具体描述可以参考前文中的方法300或方法800,此处不再赘述。Training data can include data built into the AI basic development platform. Alternatively, the training data may include user-supplied data or data processed based on user-supplied data. For example, the user-supplied data may be a data set that includes one or more mixed integer programming problems. The AI basic development platform can process the data set based on the built-in solver to obtain a set of candidate sample nodes. The AI basic development platform can save multiple sample nodes in the candidate sample node set to the training database. For another example, the data provided by the user can be a collection of candidate sample nodes. The AI basic development platform can save multiple sample nodes in the candidate sample node set to the training database. For another example, the data provided by the user may include a set of candidate sample nodes and labels corresponding to the sample nodes. The AI basic development platform can combine multiple sample nodes in the candidate sample node set and the multiple sample nodes The corresponding labels are saved to the training database. The data provided by the user can also be other types of data. For specific description, please refer to the method 300 or the method 800 mentioned above, which will not be described again here.
AI基础开发平台还可以将经过前述训练后的AI模型(例如,特征提取模型或节点评估模型)部署在云环境中的节点或者边缘环境中的节点。其中,云环境中的节点可以是虚拟机实例、容器实例、物理服务器等,边缘环境中的节点可以是各种边缘设备。如图12所示,一个示例,当模型的规模较大时,可以基于模型并行的思想将模型分布式地部署在多个节点上。另一个示例,也可以在多个节点分别独立地部署模型,以支撑较大的在线服务的访问量。另一个示例,AI基础开发平台还可以根据AI模型的应用需求,将AI应用部署到注册到云平台的边缘设备。The AI basic development platform can also deploy the aforementioned trained AI model (for example, feature extraction model or node evaluation model) on nodes in the cloud environment or nodes in the edge environment. Among them, nodes in the cloud environment can be virtual machine instances, container instances, physical servers, etc., and nodes in the edge environment can be various edge devices. As shown in Figure 12, an example is shown. When the scale of the model is large, the model can be distributed and deployed on multiple nodes based on the idea of model parallelism. As another example, the model can also be deployed independently on multiple nodes to support a larger number of visits to online services. As another example, the AI basic development platform can also deploy AI applications to edge devices registered to the cloud platform based on the application requirements of the AI model.
上述被部署后的AI模型可以成为一项AI应用,或者成为AI应用中的一部分。如图13所示,用户可以通过Web网页在线访问AI应用,或者通过客户端app在线访问AI应用。当AI应用被使用时,可以通过在线调用的方式,调用部署在边缘环境或者云环境的AI模型来提供响应。由此,通过AI基础开发平台开发和训练的AI模型可以实现对在线请求数据的推理,返回推理结果。The above-deployed AI model can become an AI application or become a part of an AI application. As shown in Figure 13, users can access AI applications online through Web pages or through client apps. When an AI application is used, the AI model deployed in the edge environment or cloud environment can be called online to provide a response. As a result, the AI model developed and trained through the AI basic development platform can implement inference on online request data and return inference results.
需要说明的是,图12和图13中的节点可以包括云环境中的节点或边缘环境中的节点,而本申请实施例的方法中的节点为目标规划问题的子问题。It should be noted that the nodes in Figures 12 and 13 may include nodes in the cloud environment or nodes in the edge environment, and the nodes in the method of the embodiment of the present application are sub-problems of the target planning problem.
示例性地,特征提取模型和节点评估模型可以作为一项AI应用,例如,选择节点的应用。用户可以通过web网页或客户端app在线访问选择节点的应用。当选择节点的应用被使用时,可以通过在线调用的方式,调用部署在边缘环境或者云环境的特征提取模型和节点评估模型来提供响应。由此,返回推理结果,例如,目标节点的指示信息。For example, the feature extraction model and the node evaluation model can be used as an AI application, for example, an application for selecting nodes. Users can access the application of selected nodes online through web pages or client apps. When the node selection application is used, the feature extraction model and node evaluation model deployed in the edge environment or cloud environment can be called online to provide a response. Thus, the inference result is returned, for example, the indication information of the target node.
可替换地,特征提取模型和节点评估模型可以作为AI应用的一部分,例如,该AI应用可以为规划问题的求解应用。用户可以通过web网页或客户端app在线访问规划问题的求解应用。在该情况下,用户可以上传待求解的规划问题,即目标规划问题。当求解应用被使用时,求解器可以调用选择节点的服务以确定目标节点。选择节点的服务通过特征提取模型和节点评估模型实现对求解器请求的数据的推理,向求解器返回推理结果,例如,目标节点的指示信息。求解器基于目标节点实现规划问题的求解,并向用户返回求解结果。Alternatively, the feature extraction model and the node evaluation model may be part of an AI application, for example, the AI application may be a planning problem solving application. Users can access planning problem solving applications online through web pages or client apps. In this case, users can upload the planning problem to be solved, that is, the goal planning problem. When a solver application is used, the solver can call the select node service to determine the target node. The node selection service implements inference on the data requested by the solver through the feature extraction model and the node evaluation model, and returns the inference results to the solver, for example, the indication information of the target node. The solver solves the planning problem based on the target nodes and returns the solution results to the user.
在AI模型在提供在线推理服务的过程中,AI基础开发平台可以持续收集推理过程的输入输出数据,利用推理阶段的输入输出数据继续充实训练数据集,以及基于推理阶段的数据和对应的人工确认后的结果继续优化训练AI模型。While the AI model is providing online reasoning services, the AI basic development platform can continuously collect the input and output data of the reasoning process, use the input and output data of the reasoning phase to continue to enrich the training data set, and based on the data of the reasoning phase and the corresponding manual confirmation The final results continue to optimize and train the AI model.
应理解,在另一些情况下,由前述AI基础开发平台开发和训练的AI模型也可以不被在线部署,而是供用户下载训练完成的AI模型至本地,供用户自由地进行本地部署。例如:用户可以选择将训练完成的AI模型(例如,特征提取模型和节点评估模型)保存至OBS,进而用户从OBS下载AI模型至本地。It should be understood that in other cases, the AI model developed and trained by the aforementioned AI basic development platform may not be deployed online. Instead, users can download the trained AI model to the local area for users to freely deploy locally. For example, users can choose to save the trained AI model (for example, feature extraction model and node evaluation model) to OBS, and then the user downloads the AI model from OBS to the local.
下面结合图14至图19对本申请实施例的装置进行说明。应理解,下面描述的装置能够执行前述本申请实施例的方法,为了避免不必要的重复,下面在介绍本申请实施例的装置时适当省略重复的描述。The device according to the embodiment of the present application will be described below with reference to FIGS. 14 to 19 . It should be understood that the devices described below can perform the foregoing methods of the embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the devices of the embodiments of the present application.
图14是本申请实施例提供的一种选择节点的装置1400的示意性框图。该装置1400可以应用于云管理平台,可以通过软件、硬件或者两者的结合实现。本申请实施例提供的
装置1400可以实现本申请实施例图2所示的方法流程。Figure 14 is a schematic block diagram of a node selection device 1400 provided by an embodiment of the present application. The device 1400 can be applied to a cloud management platform and can be implemented through software, hardware, or a combination of both. Provided by the embodiments of this application The device 1400 can implement the method flow shown in Figure 2 of the embodiment of this application.
该装置1400包括:获取模块1410和预测模块1420。获取模块1410用于获取目标规划问题的候选节点集合,候选节点集合包括多个节点,多个节点中的每个节点对应目标规划问题的一个待求解的子问题。预测模块1420用于通过节点评估模型预测每个节点在多步展开后的界限值的相关量,节点评估模型的输出结果用于确定目标节点,目标节点用于调整候选节点集合,调整后的候选节点集合用于对目标规划问题进行求解。The device 1400 includes: an acquisition module 1410 and a prediction module 1420. The acquisition module 1410 is used to obtain a candidate node set of the target planning problem. The candidate node set includes multiple nodes, and each node in the multiple nodes corresponds to a sub-problem to be solved of the target planning problem. The prediction module 1420 is used to predict the relevant quantity of the limit value of each node after multi-step expansion through the node evaluation model. The output result of the node evaluation model is used to determine the target node. The target node is used to adjust the candidate node set. The adjusted candidate Node collections are used to solve goal programming problems.
可选地,该装置1400还包括:确定模块1430(图中未示出),该确定模块1430可以用于根据用户指示确定节点评估模型,该节点评估模型可以部署于云管理平台。Optionally, the device 1400 further includes: a determination module 1430 (not shown in the figure), which can be used to determine a node evaluation model according to user instructions, and the node evaluation model can be deployed on a cloud management platform.
可选地,节点评估模型的输出结果用于确定第一目标节点,调整后的候选节点集合中包括第一目标节点的子节点。Optionally, the output result of the node evaluation model is used to determine the first target node, and the adjusted candidate node set includes child nodes of the first target node.
可选地,节点评估模型的输出结果用于确定第二目标节点,调整后的候选节点集合中不包括第二目标节点。Optionally, the output result of the node evaluation model is used to determine the second target node, and the second target node is not included in the adjusted candidate node set.
可选地,每个节点在多步展开后的界限值的相关量包括每个节点在多步展开后的界限值与每个节点的父节点的界限值之间的差异。Optionally, the correlation quantity of the limit value of each node after multi-step expansion includes the difference between the limit value of each node after multi-step expansion and the limit value of each node's parent node.
可选地,每个节点在多步展开后的界限值的相关量包括每个节点在被完全求解后的界限值与每个节点的父节点的界限值之间的差异。Optionally, the correlation quantity of the limit value of each node after multi-step expansion includes the difference between the limit value of each node after being completely solved and the limit value of each node's parent node.
可选地,第一目标节点在多步展开后的界限值与第一目标节点的父节点的界限值之间的差异小于或等于多个节点中的第一目标节点以外的其他节点在多步展开后的界限值与第一目标节点以外的其他节点的父节点的界限值之间的差异。Optionally, the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to the multi-step value of other nodes other than the first target node among the multiple nodes. The difference between the expanded bounds and the bounds of the parent nodes of nodes other than the first target node.
可选地,第二目标节点属于多个节点中的k个节点,k个节点在多步展开后的界限值与k个节点的父节点的界限值之间的差异大于或等于多个节点中的k个节点以外的其他节点在多步展开后的界限值与k个节点以外的其他节点的父节点的界限值之间的差异,k为大于1的整数,k小于多个节点的数量。Optionally, the second target node belongs to k nodes among the plurality of nodes, and the difference between the limit value of the k nodes after multi-step expansion and the limit value of the parent node of the k nodes is greater than or equal to that of the plurality of nodes. The difference between the limit value of other nodes other than k nodes after multi-step expansion and the limit value of the parent node of other nodes other than k nodes, k is an integer greater than 1, and k is less than the number of multiple nodes.
可选地,第二目标节点是基于k个节点对应的概率确定的,k个节点对应的概率与k个节点在多步展开后的界限值与k个节点的父节点的界限值之间的差异呈正相关关系。Optionally, the second target node is determined based on the probability corresponding to the k nodes. The probability corresponding to the k nodes is between the limit value of the k node after multi-step expansion and the limit value of the parent node of the k node. The differences are positively correlated.
可选地,每个节点在多步展开后的界限值包括每个节点在多步展开后的松弛解对应的目标函数的函数值。Optionally, the limit value of each node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of each node after multi-step expansion.
可选地,节点评估模型是基于样本节点和样本节点对应的标签训练得到的,样本节点对应的标签与样本节点在多步展开后的界限值相关。Optionally, the node evaluation model is trained based on the sample node and the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
可选地,样本节点对应的标签用于指示样本节点在多步展开后的界限值和样本节点的父节点的界限值之间的差异。Optionally, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
可选地,样本节点对应的标签是根据第一差异和第二差异确定的,第一差异为样本节点的父节点的界限值与样本节点的界限值之间的差异,第二差异是通过将样本节点的子节点输入至目标评估模型中进行处理后得到的,目标评估模型与节点评估模型的结构相同。Optionally, the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is determined by The child nodes of the sample node are input into the target evaluation model for processing. The target evaluation model has the same structure as the node evaluation model.
可选地,节点评估模型的输入包括每个节点的相关信息或每个节点的相关信息的低维表示,多个节点的相关信息包括以下至少一项:每个节点的目标函数,每个节点的约束条件或每个节点的决策变量,每个节点的相关信息的低维表示是通过特征提取模型对每个节点的相关信息进行降维处理得到的。Optionally, the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node, and the relevant information of multiple nodes includes at least one of the following: an objective function of each node, each node Constraints or decision variables of each node, the low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model.
图15是本申请实施例提供的一种节点评估模型的训练装置1500的示意性框图。该装
置1500可以应用于云管理平台,可以通过软件、硬件或者两者的结合实现。本申请实施例提供的装置1500可以实现本申请实施例图3或图8所示的方法流程。Figure 15 is a schematic block diagram of a node evaluation model training device 1500 provided by an embodiment of the present application. Should be installed Set 1500 can be applied to cloud management platforms, which can be implemented through software, hardware or a combination of both. The device 1500 provided by the embodiment of the present application can implement the method flow shown in Figure 3 or Figure 8 of the embodiment of the present application.
该装置1500包括:第一获取模块1510,第二获取模块1520和训练模块1530。第一获取模块1510用于获取样本节点。第二获取模块1520用于获取样本节点对应的标签,样本节点对应的标签与样本节点在多步展开后的界限值相关。训练模块1530用于基于样本节点和样本节点对应的标签对初始模型进行训练,以得到节点评估模型。The device 1500 includes: a first acquisition module 1510, a second acquisition module 1520 and a training module 1530. The first acquisition module 1510 is used to acquire sample nodes. The second obtaining module 1520 is used to obtain the label corresponding to the sample node, and the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion. The training module 1530 is used to train the initial model based on the sample nodes and labels corresponding to the sample nodes to obtain a node evaluation model.
第一获取模块1510和第二获取模块1520可以为同一获取模块,也可以为不同获取模块。The first acquisition module 1510 and the second acquisition module 1520 may be the same acquisition module, or they may be different acquisition modules.
可选地,样本节点在多步展开后的界限值包括样本节点在多步展开后的松弛解对应的目标函数的函数值。Optionally, the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
可选地,样本节点对应的标签用于指示样本节点在多步展开后的界限值和样本节点的父节点的界限值之间的差异。Optionally, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
可选地,训练模块1530具体用于:通过强化学习的方式对初始模型进行训练,以得到节点评估模型,其中,样本节点对应的标签是根据第一差异和第二差异确定的,第一差异为样本节点的父节点的界限值与样本节点的界限值之间的差异,第二差异是通过将样本节点的子节点输入至目标评估模型中进行处理后得到的,目标评估模型与节点评估模型的结构相同。Optionally, the training module 1530 is specifically configured to: train the initial model through reinforcement learning to obtain a node evaluation model, wherein the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing. The target evaluation model and the node evaluation model The structure is the same.
可选地,初始模型的输入包括样本节点的相关信息或样本节点的相关信息的低维表示,样本节点的相关信息包括以下至少以下一项:样本节点的目标函数,样本节点的约束条件或样本节点的决策变量,样本节点的相关信息的低维表示是通过特征提取模型对样本节点的相关信息进行降维处理得到的。Optionally, the input of the initial model includes relevant information of the sample node or a low-dimensional representation of the relevant information of the sample node. The relevant information of the sample node includes at least one of the following: the objective function of the sample node, the constraints of the sample node or the sample. The low-dimensional representation of the node's decision variables and the relevant information of the sample node is obtained by reducing the dimensionality of the relevant information of the sample node through the feature extraction model.
图16是本申请实施例提供的一种目标规划问题的求解装置1600的示意性框图。该装置1600可以应用于云管理平台,可以通过软件、硬件或者两者的结合实现。本申请实施例提供的装置1600可以实现本申请实施例图4所示的方法流程。Figure 16 is a schematic block diagram of a device 1600 for solving a goal planning problem provided by an embodiment of the present application. The device 1600 can be applied to a cloud management platform and can be implemented through software, hardware, or a combination of both. The device 1600 provided by the embodiment of the present application can implement the method flow shown in Figure 4 of the embodiment of the present application.
该装置1600包括:获取模块1610,调整模块1620和求解模块1630。获取模块1610用于获取用户上传的目标规划问题。调整模块1620用于根据节点评估模型,对目标规划问题的候选节点集合进行调整,其中,候选节点集合包括多个节点,多个节点中的每个节点对应目标规划问题的一个待求解的子问题,节点评估模型用于预测每个节点在多步展开后的界限值的相关量。求解模块1630用于基于调整后的候选节点集合,对目标规划问题进行求解,以得到目标规划问题的求解结果。The device 1600 includes: an acquisition module 1610, an adjustment module 1620 and a solution module 1630. The acquisition module 1610 is used to acquire the goal planning problem uploaded by the user. The adjustment module 1620 is used to adjust the candidate node set of the target planning problem according to the node evaluation model, where the candidate node set includes multiple nodes, and each node in the multiple nodes corresponds to a sub-problem to be solved of the target planning problem. ,The node evaluation model is used to predict the ,correlation quantity of the boundary value of each node after ,multi-step expansion. The solving module 1630 is used to solve the target planning problem based on the adjusted candidate node set to obtain the solution result of the target planning problem.
可选地,该装置1600还包括确定模块1640(图中未示出),该确定模块1640用于根据用户指示确定节点评估模型。该节点评估模型可以部署于云管理平台。Optionally, the device 1600 further includes a determining module 1640 (not shown in the figure), which is used to determine the node evaluation model according to user instructions. This node evaluation model can be deployed on the cloud management platform.
可选地,调整模块1620具体用于根据节点评估模型确定第一目标节点;生成第一目标节点的子节点;将于第一目标节点的子节点增加至候选节点集合。Optionally, the adjustment module 1620 is specifically configured to determine the first target node according to the node evaluation model; generate child nodes of the first target node; and add the child nodes of the first target node to the candidate node set.
可选地,调整模块1620具体用于根据节点评估模型确定第二目标节点;将第二目标节点从候选节点集合中删除。Optionally, the adjustment module 1620 is specifically configured to determine the second target node according to the node evaluation model; delete the second target node from the candidate node set.
可选地,每个节点在多步展开后的界限值的相关量包括每个节点在多步展开后的界限值与每个节点的父节点的界限值之间的差异。Optionally, the correlation quantity of the limit value of each node after multi-step expansion includes the difference between the limit value of each node after multi-step expansion and the limit value of each node's parent node.
可选地,每个节点在多步展开后的界限值的相关量包括每个节点在被完全求解后的界
限值与每个节点的父节点的界限值之间的差异。Optionally, the correlation quantity of the bound value of each node after multi-step expansion includes the bound value of each node after being completely solved. The difference between the limit value and the limit value of each node's parent node.
可选地,第一目标节点在多步展开后的界限值与第一目标节点的父节点的界限值之间的差异小于或等于多个节点中的第一目标节点以外的其他节点在多步展开后的界限值与第一目标节点以外的其他节点的父节点的界限值之间的差异。Optionally, the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to the multi-step value of other nodes other than the first target node among the multiple nodes. The difference between the expanded bounds and the bounds of the parent nodes of nodes other than the first target node.
可选地,第二目标节点属于多个节点中的k个节点,k个节点在多步展开后的界限值与k个节点的父节点的界限值之间的差异大于或等于多个节点中的k个节点以外的其他节点在多步展开后的界限值与k个节点以外的其他节点的父节点的界限值之间的差异,k为大于1的整数,k小于多个节点的数量。Optionally, the second target node belongs to k nodes among the plurality of nodes, and the difference between the limit value of the k nodes after multi-step expansion and the limit value of the parent node of the k nodes is greater than or equal to that of the plurality of nodes. The difference between the limit value of other nodes other than k nodes after multi-step expansion and the limit value of the parent node of other nodes other than k nodes, k is an integer greater than 1, and k is less than the number of multiple nodes.
可选地,第二目标节点是基于k个节点对应的概率确定的,k个节点对应的概率与k个节点在多步展开后的界限值与k个节点的父节点的界限值之间的差异呈正相关关系。Optionally, the second target node is determined based on the probability corresponding to the k nodes. The probability corresponding to the k nodes is between the limit value of the k node after multi-step expansion and the limit value of the parent node of the k node. The differences are positively correlated.
可选地,每个节点在多步展开后的界限值包括每个节点在多步展开后的松弛解对应的目标函数的函数值。Optionally, the limit value of each node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of each node after multi-step expansion.
可选地,节点评估模型是基于样本节点和样本节点对应的标签训练得到的,样本节点对应的标签与样本节点在多步展开后的界限值相关。Optionally, the node evaluation model is trained based on the sample node and the label corresponding to the sample node. The label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion.
可选地,样本节点对应的标签用于指示样本节点在多步展开后的界限值和样本节点的父节点的界限值之间的差异。Optionally, the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node.
可选地,样本节点对应的标签是根据第一差异和第二差异确定的,第一差异为样本节点的父节点的界限值与样本节点的界限值之间的差异,第二差异是通过将样本节点的子节点输入至目标评估模型中进行处理后得到的,目标评估模型与节点评估模型的结构相同。Optionally, the label corresponding to the sample node is determined based on the first difference and the second difference. The first difference is the difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is determined by The child nodes of the sample node are input into the target evaluation model for processing. The target evaluation model has the same structure as the node evaluation model.
可选地,节点评估模型的输入包括每个节点的相关信息或每个节点的相关信息的低维表示,每个节点的相关信息包括以下至少一项:每个节点的目标函数,每个节点的约束条件或每个节点的决策变量,每个节点的相关信息的低维表示是通过特征提取模型对每个节点的相关信息进行降维处理得到的。Optionally, the input of the node evaluation model includes relevant information of each node or a low-dimensional representation of the relevant information of each node, and the relevant information of each node includes at least one of the following: an objective function of each node, each node Constraints or decision variables of each node, the low-dimensional representation of the relevant information of each node is obtained by reducing the dimensionality of the relevant information of each node through the feature extraction model.
可选地,装置1600还包括返回模块(图中未示出),返回模块用于向用户返回目标规划问题的求解结果。Optionally, the device 1600 also includes a return module (not shown in the figure), which is used to return the solution result of the goal planning problem to the user.
图14、图15和图16所示的装置可以以功能模块的形式体现。这里的术语“模块”可以通过软件和/或硬件形式实现,对此不作具体限定。The devices shown in Figures 14, 15 and 16 can be embodied in the form of functional modules. The term "module" here can be implemented in the form of software and/or hardware, and is not specifically limited.
例如,“模块”可以是实现上述功能的软件程序、硬件电路或二者结合。示例性的,接下来以图16中的调整模块为例,介绍获取模块的实现方式。类似的,其他模块的实现方式可以参考调整模块的实现方式。For example, a "module" may be a software program, a hardware circuit, or a combination of both that implements the above functions. Illustratively, the following takes the adjustment module in Figure 16 as an example to introduce the implementation of the acquisition module. Similarly, the implementation of other modules can refer to the implementation of the adjustment module.
调整模块作为软件功能单元的一种举例,调整模块可以包括运行在计算实例上的代码。其中,计算实例可以包括物理主机(计算设备)、虚拟机、容器中的至少一种。进一步地,上述计算实例可以是一台或者多台。例如,调整模块可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。进一步地,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。Adjustment module As an example of a software functional unit, an adjustment module may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the above computing instance may be one or more. For example, a tuning module can include code running on multiple hosts/VMs/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Furthermore, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.
同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual
private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内,同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。Likewise, multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (virtual private cloud (VPC), or can be distributed in multiple VPCs. Among them, usually a VPC is set up in a region. Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .
调整模块作为硬件功能单元的一种举例,调整模块可以包括至少一个计算设备,如服务器等。或者,调整模块也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。The adjustment module is an example of a hardware functional unit. The adjustment module may include at least one computing device, such as a server. Alternatively, the adjustment module can also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). Among them, the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
调整模块包括的多个计算设备可以分布在相同的region中,也可以分布在不同的region中。调整模块包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,调整模块包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。Multiple computing devices included in the adjustment module can be distributed in the same region or in different regions. Multiple computing devices included in the adjustment module can be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the adjustment module can be distributed in the same VPC or in multiple VPCs. The plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
因此,在本申请的实施例中描述的各示例的模块,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Therefore, the modules of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
需要说明的是:上述实施例提供的装置在执行上述方法时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。例如,在装置1600中,获取模块可以用于执行上述方法中的任意步骤,调整模块可以用于执行上述方法中的任意步骤,求解模块可以用于执行上述方法中的任意步骤。获取模块、调整模块和求解模块负责实现的步骤可根据需要指定,通过获取模块、调整模块和求解模块分别实现上述方法中不同的步骤来实现上述装置的全部功能。装置1400和装置1500各功能模块的划分也仅为示例,为避免重复,此处不再赘述。It should be noted that when the device provided in the above embodiment performs the above method, only the division of the above functional modules is used as an example. In actual application, the above function allocation can be completed by different functional modules as needed, that is, the device The internal structure is divided into different functional modules to complete all or part of the functions described above. For example, in the device 1600, the acquisition module can be used to perform any step in the above method, the adjustment module can be used to perform any step in the above method, and the solving module can be used to perform any step in the above method. The steps that the acquisition module, the adjustment module and the solution module are responsible for implementing can be specified as needed. The acquisition module, the adjustment module and the solution module respectively implement different steps in the above method to realize all the functions of the above device. The division of functional modules of the device 1400 and the device 1500 is only an example, and will not be described again here to avoid duplication.
另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见上文中的方法实施例,这里不再赘述。In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the above method embodiments, which will not be described again here.
下面结合图17,对本申请实施例提供的一种计算设备进行详细描述。A computing device provided by an embodiment of the present application will be described in detail below with reference to FIG. 17 .
图17是本申请实施例提供的一种计算设备1000的架构示意图。Figure 17 is a schematic architectural diagram of a computing device 1000 provided by an embodiment of the present application.
如图17所示,计算设备1000包括:总线1002、处理器1004、存储器1006和通信接口1008。处理器1004、存储器1006和通信接口1008之间通过总线1002通信。计算设备1000可以是服务器或终端设备。应理解,本申请不限定计算设备1000中的处理器、存储器的个数。As shown in Figure 17, computing device 1000 includes: bus 1002, processor 1004, memory 1006, and communication interface 1008. The processor 1004, the memory 1006 and the communication interface 1008 communicate through the bus 1002. Computing device 1000 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 1000.
总线1002可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图17中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线1004可包括在计算设备1000各个部件(例如,存储器1006、处理器1004、通信接口1008)之间传送信息的通路。
The bus 1002 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, or the like. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 17, but it does not mean that there is only one bus or one type of bus. Bus 1004 may include a path that carries information between various components of computing device 1000 (eg, memory 1006, processor 1004, communications interface 1008).
处理器1004可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。The processor 1004 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
存储器1006可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器1004还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。Memory 1006 may include volatile memory, such as random access memory (RAM). The processor 1004 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive). drive, SSD).
存储器1006中存储有可执行的程序代码,处理器1004执行该可执行的程序代码以分别实现前述图14、图15或图16中的模块的功能,从而实现本申请实施例的方法。也即,存储器1006上存有用于执行本申请实施例的方法的指令。The memory 1006 stores executable program codes, and the processor 1004 executes the executable program codes to respectively implement the functions of the modules in Figure 14, Figure 15 or Figure 16, thereby implementing the methods of the embodiments of the present application. That is, the memory 1006 stores instructions for executing the method of the embodiment of the present application.
通信接口1003使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备1000与其他设备或通信网络之间的通信。The communication interface 1003 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 1000 and other devices or communication networks.
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。An embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
如图18所示,所述计算设备集群包括至少一个计算设备1000。该计算设备集群可以用于执行本申请实施例的方法,例如,图2、图3、图4或图8所示的方法。As shown in Figure 18, the computing device cluster includes at least one computing device 1000. The computing device cluster can be used to execute the method of the embodiment of the present application, for example, the method shown in Figure 2, Figure 3, Figure 4 or Figure 8.
下面主要以计算设备集群用于执行目标规划问题的求解方法为例进行说明。The following mainly explains the method used by the computing device cluster to solve the goal planning problem as an example.
计算设备集群中的一个或多个计算设备1000中的存储器1006中可以存有相同的用于执行本申请实施例的方法的指令。The memory 1006 in one or more computing devices 1000 in the computing device cluster may store the same instructions for performing the methods of the embodiments of the present application.
例如,计算设备集群中的一个或多个计算设备1000中的存储器1006中可以存有相同的用于执行目标规划问题的求解方法的指令。For example, the memory 1006 of one or more computing devices 1000 in a cluster of computing devices may store the same instructions for performing a method of solving a goal planning problem.
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备1000的存储器1006中也可以分别存有用于执行本申请实施例的方法的部分指令。换言之,一个或多个计算设备1000的组合可以共同执行用于执行本申请实施例的方法的指令。In some possible implementations, the memory 1006 of one or more computing devices 1000 in the computing device cluster may also store part of the instructions for executing the method of the embodiment of the present application. In other words, a combination of one or more computing devices 1000 may jointly execute instructions for performing the methods of embodiments of the present application.
例如,该计算设备集群中的一个或多个计算设备1000的存储器1006中也可以分别存有用于执行目标规划问题的求解方法的部分指令。换言之,一个或多个计算设备1000的组合可以共同执行用于执行目标规划问题的求解方法的指令。For example, the memory 1006 of one or more computing devices 1000 in the computing device cluster may also store part of the instructions for executing the solution method of the goal planning problem. In other words, a combination of one or more computing devices 1000 may collectively execute instructions for performing a method of solving a goal planning problem.
需要说明的是,计算设备集群中的不同的计算设备1000中的存储器1006可以存储不同的指令,分别用于执行目标规划问题的求解方法装置1600的部分功能。也即,不同的计算设备1000中的存储器1006存储的指令可以实现获取模块、调整模块和求解模块中的一个或多个模块的功能。It should be noted that the memories 1006 in different computing devices 1000 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the method device 1600 for solving a goal planning problem. That is, instructions stored in the memory 1006 in different computing devices 1000 may implement the functions of one or more of the acquisition module, the adjustment module, and the solution module.
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图19示出了一种可能的实现方式。如图19所示,两个计算设备1000A和1000B之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一种可能的实现方式中,计算设备1000A中的存储器1006中存有执行获取模块和求解模块的功能的指令。同时,计算设备1000B中的存储器1006中存有执行调整模块的功能的指令。
In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein, the network may be a wide area network or a local area network, etc. Figure 19 shows a possible implementation. As shown in Figure 19, two computing devices 1000A and 1000B are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this possible implementation, the memory 1006 in the computing device 1000A stores instructions for executing the functions of the acquisition module and the solution module. At the same time, instructions for performing the functions of the adjustment module are stored in the memory 1006 in the computing device 1000B.
应理解,图19中示出的计算设备1000A的功能也可以由多个计算设备1000完成。同样,计算设备1000B的功能也可以由多个计算设备1000完成。It should be understood that the functions of the computing device 1000A shown in FIG. 19 may also be performed by multiple computing devices 1000. Likewise, the functions of computing device 1000B may also be performed by multiple computing devices 1000.
本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序产品在至少一个计算设备上运行时,使得至少一个计算设备执行本申请实施例中的方法,例如,目标规划问题的求解方法、选择节点的方法或节点评估模型的训练方法。An embodiment of the present application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, at least one computing device is caused to execute the method in the embodiment of the present application, for example, a method for solving a goal planning problem, a method for selecting nodes, or a method for training a node evaluation model.
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示本申请实施例中的方法,例如,目标规划问题的求解方法、选择节点的方法或节点评估模型的训练方法。An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc. The computer-readable storage medium includes instructions that indicate methods in embodiments of the present application, for example, a method for solving a goal planning problem, a method for selecting nodes, or a method for training a node evaluation model.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that in the various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以
是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including A number of instructions that cause a computer device (that can be a personal computer, server, or network device, etc.) that executes all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application. should be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.
Claims (40)
- 一种目标规划问题的求解方法,其特征在于,所述方法包括:A method for solving a goal planning problem, characterized in that the method includes:获取用户上传的目标规划问题;Obtain the goal planning problems uploaded by users;根据节点评估模型,对所述目标规划问题的候选节点集合进行调整,其中,所述候选节点集合包括多个节点,所述多个节点中的每个节点对应所述目标规划问题的一个待求解的子问题,所述节点评估模型用于预测所述每个节点在多步展开后的界限值的相关量;According to the node evaluation model, the candidate node set of the target planning problem is adjusted, wherein the candidate node set includes a plurality of nodes, and each node in the plurality of nodes corresponds to one of the target planning problems to be solved. As a sub-problem, the node evaluation model is used to predict the correlation quantity of the limit value of each node after multi-step expansion;基于所述调整后的候选节点集合,对所述目标规划问题进行求解,以得到所述目标规划问题的求解结果。Based on the adjusted candidate node set, the target planning problem is solved to obtain a solution result of the target planning problem.
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, further comprising:根据所述用户指示确定所述节点评估模型,所述节点评估模型部署于云管理平台。The node evaluation model is determined according to the user instruction, and the node evaluation model is deployed on the cloud management platform.
- 根据权利要求1或2所述的方法,其特征在于,所述根据节点评估模型,对所述目标规划问题的候选节点集合进行调整,包括:The method according to claim 1 or 2, characterized in that adjusting the candidate node set of the target planning problem according to the node evaluation model includes:根据所述节点评估模型确定第一目标节点;Determine the first target node according to the node evaluation model;生成所述第一目标节点的子节点;Generate child nodes of the first target node;将所述第一目标节点的子节点增加到所述候选节点集合。Add child nodes of the first target node to the candidate node set.
- 根据权利要求1至3中任一项所述的方法,其特征在于,所述根据节点评估模型,对所述目标规划问题的候选节点集合进行调整,还包括:The method according to any one of claims 1 to 3, characterized in that adjusting the candidate node set of the target planning problem according to the node evaluation model further includes:根据所述节点评估模型确定第二目标节点;Determine a second target node according to the node evaluation model;将所述第二目标节点从所述候选节点集合中删除。The second target node is deleted from the candidate node set.
- 根据权利要求1至4中任一项所述的方法,其特征在于,所述每个节点在多步展开后的界限值的相关量包括所述每个节点在多步展开后的界限值与所述每个节点的父节点的界限值之间的差异。The method according to any one of claims 1 to 4, characterized in that the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion and The difference between the bound values of each node's parent node.
- 根据权利要求5所述的方法,其特征在于,所述每个节点在多步展开后的界限值的相关量包括所述每个节点在被完全求解后的界限值与所述每个节点的父节点的界限值之间的差异。The method according to claim 5, characterized in that the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node. The difference between the parent node's limit values.
- 根据权利要求5或6所述的方法,其特征在于,所述第一目标节点在多步展开后的界限值与所述第一目标节点的父节点的界限值之间的差异小于或等于所述多个节点中的所述第一目标节点以外的其他节点在多步展开后的界限值与所述第一目标节点以外的其他节点的父节点的界限值之间的差异。The method according to claim 5 or 6, characterized in that the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to The difference between the limit values of other nodes among the plurality of nodes other than the first target node after multi-step expansion and the limit values of parent nodes of other nodes other than the first target node.
- 根据权利要求5至7中任一项所述的方法,其特征在于,所述第二目标节点属于所述多个节点中的k个节点,所述k个节点在多步展开后的界限值与所述k个节点的父节点的界限值之间的差异大于或等于所述多个节点中的所述k个节点以外的其他节点在多步展开后的界限值与所述k个节点以外的其他节点的父节点的界限值之间的差异,k为大于1的整数,k小于所述多个节点的数量。The method according to any one of claims 5 to 7, characterized in that the second target node belongs to k nodes among the plurality of nodes, and the limit values of the k nodes after multi-step expansion The difference between the limit value of the parent node of the k nodes is greater than or equal to the limit value of other nodes in the plurality of nodes other than the k nodes after multi-step expansion. The difference between the limit values of the parent nodes of other nodes, k is an integer greater than 1, and k is less than the number of the multiple nodes.
- 根据权利要求8所述的方法,其特征在于,所述第二目标节点是基于所述k个节点对应的概率确定的,所述k个节点对应的概率与所述k个节点在多步展开后的界限值与所述k个节点的父节点的界限值之间的差异呈正相关关系。 The method according to claim 8, characterized in that the second target node is determined based on the probabilities corresponding to the k nodes, and the probabilities corresponding to the k nodes are expanded in multiple steps with the k nodes. There is a positive correlation between the difference between the last limit value and the limit value of the parent node of the k nodes.
- 根据权利要求1至9中任一项所述的方法,其特征在于,所述每个节点在多步展开后的界限值包括所述每个节点在多步展开后的松弛解对应的目标函数的函数值。The method according to any one of claims 1 to 9, characterized in that the limit value of each node after multi-step expansion includes an objective function corresponding to the relaxed solution of each node after multi-step expansion. function value.
- 根据权利要求10所述的方法,其特征在于,所述节点评估模型是基于样本节点和所述样本节点对应的标签训练得到的,所述样本节点对应的标签与所述样本节点在多步展开后的界限值相关。The method according to claim 10, characterized in that the node evaluation model is trained based on sample nodes and labels corresponding to the sample nodes, and the labels corresponding to the sample nodes and the sample nodes are expanded in multiple steps related to the final limit value.
- 根据权利要求11所述的方法,其特征在于,所述样本节点对应的标签用于指示所述样本节点在多步展开后的界限值和所述样本节点的父节点的界限值之间的差异。The method according to claim 11, characterized in that the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node. .
- 根据权利要求11所述的方法,其特征在于,所述样本节点对应的标签是根据第一差异和第二差异确定的,所述第一差异为所述样本节点的父节点的界限值与所述样本节点的界限值之间的差异,所述第二差异是通过将所述样本节点的子节点输入至目标评估模型中进行处理后得到的,所述目标评估模型与所述节点评估模型的结构相同。The method according to claim 11, characterized in that the label corresponding to the sample node is determined based on a first difference and a second difference, and the first difference is the limit value of the parent node of the sample node and the The difference between the limit values of the sample nodes, the second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing, the difference between the target evaluation model and the node evaluation model The structure is the same.
- 根据权利要求1至13中任一项所述的方法,其特征在于,所述节点评估模型的输入包括所述每个节点的相关信息或所述每个节点的相关信息的低维表示,所述每个节点的相关信息包括以下至少一项:所述每个节点的目标函数,所述每个节点的约束条件或所述每个节点的决策变量,所述每个节点的相关信息的低维表示是通过特征提取模型对所述每个节点的相关信息进行降维处理得到的。The method according to any one of claims 1 to 13, characterized in that the input of the node evaluation model includes the relevant information of each node or a low-dimensional representation of the relevant information of each node, so The relevant information of each node includes at least one of the following: an objective function of each node, a constraint condition of each node or a decision variable of each node, a low value of the relevant information of each node. The dimensional representation is obtained by performing dimensionality reduction processing on the relevant information of each node through a feature extraction model.
- 根据权利要求1至14中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 14, characterized in that the method further includes:向所述用户返回所述目标规划问题的求解结果。Return the solution result of the goal planning problem to the user.
- 一种选择节点的方法,其特征在于,包括:A method for selecting nodes, characterized by including:获取目标规划问题的候选节点集合,所述候选节点集合包括多个节点,所述多个节点中的每个节点对应所述目标规划问题的一个待求解的子问题;Obtain a candidate node set of the target planning problem, the candidate node set includes a plurality of nodes, each node in the plurality of nodes corresponds to a sub-problem to be solved of the target planning problem;通过节点评估模型预测所述每个节点在多步展开后的界限值的相关量,所述节点评估模型的输出结果用于确定目标节点,所述目标节点用于调整所述候选节点集合,所述调整后的候选节点集合用于对所述目标规划问题进行求解。The correlation quantity of the limit value of each node after multi-step expansion is predicted through the node evaluation model, the output result of the node evaluation model is used to determine the target node, and the target node is used to adjust the candidate node set, so The adjusted candidate node set is used to solve the target planning problem.
- 根据权利要求16所述的方法,其特征在于,所述方法还包括:The method of claim 16, further comprising:根据用户指示确定所述节点评估模型,所述节点评估模型部署于云管理平台。The node evaluation model is determined according to user instructions, and the node evaluation model is deployed on the cloud management platform.
- 根据权利要求16或17所述的方法,其特征在于,所述节点评估模型的输出结果用于确定第一目标节点,所述调整后的候选节点集合中包括所述第一目标节点的子节点。The method according to claim 16 or 17, characterized in that the output result of the node evaluation model is used to determine the first target node, and the adjusted candidate node set includes child nodes of the first target node. .
- 根据权利要求16至18中任一项所述的方法,其特征在于,所述节点评估模型的输出结果用于确定第二目标节点,所述调整后的候选节点集合中不包括所述第二目标节点。The method according to any one of claims 16 to 18, characterized in that the output result of the node evaluation model is used to determine the second target node, and the adjusted candidate node set does not include the second target node. target node.
- 根据权利要求16至19中任一项所述的方法,其特征在于,所述每个节点在多步展开后的界限值的相关量包括所述每个节点在多步展开后的界限值与所述每个节点的父节点的界限值之间的差异。The method according to any one of claims 16 to 19, characterized in that the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after multi-step expansion and The difference between the bound values of each node's parent node.
- 根据权利要求20所述的方法,其特征在于,所述每个节点在多步展开后的界限值的相关量包括所述每个节点在被完全求解后的界限值与所述每个节点的父节点的界限值之间的差异。The method according to claim 20, characterized in that the correlation quantity of the limit value of each node after multi-step expansion includes the limit value of each node after being completely solved and the limit value of each node. The difference between the parent node's limit values.
- 根据权利要求20或21所述的方法,其特征在于,所述第一目标节点在多步展开后的界限值与所述第一目标节点的父节点的界限值之间的差异小于或等于所述多个节点中的所述第一目标节点以外的其他节点在多步展开后的界限值与所述第一目标节点以外的 其他节点的父节点的界限值之间的差异。The method according to claim 20 or 21, characterized in that the difference between the limit value of the first target node after multi-step expansion and the limit value of the parent node of the first target node is less than or equal to The limit values of other nodes other than the first target node among the plurality of nodes after multi-step expansion are the same as the limit values of other nodes other than the first target node. The difference between the bound values of other nodes' parents.
- 根据权利要求20至22中任一项所述的方法,其特征在于,所述第二目标节点属于所述多个节点中的k个节点,所述k个节点在多步展开后的界限值与所述k个节点的父节点的界限值之间的差异大于或等于所述多个节点中的所述k个节点以外的其他节点在多步展开后的界限值与所述k个节点以外的其他节点的父节点的界限值之间的差异,k为大于1的整数,k小于所述多个节点的数量。The method according to any one of claims 20 to 22, characterized in that the second target node belongs to k nodes among the plurality of nodes, and the limit values of the k nodes after multi-step expansion The difference between the limit value of the parent node of the k nodes is greater than or equal to the limit value of other nodes among the plurality of nodes other than the k nodes after multi-step expansion. The difference between the limit values of the parent nodes of other nodes, k is an integer greater than 1, and k is less than the number of the multiple nodes.
- 根据权利要求23所述的方法,其特征在于,所述第二目标节点是基于所述k个节点对应的概率确定的,所述k个节点对应的概率与所述k个节点在多步展开后的界限值与所述k个节点的父节点的界限值之间的差异呈正相关关系。The method according to claim 23, characterized in that the second target node is determined based on the probabilities corresponding to the k nodes, and the probabilities corresponding to the k nodes are expanded in multiple steps with the k nodes. There is a positive correlation between the difference between the last limit value and the limit value of the parent node of the k nodes.
- 根据权利要求16至24中任一项所述的方法,其特征在于,所述每个节点在多步展开后的界限值包括所述每个节点在多步展开后的松弛解对应的目标函数的函数值。The method according to any one of claims 16 to 24, wherein the limit value of each node after multi-step expansion includes an objective function corresponding to the relaxed solution of each node after multi-step expansion. function value.
- 根据权利要求25所述的方法,其特征在于,所述节点评估模型是基于样本节点和样本节点对应的标签训练得到的,所述样本节点对应的标签与所述样本节点在多步展开后的界限值相关。The method according to claim 25, characterized in that the node evaluation model is trained based on sample nodes and labels corresponding to the sample nodes, and the labels corresponding to the sample nodes and the sample nodes after multi-step expansion are Limit value related.
- 根据权利要求26所述的方法,其特征在于,所述样本节点对应的标签用于指示所述样本节点在多步展开后的界限值和所述样本节点的父节点的界限值之间的差异。The method according to claim 26, characterized in that the label corresponding to the sample node is used to indicate the difference between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node. .
- 根据权利要求26所述的方法,其特征在于,所述样本节点对应的标签是根据第一差异和第二差异确定的,所述第一差异为所述样本节点的父节点的界限值与所述样本节点的界限值之间的差异,所述第二差异是通过将所述样本节点的子节点输入至目标评估模型中进行处理后得到的,所述目标评估模型与所述节点评估模型的结构相同。The method according to claim 26, characterized in that the label corresponding to the sample node is determined based on a first difference and a second difference, and the first difference is the limit value of the parent node of the sample node and the The difference between the limit values of the sample nodes, the second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing, the difference between the target evaluation model and the node evaluation model The structure is the same.
- 根据权利要求16至28中任一项所述的方法,其特征在于,所述节点评估模型的输入包括所述每个节点的相关信息或所述每个节点的相关信息的低维表示,所述多个节点的相关信息包括以下至少一项:所述每个节点的目标函数,所述每个节点的约束条件或所述每个节点的决策变量,所述每个节点的相关信息的低维表示是通过特征提取模型对所述每个节点的相关信息进行降维处理得到的。The method according to any one of claims 16 to 28, characterized in that the input of the node evaluation model includes the relevant information of each node or a low-dimensional representation of the relevant information of each node, so The relevant information of the plurality of nodes includes at least one of the following: an objective function of each node, a constraint condition of each node or a decision variable of each node, a low value of the relevant information of each node. The dimensional representation is obtained by performing dimensionality reduction processing on the relevant information of each node through a feature extraction model.
- 一种节点评估模型的训练方法,其特征在于,所述节点评估模型用于预测目标规划问题的候选节点集合中的每个节点在多步展开后的界限值的相关量,所述每个节点对应所述目标规划问题的一个待求解的子问题,所述节点评估模型的输出结果用于确定目标节点,所述目标节点用于调整候选节点集合,所述调整后的候选节点集合用于对所述目标规划问题进行求解,所述训练方法包括:A training method for a node evaluation model, characterized in that the node evaluation model is used to predict the correlation quantity of the limit value of each node in a candidate node set of a target planning problem after multi-step expansion, and each node Corresponding to a sub-problem to be solved of the target planning problem, the output result of the node evaluation model is used to determine the target node, the target node is used to adjust the candidate node set, and the adjusted candidate node set is used to The goal planning problem is solved, and the training method includes:获取样本节点,Get the sample node,获取所述样本节点对应的标签,所述样本节点对应的标签与所述样本节点在多步展开后的界限值相关;Obtain the label corresponding to the sample node, and the label corresponding to the sample node is related to the limit value of the sample node after multi-step expansion;基于所述样本节点和所述样本节点对应的标签对初始模型进行训练,以得到所述节点评估模型。An initial model is trained based on the sample node and the label corresponding to the sample node to obtain the node evaluation model.
- 根据权利要求30所述的方法,其特征在于,所述样本节点在多步展开后的界限值包括所述样本节点在多步展开后的松弛解对应的目标函数的函数值。The method of claim 30, wherein the limit value of the sample node after multi-step expansion includes the function value of the objective function corresponding to the relaxed solution of the sample node after multi-step expansion.
- 根据权利要求30或31所述的方法,其特征在于,所述样本节点对应的标签用于指示所述样本节点在多步展开后的界限值和所述样本节点的父节点的界限值之间的差异。 The method according to claim 30 or 31, characterized in that the label corresponding to the sample node is used to indicate the gap between the limit value of the sample node after multi-step expansion and the limit value of the parent node of the sample node. difference.
- 根据权利要求30或31所述的方法,其特征在于,所述基于所述样本节点和所述样本节点对应的标签对初始模型进行训练,以得到所述节点评估模型,包括:The method according to claim 30 or 31, characterized in that said training an initial model based on the sample node and the label corresponding to the sample node to obtain the node evaluation model includes:通过强化学习的方式对所述初始模型进行训练,以得到所述节点评估模型,其中,所述样本节点对应的标签是根据第一差异和第二差异确定的,所述第一差异为所述样本节点的父节点的界限值与所述样本节点的界限值之间的差异,所述第二差异是通过将所述样本节点的子节点输入至目标评估模型中进行处理后得到的,所述目标评估模型与所述节点评估模型的结构相同。The initial model is trained through reinforcement learning to obtain the node evaluation model, wherein the label corresponding to the sample node is determined based on the first difference and the second difference, and the first difference is the The difference between the limit value of the parent node of the sample node and the limit value of the sample node. The second difference is obtained by inputting the child nodes of the sample node into the target evaluation model for processing, and the second difference is The target evaluation model has the same structure as the node evaluation model.
- 根据权利要求30至33中任一项所述的方法,所述初始模型的输入包括所述样本节点的相关信息或所述样本节点的相关信息的低维表示,所述样本节点的相关信息包括以下至少以下一项:所述样本节点的目标函数,所述样本节点的约束条件或所述样本节点的决策变量,所述样本节点的相关信息的低维表示是通过特征提取模型对所述样本节点的相关信息进行降维处理得到的。According to the method of any one of claims 30 to 33, the input of the initial model includes relevant information of the sample node or a low-dimensional representation of the relevant information of the sample node, and the relevant information of the sample node includes At least one of the following: the objective function of the sample node, the constraint condition of the sample node or the decision variable of the sample node, the low-dimensional representation of the relevant information of the sample node is a feature extraction model for the sample The relevant information of the nodes is obtained by dimension reduction processing.
- 一种目标规划问题的求解装置,其特征在于,包括用于执行如权利要求1至15中任一项所述的方法的单元或模块。A device for solving a goal planning problem, characterized by comprising a unit or module for executing the method according to any one of claims 1 to 15.
- 一种选择节点的求解装置,其特征在于,包括用于执行如权利要求16至29中任一项所述的方法的单元或模块。A solution device for selecting nodes, characterized by comprising a unit or module for executing the method according to any one of claims 16 to 29.
- 一种节点评估模型的训练装置,其特征在于,包括用于执行如权利要求30至34中任一项所述的方法的单元或模块。A training device for a node evaluation model, characterized by comprising a unit or module for executing the method according to any one of claims 30 to 34.
- 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;A computing device cluster, characterized by including at least one computing device, each computing device including a processor and a memory;所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1至15中任一项所述的方法,以使得所述计算设备集群执行如权利要求16至29中任一项所述的方法,或者,以使得所述计算设备集群执行如权利要求30至34中任一项所述的方法。The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the cluster of computing devices executes the method according to any one of claims 1 to 15, so that the The cluster of computing devices performs the method according to any one of claims 16 to 29, or is configured to cause the cluster of computing devices to perform the method according to any one of claims 30 to 34.
- 一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如权利要求1至15中任一项所述的方法,所述计算设备集群执行如权利要求16至29中任一项所述的方法,或者,所述计算设备集群执行如权利要求30至34中任一项所述的方法。A computer-readable storage medium, characterized in that it includes computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster performs the method according to any one of claims 1 to 15. , the computing device cluster performs the method according to any one of claims 16 to 29, or the computing device cluster performs the method according to any one of claims 30 to 34.
- 一种包含指令的计算机程序产品,其特征在于,当所述指令被计算设备集群运行时,使得所述计算设备集群执行如权利要求1至15中任一项所述的方法,使得所述计算设备集群执行如权利要求16至29中任一项所述的方法,或者,使得所述计算设备集群执行如权利要求30至34中任一项所述的方法。 A computer program product containing instructions, characterized in that, when the instructions are executed by a cluster of computing devices, the cluster of computing devices causes the cluster of computing devices to execute the method according to any one of claims 1 to 15, causing the computing The device cluster performs the method according to any one of claims 16 to 29, or causes the computing device cluster to perform the method according to any one of claims 30 to 34.
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210773925 | 2022-07-01 | ||
CN202210773925.4 | 2022-07-01 | ||
CN202211430767.9 | 2022-11-15 | ||
CN202211430767 | 2022-11-15 | ||
CN202211490522.5A CN117371674A (en) | 2022-07-01 | 2022-11-25 | Solving method of target planning problem, method and device for selecting nodes |
CN202211490522.5 | 2022-11-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024001610A1 true WO2024001610A1 (en) | 2024-01-04 |
Family
ID=89383171
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/095590 WO2024001610A1 (en) | 2022-07-01 | 2023-05-22 | Method for solving goal programming problem, node selection method, and apparatus |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024001610A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060112049A1 (en) * | 2004-09-29 | 2006-05-25 | Sanjay Mehrotra | Generalized branching methods for mixed integer programming |
CN111915060A (en) * | 2020-06-30 | 2020-11-10 | 华为技术有限公司 | Processing method and processing device for combined optimization task |
CN114595641A (en) * | 2022-05-09 | 2022-06-07 | 支付宝(杭州)信息技术有限公司 | Method and system for solving combined optimization problem |
-
2023
- 2023-05-22 WO PCT/CN2023/095590 patent/WO2024001610A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060112049A1 (en) * | 2004-09-29 | 2006-05-25 | Sanjay Mehrotra | Generalized branching methods for mixed integer programming |
CN111915060A (en) * | 2020-06-30 | 2020-11-10 | 华为技术有限公司 | Processing method and processing device for combined optimization task |
CN114595641A (en) * | 2022-05-09 | 2022-06-07 | 支付宝(杭州)信息技术有限公司 | Method and system for solving combined optimization problem |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Agl: a scalable system for industrial-purpose graph machine learning | |
Liu et al. | A task scheduling algorithm based on classification mining in fog computing environment | |
Jin et al. | Auto-keras: Efficient neural architecture search with network morphism | |
CN116011510A (en) | Framework for optimizing machine learning architecture | |
US11741375B2 (en) | Capturing the global structure of logical formulae with graph long short-term memory | |
Gungor et al. | Integration search strategies in tree seed algorithm for high dimensional function optimization | |
CN103336791B (en) | Hadoop-based fast rough set attribute reduction method | |
JP5881048B2 (en) | Information processing system and information processing method | |
CN103336790A (en) | Hadoop-based fast neighborhood rough set attribute reduction method | |
US11501111B2 (en) | Learning models for entity resolution using active learning | |
US11366806B2 (en) | Automated feature generation for machine learning application | |
US10978054B1 (en) | Utilizing machine learning models for determining an optimized resolution path for an interaction | |
Li et al. | An optimisation method for complex product design | |
da Silva et al. | GraphEvol: a graph evolution technique for web service composition | |
JP2020166802A (en) | Information processing device and information processing system | |
Zhang | An immune genetic algorithm for simple assembly line balancing problem of type 1 | |
CN116057518A (en) | Automatic query predicate selective prediction using machine learning model | |
Wang et al. | GP-based approach to comprehensive quality-aware automated semantic web service composition | |
Dowson et al. | Bi-objective multistage stochastic linear programming | |
Shi | Cloud manufacturing service recommendation model based on GA-ACO and carbon emission hierarchy | |
Sun et al. | Particle swarm algorithm: convergence and applications | |
WO2024001610A1 (en) | Method for solving goal programming problem, node selection method, and apparatus | |
CN106415525B (en) | Determine that Payload is estimated | |
CN115879824A (en) | Method, device, equipment and medium for assisting expert decision based on ensemble learning | |
Lotov et al. | Launch pad method in multiextremal multiobjective optimization problems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23829795 Country of ref document: EP Kind code of ref document: A1 |