WO2023207630A1 - 一种任务求解方法及其装置 - Google Patents

一种任务求解方法及其装置 Download PDF

Info

Publication number
WO2023207630A1
WO2023207630A1 PCT/CN2023/088333 CN2023088333W WO2023207630A1 WO 2023207630 A1 WO2023207630 A1 WO 2023207630A1 CN 2023088333 W CN2023088333 W CN 2023088333W WO 2023207630 A1 WO2023207630 A1 WO 2023207630A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
node
nodes
solving
child
Prior art date
Application number
PCT/CN2023/088333
Other languages
English (en)
French (fr)
Inventor
陈琼
孙银磊
周杰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023207630A1 publication Critical patent/WO2023207630A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations

Definitions

  • the present application relates to the field of scheduling, and in particular to a task solving method and its device.
  • Scheduling is one of the most common problems in large-scale manufacturing, logistics, production and other links. Scheduling always has different meanings in different scenarios. For example: Logistics scheduling mainly refers to the reasonable arrangement and dispatch of vehicles and personnel by logistics companies based on the weight, destination, specifications, urgency, etc. of the goods to be shipped during the logistics process.
  • Linear programming is the most widely used modeling method.
  • Mixed integer programming is a type of solver field and is widely used in cloud computing, finance, manufacturing and other fields.
  • the LP problem is a problem of finding the minimum value of the objective function given a specific set of linear constraints.
  • the MIP problem is to add integer constraints based on LP that some or all variables are integers.
  • MIP solution has only one root node in the initial state (corresponding to the initial mixed integer programming task), and new child nodes to be solved will be continuously generated through branches during the solution process. It can be seen that different nodes in the MIP tree search can be processed independently and in parallel, and have natural parallelism. That is, there are parallel opportunities in multiple dimensions for the search of this tree, such as subtree granularity, node granularity, intra-node granularity, etc.
  • This application discloses a task solving method.
  • the parallel mode is determined based on the information of the search tree.
  • the parallel mode can be dynamically selected. Compared with the fixed parallel mode, the MIP solving efficiency can be improved and the calculation can be shortened. time.
  • this application provides a method for solving a task.
  • the method includes: obtaining a search tree.
  • the search tree is obtained by solving a mixed integer planning task through branch and bound.
  • the search tree includes a first A plurality of sub-nodes including a sub-node and a second sub-node, each sub-node corresponds to a sub-task, the first sub-node and the second sub-node are the sub-nodes to be solved among the plurality of sub-nodes; according to the
  • the search tree information is used to determine a target parallel mode from multiple parallel modes, and the first sub-node and the second sub-node are solved in parallel on the first device and the second device according to the target parallel mode.
  • At least one; wherein the search tree information includes at least one of first information and second information; the first information is related to the number of child nodes on the search tree, and the second information is related to the search tree Related to the connection relationship between the child nodes on.
  • the plurality of sub-nodes may also include other sub-nodes in addition to the first sub-node and the second sub-node. It has child nodes to be solved.
  • the entire process remains in the same parallel mode.
  • the so-called parallelism can be understood as the existence of multiple computing devices (or described as instances), and the multiple computing devices perform the solving process of sub-nodes synchronously.
  • different parallel modes have different solving efficiency and solving accuracy. For example, for search trees whose subtrees are generally shorter in length, the solution process performed at the node or intra-node granularity is more efficient. For search trees whose subtrees are generally longer in length, the solution process is more efficient. , the efficiency of the solution process based on subtrees is higher. And as the solution process proceeds, the structure of the search tree is also changing. For example, the length of the subtrees included in part of the search tree is generally longer, and the length of the subtrees included in another part of the search tree is generally shorter.
  • the parallel mode is determined based on the search tree information, and the parallel mode can be dynamically selected. Compared with the fixed parallel mode, the MIP solving efficiency can be improved and the calculation time can be shortened.
  • the multiple parallel modes include at least one of the following modes: solving multiple sub-nodes included in the first subtree where the first sub-node is located on the first device, and solving on the first device Solve multiple sub-nodes included in the second sub-tree where the second sub-node is located on the second device; the first sub-tree is obtained by solving the first sub-node through branch and bound, and the second sub-tree is obtained by solving the first sub-node through branch and bound. The subtree is obtained by solving the second sub-node through branch and bound; or, solving the first sub-node on the first device without solving other sub-nodes other than the first sub-node.
  • the multiple parallel modes may also include other parallel solving modes, which are not limited here.
  • the first planning algorithm and the second planning algorithm are different linear programming algorithms.
  • the search tree information includes: the number of child nodes on the search tree, or the connection relationship between the child nodes on the search tree.
  • the first information includes: the number of sub-nodes at each depth in multiple depths on the search tree, or the number of sub-nodes to be solved on the search tree, where:
  • the mixed integer programming task corresponds to the root node on the search tree, and the depth represents the distance between child nodes and the root node.
  • solving the first sub-node or the second sub-node in parallel on the first device and the second device according to the target parallel mode includes: The first sub-node or the second sub-node is solved in parallel on the first device and the second device to obtain an updated search tree; the updated search tree includes a third sub-node and a fourth sub-node, The third sub-node and the fourth sub-node are new sub-nodes to be solved after solving the first sub-node or the second sub-node; the method also includes: according to the updated Search tree information, determine a target parallel mode from a plurality of parallel modes, and solve at least one of the third child node and the fourth child node in parallel on the first device and the second device according to the target parallel mode.
  • the plurality of parallel modes include at least one of the following modes: solving multiple sub-nodes included in the third sub-tree where the third sub-node is located through branch and bound on the first device, and Solve the fourth subtree where the fourth subnode is located on the second device It includes multiple sub-nodes; the third sub-tree is obtained by solving the third sub-node through branch and bound, and the fourth sub-tree is obtained by solving the fourth sub-node through branch and bound.
  • solving for a plurality of sub-nodes included in the first sub-tree where the first sub-node is located includes: solving a preset number of sub-trees included in the first sub-tree where the first sub-node is located. child node; or, within a preset time, solve multiple child nodes included in the first subtree where the first child node is located.
  • solving for a plurality of sub-nodes included in the second sub-tree where the second sub-node is located includes: solving a preset number of sub-trees included in the second sub-tree where the second sub-node is located. child node; or, within a preset time, solve multiple child nodes included in the second subtree where the second child node is located.
  • determining the target parallel mode from multiple parallel modes based on the information of the search tree includes: using the information of the search tree, using a neural network model to determine the target parallel mode from the multiple parallel modes. Determine the target parallel mode.
  • this application provides a task solving device, which includes:
  • An acquisition module is used to acquire a search tree.
  • the search tree is obtained by solving a mixed integer planning task through branch and bound.
  • the search tree includes a plurality of sub-nodes including a first sub-node and a second sub-node, Each sub-node corresponds to a sub-task, and the first sub-node and the second sub-node are sub-nodes to be solved among the plurality of sub-nodes;
  • a parallel scheduling module configured to determine a target parallel mode from multiple parallel modes according to the information of the search tree, and solve the first sub-node in parallel on the first device and the second device according to the target parallel mode; At least one of the second child nodes; wherein the information of the search tree includes at least one of first information and second information; the first information is related to the number of child nodes on the search tree, The second information is related to the connection relationship between the sub-nodes on the search tree.
  • the multiple parallel modes include at least one of the following modes:
  • the first subtree is obtained by solving the first subnode through branch and bound
  • the second subtree is obtained by solving the second subnode through branch and bound;
  • the first child node is solved on the first device without solving other child nodes other than the first child node, and the second child node is solved on the second device without solving the second child node.
  • the first child node is solved on the first device, and through the second planning algorithm, on The first child node is solved in parallel on the second device.
  • the first information includes:
  • the depth represents the distance between child nodes and the root node.
  • the parallel scheduling module is specifically used to:
  • the updated search tree includes a third sub-node node and the fourth sub-node, the third sub-node and the fourth sub-node are new sub-nodes to be solved after solving the first sub-node or the second sub-node;
  • the method also includes:
  • a plurality of sub-nodes included in the third sub-tree where the third sub-node is located are solved through branch and bound
  • a plurality of sub-nodes included in the fourth sub-tree where the fourth sub-node is located are solved.
  • the third sub-node is solved on the first device without solving other sub-nodes other than the third sub-node, and the fourth sub-node is solved on the second device without solving the third sub-node. child nodes beyond the four child nodes; or,
  • the third sub-node is solved on the first device by a first planning algorithm, and the third sub-node is solved in parallel on the second device by a second planning algorithm.
  • the parallel scheduling module is specifically used to:
  • the parallel scheduling module is specifically used to:
  • the parallel scheduling module is specifically used to:
  • the target parallel mode is determined from multiple parallel modes through the neural network model.
  • the first planning algorithm and the second planning algorithm are different linear programming algorithms.
  • embodiments of the present application provide a device, including a memory, a processor, and a bus system, wherein the memory is used to store programs, and the processor is used to execute the programs in the memory to perform the above-mentioned first aspect and the first aspect. any optional method.
  • embodiments of the present invention further provide a system, which includes at least one processor, at least one memory, and at least one communication interface; the processor, memory, and communication interface are connected through a communication bus and communicate with each other;
  • the memory is used to store the application code that executes the above scheme, and the processor controls the execution.
  • the processor is configured to execute the application code stored in the memory to obtain task scheduling results; wherein the code stored in the memory can execute one of the task solving methods provided above.
  • a communication interface used to communicate with other devices or communication networks to send the task solution results to the devices or communication networks.
  • embodiments of the present application provide a computer-readable storage medium that stores a computer program that, when run on a computer, causes the computer to execute the first aspect and any one of the above-mentioned aspects.
  • Optional method
  • embodiments of the present application provide a computer-readable storage medium that stores one or more instructions that, when executed by one or more computers, cause the one or more The computer implements the above second aspect and any optional system thereof.
  • embodiments of the present application provide a computer program that, when run on a computer, causes the computer to execute the above-mentioned first aspect and any optional method thereof.
  • the present application provides a chip system that includes a processor to support a terminal device or server to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods; or ,information.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the terminal device or server.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Figure 1 is a schematic diagram of an application architecture provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of the architecture of a server provided by an embodiment of the present application.
  • Figure 3a is a schematic diagram of a search tree provided by an embodiment of the present application.
  • Figure 3b is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Figure 4 is a schematic flow chart of a task solving method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of parallel processing provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of parallel processing provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of parallel processing provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of an architecture provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of an architecture provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of an architecture provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a task solving device provided in this embodiment.
  • Figure 12 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Embodiments of the present application can be applied to solving linear programming optimization problems in various scenarios (such as supply chain, cloud computing, scheduling, storage optimization, finance, etc.) to accelerate the efficiency of linear programming solvers in solving these problems.
  • Figure 1 is a schematic diagram of the application structure provided by the embodiment of the present application.
  • the task solving method provided by the present application can be deployed as a solver on a server on the cloud side.
  • the terminal device can convert the model to be solved (such as the model in the embodiment of the present application) Mixed integer programming tasks) are passed to the server on the cloud side.
  • the server on the cloud side can solve the model to be solved based on its own deployed solver and pass the solution results to the terminal device.
  • users can build a model to be solved based on their own business scenarios. When solving, they can transfer some historical models of similar problems to the server, and the server can call the solver to quickly output the optimal solution to the model input by the user. Based on this solution, users can use the functions provided by the platform to generate data reports or process it themselves to obtain the desired results.
  • FIG. 2 is a schematic diagram of the architecture of a server provided by an embodiment of the present application.
  • the server 200 is implemented by one or more servers.
  • the server 200 may vary greatly due to different configurations or performance, and may include one or more central processing units (CPU) 22 (for example, one or more one or more processors) and memory 232, one or more storage media 230 (eg, one or more mass storage devices) that stores applications 242 or data 244.
  • the memory 232 and the storage medium 230 may be short-term storage or persistent storage.
  • the program stored in the storage medium 230 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
  • the central processor 22 may be configured to communicate with the storage medium 230 and execute a series of instruction operations in the storage medium 230 on the server 200 .
  • the server 200 may also include one or more power supplies 222, one or more wired or wireless network interfaces. port 250, one or more input and output interfaces 258; or, one or more operating systems 241, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the central processor 22 is used to execute the task solving method described in the embodiment of the present application.
  • the task solving method provided by the embodiment of the present application can also be deployed as a solver on the terminal device on the end side, which is not limited here.
  • the model to be solved in the embodiment of this application can be used to solve the scheduling problem.
  • Scheduling problems are one of the most common problems in large-scale manufacturing, logistics, production and other links, and scheduling always has different meanings in different scenarios.
  • Logistics scheduling mainly refers to the reasonable arrangement and dispatch of vehicles and personnel by logistics companies based on the weight, destination, specifications, urgency, etc. of the goods to be shipped during the logistics process.
  • Scheduling in a production environment is to complete the sorting of tasks and the matching between tasks and production equipment in several tasks (jobs) based on the production capacity and production needs of different machines in different production lines. That is, multiple tasks are assigned to production equipment in each production line.
  • n workpieces are processed on m machines.
  • Each workpiece has a specific processing technology.
  • the order of processing each workpiece and the time spent on each process are given.
  • this type of scheduling problem requires that each task must be executed to each stage in turn. It does not involve the matching of tasks and stages, but mainly determines the execution order of tasks. Prevent the overall completion time from being too long due to too long waiting time in the middle.
  • Resources can refer to virtual computing resources, such as threads, processes, or data streams; they can also refer to hardware resources, such as processors, network connections, or expansion cards.
  • the program that schedules work is called a scheduler. Schedulers are typically implemented so that all computing resources are busy (in load balancing), allowing multiple users to efficiently share system resources simultaneously, or to achieve a specified quality of service.
  • Linear programming is the most widely used modeling method.
  • the simplex method is currently the most widely used algorithm, and it is also a type of algorithm that is optimized more frequently by various linear programming solvers.
  • MIP Mixed integer programming
  • the LP problem is a problem of finding the minimum value of the objective function given a specific set of linear constraints.
  • the MIP problem is to add integer constraints based on LP that some or all variables are integers.
  • Branch and bound is a The most commonly used algorithm for solving integer programming problems.
  • the branch and bound method is a search and iterative method that selects different branch variables and sub-problems for branching.
  • the entire feasible solution space is repeatedly divided into smaller and smaller subsets (which can also be called sub-nodes in the embodiment of this application), which are called branches; and a target lower bound is calculated for the solution set in each subset. (For minimum value problems) this is called delimitation.
  • delimitation After each branch, those subsets whose boundaries exceed the target value of the known feasible solution set will not branch further. In this way, many subsets can be ignored, which is called pruning.
  • the calculation model of branch and bound is essentially a branch and bound tree search based on a queue.
  • the queue stores the nodes currently to be solved.
  • the execution process is as follows:
  • MIP solution has only one root node in the initial state (corresponding to the initial mixed integer programming task).
  • new nodes to be solved will be continuously generated through branches.
  • the generated dynamic search tree structure can be shown in Figure 3a (where , the black ones are nodes to be solved, and the white ones are nodes that have been solved). It can be seen that different nodes in the MIP tree search can be processed independently and in parallel, and have natural parallelism. That is, there are parallel opportunities in multiple dimensions for the search of this tree, such as subtree granularity, node granularity, intra-node granularity, etc.
  • the MIP solution process only supports one parallel strategy, and the acceleration effect of the MIP solution process is limited.
  • Figure 3b is a schematic diagram of the system architecture provided by an embodiment of the present application.
  • the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550 and a data collection system 560.
  • the execution device 510 includes a computing module 511, an I/O interface 512, a preprocessing module 513 and a preprocessing module 514.
  • the target model/rule 501 may be included in the calculation module 511, and the preprocessing module 513 and the preprocessing module 514 are optional.
  • Data collection device 560 is used to collect training data.
  • the training data can be search tree information, solution results, and the amount of resources consumed in the solution process, etc.
  • the data collection device 560 After collecting the training data, the data collection device 560 stores the training data in the database 530.
  • the training device 520 trains to obtain the target model/rule 501 (such as the neural network model in the embodiment of the present application) based on the training data maintained in the database 530. .
  • the training data maintained in the database 530 may not necessarily be collected by the data collection device 560, but may also be received from other devices.
  • the training device 520 does not necessarily perform training of the target model/rules 501 based entirely on the training data maintained by the database 530. It may also obtain training data from the cloud or other places for model training.
  • the above description should not be regarded as a limitation of this application. Limitations of Examples.
  • the target model/rules 501 trained according to the training device 520 can be applied to different systems or devices, such as to the execution device 510 shown in Figure 3b.
  • the execution device 510 can be a terminal, such as a mobile phone terminal, a tablet computer, Laptops, augmented reality (AR)/virtual reality (VR) devices, vehicle-mounted terminals, etc., or servers or clouds, etc.
  • execution device 510 configures input/output
  • the I/O interface 512 is used for data interaction with external devices. The user can input data to the I/O interface 512 through the client device 540.
  • the preprocessing module 513 and the preprocessing module 514 are used to perform preprocessing according to the input data received by the I/O interface 512. It should be understood that there may be no preprocessing module 513 and 514 or only one preprocessing module. When the preprocessing module 513 and the preprocessing module 514 do not exist, the computing module 511 can be directly used to process the input data.
  • the execution device 510 When the execution device 510 preprocesses input data, or when the calculation module 511 of the execution device 510 performs calculations and other related processes, the execution device 510 can call data, codes, etc. in the data storage system 550 for corresponding processing. , the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 550.
  • the I/O interface 512 presents the processing results to the client device 540, thereby providing them to the user.
  • the user can manually set the input data, and the "manually set input data" can be operated through the interface provided by the I/O interface 512 .
  • the client device 540 can automatically send input data to the I/O interface 512. If requiring the client device 540 to automatically send the input data requires the user's authorization, the user can set corresponding permissions in the client device 540. The user can view the results output by the execution device 510 on the client device 540, and the specific presentation form may be display, sound, action, etc.
  • the client device 540 can also be used as a data collection terminal to collect the input data of the input I/O interface 512 and the output results of the output I/O interface 512 as new sample data, and store them in the database 530.
  • the I/O interface 512 directly uses the input data input to the I/O interface 512 and the output result of the output I/O interface 512 as a new sample as shown in the figure.
  • the data is stored in database 530.
  • Figure 3b is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 550 is an external memory relative to the execution device 510. In other cases, the data storage system 550 can also be placed in the execution device 510.
  • execution device 510 can also be deployed in the client device 540.
  • Linear Programming It is an important branch of operations research that has been studied earlier, developed faster, is widely used, and has more mature methods. It is a mathematical method that assists people in scientific management. Mathematical theory and method for studying the extreme value problem of linear objective function under linear constraints.
  • Constraints are constraints in mathematical programming problems, that is, numerical requirements for decision variables.
  • Function instance The entire isolation environment of a function in function computing. For example, if a container is used as a function isolation method, a function instance is the container isolation environment that contains the complete running environment of the function.
  • Subtree parallelism Multiple function instances/threads calculate multiple subtrees in the same search tree in parallel, and each function instance/thread is independent of each other.
  • Node parallelism Multiple function instances/threads calculate multiple sub-nodes in the same search tree in parallel, and each function instance/thread is independent of each other.
  • Intra-node parallelism parallelize calculations within a single node, such as solving LP problems in parallel and using different startups in parallel. hair-raising algorithms, using different cutting plane algorithms in parallel, etc.
  • Figure 4 is a task solving method provided by an embodiment of the present application. The method includes:
  • the search tree is obtained by solving the mixed integer planning task through branch and bound.
  • the search tree includes a plurality of child nodes including a first child node and a second child node. Each child node Corresponding to a subtask, the first subnode and the second subnode are subnodes to be solved among the plurality of subnodes.
  • the execution subject of step 401 may be a server or a terminal device.
  • the terminal device can pass the mixed integer planning task to the server as a model to be solved, and then the server can obtain the mixed integer planning task.
  • the mixed integer programming task may include an objective function and constraints, where the objective function refers to a function designed based on the objective to be optimized and variables that affect the objective.
  • the objective function refers to a function designed based on the objective to be optimized and variables that affect the objective.
  • the goal of the entire production schedule is usually to find the best processing plan while satisfying all resource constraints, so that the demand satisfaction rate is the highest and the overall cost is the minimum (for example, the cost can include (but not limited to processing costs, inventory costs, and transshipment costs), in this case, the objective function may be a function used to maximize the satisfaction rate and minimize the cost.
  • constraints refer to other restrictions that must be met in the process of solving the objective function.
  • at least one variable to be solved is constrained to be an integer.
  • the mixed integer planning task is used to allocate scheduling resources to at least one to-be-scheduled task
  • the first planning constraint is a constraint satisfied by the scheduling resource
  • the scheduling resource is a production line, production equipment or production factory.
  • the tasks to be scheduled may be products to be produced, and in a personnel scheduling scenario, the tasks to be scheduled may be people to be produced, etc. This is not limited by the embodiments of this application.
  • each of the multiple schedulable resource groups can be a production line.
  • each of the multiple schedulable resource groups can be a production line.
  • a production line for mobile phone components for example, can be a battery production line, a casing production line, a chip production line, etc.
  • each schedulable resource group can include multiple schedulable resources, and each of the multiple schedulable resources can The scheduling resource is the production equipment in the production line.
  • the battery production line may include multiple battery production equipment
  • the casing production line may include multiple casing production equipment, which is not limited here.
  • each schedulable resource group among the multiple schedulable resource groups can be a time period.
  • each schedulable resource group among the multiple schedulable resource groups can be A day, for example, can be Monday, Tuesday, Wednesday or a certain day in some months, etc.
  • each schedulable resource group can include multiple schedulable resources, and each schedulable resource in the multiple schedulable resources is The sub-time period in the time period, for example, a certain day may include multiple hours, multiple minutes, or other multiple sub-time periods, which is not limited here.
  • the solver when the solver performs branch-and-bound processing on a mixed integer programming task, the solver (such as the coordinator node of the solver) can be responsible for the steps in the branch-and-bound process that need to be performed serially, for example,
  • the coordinator can have a built-in queue of nodes to be solved and is responsible for task allocation and control of the entire solution process (for example For example, it can perform functions such as preprocessing, node selection, upper and lower bound checks, and maintain node queues based on local status).
  • the mixed integer planning task can be branched and bounded, and then the mixed integer planning task can be solved, split, and pruned, thereby forming a search tree including multiple child nodes, where in the search tree
  • the multiple sub-nodes of can include multiple solved sub-nodes and multiple sub-nodes to be solved. There is a connection relationship between each child node based on the split relationship, thereby forming a search tree.
  • the child nodes to be solved may include child node 1 and child node 2
  • the computing device may include computing device 1 and computing device 2.
  • computing device 1 solves child node 1 and obtains the solution result of child node 1 and the update result of the child node (determines whether to split the child node based on the solution result)
  • the solution results and the update results of the sub-nodes can be transferred to the coordinator node of the solver to complete the solution of the current round.
  • the solution result and the update result of the child node can be It is passed to the coordinator node of the solver to complete the solution of the current round.
  • the solving process of computing device 1 to solve child node 1 and the solving process of computing device 2 to solve child node 2 are performed in parallel.
  • computing device 1 solves child node 1 and multiple child nodes obtained by splitting child node 1 (or splitting child nodes again) until the solution time reaches After the preset value or the number of solved child nodes reaches the preset value, the solution results of multiple child nodes and the update results of the child nodes can be obtained (it is determined whether to split the child nodes based on the solution results), and the solution results and the child nodes can be The update results are passed to the coordinator node of the solver to complete the solution of the current round.
  • the computing device 2 solves the child node 2 and the multiple child nodes obtained by splitting the child node 2 (or the split child nodes are split again) until the solving time reaches the preset value or the number of solved child nodes reaches the preset value. After the value, the solution results of multiple child nodes and the update results of the child nodes can be obtained (based on the solution results, it is determined whether to split the child nodes). The solution results and the update results of the child nodes can be passed to the coordinator node of the solver, and then Complete the current round of solution.
  • the solving process in which the computing device 1 solves multiple sub-nodes including sub-node 1 and the solving process in which the computing device 2 solves the multiple sub-nodes including sub-node 2 are performed in parallel.
  • the subtree where the child node is located can be understood as multiple child nodes obtained by splitting the child node (or splitting the child node again).
  • computing device 1 solves child node 1 through linear programming algorithm 1 .
  • computing device 2 solves child node 1 through linear programming algorithm 2 (different from linear programming algorithm 1).
  • computing device 1 and computing device 2 are the first to obtain the solution result of child node 1 and the update result of the child node (determine whether to split the child node based on the solution result)
  • the solution result and the update result of the child node can be transferred to the solution
  • the coordination node coordinator of the controller is used to complete the solution of the current round.
  • the computing device 1 solves the child node 1 through the linear programming algorithm 1 and the computing device 2 solves the child node 1 through the linear programming algorithm 2 in parallel.
  • the parallel mode in order to improve the solving efficiency and accuracy of the mixed integer planning task, can be dynamically updated based on the information of the search tree during the solving process.
  • the solver (for example, the coordinator node of the solver) can obtain information about the search tree.
  • the information about the search tree can describe the structural characteristics of the sub-nodes on the search tree, where the structural characteristics can include sub-nodes.
  • the search tree information may include: the number of child nodes on the search tree, or the connection relationship between the child nodes on the search tree.
  • the first information specifically includes: the number of child nodes at each depth in multiple depths on the search tree, or the number of child nodes to be solved on the search tree, where, The mixed integer programming task corresponds to the root node on the search tree, and the depth represents the distance between child nodes and the root node.
  • the number of nodes to be solved on the tree to be searched at the current moment can be used as the information of the search tree.
  • ⁇ k can represent the number of nodes to be solved on the dynamic search tree, such as the number of black nodes in Figure 3a.
  • the number of nodes at each depth level on the tree to be searched at the current moment can be used as the information of the search tree.
  • W d can represent the number of nodes in the dth layer of the search tree
  • the depth here can be understood as the distance between the child nodes and the root node; for example, the depth of the two leftmost child nodes to be solved in Figure 3a is 4 (that is, three children are separated from the root node). node, the default depth of the root node is 0).
  • the information of the search tree includes the number of nodes at multiple depths, and the number of depths is also less than a preset value, that is, selecting child nodes whose generation time is closer to the current time is selected.
  • the depth of the dynamic search tree is constantly increasing, that is, the historical information is constantly increasing.
  • the sliding window optimization idea can be used. That is, only the historical information within the window size h is taken as the search tree information and input into the neural network to solve the problem of possible explosion of input dimensions.
  • the information of the search tree determine a target parallel mode from multiple parallel modes, and solve the first sub-node and the second second device in parallel on the first device and the second device according to the target parallel mode.
  • At least one of the child nodes wherein the information of the search tree includes at least one of first information and second information; the first information is related to the number of child nodes on the search tree, and the second The information is related to the connection relationships between child nodes on the search tree.
  • the multiple parallel modes include at least one of the following modes:
  • the first child node is solved on the first device without solving other child nodes other than the first child node, and the second child node is solved on the second device without solving the second child node.
  • the first child node is solved on a first device by a first planning algorithm, and the first child node is solved in parallel on a second device by a second planning algorithm.
  • the parallel mode of the next round of solving process can be determined based on the information of the search tree.
  • the child nodes to be solved may include child node 1 and child node 2
  • the computing device may include computing device 1 and computing device 2.
  • computing device 1 solves child node 1 and obtains the solution result of child node 1 and the update result of the child node (determines whether to split the child node based on the solution result)
  • the solution results and the update results of the sub-nodes can be transferred to the coordinator node of the solver to complete the solution of the current round.
  • the solution result and the update result of the child node can be It is passed to the coordinator node of the solver to complete the solution of the current round.
  • the solving process of computing device 1 to solve child node 1 and the solving process of computing device 2 to solve child node 2 are performed in parallel.
  • Figure 6 is a parallel diagram with node granularity, in which the Coordinator selects n nodes from the node queue to be calculated and configures n function instances; the Coordinator sets the function instance to only calculate 1 node, that is After calculating the received nodes, split two child nodes and insert them into the Coordinator node queue to be calculated; after all function instances are calculated, synchronize their respective status information to the Coordinator;
  • computing device 1 solves child node 1 and multiple child nodes obtained by splitting child node 1 (or splitting child nodes again) until the solution time reaches After the preset value or the number of solved child nodes reaches the preset value, the solution results of multiple child nodes and the update results of the child nodes can be obtained (it is determined whether to split the child nodes based on the solution results), and the solution results and the child nodes can be The update results are passed to the coordinator node of the solver to complete the solution of the current round.
  • the computing device 2 solves the child node 2 and the multiple child nodes obtained by splitting the child node 2 (or the split child nodes are split again) until the solving time reaches the preset value or the number of solved child nodes reaches the preset value. After the value, the solution results of multiple child nodes and the update results of the child nodes can be obtained (based on the solution results, it is determined whether to split the child nodes). The solution results and the update results of the child nodes can be passed to the coordinator node of the solver, and then Complete the current round of solution.
  • the solving process in which the computing device 1 solves multiple sub-nodes including sub-node 1 and the solving process in which the computing device 2 solves the multiple sub-nodes including sub-node 2 are performed in parallel.
  • the subtree where the child node is located can be understood as multiple child nodes obtained by splitting the child node (or splitting the child node again).
  • Figure 5 is a parallel diagram with a subtree as the granularity.
  • the Coordinator selects n nodes from the node queue to be calculated and configures n function instances; the Coordinator sets the calculation time of the function instance to receive arrive The corresponding node is the root node. Calculate multiple nodes in the subtree within the set calculation time (or number), and then insert the remaining uncalculated child nodes into the Coordinator node queue to be calculated; wait for all function instances After the calculation is completed, synchronize their respective status information to the Coordinator.
  • computing device 1 solves child node 1 through linear programming algorithm 1 .
  • computing device 2 solves child node 1 through linear programming algorithm 2 (different from linear programming algorithm 1).
  • computing device 1 and computing device 2 are the first to obtain the solution result of child node 1 and the update result of the child node (determine whether to split the child node based on the solution result)
  • the solution result and the update result of the child node can be transferred to the solution
  • the coordination node coordinator of the controller is used to complete the solution of the current round.
  • the computing device 1 solves the child node 1 through the linear programming algorithm 1 and the computing device 2 solves the child node 1 through the linear programming algorithm 2 in parallel.
  • Figure 7 is a parallel diagram with intra-node granularity.
  • the Coordinator selects n nodes from the node queue to be calculated and configures n function instances; the Coordinator sets the function instance to only calculate 1 node.
  • invoke k sub-function instances to accelerate the calculation of modules within the node (such as LP) in parallel until the node is calculated and split into two child nodes and inserted into the Coordinator node queue to be calculated; until all function instances are calculated After completion, synchronize their respective status information to the Coordinator;
  • the first sub-node or the second sub-node can be solved in parallel on the first device and the second device according to the target parallel mode to obtain the solution result and the newly added sub-node.
  • the newly added child nodes are used to update the search tree to obtain an updated search tree;
  • the updated search tree includes a third child node and a fourth child node,
  • the third sub-node and the fourth sub-node are sub-nodes to be solved on the updated search tree;
  • the target parallel mode can be determined from multiple parallel modes according to the updated search tree information, and the third device can be solved in parallel on the first device and the second device according to the target parallel mode.
  • Child node or the fourth child node; wherein, the plurality of parallel modes include at least two of the following modes:
  • a plurality of sub-nodes included in the third sub-tree where the third sub-node is located are solved through branch and bound
  • a plurality of sub-nodes included in the fourth sub-tree where the fourth sub-node is located are solved.
  • the third sub-node is solved on the first device without solving other sub-nodes other than the third sub-node, and the fourth sub-node is solved on the second device without solving the third sub-node. child nodes beyond the four child nodes; or,
  • the third sub-node is solved on the first device by a first planning algorithm, and the third sub-node is solved in parallel on the second device by a second planning algorithm.
  • the above-mentioned action of determining a target parallel mode from multiple parallel modes according to the information of the search tree can be implemented through a pre-trained neural network model, that is, according to the search tree
  • the tree information is passed through the neural network model to determine the target parallel mode from multiple parallel modes.
  • the input of the neural network model can be the information of the search tree, and the output can be the parallel mode and the number of parallelism.
  • the neural network model in the embodiment of this application will be introduced with a specific example.
  • the definition of indicators and loss functions for evaluating decision-making during neural network model training can include:
  • the first is the change of primary bound after decision-making, and the second is the change of dual bound.
  • t represents the elapsed time
  • N represents the amount of resources consumed, which means the corresponding gap within the unit resource per unit time. changes, the bigger the better for both.
  • the objective function is generally minimized.
  • An optional definition is to take the reciprocal of the above gap and then sum it up as the loss function of the machine learning model.
  • the trained machine learning model can be used to perform real-time inference and decision-making on parallel strategies based on the input features collected by the Coordinator module.
  • An embodiment of the present application provides a method for solving a task.
  • the method includes: obtaining a search tree.
  • the search tree is obtained by solving a mixed integer planning task through branch and bound.
  • the search tree includes a first child node. and a plurality of sub-nodes including the second sub-node, each sub-node corresponds to a sub-task, the first sub-node and the second sub-node are the sub-nodes to be solved among the plurality of sub-nodes; according to the search tree information, determine a target parallel mode from multiple parallel modes, and solve at least one of the first child node and the second child node in parallel on the first device and the second device according to the target parallel mode.
  • the information of the search tree includes at least one of first information and second information; the first information is related to the number of child nodes on the search tree, and the second information is related to the number of child nodes on the search tree. related to the connection relationship between child nodes.
  • the parallel mode can be determined based on the information of the search tree. Dynamically selecting the parallel mode can improve the MIP solution efficiency and shorten the calculation time compared to the fixed parallel mode.
  • Figure 8 is a schematic diagram of a software architecture of the task solving method according to the embodiment of the present application. As shown in Figure 8, it may specifically include:
  • the user submits a MIP solution task, which defines the constraints of the MIP problem and the objective function and other information.
  • the Coordinator After receiving the task, the Coordinator performs a series of processing such as reading, preprocessing, and root node calculation.
  • the Coordinator collects information such as the dynamic search tree topology structure at the current moment and the number of nodes currently to be solved as input features to the Scheduler module.
  • the Scheduler module provides the above input feature information to the built-in machine learning model, infers the parallel strategy to be adopted at the next moment in real time, including parallel dimensions and parallelism, and sends the results to the Coordinator module.
  • the Coordinator module performs corresponding actions according to the received parallel strategy.
  • the Coordinator module determines whether the optimal solution is currently found or all nodes on the dynamic search tree have been calculated, that is, the queue of nodes to be calculated is empty; if so, the calculation ends and goes to step (7). If not, go to step (7). Step (3), continue iteration.
  • the serial module can perform: presolving, node selection, and bound checking
  • the parallel module can execute: node processing (node processing), branching (branching);
  • Coordinator stateful can execute: built-in queue of nodes to be solved (queue), responsible for task (e.g., MIP problem) allocation and control of the entire solution process, carrying the above serial modules;
  • LP&Primal Heuristic (stateless) can be executed: no built-in state is required, and the LP & heuristic module calculations in node processing are respectively carried;
  • Each function instance creates and stores a local copy, such as LP data, and the calculation process is based on the copy;
  • Global shared data Use the data system to store shared data, such as global tree, etc., and control concurrent access conflicts through locking;
  • Figure 11 is a structural representation of a task solving device provided by an embodiment of the present application.
  • the device 1100 may include:
  • the acquisition module 1101 is used to acquire a search tree.
  • the search tree is obtained by solving a mixed integer planning task through branch and bound.
  • the search tree includes a plurality of child nodes including a first child node and a second child node. , each sub-node corresponds to a sub-task, and the first sub-node and the second sub-node are the sub-nodes to be solved among the plurality of sub-nodes;
  • step 401 For the specific description of the acquisition module 1101, reference may be made to the description of step 401 in the above embodiment, which will not be described again here.
  • Parallel scheduling module 1102 configured to determine a target parallel mode from multiple parallel modes according to the information of the search tree, and solve the first sub-node in parallel on the first device and the second device according to the target parallel mode. and at least one of the second child nodes; wherein the information of the search tree includes at least one of first information and second information; the first information is related to the number of child nodes on the search tree , the second information is related to the connection relationship between the child nodes on the search tree.
  • the multiple parallel modes include at least one of the following modes:
  • the first subtree is obtained by solving the first subnode through branch and bound
  • the second subtree is obtained by solving the second subnode through branch and bound;
  • the first child node is solved on the first device without solving other child nodes other than the first child node, and the second child node is solved on the second device without solving the second child node.
  • the first child node is solved on a first device by a first planning algorithm, and the first child node is solved in parallel on a second device by a second planning algorithm.
  • parallel scheduling module 1102 For a specific description of the parallel scheduling module 1102, reference may be made to the description of step 402 in the above embodiment, and will not be described again here.
  • the first information includes:
  • the depth represents the distance between child nodes and the root node.
  • the parallel scheduling module is specifically used to:
  • the updated search tree includes a third sub-node node and the fourth sub-node, the third sub-node and the fourth sub-node are new sub-nodes to be solved after solving the first sub-node or the second sub-node;
  • the method also includes:
  • a plurality of sub-nodes included in the third sub-tree where the third sub-node is located are solved through branch and bound
  • a plurality of sub-nodes included in the fourth sub-tree where the fourth sub-node is located are solved.
  • the third sub-node is solved on the first device without solving other sub-nodes other than the third sub-node, and the fourth sub-node is solved on the second device without solving the third sub-node. child nodes beyond the four child nodes; or,
  • the third sub-node is solved on the first device by a first planning algorithm, and the third sub-node is solved in parallel on the second device by a second planning algorithm.
  • the parallel scheduling module is specifically used to:
  • the parallel scheduling module is specifically used to:
  • the parallel scheduling module is specifically used to:
  • the target parallel mode is determined from multiple parallel modes through the neural network model.
  • the first planning algorithm and the second planning algorithm are different linear programming algorithms.
  • the embodiment of the present application also provides a system, wherein the system may include a terminal device and a server, wherein the terminal device may perform steps 401 to 402 in the above embodiment according to the model to be solved (mixed integer programming task) to obtain the solution result.
  • the system may include a terminal device and a server, wherein the terminal device may perform steps 401 to 402 in the above embodiment according to the model to be solved (mixed integer programming task) to obtain the solution result.
  • the terminal device can send the model to be solved to the server, and the server can perform steps 401 to 402 in the above embodiment to obtain the solution results, and send the solution results to the terminal device.
  • Figure 12 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device 1200 can be embodied as a mobile phone, a tablet, a notebook computer, Smart wearable devices, etc. are not limited here. Among them, the terminal device 1200 can be deployed with the implementation corresponding to Figure 11
  • the task solving device described in the example is used to realize the task solving function in the embodiment corresponding to Figure 11.
  • the terminal device 1200 includes: a receiver 1201, a transmitter 1202, a processor 1203 and a memory 1204 (the number of processors 1203 in the terminal device 1200 can be one or more, one processor is taken as an example in Figure 12) , wherein the processor 1203 may include an application processor 12031 and a communication processor 12032.
  • the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected through a bus or other means.
  • Memory 1204 may include read-only memory and random access memory and provides instructions and data to processor 1203 .
  • a portion of memory 1204 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1204 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1203 controls the operation of the terminal device.
  • various components of the terminal equipment are coupled together through a bus system.
  • the bus system may also include a power bus, a control bus, a status signal bus, etc.
  • various buses are called bus systems in the figure.
  • the methods disclosed in the above embodiments of the present application can be applied to the processor 1203 or implemented by the processor 1203.
  • the processor 1203 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1203 .
  • the above-mentioned processor 1203 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (ASIC), a field programmable Gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the processor 1203 can implement or execute the various methods, steps and logical block diagrams disclosed in the embodiments of this application.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1204.
  • the processor 1203 reads the information in the memory 1204 and completes the steps related to the terminal device in the above method in combination with its hardware.
  • the receiver 1201 can be used to receive input numeric or character information, and generate signal input related to relevant settings and function control of the terminal device.
  • the transmitter 1202 can be used to output numeric or character information through the first interface; the transmitter 1202 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1202 can also include a display device such as a display screen .
  • the processor 1203 is used to perform the steps performed by the terminal device in the above embodiment.
  • FIG. 13 is a schematic structural diagram of the server provided by the embodiment of the present application.
  • the server 1300 is implemented by one or more servers.
  • the server 1300 can be configured or Different performance results in relatively large differences, which may include one or more central processing units (CPU) 1313 (for example, one or more processors) and memory 1332, and one or more storage applications 1342 or data. 1344 storage medium 1330 (eg, one or more mass storage devices).
  • the memory 1332 and the storage medium 1330 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
  • the central processor 1313 may be configured to communicate with the storage medium 1330 and execute a series of instruction operations in the storage medium 1330 on the server 1300 .
  • the server 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1351; or one or more operating systems 1341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1341 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • the central processor 1313 is used to execute steps related to the task solving method in the above embodiment.
  • An embodiment of the present application also provides a computer program product that, when run on a computer, causes the computer to perform the steps performed by the foregoing terminal device, or causes the computer to perform the steps performed by the foregoing server.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a program for performing signal processing.
  • the program When the program is run on a computer, it causes the computer to perform the steps performed by the terminal device. , or, causing the computer to perform the steps performed by the aforementioned server.
  • the terminal device, server or terminal device provided by the embodiment of the present application may specifically be a chip.
  • the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor.
  • the communication unit may be, for example, an input/output interface, a pipe pins or circuits, etc.
  • the processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the terminal device executes the data processing method described in the above embodiment, or so that the chip in the server executes the data processing method described in the above embodiment.
  • the storage unit is a storage unit within the chip, such as a register, cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • Figure 14 is a structural schematic diagram of a chip provided by an embodiment of the present application.
  • the chip can be represented as a neural network processor NPU 1400.
  • the NPU 1400 serves as a co-processor and is mounted to the main CPU (Host). CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1403.
  • the arithmetic circuit 1403 is controlled by the controller 1404 to extract the matrix data in the memory and perform multiplication operations.
  • the computing circuit 1403 internally includes multiple processing units (Process Engine, PE).
  • arithmetic circuit 1403 is a two-dimensional systolic array.
  • the arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 1403 is a general-purpose matrix processor.
  • the arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1402 and caches it on each PE in the arithmetic circuit.
  • the operation circuit takes matrix A data and matrix B from the input memory 1401 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1408 .
  • the unified memory 1406 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1405, and the DMAC is transferred to the weight memory 1402.
  • Input data is also transferred to unified memory 1406 via DMAC.
  • DMAC Direct Memory Access Controller
  • BIU is Bus Interface Unit, that is, bus interface unit 1410, which is used for AXI bus, DMAC and index storage. Interaction of Instruction Fetch Buffer (IFB) 14014.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1410 (Bus Interface Unit, BIU for short) is used to fetch the memory 14014 to obtain instructions from the external memory, and is also used for the storage unit access controller 1405 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1406 or the weight data to the weight memory 1402 or the input data to the input memory 1401 .
  • the vector calculation unit 1407 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit 1403, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc.
  • vector calculation unit 1407 can store the processed output vectors to unified memory 1406 .
  • the vector calculation unit 1407 can apply a linear function; or a nonlinear function to the output of the operation circuit 1403, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value.
  • vector calculation unit 1407 generates normalized values, pixel-wise summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 1403, such as for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 14014 connected to the controller 1404 is used to store instructions used by the controller 1404;
  • the unified memory 1406, input memory 1401, weight memory 1402 and instruction fetch memory 14014 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • the processor mentioned in any of the above places can be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the above programs.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate.
  • the physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.
  • the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, server, or network device, etc.) to execute various implementations of this application. method described in the example.
  • a computer device which can be a personal computer, server, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that a computer can store, or a data storage device such as a server or data center integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Abstract

本申请公开了一种任务求解方法,方法包括:获取搜索树,搜索树为通过分支定界,对混合整数规划任务进行求解得到的,搜索树包括第一子节点和第二子节点在内的多个子节点,每个子节点对应一个子任务,第一子节点和第二子节点为多个子节点中的待求解子节点;根据搜索树的信息,从多个并行模式中确定目标并行模式,并根据目标并行模式在第一设备和第二设备上并行求解第一子节点以及第二子节点中的至少一个。本申请在对MIP进行求解的过程中,根据搜索树的信息来确定并行模式,可以动态选择并行模式,相比固定的并行模式,可以提升MIP求解效率,缩短计算时间。

Description

一种任务求解方法及其装置
本申请要求于2022年4月24日提交中国专利局、申请号为202210454870.0、发明名称为“一种任务求解方法及其装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及调度领域,尤其涉及一种任务求解方法及其装置。
背景技术
调度问题是大型制造、物流、生产等环节中最常见的问题之一,在不同的场景下,调度总是有不同的意义。例如:物流调度主要是指在物流过程中,物流公司根据待发货物的重量、去向、规格、加急程度等对所属的车辆和人员进行合理的安排和调度。
很多的调度问题(如排产、线体调度、加工网络布局等)都可以建模成一个数学问题来求解,线性规划(linear programming,LP)是其中使用最广的一类建模方法。混合整数规划(mixed integer programming,MIP)属于求解器领域的一种,广泛应用于云计算、金融、制造等领域。LP问题是在给定一组特定的线性约束条件下,求解目标函数最小值的问题。而MIP问题就是在LP基础上新增了部分或全部变量是整数的整数约束。
MIP求解在初始状态只有一个根节点(对应于初始的混合整数规划任务),在求解过程中会通过分支不断生成新的待求解子节点。可见,MIP树搜索中的不同节点可以独立、并行处理,具有天然的并行性,即对这棵树的搜索存在多种维度的并行机会,如子树粒度、节点粒度以及节点内粒度等等。
在现有的实现中,每次进行MIP的求解过程中均只支持一种并行策略,对于MIP的求解过程的加速效果的提升有限。
发明内容
本申请公开了一种任务求解方法,在对MIP进行求解的过程中,根据搜索树的信息来确定并行模式,可以动态选择并行模式,相比固定的并行模式,可以提升MIP求解效率,缩短计算时间。
第一方面,本申请提供了一种任务求解方法,所述方法包括:获取搜索树,所述搜索树为通过分支定界,对混合整数规划任务进行求解得到的,所述搜索树包括第一子节点和第二子节点在内的多个子节点,每个子节点对应一个子任务,所述第一子节点和所述第二子节点为所述多个子节点中的待求解子节点;根据所述搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点以及所述第二子节点中的至少一个;其中,所述搜索树的信息包括第一信息和第二信息中的至少一种;所述第一信息与所述搜索树上子节点的数量有关,所述第二信息与搜索树上的子节点之间的连接关系有关。
在一种可能的实现中,多个子节点还可以包括除了第一子节点和第二子节点之外的其 他待求解子节点。
在现有的实现中,在对混合整数规划任务进行分支定界的并行处理时,整个过程一直保持采用相同的并行模式。其中,这里所谓的并行,可以理解为存在多个计算设备(或者描述为实例),多个计算设备之间同步进行子节点的求解过程。然而,针对于不同的搜索树结构,不同的并行模式进行求解的效率以及求解精度是不同的。例如针对于包括的子树的长度普遍较短的搜索树而言,以节点或者节点内为粒度进行的求解过程的效率较高,针对于包括的子树的长度普遍较长的搜索树而言,以子树为粒度进行的求解过程的效率较高。且随着求解过程的进行,搜索树的结构也在发生变化,例如一部分搜索树包括的子树的长度普遍较长,另一部分搜索树包括的子树的长度普遍较短。
本申请实施例中,在对MIP进行求解的过程中,根据搜索树的信息来确定并行模式,可以动态选择并行模式,相比固定的并行模式,可以提升MIP求解效率,缩短计算时间。
在一种可能的实现中,所述多个并行模式包括如下模式中的至少一种:在第一设备上求解所述第一子节点所在的第一子树包括的多个子节点、且在第二设备上求解所述第二子节点所在的第二子树包括的多个子节点;所述第一子树为通过分支定界,对所述第一子节点进行求解得到的,所述第二子树为通过分支定界,对所述第二子节点进行求解得到的;或者,在第一设备上求解所述第一子节点且不求解除所述第一子节点之外的其他子节点、以及在第二设备上求解所述第二子节点且不求解除所述第二子节点之外的其他子节点;或者,通过第一规划算法,在第一设备上求解所述第一子节点、以及通过第二规划算法,在第二设备上并行求解所述第一子节点。
应理解,除了上述三种并行求解模式之外,所述多个并行模式还可以包括其他并行求解模式,这里并不限定。
在一种可能的实现中,所述第一规划算法和所述第二规划算法为不同的线性规划算法。
在一种可能的实现中,所述搜索树的信息包括:所述搜索树上的子节点数量、或者所述搜索树上的子节点之间的连接关系。
在一种可能的实现中,所述第一信息,包括:所述搜索树上多个深度中每个深度的子节点数量、或者所述搜索树上待求解的子节点的数量,其中,所述混合整数规划任务对应于所述搜索树上的根节点,所述深度表示子节点到所述根节点之间的距离。
在一种可能的实现中,所述根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点或所述第二子节点,包括:根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点或所述第二子节点,以得到更新后的搜索树;所述更新后的搜索树包括第三子节点和第四子节点,所述第三子节点和所述第四子节点为求解所述第一子节点或所述第二子节点后新增的待求解的子节点;所述方法还包括:根据所述更新后的搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第三子节点和所述第四子节点中的至少一个;其中,所述多个并行模式包括如下模式中的至少一种:在第一设备上通过分支定界,求解所述第三子节点所在的第三子树包括的多个子节点、且在第二设备上求解所述第四子节点所在的第四子树 包括的多个子节点;所述第三子树为通过分支定界,对所述第三子节点进行求解得到的,所述第四子树为通过分支定界,对所述第四子节点进行求解得到的;或者,在第一设备上求解所述第三子节点且不求解除所述第三子节点之外的其他子节点、以及在第二设备上仅求解所述第四子节点且不求解除所述第四子节点之外的其他子节点;或者,通过第一规划算法,在第一设备上求解所述第三子节点、以及通过第二规划算法,在第二设备上并行求解所述第三子节点。
在一种可能的实现中,所述求解所述第一子节点所在的第一子树包括的多个子节点,包括:求解所述第一子节点所在的第一子树包括的预设数量的子节点;或者,在预设时间内,求解所述第一子节点所在的第一子树包括的多个子节点。
在一种可能的实现中,所述求解所述第二子节点所在的第二子树包括的多个子节点,包括:求解所述第二子节点所在的第二子树包括的预设数量的子节点;或者,在预设时间内,求解所述第二子节点所在的第二子树包括的多个子节点。
在一种可能的实现中,所述根据所述搜索树的信息,从多个并行模式中确定目标并行模式,包括:根据所述搜索树的信息,通过神经网络模型,从多个并行模式中确定目标并行模式。
第二方面,本申请提供了一种任务求解装置,所述装置包括:
获取模块,用于获取搜索树,所述搜索树为通过分支定界,对混合整数规划任务进行求解得到的,所述搜索树包括第一子节点和第二子节点在内的多个子节点,每个子节点对应一个子任务,所述第一子节点和所述第二子节点为所述多个子节点中的待求解子节点;
并行调度模块,用于根据所述搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点以及所述第二子节点中的至少一个;其中,所述搜索树的信息包括第一信息和第二信息中的至少一种;所述第一信息与所述搜索树上子节点的数量有关,所述第二信息与搜索树上的子节点之间的连接关系有关。
在一种可能的实现中,所述多个并行模式包括如下模式中的至少一种:
在第一设备上求解所述第一子节点所在的第一子树包括的多个子节点、且在第二设备上求解所述第二子节点所在的第二子树包括的多个子节点;所述第一子树为通过分支定界,对所述第一子节点进行求解得到的,所述第二子树为通过分支定界,对所述第二子节点进行求解得到的;或者,
在第一设备上求解所述第一子节点且不求解除所述第一子节点之外的其他子节点、以及在第二设备上求解所述第二子节点且不求解除所述第二子节点之外的其他子节点;或者,
通过第一规划算法,在第一设备上求解所述第一子节点、以及通过第二规划算法,在 第二设备上并行求解所述第一子节点。
在一种可能的实现中,所述第一信息,包括:
所述搜索树上多个深度中每个深度的子节点数量、或者所述搜索树上待求解的子节点的数量,其中,所述混合整数规划任务对应于所述搜索树上的根节点,所述深度表示子节点到所述根节点之间的距离。
在一种可能的实现中,所述并行调度模块,具体用于:
根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点或所述第二子节点,以得到更新后的搜索树;所述更新后的搜索树包括第三子节点和第四子节点,所述第三子节点和所述第四子节点为求解所述第一子节点或所述第二子节点后新增的待求解的子节点;
所述方法还包括:
根据所述更新后的搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第三子节点和所述第四子节点中的至少一个;其中,所述多个并行模式包括如下模式中的至少一种:
在第一设备上通过分支定界,求解所述第三子节点所在的第三子树包括的多个子节点、且在第二设备上求解所述第四子节点所在的第四子树包括的多个子节点;所述第三子树为通过分支定界,对所述第三子节点进行求解得到的,所述第四子树为通过分支定界,对所述第四子节点进行求解得到的;或者,
在第一设备上求解所述第三子节点且不求解除所述第三子节点之外的其他子节点、以及在第二设备上仅求解所述第四子节点且不求解除所述第四子节点之外的其他子节点;或者,
通过第一规划算法,在第一设备上求解所述第三子节点、以及通过第二规划算法,在第二设备上并行求解所述第三子节点。
在一种可能的实现中,所述并行调度模块,具体用于:
求解所述第一子节点所在的第一子树包括的预设数量的子节点;或者,
在预设时间内,求解所述第一子节点所在的第一子树包括的多个子节点。
在一种可能的实现中,所述并行调度模块,具体用于:
求解所述第二子节点所在的第二子树包括的预设数量的子节点;或者,
在预设时间内,求解所述第二子节点所在的第二子树包括的多个子节点。
在一种可能的实现中,所述并行调度模块,具体用于:
根据所述搜索树的信息,通过神经网络模型,从多个并行模式中确定目标并行模式。
在一种可能的实现中,所述第一规划算法和所述第二规划算法为不同的线性规划算法。
第三方面,本申请实施例提供了一种装置,包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,以执行如上述第一方面及第一方面任一可选的方法。
第四方面,本发明实施例还提供一种系统,该系统包括至少一个处理器,至少一个存储器以及至少一个通信接口;处理器、存储器和通信接口通过通信总线连接并完成相互间的通信;
存储器用于存储执行以上方案的应用程序代码,并由处理器来控制执行。所述处理器用于执行所述存储器中存储的应用程序代码,以得到任务调度结果;其中存储器存储的代码可执行以上提供的一种任务求解方法。
通信接口,用于与其他设备或通信网络通信,以将所述任务求解结果发送至所述设备或通信网络。
第五方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面及其任一可选的方法。
第六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机存储介质存储有一个或多个指令,所述指令在由一个或多个计算机执行时使得所述一个或多个计算机实施上述第二方面及其任一可选的系统。
第七方面,本申请实施例提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面及其任一可选的方法。
第八方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持终端设备或服务器实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据;或,信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存终端设备或服务器必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
附图说明
图1为本申请实施例提供一种应用架构示意图;
图2为本申请实施例提供的一种服务器的架构示意图;
图3a为本申请实施例提供的一种搜索树示意图;
图3b为本申请实施例提供的一种系统架构示意图;
图4为本申请实施例提供的一种任务求解方法的流程示意图;
图5为本申请实施例提供的一种并行处理示意图;
图6为本申请实施例提供的一种并行处理示意图;
图7为本申请实施例提供的一种并行处理示意图;
图8为本申请实施例提供的一种架构示意图;
图9为本申请实施例提供的一种架构示意图;
图10为本申请实施例提供的一种架构示意图;
图11为本实施例提供的一种任务求解装置的结构示意;
图12为本申请实施例提供的终端设备的一种结构示意图;
图13为本申请实施例提供的服务器一种结构示意图;
图14为本申请实施例提供的芯片的一种结构示意图。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
本申请实施例可以应用于多种场景(比如供应链、云计算、调度、存储优化、金融等)的线性规划优化问题的求解中,加速线性规划求解器求解这些问题的效率。
参照图1,图1为本申请实施例提供的应用结构示意,本申请提供的任务求解方法可以作为求解器部署在云侧的服务器,终端设备可以将待求解模型(例如本申请实施例中的混合整数规划任务)传递至云侧的服务器,云侧的服务器可以基于自身部署的求解器对待求解模型进行求解,并将求解结果传递至终端设备。
比如用户可以根据自己的业务场景构建待求解模型,在求解时候,可以将一部分历史上同类问题的模型传递至服务器,服务器可以调用求解器较快的输出用户输入模型的最优解。用户可以根据这个解去使用平台提供的功能生成数据报表或者自己对它进行处理得到想要的结果。
参照图2,图2为本申请实施例提供的服务器的架构示意。具体的,服务器200由一个或多个服务器实现,服务器200可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)22(例如,一个或一个以上处理器)和存储器232,一个或一个以上存储应用程序242或数据244的存储介质230(例如一个或一个以上海量存储设备)。其中,存储器232和存储介质230可以是短暂存储或持久存储。存储在存储介质230的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器22可以设置为与存储介质230通信,在服务器200上执行存储介质230中的一系列指令操作。
服务器200还可以包括一个或一个以上电源222,一个或一个以上有线或无线网络接 口250,一个或一个以上输入输出接口258;或,一个或一个以上操作系统241,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,中央处理器22,用于执行本申请实施例中描述的任务求解方法。
应理解,本申请实施例提供的任务求解方法也可以作为求解器部署在端侧的终端设备上,这里并不限定。
示例性的,本申请实施例中待求解模型可以用于解决调度问题。调度问题是大型制造、物流、生产等环节中最常见的问题之一,并且在不同的场景下,调度总是有不同的意义。例如:物流调度主要是指在物流过程中,物流公司根据待发货物的重量、去向、规格、加急程度等对所属的车辆和人员进行合理的安排和调度。
而生产环境中的调度是根据不同产线中不同机器的产能以及生产需求,在若干任务(job)中完成对任务的排序以及任务和生产设备之间的匹配。即将多个任务分配至各条生产线中的生产设备。
例如,在作业车间调度(job-shop scheduling)的场景中,n个工件在m台机器上加工,每个工件有特定的加工工艺,每个工件加工的顺序及每道工序所花时间给定,安排工件在每台机器上工件的加工顺序,使得某种指标最优。这里不要求每个工件都在每个机器上执行。
例如,在流水车间调度(flow-shop scheduling)的场景中,该类调度问题要求每个任务必须依次执行到每个阶段,不涉及任务和阶段的匹配,而主要是决定任务的执行顺序。防止由于中间等待时间过长,而造成整体的完成时间时长。
与一般的货物调度略有不同的是,机场、大型制造工厂的工人、排班(timetabling)也是调度问题的一种,这是由于这类问题的目标也是依照工人的工作特点以及场景需要在不同的时间段内完成最优匹配。因此,核心是排序以及最优分配,而不局限“任务”是人还是货物。一般来讲,调度问题的目标是在给定任务数的前提下得到最小总工时(makespan)所对应的排序。
同时,调度问题在计算机中是分配工作所需资源的方法。资源可以指虚拟的计算资源,如线程、进程或数据流;也可以指硬件资源,如处理器、网络连接或扩展卡。进行调度工作的程序叫做调度器。调度器通常的实现使得所有计算资源都处于忙碌状态(在负载均衡中),允许多位用户有效地同时共享系统资源,或达到指定的服务质量。
很多的调度问题(如排产、线体调度、加工网络布局等)都可以建模成一个数学问题来求解,线性规划(linear programming,LP)是其中使用最广的一类建模方法。目前,线性规划求解器底层依赖的算法中,单纯型法是目前使用最为广泛的算法,并且也是各个线性规划求解器优化的比较多的一类算法。
混合整数规划(mixed integer programming,MIP)属于求解器领域的一种,广泛应用于云计算、金融、制造等领域。LP问题是在给定一组特定的线性约束条件下,求解目标函数最小值的问题。而MIP问题就是在LP基础上新增了部分或全部变量是整数的整数约束。
通常来说,MIP是一类NP-hard问题,其计算复杂度非常高,目前业界主要采用分支定界(branch and bound,B&B)理论计算模型作为主要的求解框架。分支定界法是一种 求解整数规划问题的最常用算法。分支定界法是一种搜索与迭代的方法,选择不同的分支变量和子问题进行分支。通常,把全部可行解空间反复地分割为越来越小的子集(本申请实施例中也可以称之为子节点),称为分支;并且对每个子集内的解集计算一个目标下界(对于最小值问题),这称为定界。在每次分支后,凡是界限超出已知可行解集目标值的那些子集不再进一步分支,这样,许多子集可不予考虑,这称剪枝。
分支定界的计算模型本质上是一种基于队列实现的分支定界树搜索,该队列中存储了当前待求解的节点,其执行流程如下:
1.根据节点选择算法,从队列中取出一个待计算的节点;
2.求解节点并更新上下界信息,如果计算结果不满足整数约束,则分裂成两个子节点;
3.对上下界信息进行上下界检查,满足条件则子节点入队,否则进行剪枝;
4.循环执行以上几步,直至队列为空,即不存在待计算的节点为止。
MIP求解在初始状态只有一个根节点(对应于初始的混合整数规划任务),在求解过程中会通过分支不断生成新的待求解节点,其生成的动态搜索树结构可以如图3a所示(其中,黑色的为待求解节点,白色的为已经求解过的节点)。可见,MIP树搜索中的不同节点可以独立、并行处理,具有天然的并行性,即对这棵树的搜索存在多种维度的并行机会,如子树粒度、节点粒度以及节点内粒度等等。
在现有的实现中,MIP的求解过程均只支持一种并行策略,对于MIP的求解过程的加速效果的提升有限。
下面结合图3b对本申请实施例提供的系统架构进行详细的介绍。图3b为本申请一实施例提供的系统架构示意图。如图3b所示,系统架构500包括执行设备510、训练设备520、数据库530、客户设备540、数据存储系统550以及数据采集系统560。
执行设备510包括计算模块511、I/O接口512、预处理模块513和预处理模块514。计算模块511中可以包括目标模型/规则501,预处理模块513和预处理模块514是可选的。
数据采集设备560用于采集训练数据。
其中,训练数据可以为搜索树的信息、求解结果以及求解过程所消耗的资源量等。
在采集到训练数据之后,数据采集设备560将这些训练数据存入数据库530,训练设备520基于数据库530中维护的训练数据训练得到目标模型/规则501(例如本申请实施例中的神经网络模型)。
需要说明的是,在实际应用中,数据库530中维护的训练数据不一定都来自于数据采集设备560的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备520也不一定完全基于数据库530维护的训练数据进行目标模型/规则501的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备520训练得到的目标模型/规则501可以应用于不同的系统或设备中,如应用于图3b所示的执行设备510,所述执行设备510可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备,车载终端等,还可以是服务器或者云端等。在图3b中,执行设备510配置输入/输出(input/output, I/O)接口512,用于与外部设备进行数据交互,用户可以通过客户设备540向I/O接口512输入数据。
预处理模块513和预处理模块514用于根据I/O接口512接收到的输入数据进行预处理。应理解,可以没有预处理模块513和预处理模块514或者只有的一个预处理模块。当不存在预处理模块513和预处理模块514时,可以直接采用计算模块511对输入数据进行处理。
在执行设备510对输入数据进行预处理,或者在执行设备510的计算模块511执行计算等相关的处理过程中,执行设备510可以调用数据存储系统550中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统550中。
最后,I/O接口512将处理结果呈现给客户设备540,从而提供给用户。
在图3b所示情况下,用户可以手动给定输入数据,该“手动给定输入数据”可以通过I/O接口512提供的界面进行操作。另一种情况下,客户设备540可以自动地向I/O接口512发送输入数据,如果要求客户设备540自动发送输入数据需要获得用户的授权,则用户可以在客户设备540中设置相应权限。用户可以在客户设备540查看执行设备510输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备540也可以作为数据采集端,采集如图所示输入I/O接口512的输入数据及输出I/O接口512的输出结果作为新的样本数据,并存入数据库530。当然,也可以不经过客户设备540进行采集,而是由I/O接口512直接将如图所示输入I/O接口512的输入数据及输出I/O接口512的输出结果,作为新的样本数据存入数据库530。
值得注意的是,图3b仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图3b中,数据存储系统550相对执行设备510是外部存储器,在其它情况下,也可以将数据存储系统550置于执行设备510中。
应理解上述执行设备510也可以部署于客户设备540中。
为了使得本申请更加的清楚,首先对本申请提到的部分概念和处理流程作简单介绍。
线性规划(Linear Programming,LP):是运筹学中研究较早、发展较快、应用广泛、方法较成熟的一个重要分支,它是辅助人们进行科学管理的一种数学方法。研究线性约束条件下线性目标函数的极值问题的数学理论和方法。
约束:Constraints,是数学规划问题中的约束,即对决策变量的数值要求。
函数实例:在函数计算中函数的整个隔离环境,如使用容器作为函数的隔离方式,函数实例即为包含了函数完整运行环境的容器隔离环境。
子树并行:多个函数实例/线程对同一搜索树中的多颗子树并行计算,各函数实例/线程之间相互独立。
节点并行:多个函数实例/线程对同一搜索树中的多个子节点并行计算,各函数实例/线程之间相互独立。
节点内并行:对单节点内的计算进行并行化,如并行求解LP问题、并行使用不同的启 发式算法、并行使用不同的割平面算法等。
下面结合本申请实施例中的附图对本申请实施例进行描述。本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。参见图4,图4为本申请实施例提供的一种任务求解方法,所述方法包括:
401、获取搜索树,所述搜索树为通过分支定界,对混合整数规划任务进行求解得到的,所述搜索树包括第一子节点和第二子节点在内的多个子节点,每个子节点对应一个子任务,所述第一子节点和所述第二子节点为所述多个子节点中的待求解子节点。
在一种可能的实现中,步骤401的执行主体可以为服务器或者终端设备。
例如终端设备可以将混合整数规划任务作为待求解模型传递至服务器,进而服务器可以获取混合整数规划任务。
在一种可能的实现中,混合整数规划任务可以包括目标函数和约束条件,其中,目标函数是指根据待优化的目标和影响该目标的变量所设计的函数。例如,在排产问题中,整个排产的目标通常是在满足所有资源约束的情况下,找出一个最好的加工计划,使得需求的满足率最高,同时整体的成本最小(例如成本可以包括但不限于加工成本、库存成本、转运成本),此时,该目标函数可以是用于表示满足率最大化以及成本最小化的函数。另外,约束条件是指在求解目标函数的过程中所要满足的其他限制条件。在混合整数规划任务中,至少一个待求解变量被约束为整数。
在一种可能的实现中,所述混合整数规划任务用于为至少一个待调度任务分配调度资源,所述第一规划约束为调度资源满足的约束,所述调度资源为生产线、生产设备或生产厂家。
例如,在产品生产的场景中,待调度任务可以是待生产的产品,在人员调度的场景中,待调度任务可以是待生产的人等等,本申请实施例并不限定。
在产品生产的场景中,多个可调度资源组中的每个可调度资源组可以为生产线,例如在手机的生产场景中,多个可调度资源组中的每个可调度资源组可以为一种手机组件的生产线,例如可以是电池的生产线、外壳的生产线、芯片的生产线等等,相应的,每个可调度资源组可以包括多个可调度资源,多个可调度资源中的每个可调度资源为所述生产线中的生产设备,例如,电池生产线可以包括多个电池生产设备,外壳生产线可以包括多个外壳生产设备,这里并不限定。
在人员调度的场景中,多个可调度资源组中的每个可调度资源组可以为时间段,例如在人员调度的场景中,多个可调度资源组中的每个可调度资源组可以为一天,例如可以是周一、周二、周三或者是一些月份的某一天等等,相应的,每个可调度资源组可以包括多个可调度资源,多个可调度资源中的每个可调度资源为时间段中的子时间段,例如,某一天可以包括多个小时、多个分钟或者其他多个子时间段,这里并不限定。
在一种可能的实现中,求解器在对混合整数规划任务进行分支定界的处理时,求解器(例如求解器的协调节点coordinator)可以负责分支定界中需要串行进行的步骤,例如,coordinator可以内置待求解节点队列(queue),负责任务分配以及整个求解流程的控制(例 如可以执行预处理、节点选择、上下界检查等功能,并基于本地的状态维护节点队列)。
在一种可能的实现中,可以对混合整数规划任务进行分支定界的处理,进而对混合整数规划任务进行求解、分裂以及剪枝,进而形成包括多个子节点的搜索树,其中,搜索树中的多个子节点可以包括多个已求解的子节点以及多个待求解的子节点。各个子节点之间基于分裂关系而存在连接关系,进而形成搜索树。
在现有的实现中,在对混合整数规划任务进行分支定界的并行处理时,整个过程一直保持采用相同的并行模式。其中,这里所谓的并行,可以理解为存在多个计算设备(或者描述为实例),多个计算设备之间同步进行子节点的求解过程。
例如,在一种可能的实现中,待求解的子节点可以包括子节点1和子节点2,计算设备可以包括计算设备1和计算设备2。
在并行模式1(以节点为粒度的并行)中,计算设备1对子节点1进行求解,得到子节点1的求解结果以及子节点的更新结果(基于求解结果确定是否对子节点进行分裂)后,可以将求解结果以及子节点的更新结果传递至求解器的协调节点coordinator,进而完成当前轮次的求解。类似的,计算设备2对子节点2进行求解,得到子节点2的求解结果以及子节点的更新结果(基于求解结果确定是否对子节点进行分裂)后,可以将求解结果以及子节点的更新结果传递至求解器的协调节点coordinator,进而完成当前轮次的求解。其中,计算设备1对子节点1进行求解的求解过程和计算设备2对子节点2进行求解的求解过程是并行进行的。
在并行模式2(以子树为粒度的并行)中,计算设备1对子节点1以及子节点1分裂(或者分裂后的子节点再进行分裂)得到的多个子节点进行求解,直到求解时间达到预设值或者求解的子节点数量达到预设值后,可以得到多个子节点的求解结果以及子节点的更新结果(基于求解结果确定是否对子节点进行分裂),可以将求解结果以及子节点的更新结果传递至求解器的协调节点coordinator,进而完成当前轮次的求解。类似的,计算设备2对子节点2以及子节点2分裂(或者分裂后的子节点再进行分裂)得到的多个子节点进行求解,直到求解时间达到预设值或者求解的子节点数量达到预设值后,可以得到多个子节点的求解结果以及子节点的更新结果(基于求解结果确定是否对子节点进行分裂),可以将求解结果以及子节点的更新结果传递至求解器的协调节点coordinator,进而完成当前轮次的求解。其中,计算设备1对包括子节点1在内的多个子节点进行求解的求解过程和计算设备2对包括子节点2在内的多个子节点进行求解的求解过程是并行进行的。
其中,子节点所在的子树可以理解为对该子节点进行分裂(或者分裂后的子节点再进行分裂)得到的多个子节点。
在并行模式3(以节点内为粒度的并行)中,计算设备1通过线性规划算法1对子节点1进行求解。类似的,计算设备2通过线性规划算法2(不同于线性规划算法1)对子节点1进行求解。计算设备1和计算设备2中最先得到子节点1的求解结果以及子节点的更新结果(基于求解结果确定是否对子节点进行分裂)后,可以将求解结果以及子节点的更新结果传递至求解器的协调节点coordinator,进而完成当前轮次的求解。其中,计算设备1通过线性规划算法1对子节点1的求解过程和计算设备2通过线性规划算法2对子节点1的求解过程是并行进行的。
然而,针对于不同的搜索树结构,不同的并行模式进行求解的效率以及求解精度是不同的。例如针对于包括的子树的长度普遍较短的搜索树而言,以节点或者节点内为粒度进行的求解过程的效率较高,针对于包括的子树的长度普遍较长的搜索树而言,以子树为粒度进行的求解过程的效率较高。且随着求解过程的进行,搜索树的结构也在发生变化,例如一部分搜索树包括的子树的长度普遍较长,另一部分搜索树包括的子树的长度普遍较短。
本申请实施例中,为了提高混合整数规划任务的求解效率以及精度,可以在求解过程中,基于搜索树的信息,来动态更新并行模式。
接下来介绍本申请实施例中的搜索树的信息。
在一种可能的实现中,求解器(例如求解器的协调节点coordinator)可以获取到搜索树的信息,该搜索树的信息可以描述搜索树上子节点的结构特征,其中,结构特征可以包括子节点之间的连接关系(第二信息)和/或子节点的数量特征(第一信息)。
在一种可能的实现中,所述搜索树的信息可以包括:所述搜索树上的子节点数量、或者所述搜索树上的子节点之间的连接关系。
在一种可能的实现中,所述第一信息,具体包括:所述搜索树上多个深度中每个深度的子节点数量、或者所述搜索树上待求解的子节点的数量,其中,所述混合整数规划任务对应于所述搜索树上的根节点,所述深度表示子节点到所述根节点之间的距离。
示例性的,在进行下一轮求解过程时,可以将当前时刻待搜索树上的待求解节点的数量作为搜索树的信息。例如可以通过λk表示,λk可以表示动态搜索树上待求解的节点个数,如图3a中黑色节点个数。
示例性的,在进行下一轮求解过程时,可以将当前时刻待搜索树上在深度上的每一层节点的数量作为搜索树的信息。例如Wd可以表示搜索树上第d层的节点数量;γk可以表示k时刻树上每一层的节点数组成的序列,其中k时刻树的最大深度为d,即γk=(W0…,Wd)。
其中,这里的深度可以理解为子节点到所述根节点之间的距离;例如图3a中最左侧的两个待求解子节点的深度为4(也就是和根节点之间相隔了三个子节点,根节点的默认深度为0)。
在一种可能的实现中,搜索树的信息包括多个深度的节点数量,且深度的数量也小于预设值,也就是选择生成时间距离当前时刻较近的子节点。在求解过程中,动态搜索树的深度是不断增加的,即历史信息是不断增加,但考虑到一般很久远的信息对决策的作用有着很边际的效应,因此,可采用滑动窗口的优化思路,即只取窗口大小为h内的历史信息作为搜索树的信息来输入到神经网络中,以解决输入维度可能面临爆炸的问题。
402、根据所述搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点以及所述第二子节点中的至少一个;其中,所述搜索树的信息包括第一信息和第二信息中的至少一种;所述第一信息与所述搜索树上子节点的数量有关,所述第二信息与搜索树上的子节点之间的连接关系有关。
在一种可能的实现中,所述多个并行模式包括如下模式中的至少一种:
在第一设备上求解所述第一子节点所在的第一子树包括的多个子节点、且在第二设备 上求解所述第二子节点所在的第二子树包括的多个子节点;所述第一子树为通过分支定界,对所述第一子节点进行求解得到的,所述第二子树为通过分支定界,对所述第二子节点进行求解得到的;或者,
在第一设备上求解所述第一子节点且不求解除所述第一子节点之外的其他子节点、以及在第二设备上求解所述第二子节点且不求解除所述第二子节点之外的其他子节点;或者,
通过第一规划算法,在第一设备上求解所述第一子节点、以及通过第二规划算法,在第二设备上并行求解所述第一子节点。
在一种可能的实现中,在进行下一轮求解过程前,可以根据搜索树的信息确定下一轮求解过程的并行模式。
例如,在一种可能的实现中,待求解的子节点可以包括子节点1和子节点2,计算设备可以包括计算设备1和计算设备2。
在并行模式1(以节点为粒度的并行)中,计算设备1对子节点1进行求解,得到子节点1的求解结果以及子节点的更新结果(基于求解结果确定是否对子节点进行分裂)后,可以将求解结果以及子节点的更新结果传递至求解器的协调节点coordinator,进而完成当前轮次的求解。类似的,计算设备2对子节点2进行求解,得到子节点2的求解结果以及子节点的更新结果(基于求解结果确定是否对子节点进行分裂)后,可以将求解结果以及子节点的更新结果传递至求解器的协调节点coordinator,进而完成当前轮次的求解。其中,计算设备1对子节点1进行求解的求解过程和计算设备2对子节点2进行求解的求解过程是并行进行的。
参照图6,图6为一种以节点为粒度的并行示意,其中,Coordinator从待计算节点队列中挑出n个节点,并配置n个函数实例;Coordinator设置函数实例只计算1个节点,即计算完接收到的节点后,分裂出2个子节点,并插入到Coordinator待计算节点队列;待所有函数实例计算完后,同步各自的状态信息给Coordinator;
在并行模式2(以子树为粒度的并行)中,计算设备1对子节点1以及子节点1分裂(或者分裂后的子节点再进行分裂)得到的多个子节点进行求解,直到求解时间达到预设值或者求解的子节点数量达到预设值后,可以得到多个子节点的求解结果以及子节点的更新结果(基于求解结果确定是否对子节点进行分裂),可以将求解结果以及子节点的更新结果传递至求解器的协调节点coordinator,进而完成当前轮次的求解。类似的,计算设备2对子节点2以及子节点2分裂(或者分裂后的子节点再进行分裂)得到的多个子节点进行求解,直到求解时间达到预设值或者求解的子节点数量达到预设值后,可以得到多个子节点的求解结果以及子节点的更新结果(基于求解结果确定是否对子节点进行分裂),可以将求解结果以及子节点的更新结果传递至求解器的协调节点coordinator,进而完成当前轮次的求解。其中,计算设备1对包括子节点1在内的多个子节点进行求解的求解过程和计算设备2对包括子节点2在内的多个子节点进行求解的求解过程是并行进行的。
其中,子节点所在的子树可以理解为对该子节点进行分裂(或者分裂后的子节点再进行分裂)得到的多个子节点。
参照图5,图5为一种以子树为粒度的并行示意,其中,Coordinator从待计算节点队列中挑出n个节点,并配置n个函数实例;Coordinator设置函数实例计算的时长,以接收到 的对应节点为根节点,在设定的计算时长(或者数量)内计算该颗子树中的多个节点,再把剩余未计算的多个子节点插入到Coordinator待计算节点队列;待所有函数实例计算完后,同步各自的状态信息给Coordinator。
在并行模式3(以节点内为粒度的并行)中,计算设备1通过线性规划算法1对子节点1进行求解。类似的,计算设备2通过线性规划算法2(不同于线性规划算法1)对子节点1进行求解。计算设备1和计算设备2中最先得到子节点1的求解结果以及子节点的更新结果(基于求解结果确定是否对子节点进行分裂)后,可以将求解结果以及子节点的更新结果传递至求解器的协调节点coordinator,进而完成当前轮次的求解。其中,计算设备1通过线性规划算法1对子节点1的求解过程和计算设备2通过线性规划算法2对子节点1的求解过程是并行进行的。
参照图7,图7为一种以节点内为粒度的并行示意,其中,Coordinator从待计算节点队列中挑出n个节点,并配置n个函数实例;Coordinator设置函数实例只计算1个节点,在计算接收节点过程中,invoke k个子函数实例,并行加速节点内模块(如LP)的计算,直到该节点计算完分裂出2个子节点,并插入到Coordinator待计算节点队列;待所有函数实例计算完后,同步各自的状态信息给Coordinator;
在一种可能的实现中,可以根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点或所述第二子节点,得到求解结果、以及新增的子节点(根据求解结果生成的子节点);所述新增的子节点用于更新所述搜索树,得到更新后的搜索树;所述更新后的搜索树包括第三子节点和第四子节点,所述第三子节点和所述第四子节点为所述更新后的搜索树上待求解的子节点;
在一种可能的实现中,在接下来一轮的求解过程中,也就是在所述根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点或所述第二子节点之后,可以根据所述更新后的搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第三子节点或所述第四子节点;其中,所述多个并行模式包括如下模式中的至少两种:
在第一设备上通过分支定界,求解所述第三子节点所在的第三子树包括的多个子节点、且在第二设备上求解所述第四子节点所在的第四子树包括的多个子节点;所述第三子树为通过分支定界,对所述第三子节点进行求解得到的,所述第四子树为通过分支定界,对所述第四子节点进行求解得到的;或者,
在第一设备上求解所述第三子节点且不求解除所述第三子节点之外的其他子节点、以及在第二设备上仅求解所述第四子节点且不求解除所述第四子节点之外的其他子节点;或者,
通过第一规划算法,在第一设备上求解所述第三子节点、以及通过第二规划算法,在第二设备上并行求解所述第三子节点。
在一种可能的实现中,上述所述根据所述搜索树的信息,从多个并行模式中确定目标并行模式的动作可以通过预训练好的神经网络模型来实现,也就是可以根据所述搜索树的信息,通过神经网络模型,从多个并行模式中确定目标并行模式。
其中,神经网络模型的输入可以为搜索树的信息,输出可以为并行模式以及并行的数量。接下来结合一个具体的实例介绍本申请实施例中的神经网络模型。
参照表1,表1为神经网络模型的预测标签的一种定义方式,其中μ可以表示并行维度,n可以表示并行度。
表1
其中,并行维度μ=0:子树粒度;μ=1:节点粒度;μ=2:节点内粒度,单节点内并行度:η。
针对于神经网络模型训练时评价决策的指标以及损失函数的定义可以包括:
当前决策后primal bound的变化:
当前决策后dual bound的变化:
损失函数:
其中,两个指标,第一个是决策后primal bound的变化,第二个是dual bound的变化,这里的t代表经过的时间,N代表消耗的资源量,即代表单位时间单位资源内对应gap的变化情况,两者都是越大越好。考虑到机器学习模型实际训练过程中,一般是最小化目标函数,一种可选的定义是对以上的gap取倒数,再求和,作为机器学习模型的损失函数。
对机器学习模型进行离线训练,并利用网格搜索技术生成多样的并行策略,记录该策略下gap值的变化,当满足预设的经验值时,认为该决策效果好,并记录当前时刻的输入特征与对应的并行策略,作为机器学习模型的一组训练数据;将以上收集到的训练数据用于机器学习模型的训练,利用梯度下降等算法最小化loss的值,直到模型损失函数收敛;利用训练好的机器学习模型进行实时推理。
在求解过程中,可以利用训练好的机器学习模型,基于Coordinator模块收集的输入特征进行实时推理决策并行策略。
本申请实施例提供了一种任务求解方法,所述方法包括:获取搜索树,所述搜索树为通过分支定界,对混合整数规划任务进行求解得到的,所述搜索树包括第一子节点和第二子节点在内的多个子节点,每个子节点对应一个子任务,所述第一子节点和所述第二子节点为所述多个子节点中的待求解子节点;根据所述搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点以及所述第二子节点中的至少一个;其中,所述搜索树的信息包括第一信息和第二信息中的至少一种;所述第一信息与所述搜索树上子节点的数量有关,所述第二信息与搜索树上的子节点之间的连接关系有关。
通过上述方式,在对MIP进行求解的过程中,根据搜索树的信息来确定并行模式,可 以动态选择并行模式,相比固定的并行模式,可以提升MIP求解效率,缩短计算时间。
接下来介绍本申请实施例任务求解方法的一个软件架构:
参照图8,图8为本申请实施例任务求解方法的一个软件架构的示意,如图8所示,具体可以包括:
(1)用户提交一个MIP求解任务,里面定义了MIP问题的约束以及目标函数等信息。
(2)Coordinator收到该任务后,对其进行读问题、预处理、根节点计算等一系列的处理。
(3)Coordinator收集当前时刻动态搜索树拓扑结构、当前待求解的节点数等信息作为输入特征给Scheduler模块。
(4)Scheduler模块将以上输入特征信息提供给内置的机器学习模型,实时推理出下一个时刻该采取的并行策略,包含并行维度和并行度,并将结果发送给Coordinator模块。
(5)Coordinator模块根据接收到的并行策略执行对应的动作。
(5.1)Coordinator模块接收到的并行策略为子树粒度并行,如(μ,n)=(0,4)
(5.2)Coordinator模块接收到的并行策略为节点粒度并行,如(μ,n)=(1,4)
(5.3)Coordinator接收到的并行策略为节点内粒度并行,如(μ,n)=(2,4)
(6)Coordinator模块判断当前是否求出最优解或者动态搜索树上的所有节点都已被计算,即待计算节点队列为空;若是,计算结束转第(7)步,若否,则转第(3)步,继续迭代。
(7)将计算结果返回给用户。
应理解,图8所示的架构解耦了MIP串行求解逻辑,利用有状态函数、无状态函数等函数计算平台原子能力实现MIP求解函数化改造,使能MIP子模块间可通过函数嵌套调用。
接下来介绍本申请实施例中的一个任务求解方法以及所在的软件架构的示意:参照图9和图10,可以包括:
(1)划分MIP串行求解流程中可并行的模块,具体包括:
(1.1)串行模块可以执行:预处理(presolving)、节点选择(node selection)、上下界检查(bound checking);
(1.2)并行模块可以执行:节点处理(node processing),分支(branching);
(2)对切分的模块利用有状态、无状态抽象进行函数化,具体包括:
(2.1)Coordinator(stateful)可以执行:内置待求解节点队列(queue),负责任务(e.g.,MIP问题)分配以及整个求解流程的控制,承载以上串行模块;
(2.2)Worker(stateful)可以执行:内置当前搜索路径(tree path),负责任务的计算求解,承载以上并行模块;
(2.3)LP&Primal Heuristic(stateless)可以执行:无需内置状态,分别承载节点处理中LP&启发式模块计算;
(3)确定关键数据并行访问模式,具体包括:
(3.1)本地副本数据:每个函数实例创建并存储本地副本,如LP数据,计算过程基于副本进行处理;
(3.2)全局共享数据:利用数据系统存储共享数据,如global tree等,通过加锁控制并发访问冲突;
(4)串行到并行的算法优化,具体包括:
(4.1)并行树搜索:每个Worker基于本地状态维护各自的搜索路径进行多路径并行搜索;
(4.2)主控制逻辑:迭代终止条件需同时满足待求解队列为空且worker均为空闲,串行情况下只需满足待求解队列为空;
(5)依据主控制逻辑对以上函数进行编排构建MIP计算框架。
参照图11,图11为本申请实施例提供的一种任务求解装置的结构示意,所述装置1100可以包括:
获取模块1101,用于获取搜索树,所述搜索树为通过分支定界,对混合整数规划任务进行求解得到的,所述搜索树包括第一子节点和第二子节点在内的多个子节点,每个子节点对应一个子任务,所述第一子节点和所述第二子节点为所述多个子节点中的待求解子节点;
其中,关于获取模块1101的具体描述可以参照上述实施例中步骤401的描述,这里不再赘述。
并行调度模块1102,用于根据所述搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点以及所述第二子节点中的至少一个;其中,所述搜索树的信息包括第一信息和第二信息中的至少一种;所述第一信息与所述搜索树上子节点的数量有关,所述第二信息与搜索树上的子节点之间的连接关系有关。
在一种可能的实现中,所述多个并行模式包括如下模式中的至少一种:
在第一设备上求解所述第一子节点所在的第一子树包括的多个子节点、且在第二设备上求解所述第二子节点所在的第二子树包括的多个子节点;所述第一子树为通过分支定界,对所述第一子节点进行求解得到的,所述第二子树为通过分支定界,对所述第二子节点进行求解得到的;或者,
在第一设备上求解所述第一子节点且不求解除所述第一子节点之外的其他子节点、以及在第二设备上求解所述第二子节点且不求解除所述第二子节点之外的其他子节点;或者,
通过第一规划算法,在第一设备上求解所述第一子节点、以及通过第二规划算法,在第二设备上并行求解所述第一子节点。
其中,关于并行调度模块1102的具体描述可以参照上述实施例中步骤402的描述,这里不再赘述。
在一种可能的实现中,所述第一信息,包括:
所述搜索树上多个深度中每个深度的子节点数量、或者所述搜索树上待求解的子节点的数量,其中,所述混合整数规划任务对应于所述搜索树上的根节点,所述深度表示子节点到所述根节点之间的距离。
在一种可能的实现中,所述并行调度模块,具体用于:
根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点或所述第二子节点,以得到更新后的搜索树;所述更新后的搜索树包括第三子节点和第四子节点,所述第三子节点和所述第四子节点为求解所述第一子节点或所述第二子节点后新增的待求解的子节点;
所述方法还包括:
根据所述更新后的搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第三子节点和所述第四子节点中的至少一个;其中,所述多个并行模式包括如下模式中的至少一种:
在第一设备上通过分支定界,求解所述第三子节点所在的第三子树包括的多个子节点、且在第二设备上求解所述第四子节点所在的第四子树包括的多个子节点;所述第三子树为通过分支定界,对所述第三子节点进行求解得到的,所述第四子树为通过分支定界,对所述第四子节点进行求解得到的;或者,
在第一设备上求解所述第三子节点且不求解除所述第三子节点之外的其他子节点、以及在第二设备上仅求解所述第四子节点且不求解除所述第四子节点之外的其他子节点;或者,
通过第一规划算法,在第一设备上求解所述第三子节点、以及通过第二规划算法,在第二设备上并行求解所述第三子节点。
在一种可能的实现中,所述并行调度模块,具体用于:
求解所述第一子节点所在的第一子树包括的预设数量的子节点;或者,
在预设时间内,求解所述第一子节点所在的第一子树包括的多个子节点。
在一种可能的实现中,所述并行调度模块,具体用于:
求解所述第二子节点所在的第二子树包括的预设数量的子节点;或者,
在预设时间内,求解所述第二子节点所在的第二子树包括的多个子节点。
在一种可能的实现中,所述并行调度模块,具体用于:
根据所述搜索树的信息,通过神经网络模型,从多个并行模式中确定目标并行模式。
在一种可能的实现中,所述第一规划算法和所述第二规划算法为不同的线性规划算法。
本申请实施例还提供一个系统,其中,系统可以包括终端设备以及服务器,其中,终端设备可以根据待求解模型(混合整数规划任务)执行上述实施例中步骤401至步骤402的步骤,以得到求解结果。
此外终端设备可以发送待求解模型至服务器,服务器可以执行上述实施例中步骤401至步骤402的步骤,以得到求解结果,并将求解结果发送至终端设备。
接下来介绍本申请实施例提供的一种终端设备,请参阅图12,图12为本申请实施例提供的终端设备的一种结构示意图,终端设备1200具体可以表现为手机、平板、笔记本电脑、智能穿戴设备等,此处不做限定。其中,终端设备1200上可以部署有图11对应实施 例中所描述的任务求解装置,用于实现图11对应实施例中任务求解的功能。具体的,终端设备1200包括:接收器1201、发射器1202、处理器1203和存储器1204(其中终端设备1200中的处理器1203的数量可以一个或多个,图12中以一个处理器为例),其中,处理器1203可以包括应用处理器12031和通信处理器12032。在本申请的一些实施例中,接收器1201、发射器1202、处理器1203和存储器1204可通过总线或其它方式连接。
存储器1204可以包括只读存储器和随机存取存储器,并向处理器1203提供指令和数据。存储器1204的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1204存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1203控制终端设备的操作。具体的应用中,终端设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1203中,或者由处理器1203实现。处理器1203可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1203中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1203可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1203可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1204,处理器1203读取存储器1204中的信息,结合其硬件完成上述方法中关于终端设备的步骤。
接收器1201可用于接收输入的数字或字符信息,以及产生与终端设备的相关设置以及功能控制有关的信号输入。发射器1202可用于通过第一接口输出数字或字符信息;发射器1202还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1202还可以包括显示屏等显示设备。
本申请实施例中,在一种情况下,处理器1203,用于执行上述实施例中的终端设备执行的步骤。
本申请实施例还提供了一种服务器,请参阅图13,图13是本申请实施例提供的服务器一种结构示意图,具体的,服务器1300由一个或多个服务器实现,服务器1300可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1313(例如,一个或一个以上处理器)和存储器1332,一个或一个以上存储应用程序1342或数据1344的存储介质1330(例如一个或一个以上海量存储设备)。 其中,存储器1332和存储介质1330可以是短暂存储或持久存储。存储在存储介质1330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1313可以设置为与存储介质1330通信,在服务器1300上执行存储介质1330中的一系列指令操作。
服务器1300还可以包括一个或一个以上电源1326,一个或一个以上有线或无线网络接口1350,一个或一个以上输入输出接口1351;或,一个或一个以上操作系统1341,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,中央处理器1313,用于执行上述实施例任务求解方法相关的步骤。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述终端设备所执行的步骤,或者,使得计算机执行如前述服务器所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述终端设备所执行的步骤,或者,使得计算机执行如前述服务器所执行的步骤。
本申请实施例提供的终端设备、服务器或终端设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使终端设备内的芯片执行上述实施例描述的数据处理方法,或者,以使服务器内的芯片执行上述实施例描述的数据处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体的,请参阅图14,图14为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 1400,NPU 1400作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1403,通过控制器1404控制运算电路1403提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路1403内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1403是二维脉动阵列。运算电路1403还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1403是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1402中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1401中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1408中。
统一存储器1406用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1405,DMAC被搬运到权重存储器1402中。输入数据也通过DMAC被搬运到统一存储器1406中。
BIU为Bus Interface Unit即,总线接口单元1410,用于AXI总线与DMAC和取指存 储器(Instruction Fetch Buffer,IFB)14014的交互。
总线接口单元1410(Bus Interface Unit,简称BIU),用于取指存储器14014从外部存储器获取指令,还用于存储单元访问控制器1405从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1406或将权重数据搬运到权重存储器1402中或将输入数据数据搬运到输入存储器1401中。
向量计算单元1407包括多个运算处理单元,在需要的情况下,对运算电路1403的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元1407能将经处理的输出的向量存储到统一存储器1406。例如,向量计算单元1407可以将线性函数;或,非线性函数应用到运算电路1403的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1407生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1403的激活输入,例如用于在神经网络中的后续层中的使用。
控制器1404连接的取指存储器(instruction fetch buffer)14014,用于存储控制器1404使用的指令;
统一存储器1406,输入存储器1401,权重存储器1402以及取指存储器14014均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施 例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (17)

  1. 一种任务求解方法,其特征在于,所述方法包括:
    获取搜索树,所述搜索树为通过分支定界,对混合整数规划任务进行求解得到的,所述搜索树包括第一子节点和第二子节点在内的多个子节点,每个子节点对应一个子任务,所述第一子节点和所述第二子节点为所述多个子节点中的待求解子节点;
    根据所述搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点以及所述第二子节点中的至少一个;其中,所述搜索树的信息包括第一信息和第二信息中的至少一种;所述第一信息与所述搜索树上子节点的数量有关,所述第二信息与搜索树上的子节点之间的连接关系有关。
  2. 根据权利要求1所述的方法,其特征在于,所述多个并行模式包括如下模式中的至少一种:
    在第一设备上求解所述第一子节点所在的第一子树包括的多个子节点、且在第二设备上求解所述第二子节点所在的第二子树包括的多个子节点;所述第一子树为通过分支定界,对所述第一子节点进行求解得到的,所述第二子树为通过分支定界,对所述第二子节点进行求解得到的;或者,
    在第一设备上求解所述第一子节点且不求解除所述第一子节点之外的其他子节点、以及在第二设备上求解所述第二子节点且不求解除所述第二子节点之外的其他子节点;或者,通过第一规划算法,在第一设备上求解所述第一子节点、以及通过第二规划算法,在第二设备上并行求解所述第一子节点。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一信息,包括:
    所述搜索树上多个深度中每个深度的子节点数量、或者所述搜索树上待求解的子节点的数量,其中,所述混合整数规划任务对应于所述搜索树上的根节点,所述深度表示子节点到所述根节点之间的距离。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点或所述第二子节点,包括:
    根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点或所述第二子节点,以得到更新后的搜索树;所述更新后的搜索树包括第三子节点和第四子节点,所述第三子节点和所述第四子节点为求解所述第一子节点或所述第二子节点后新增的待求解的子节点;
    所述方法还包括:
    根据所述更新后的搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第三子节点和所述第四子节点中的至少一个;其中,所述多个并行模式包括如下模式中的至少一种:
    在第一设备上通过分支定界,求解所述第三子节点所在的第三子树包括的多个子节点、 且在第二设备上求解所述第四子节点所在的第四子树包括的多个子节点;所述第三子树为通过分支定界,对所述第三子节点进行求解得到的,所述第四子树为通过分支定界,对所述第四子节点进行求解得到的;或者,
    在第一设备上求解所述第三子节点且不求解除所述第三子节点之外的其他子节点、以及在第二设备上仅求解所述第四子节点且不求解除所述第四子节点之外的其他子节点;或者,
    通过第一规划算法,在第一设备上求解所述第三子节点、以及通过第二规划算法,在第二设备上并行求解所述第三子节点。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述求解所述第一子节点所在的第一子树包括的多个子节点,包括:
    求解所述第一子节点所在的第一子树包括的预设数量的子节点;或者,
    在预设时间内,求解所述第一子节点所在的第一子树包括的多个子节点。
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述求解所述第二子节点所在的第二子树包括的多个子节点,包括:
    求解所述第二子节点所在的第二子树包括的预设数量的子节点;或者,
    在预设时间内,求解所述第二子节点所在的第二子树包括的多个子节点。
  7. 根据权利要求1至6任一所述的方法,其特征在于,所述根据所述搜索树的信息,从多个并行模式中确定目标并行模式,包括:
    根据所述搜索树的信息,通过神经网络模型,从多个并行模式中确定目标并行模式。
  8. 一种任务求解装置,其特征在于,所述装置包括:
    获取模块,用于获取搜索树,所述搜索树为通过分支定界,对混合整数规划任务进行求解得到的,所述搜索树包括第一子节点和第二子节点在内的多个子节点,每个子节点对应一个子任务,所述第一子节点和所述第二子节点为所述多个子节点中的待求解子节点;
    并行调度模块,用于根据所述搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点以及所述第二子节点中的至少一个;其中,所述搜索树的信息包括第一信息和第二信息中的至少一种;所述第一信息与所述搜索树上子节点的数量有关,所述第二信息与搜索树上的子节点之间的连接关系有关。
  9. 根据权利要求8所述的装置,其特征在于,所述多个并行模式包括如下模式中的至少一种:
    在第一设备上求解所述第一子节点所在的第一子树包括的多个子节点、且在第二设备上求解所述第二子节点所在的第二子树包括的多个子节点;所述第一子树为通过分支定界, 对所述第一子节点进行求解得到的,所述第二子树为通过分支定界,对所述第二子节点进行求解得到的;或者,
    在第一设备上求解所述第一子节点且不求解除所述第一子节点之外的其他子节点、以及在第二设备上求解所述第二子节点且不求解除所述第二子节点之外的其他子节点;或者,
    通过第一规划算法,在第一设备上求解所述第一子节点、以及通过第二规划算法,在第二设备上并行求解所述第一子节点。
  10. 根据权利要求8或9所述的装置,其特征在于,所述第一信息,包括:
    所述搜索树上多个深度中每个深度的子节点数量、或者所述搜索树上待求解的子节点的数量,其中,所述混合整数规划任务对应于所述搜索树上的根节点,所述深度表示子节点到所述根节点之间的距离。
  11. 根据权利要求8至10任一所述的装置,其特征在于,所述并行调度模块,具体用于:
    根据所述目标并行模式在第一设备和第二设备上并行求解所述第一子节点或所述第二子节点,以得到更新后的搜索树;所述更新后的搜索树包括第三子节点和第四子节点,所述第三子节点和所述第四子节点为求解所述第一子节点或所述第二子节点后新增的待求解的子节点;
    所述方法还包括:
    根据所述更新后的搜索树的信息,从多个并行模式中确定目标并行模式,并根据所述目标并行模式在第一设备和第二设备上并行求解所述第三子节点和所述第四子节点中的至少一个;其中,所述多个并行模式包括如下模式中的至少一种:
    在第一设备上通过分支定界,求解所述第三子节点所在的第三子树包括的多个子节点、且在第二设备上求解所述第四子节点所在的第四子树包括的多个子节点;所述第三子树为通过分支定界,对所述第三子节点进行求解得到的,所述第四子树为通过分支定界,对所述第四子节点进行求解得到的;或者,
    在第一设备上求解所述第三子节点且不求解除所述第三子节点之外的其他子节点、以及在第二设备上仅求解所述第四子节点且不求解除所述第四子节点之外的其他子节点;或者,
    通过第一规划算法,在第一设备上求解所述第三子节点、以及通过第二规划算法,在第二设备上并行求解所述第三子节点。
  12. 根据权利要求8至11任一所述的装置,其特征在于,所述并行调度模块,具体用于:
    求解所述第一子节点所在的第一子树包括的预设数量的子节点;或者,
    在预设时间内,求解所述第一子节点所在的第一子树包括的多个子节点。
  13. 根据权利要求8至12任一所述的装置,其特征在于,所述并行调度模块,具体用于:
    求解所述第二子节点所在的第二子树包括的预设数量的子节点;或者,
    在预设时间内,求解所述第二子节点所在的第二子树包括的多个子节点。
  14. 根据权利要求8至13任一所述的装置,其特征在于,所述并行调度模块,具体用于:
    根据所述搜索树的信息,通过神经网络模型,从多个并行模式中确定目标并行模式。
  15. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有一个或多个指令,所述指令在由一个或多个计算机执行时使得所述一个或多个计算机执行权利要求1-7中任一项所述方法的操作。
  16. 一种计算机程序产品,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机设备上运行时,使得所述计算机设备执行如权利要求1至7任一所述的方法。
  17. 一种系统,包括至少一个处理器,至少一个存储器以及至少一个通信接口;所述处理器、所述存储器和所述通信接口通过通信总线连接并完成相互间的通信;
    所述至少一个存储器用于存储代码;
    所述至少一个处理器用于执行所述代码,以执行如权利要求1-7任一所述的任务求解方法,以得到求解结果;
    所述至少一个通信接口,用于与设备或通信网络通信,以将所述求解结果发送至所述设备或通信网络。
PCT/CN2023/088333 2022-04-24 2023-04-14 一种任务求解方法及其装置 WO2023207630A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210454870.0 2022-04-24
CN202210454870.0A CN116993063A (zh) 2022-04-24 2022-04-24 一种任务求解方法及其装置

Publications (1)

Publication Number Publication Date
WO2023207630A1 true WO2023207630A1 (zh) 2023-11-02

Family

ID=88517420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/088333 WO2023207630A1 (zh) 2022-04-24 2023-04-14 一种任务求解方法及其装置

Country Status (2)

Country Link
CN (1) CN116993063A (zh)
WO (1) WO2023207630A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119338A1 (en) * 2009-11-19 2011-05-19 International Business Machines Corporation Email composition and processing
CN107491863A (zh) * 2017-07-28 2017-12-19 东北大学 一种基于直线编码方式采用初始下界剪枝的分支定界方法
CN113255967A (zh) * 2021-04-28 2021-08-13 北京理工大学 信号时序逻辑约束下基于终点回溯的任务规划方法和装置
CN114169573A (zh) * 2021-11-10 2022-03-11 中铁第四勘察设计院集团有限公司 一种物品的装箱方法、装置、设备及可读存储介质
CN114237835A (zh) * 2021-09-30 2022-03-25 华为技术有限公司 一种任务求解方法及其装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119338A1 (en) * 2009-11-19 2011-05-19 International Business Machines Corporation Email composition and processing
CN107491863A (zh) * 2017-07-28 2017-12-19 东北大学 一种基于直线编码方式采用初始下界剪枝的分支定界方法
CN113255967A (zh) * 2021-04-28 2021-08-13 北京理工大学 信号时序逻辑约束下基于终点回溯的任务规划方法和装置
CN114237835A (zh) * 2021-09-30 2022-03-25 华为技术有限公司 一种任务求解方法及其装置
CN114169573A (zh) * 2021-11-10 2022-03-11 中铁第四勘察设计院集团有限公司 一种物品的装箱方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
CN116993063A (zh) 2023-11-03

Similar Documents

Publication Publication Date Title
Rashid et al. Design and analysis of proposed remote controlling distributed parallel computing system over the cloud
US9244735B2 (en) Managing resource allocation or configuration parameters of a model building component to build analytic models to increase the utility of data analysis applications
Agliamzanov et al. Hydrology@ Home: a distributed volunteer computing framework for hydrological research and applications
US20170223143A1 (en) Integration of Quantum Processing Devices with Distributed Computers
CN109933306A (zh) 混合计算框架生成、数据处理方法、装置及混合计算框架
WO2022068663A1 (zh) 内存分配方法、相关设备及计算机可读存储介质
WO2023051505A1 (zh) 一种任务求解方法及其装置
CN111738488A (zh) 一种任务调度方法及其装置
Yu et al. Workflow performance prediction based on graph structure aware deep attention neural network
Zhao et al. The resource allocation model for multi-process instances based on particle swarm optimization
CN115794341A (zh) 基于人工智能的任务调度方法、装置、设备及存储介质
CN112764893A (zh) 数据处理方法和数据处理系统
CN117009038B (zh) 一种基于云原生技术的图计算平台
CN116820714A (zh) 一种算力设备的调度方法、装置、设备和存储介质
WO2023207630A1 (zh) 一种任务求解方法及其装置
Li et al. Dynamic data replacement and adaptive scheduling policies in spark
Li et al. Performance modelling and cost effective execution for distributed graph processing on configurable VMs
Nagarajan et al. Malleable scheduling for flows of jobs and applications to MapReduce
Alirezazadeh et al. Improving makespan in dynamic task scheduling for cloud robotic systems with time window constraints
CN103942235A (zh) 针对大规模数据集交叉比较的分布式计算系统和方法
Bensaleh et al. Optimal task scheduling for distributed cluster with active storage devices and accelerated nodes
Bengre et al. A learning-based scheduler for high volume processing in data warehouse using graph neural networks
Ghannane et al. Diviml: A module-based heuristic for mapping neural networks onto heterogeneous platforms
Du et al. OctopusKing: A TCT-aware task scheduling on spark platform
Alirezazadeh et al. Ordered balancing: load balancing for redundant task scheduling in robotic network cloud systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795063

Country of ref document: EP

Kind code of ref document: A1