WO2021036936A1 - Method and apparatus for allocating resources and tasks in distributed system, and system - Google Patents

Method and apparatus for allocating resources and tasks in distributed system, and system Download PDF

Info

Publication number
WO2021036936A1
WO2021036936A1 PCT/CN2020/110544 CN2020110544W WO2021036936A1 WO 2021036936 A1 WO2021036936 A1 WO 2021036936A1 CN 2020110544 W CN2020110544 W CN 2020110544W WO 2021036936 A1 WO2021036936 A1 WO 2021036936A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
resource
tasks
candidate
working node
Prior art date
Application number
PCT/CN2020/110544
Other languages
French (fr)
Chinese (zh)
Inventor
刘一鸣
裴兆友
肖羽
Original Assignee
第四范式(北京)技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 第四范式(北京)技术有限公司 filed Critical 第四范式(北京)技术有限公司
Publication of WO2021036936A1 publication Critical patent/WO2021036936A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the field of distributed technology, and more specifically, to a method for allocating resources and tasks in a distributed system, a device for allocating resources and tasks in a distributed system, and a device for allocating resources and tasks in a distributed system And distributed systems.
  • An object of the present disclosure is to provide a new technical solution for the allocation of resources and tasks in a distributed system.
  • a method for allocating resources and tasks in a distributed system which includes: receiving a job for execution in the distributed system; The resource upper limit of each working node in the distributed system, predicting the resource demand for each task executed by the working node; assigning each task to a suitable working node according to the predicted resource demand; and, executing on the working node In the process of assigning tasks, dynamic adjustments are made to resource usage.
  • the task type includes a parameter server task and/or a training learning task in machine learning; and, the resource-related information includes the scale of processing data and processing content of the corresponding task type. At least one of.
  • the resource demand includes each resource type required by the task and the corresponding resource demand value; wherein, the resource demand value includes the peak demand value and the general demand value. At least one item.
  • the step of predicting the resource requirements of each task performed by the worker node includes: predicting the resource requirements of each task performed by the worker node according to rules and/or a machine learning model.
  • the allocated resource requirements and the method further includes: collecting actual resource usage of the working node when the task is executed, so as to obtain the rule and/or the machine learning model.
  • the allocating each task to a suitable worker node according to the predicted resource demand includes: obtaining the current resource usage, current task running status, and total resources of each worker node Maximum limit; using the preset allocation algorithm, according to the predicted resource demand, combined with the current resource usage of each working node, the current task running situation and the maximum limit of total resources, filtering from multiple working nodes in the distributed system A working node suitable for executing each task is selected, and each task is assigned to the selected working node.
  • the step of dynamically adjusting resource usage during the execution of the assigned task by the working node includes: monitoring the resource usage of the task; When the use exceeds the predicted resource demand value, it is determined whether the total current use of the certain resource exceeds the maximum limit of the total resource of the certain resource; when the total current use of the certain resource exceeds the certain resource When the total resources of this resource are at the maximum limit, dynamic adjustment is made according to the compressibility of the certain resource.
  • the step of dynamically adjusting according to the compressibility of the certain resource includes: searching for a resource demand value exceeding a predicted resource demand value for the certain resource in the working node
  • the task is selected as a candidate task, and the candidate task is selected according to the processing priority and/or start time; according to the compressibility of the certain resource, the selected candidate task is dynamically adjusted.
  • the dynamic adjustment for the selected candidate task according to the compressibility of the certain resource includes: when the certain resource is a compressible resource Next, limit the resource usage of the candidate task for the certain resource.
  • the dynamic adjustment for the selected candidate task according to the compressibility of the certain resource includes: when the certain resource is an incompressible resource Next, determine whether the candidate task supports expansion; in the case that the candidate task supports expansion, determine whether there are other working nodes that can perform the candidate task; in the case where the other working nodes exist, Extracting the uncompleted part of the task among the candidate tasks; sending the extracted part of the task to the other working nodes.
  • the method further includes: when the candidate task does not support expansion, determining whether the candidate task supports freezing; In this case, the memory data of the candidate task is written into the disk of the working node.
  • the method further includes: in the absence of the other working nodes, determining whether the candidate task supports freezing; in the case that the candidate task supports freezing Next, write the memory data of the candidate task into the disk of the working node.
  • the method further includes: determining whether the candidate task supports migration; In the case that the candidate task supports migration, it is determined whether there are other working nodes that can execute the candidate task; and the memory data is sent to the other working nodes.
  • the method further includes: in the case that the candidate task does not support migration, in response to a set trigger event, obtaining the current resource usage of the candidate task ; Based on the current resource usage of the candidate task, continue to execute the candidate task by the working node.
  • the trigger event includes at least one of any one of the assigned tasks in the working node has been completed, and the released resource in the working node.
  • the method further includes: directly killing the candidate task when the candidate task does not support freezing.
  • the method further includes: collecting resource usage of the candidate task sent by the working node; based on the Resource usage, expanding the resource demand of the candidate task, so as to allocate the candidate task to a suitable work node again according to the expanded resource demand.
  • an apparatus for allocating resources and tasks in a distributed system which includes:
  • the job receiving unit is configured to receive jobs for execution in the distributed system
  • the resource demand prediction unit is configured to predict the resource demand to be allocated for each task executed by the working node based on the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system;
  • the task allocation unit is configured to allocate each task to a suitable work node according to the predicted resource demand
  • the resource scheduling unit is configured to dynamically adjust the resource usage in the process of executing the assigned task by the working node.
  • a device for allocating resources and tasks in a distributed system which includes:
  • the memory is configured to store executable instructions
  • the processor is configured to run the resource and task allocation device in the distributed system according to the control of the executable instruction to execute the method for allocating resources and tasks in the distributed system as described in the first aspect of the present disclosure .
  • a computer-readable storage medium on which a computer program is stored, and the computer program, when executed by a processor, implements the distributed system as described in the first aspect of the present disclosure How to allocate resources and tasks.
  • a distributed system which includes:
  • the device for allocating resources and tasks in a distributed system as described in the second aspect of the present disclosure or the device for allocating resources and tasks in a distributed system as described in the third aspect of the present disclosure is not limited to the second aspect of the present disclosure.
  • the method, device, equipment and system of the embodiments of the present disclosure on the one hand, it is not for business personnel to artificially determine the resources required by each task in the distributed system, but based on the resource-related information and distributed information of each task type.
  • the upper limit of the resource of each working node in the system using the system to predict the resource demand that each task performed by the working node needs to be allocated, which can effectively improve the efficiency and accuracy of resource calculation; on the other hand, it can be based on the predicted Resource requirements, assign each task to a suitable working node, and dynamically adjust the resource usage during the execution of the assigned task by the working node, so as to achieve efficient task allocation and resource scheduling, and improve task execution efficiency And resource utilization.
  • FIG. 1 is a block diagram showing an example of a hardware configuration of a distributed system that can implement an embodiment of the present disclosure.
  • FIG. 2 is a schematic flowchart of a method for allocating resources and tasks in a distributed system according to an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of a method for allocating resources and tasks in a distributed system according to another embodiment of the present disclosure
  • FIG. 4 is a schematic flowchart of a method for allocating resources and tasks in a distributed system according to a third embodiment of the present disclosure
  • Fig. 5 is a functional block diagram of an apparatus for allocating resources and tasks in a distributed system according to an embodiment of the present disclosure
  • Fig. 6 is a functional block diagram of a device for allocating resources and tasks in a distributed system according to an embodiment of the present disclosure
  • Fig. 7 is a block diagram of a distributed system according to an embodiment of the present disclosure.
  • Fig. 8 is a schematic flowchart of a method for allocating resources and tasks according to an example of the present disclosure.
  • Figure 1 shows a block diagram of the hardware configuration that can implement the distributed system of this embodiment.
  • the distributed system of this embodiment includes multiple servers 1000.
  • Fig. 1 shows four servers 1000, namely server 1000A, server 1000B, server 1000C, and server 1000D.
  • the number of servers 1000 in the distributed system can be determined according to actual scenarios, and there is no limitation here.
  • these servers 1000 form a distributed system, and each server 1000 can be used as a resource and task allocation device in the distributed system.
  • it can be any server 1000 with an execution (Executor) node in the distributed system submitting a job job for execution in the distributed system, or it can be a client connected to the distributed system to Any server 1000 in the distributed system submits a job job to be executed in the distributed system, and the resource prediction (ResourceGuess) node in the server 1000 is based on the resource-related information of each task type in the job and the distributed system The resource upper limit of each working node, predicting the resource demand to be allocated for each task executed by the working node, and the scheduling (Scheduler) node in the server 1000 according to the predicted resource demand, assigning each task to a suitable The working node, and in turn, the server 1000 having a worker node in the distributed system dynamically adjusts the resource usage in the process of executing the assigned task.
  • Executor Executor
  • Any server 1000 in the distributed system submits a job job job to be executed in the distributed system
  • the resource prediction (ResourceGuess) node in the server 1000 is based
  • the server 1000 provides service points for processing, database, and communication facilities.
  • the server 1000 may be an integral server or a distributed server that spans multiple computers or computer data centers.
  • the server can be of various types, such as, but not limited to, a web server, a news server, a mail server, a message server, an advertisement server, a file server, an application server, an interactive server, a database server, or a proxy server.
  • each server may include hardware, software, or an embedded logic component or a combination of two or more such components configured to perform suitable functions supported or implemented by the server.
  • the server may be a blade server, a cloud server, etc., or may be a server group composed of multiple servers, and may include one or more of the foregoing types of servers, and so on.
  • the server 1000 may be as shown in FIG. 1 and includes a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, and an input device 1600.
  • the server 1000 may also include a speaker, a microphone, etc., which are not limited herein.
  • the processor 1100 may be a dedicated server processor, or may be a desktop processor or a mobile processor that meets performance requirements, and is not limited herein.
  • the memory 1200 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, and the like.
  • the interface device 1300 includes, for example, various bus interfaces, such as a serial bus interface (including a USB interface), a parallel bus interface, and the like.
  • the communication device 1400 can perform wired or wireless communication.
  • the display device 1500 is, for example, a liquid crystal display, an LED display touch screen, and the like.
  • the input device 1600 may include, for example, a touch screen, a keyboard, and the like.
  • the present disclosure may only relate to some of the devices.
  • the server 1000 only relates to the storage 1200, the communication device 1400, and the processor 1100.
  • the network 2000 may be a wireless communication network or a wired communication network, and may be a local area network or a wide area network. In the distributed system shown in FIG. 1, multiple servers 1000 can communicate through a network 2000. In addition, the network 2000 on which the communication between the multiple servers 1000 is based may be the same or different.
  • the distributed system shown in FIG. 1 is only for explanatory purposes, and is by no means intended to limit the present disclosure, its application or use. In actual applications, it may also include other numbers of distributed systems, for example, it may also include two distributions. Type system, 3 distributed systems, 5 distributed systems, or even more, there are no restrictions here.
  • the memory 1200 of the server 1000 is configured to store instructions, and the instructions are configured to control the processor 1100 to operate to execute any one of the resources and tasks in the distributed system provided in the embodiments of the present disclosure. Method of distribution. Technicians can design instructions according to the solutions disclosed in this disclosure. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.
  • Fig. 2 is a schematic flowchart of a method for allocating resources and tasks in a distributed system according to an embodiment.
  • the method for allocating resources and tasks in a distributed system of this embodiment can be implemented by a device for allocating resources and tasks in a distributed system, or it can be implemented by resources and tasks in a distributed system.
  • Implementation of the allocation device, the device for allocating resources and tasks in a distributed system or the device for allocating resources and tasks in a distributed system may be specifically distributed on the device that provides resources.
  • the resource scheduling method of this embodiment may include the following steps S2100 ⁇ S2400:
  • Step S2100 Receive a job for execution in the distributed system.
  • a job job is the basic unit for submitting tasks.
  • a job job includes multiple tasks, and the multiple tasks are related to each other.
  • Task task is the smallest unit of task operation. Under normal circumstances, a process can be called a task task.
  • step S2100 After receiving the job for execution in the distributed system through step S2100, it can be combined with subsequent steps to predict each job executed by the working node based on the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system.
  • Tasks need to be allocated resource requirements, and according to the predicted resource requirements, each task is allocated to a suitable working node, so that the working node provides resources to the corresponding task according to the predicted resource demand, thereby improving resource utilization.
  • step S2200 according to the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system, predict the resource demand to be allocated for each task executed by the working node.
  • Task types include parameter server tasks and/or training learning tasks in machine learning, where the parameter server tasks are tasks for parameter processing (for example, parameter updates), and the training learning tasks are for model training (for example, , Sample calculation) task.
  • parameter server tasks are tasks for parameter processing (for example, parameter updates)
  • training learning tasks are for model training (for example, , Sample calculation) task.
  • the resource-related information includes at least one of the processing data scale and processing content of the corresponding task type.
  • the required resources may often be multiple.
  • the resource requirements include at least each type of resource required by the task and the corresponding resource demand value.
  • the resource requirements may also include other information about the resources required by the task, which is not limited here.
  • the resource type may include, for example, CPU, memory usage, disk usage, disk input/output I/O (Input/Output), network I/O, graphics processing unit (GPU), and field programmable gate array (Field Programmable Gate Array). -Programmable Gate Array, FPGA).
  • the resource demand value includes at least one of a peak demand value and a general demand value.
  • the predicted resource demand value may be greater than the actual use value, resulting in a waste of resources; on the other hand, the predicted resource demand value may also be less than the actual use value, resulting in insufficient resources.
  • the resource usage can be dynamically adjusted to improve the resource utilization. This step S2200 will not be described in detail here.
  • each task type in the job may divide a certain task in the submitted work into multiple tasks, and further predict Each task performed by the work node needs to be allocated the resource type and the resource demand value of the corresponding type, and then the task is allocated to the appropriate work node, so that the work node provides the corresponding resource to perform the task.
  • predicting the resource requirements to be allocated for each task executed by the working node in step S2200 may further include:
  • the resource requirements that need to be allocated for each task executed by the worker nodes are predicted.
  • the machine learning model can be a neural network model, such as but not limited to a BP (Back Propagation) neural network model, a convolutional neural network model, etc.
  • the machine learning model can also be a logistic regression model. This is not a machine learning model. Specific limitations are made, as long as any machine learning model that can predict the resource requirements that need to be allocated for each task executed by the working node belongs to the content protected by the embodiments of the present disclosure.
  • the data scale and task type actually involved in the task may be used as features, and each resource actually corresponding to the task and its resource usage value, etc., are used as markers to form a training sample and input to the machine that performs resource demand prediction.
  • the data scale may include, for example, at least one of the number of rows of data and the number of columns of data, and the machine learning model can be trained to predict the task corresponding to the task to be predicted based on the data scale and task type. Each resource and its resource demand value.
  • step S2200 according to the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system, the resource demand for each task to be executed by the working node is predicted, and according to the predicted resource demand, each Each task is allocated to a suitable working node, so that the working node provides resources to the corresponding task according to the predicted resource demand, thereby improving resource utilization.
  • step S2300 After predicting the resource requirements to be allocated for each task executed by the working node according to the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system, step S2300 is entered.
  • step S2300 each task is allocated to a suitable working node according to the predicted resource demand.
  • the tasks may include multiple tasks included in the work when the work is submitted, or multiple tasks obtained by dividing a certain task according to step S2200 after the work is submitted.
  • the predicted resource demand may be sent to the working node to control the working node to provide resources to the task according to the predicted resource demand, or it may be the working node after receiving the predicted resource information, according to the prediction.
  • Resource requirements provide resources to tasks.
  • step S2300 assigning each task to a suitable working node according to the predicted resource demand may further include the following steps S2310 to S2320:
  • Step S2310 Obtain the current resource usage status, current task running status, and total resource maximum limit of each working node.
  • Step S2320 using the preset allocation algorithm, according to the predicted resource demand, combined with the current resource usage of each working node, the current task running situation and the maximum limit of total resources, to filter from multiple working nodes in the distributed system Work nodes suitable for performing each task are selected, and each task is assigned to the selected work nodes.
  • any allocation algorithm may be used for task allocation, so there is no limitation here.
  • step S2320 only one task can be allocated to one working node, or multiple tasks can be allocated to one working node.
  • the multiple tasks can be executed at the same time, or it can be based on the predicted resource demand value.
  • the execution order of multiple tasks in one working node is not limited to the execution order of multiple tasks in one working node.
  • step S2300 the resource demand for each task to be allocated to be executed by the working node is predicted, and each task is allocated to a suitable working node according to the predicted resource demand, so that the working node will forward the corresponding resource demand to the corresponding working node according to the predicted resource demand.
  • Tasks provide resources, thereby improving resource utilization.
  • step S2400 is entered.
  • Step S2400 during the execution of the assigned task by the working node, dynamic adjustment is made according to the resource usage.
  • the resource demand value predicted according to step S2200 may be greater than the actual use value, resulting in waste of resources; or the predicted resource demand value may also be less than the actual use value, resulting in insufficient resources.
  • this step S2400 in the process of executing the assigned tasks by the working node, dynamic adjustments can be made to the resource usage to improve the resource usage.
  • the step S2400 in the process of executing the assigned task by the working node, dynamically adjusting the resource usage may further include the following steps S2410 to S2440:
  • Step S2410 monitor the resource usage of the task, and determine whether a certain resource usage of the task exceeds the predicted resource demand value, if yes, execute step S2430, if not, execute step S2420.
  • the resource usage of the task can be monitored in real time after the working node starts to execute the assigned task.
  • resources including CPU and GPU can be used to monitor the CPU and GPU usage of tasks in the worker node in real time during the execution of the assigned tasks of the worker node, and determine whether the task's use of the CPU or GPU exceeds the forecast If the task’s use of the CPU or GPU does not exceed the task’s predicted CPU or GPU resource demand value, no processing will be performed according to step S2420, and the task will continue to be executed by the worker node. If the usage of the CPU or GPU exceeds the predicted resource demand value of the CPU or GPU, it is further determined according to step S2430 whether the total current usage of the CPU or GPU exceeds the maximum limit of the total resource of the CPU or GPU.
  • step S2420 no processing is performed, and the task continues to be executed by the working node.
  • step S2410 if the task's use of the CPU or GPU does not exceed the task’s predicted resource demand value of the CPU or GPU, no processing is performed according to this step S2420, and the task continues to be executed by the worker node, and only the scheduling node is notified The actual CPU or GPU resource usage.
  • step S2430 in the case that a certain resource usage of the task exceeds the predicted resource demand value, it is judged whether the current total usage of a certain resource exceeds the total resource maximum limit of a certain resource.
  • step S2410 if the task's use of the CPU exceeds the predicted resource demand value of the CPU, then according to this step S2430, it is further determined whether the total current usage of the CPU exceeds the maximum limit of the total resources of the CPU, if it exceeds the total resources of the CPU Maximum limit, execute step S2440 to dynamically adjust according to the compressibility of the CPU. If the maximum limit of the total resources of the CPU is not exceeded, no processing will be performed according to step S2450, and the task will continue to be executed by the working node, and only the scheduling node will be notified of the actual CPU resource usage.
  • step S2440 when the total current usage of a certain resource does not exceed the maximum total resource limit of a certain resource, no processing is performed, and the task continues to be executed by the working node.
  • step S2440 if the maximum limit of the total resources of the CPU is not exceeded, no processing is performed according to this step S2440, the task is continued to be executed by the working node, and only the scheduling node is notified of the actual CPU resource usage.
  • Step S2450 in the case that the total current usage of a certain resource exceeds the maximum limit of the total resource of a certain resource, dynamic adjustment is made according to the compressibility of the certain resource.
  • resources can be divided into compressible resources and incompressible resources.
  • compressible resources include CPU, disk I/O, and network I/O; and incompressible resources include memory usage, disk usage, and GPU.
  • FPGA field-programmable gate array
  • step S2450 is executed to dynamically adjust according to the compressibility of the CPU, thereby realizing efficient resource scheduling and improving resource utilization.
  • the method of the embodiment of the present disclosure on the one hand, it is not for the user to artificially determine the resources required by each task in the distributed system, but according to the resource-related information of each task type and the resource upper limit of the working node in the distributed system. , Using the system to predict the resource requirements that need to be allocated for each task executed by the working node, which can effectively improve the efficiency and accuracy of resource calculation; on the other hand, it can allocate each task according to the predicted resource demand Give suitable working nodes and dynamically adjust resource usage during the execution of assigned tasks by the working nodes, thereby realizing efficient task allocation and resource scheduling, and improving task execution efficiency and resource utilization.
  • the dynamic adjustment according to the compressibility of a certain resource in the above step S2450 may further include the following steps:
  • Step S2451 Search for tasks in the working node whose resources exceed the predicted resource demand value as candidate tasks, and select the candidate tasks according to the processing priority and/or start time.
  • step S2451 for example, when the total current usage of a certain resource exceeds the maximum limit of the total resource of a certain resource, first search for tasks in the working node that exceed the predicted resource demand value for a certain resource as candidates. Tasks, and then select the candidate task with a low processing priority as the selected candidate task according to the ascending order of the processing priority of the candidate task. It can also be the case where there are multiple candidate tasks with the lowest priority. Successively, the candidate task with the longest starting time is selected as the selected candidate task according to the ascending order of the starting time of the multiple candidate tasks with the lowest priority.
  • the searched candidate task may be, for example, It is task 1, task 2, and task 3.
  • the searched candidate task may be, for example, It is task 1, task 2, and task 3.
  • it can be obtained after sorting task 1, task 2, and task 3 in order of processing priority: task 3, task 2, task 1, and select the processing priority Task 3 with the lowest level is selected as the candidate task; in addition, for example, if the processing priority of task 3 and task 2 are the same, then task 3 and task 2 are successively sorted from smallest to largest start time To get: task 2, task 3, and select task 3 with the longest startup time as the selected candidate task.
  • Step S2452 according to the compressibility of a certain resource, dynamically adjust the selected candidate task.
  • the selected candidate task can be dynamically adjusted according to the compressibility of a certain resource to improve resource utilization.
  • the step S2452 dynamically adjusting for the selected candidate task according to the compressibility of a certain resource may further include:
  • a certain resource is a compressible resource
  • the resource usage of the candidate task for a certain resource is restricted.
  • step S2451 since the CPU is a compressible resource, here, the resource usage of the selected candidate task, that is, task 3, for the CPU may be restricted.
  • the step S2452 dynamically adjusts the selected candidate task according to the compressibility of a certain resource may further include:
  • step S2452-1 when a certain resource is an incompressible resource, it is judged whether the candidate task supports capacity expansion, if so, step S2452-2 is executed, otherwise, step S2452-5 is executed.
  • step S2452-2 it is judged whether there are other working nodes that can execute task 3, otherwise, according to the following step S2452-5, it is judged whether task 3 supports freezing.
  • step S2452-2 it is judged whether there are other working nodes that can perform the candidate task. If so, step S2452-3 is performed, otherwise, step S2452-5 is performed.
  • step S2452-1 in the case that task 3 supports capacity expansion, further determine whether there are other working nodes that can perform task 3, and if there are other working nodes that can perform task 3, perform the following step S2452-3 extraction For some uncompleted tasks in task 3, on the contrary, step S2452-5 is executed to determine whether the candidate tasks support freezing.
  • step S2452-3 when there are other working nodes, extract the uncompleted part of the tasks among the candidate tasks.
  • step S2452-2 if there are other working nodes that can execute task 3, extract the uncompleted part of tasks in task 3, and continue to execute step S2452-4.
  • Step S2452-4 sending some of the extracted tasks to other working nodes.
  • step S2452-3 after extracting the unfinished part of task 3, the extracted part of the task can be sent to other working nodes according to this step S2452-4, so that other working nodes can continue to execute the part task.
  • step S2452-5 it is judged whether the candidate task supports freezing, if so, step S2452-6 is executed, otherwise, step S2452-12 is executed.
  • step S2452-1 or step S2452-2 in the case that task 3 does not support capacity expansion, or there is no other working node that can perform alternative tasks, it is further judged whether task 3 supports freezing, if If task 3 supports freezing, execute step S2452-6 to freeze task 3, otherwise, execute step S2452-12.
  • step S2452-6 when the candidate task supports freezing, the memory data of the candidate task is written into the disk of the working node.
  • step S2452-5 in the case that task 3 supports freezing, task 3 is frozen, that is, the memory data of task 3 is written to the disk of the working node, which can be in writing the memory data of task 3 to the working node. After the node is in the disk, continue to perform step S2452-7 to determine whether task 3 supports migration.
  • step S2452-7 it is judged whether the candidate task supports migration, if so, step S2452-8 is executed, otherwise, step S2452-10 is executed.
  • step S2452-10 waits for recovery.
  • step S2452-8 if the candidate task supports migration, it is determined whether there are other working nodes that can execute the candidate task. If so, step S2452-9 is executed, otherwise, S2452-5 is executed.
  • step S2452-7 in the case that task 3 supports migration, it is further determined whether there are other working nodes that can execute task 3, and if there are other working nodes that can execute task 3, step S2452-9 is executed to save the memory The data is sent to other working nodes, otherwise, step S2452-5 is executed to determine whether task 3 supports freezing.
  • step S2452-9 the memory data is sent to other working nodes.
  • step S2452-9 if there are other working nodes that can perform task 3, execute this step S2452-9 to send the memory data to other working nodes.
  • Step S2452-10 in response to the set trigger event, obtain the current resource usage of the candidate task.
  • the set trigger event includes at least one of any assigned task in the working node has been completed, and at least one of the released resources in the working node.
  • step S2452-10 in the case that task 3 does not support migration, perform the waiting recovery of step S2452-10, for example, it can be that any task assigned in the working node has been completed, and there is a node in the working node. When at least one of the GPUs is released, the current GPU usage of task 3 is acquired.
  • Step S2452-11 based on the current resource usage of the candidate task, continue to execute the candidate task by the working node.
  • step S2452-10 based on the current GPU usage of task 3, task 3 will continue to be executed by the worker node.
  • Step S2452-12 directly kill the candidate task.
  • step S2452-12 if task 3 does not support freezing, execute this step S2452-12 to directly kill task 3, and continue to execute step S2452-13.
  • Step S2452-13 Collect the resource usage status of the candidate tasks sent by the working node.
  • the GPU usage of task 3 sent by the worker node can be collected, and new resource requirements can be derived automatically.
  • step S2452-14 based on the resource usage, the resource requirements of the candidate tasks are expanded, so as to allocate the candidate tasks to suitable working nodes again according to the expanded resource requirements.
  • the resource demand of task 3 can be expanded according to this step S2452-14, so that task 3 can be allocated again according to the expanded resource demand Give a suitable working node, and then perform task 3 again by the suitable working node.
  • the total current usage of a certain resource exceeds the maximum limit of the total resource of a certain resource, it can dynamically adjust according to the compressibility of a certain resource, thereby improving task processing efficiency and resources. Utilization rate.
  • an apparatus 5000 for allocating resources and tasks in a distributed system includes a job receiving unit 5100, a resource demand prediction unit 5200, a task allocation unit 5300, and a resource scheduling unit. 5400.
  • the job receiving unit 5100 is configured to receive jobs for execution in a distributed system.
  • the resource demand prediction unit 5200 is configured to predict the resource demand to be allocated for each task executed by the working node based on the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system.
  • the task allocation unit 5300 is configured to allocate each task to a suitable working node according to the predicted resource demand.
  • the resource scheduling unit 5400 is configured to dynamically adjust the resource usage in the process of executing the assigned task by the working node.
  • the task types include parameter server tasks and/or training learning tasks in machine learning.
  • the resource-related information includes at least one of the processing data scale and processing content of the corresponding task type.
  • the resource requirement includes each resource type and corresponding resource requirement value required by the task
  • the resource demand value includes at least one of a peak demand value and a general demand value.
  • the resource demand prediction unit 5200 is further configured to predict the resource demand to be allocated for each task performed by the worker node according to rules and/or machine learning models; and,
  • the task allocation unit 5300 is further configured to obtain the current resource usage status, current task running status, and total resource maximum limit of each working node;
  • the multiple working nodes of the distributed system are selected to be suitable for execution.
  • the working node of each task is assigned, and each task is assigned to the selected working node.
  • the resource scheduling unit 5400 is also configured to monitor the resource usage of the task
  • the resource scheduling unit 5400 is further configured to search for tasks in the working node that exceed the predicted resource demand value for the certain resource as candidate tasks, and start the task according to the processing priority and/or Time to choose alternative tasks;
  • the selected candidate task is dynamically adjusted.
  • the resource scheduling unit 5400 is further configured to limit the resource usage of the candidate task for the certain resource when the certain resource is a compressible resource.
  • the resource scheduling unit 5400 is further configured to determine whether the candidate task supports expansion when the certain resource is an incompressible resource
  • the candidate task supports capacity expansion, determining whether there are other working nodes that can execute the candidate task;
  • the resource scheduling unit 5400 is further configured to determine whether the candidate task supports freezing when the candidate task does not support capacity expansion;
  • the memory data of the candidate task is written into the disk of the working node.
  • the resource scheduling unit 5400 is further configured to determine whether the candidate task supports freezing when the other working node does not exist;
  • the memory data of the candidate task is written into the disk of the working node.
  • the resource scheduling unit 5400 is further configured to determine whether the candidate task supports migration
  • the candidate task supports migration, determining whether there are other working nodes that can execute the candidate task;
  • the resource scheduling unit 5400 is further configured to obtain the current resource usage status of the candidate task in response to a set trigger event when the candidate task does not support migration;
  • the trigger event includes at least one of any one of the assigned tasks in the working node has been completed, and the resource that has been released in the working node.
  • the resource scheduling unit 5400 is further configured to directly kill the candidate task when the candidate task does not support freezing.
  • the resource scheduling unit 5400 is further configured to collect resource usage of the candidate task sent by the working node;
  • the resource requirements of the candidate tasks are expanded, so as to allocate the candidate tasks to suitable working nodes again according to the expanded resource requirements.
  • a device 6000 for allocating resources and tasks in a distributed system including:
  • the memory 6100 is configured to store executable instructions
  • the processor 6200 is configured to execute the resource and task allocation device in the distributed system according to the control of the executable instruction to execute the resource and task allocation method in the distributed system as provided in this embodiment.
  • the resource and task allocation device 6000 in a distributed system may be a server.
  • the resource and task allocation device 6000 in a distributed system may be the server 1000 as shown in FIG. 1.
  • the resource and task allocation equipment 6000 in a distributed system may also include other devices, for example, the server 1000 shown in FIG. 1 may also include an input device, a communication device, an interface device, and a display device.
  • a computer-readable storage medium is also provided, on which a computer program is stored.
  • the computer program is executed by a processor, the method for allocating tasks and resources in a distributed system as in any embodiment of the present disclosure is implemented. .
  • a distributed system 7000 is also provided, as shown in FIG. 7, including:
  • a plurality of devices 7100 configured to provide resources may be, for example, a device 7100A configured to provide resources and a device 7100B configured to provide resources.
  • the plurality of devices 7100 configured to provide resources may be in a server cluster. Devices can also be devices that belong to different server clusters.
  • the number of devices 7100 configured to provide resources can be determined according to actual scenarios, and there is no limitation here.
  • the distributed system 7000 further includes a device 5000 for allocating resources and tasks in the distributed system or a device 6000 for allocating resources and tasks in the distributed system.
  • the device 5000 for allocating resources and tasks in a distributed system or the device 6000 for allocating resources and tasks in a distributed system may be distributed on a device 7100 that provides resources.
  • the distributed system 7000 is not only applicable to machine learning scenarios, but also applicable to other non-machine learning scenarios that do not impose strict resource restrictions on tasks.
  • the machine learning scenario is: due to the complexity of the system or some unknown reasons for the data, it is difficult to make accurate judgments on the specific resource usage of the task and give correct results. For example, feature processing tasks, offline machine learning training tasks, and online machine learning prediction tasks.
  • the non-machine learning scenario is: some online services generally have some peak periods, such as a take-out system, there will be a large peak during lunch time, the system will require a lot of resources, but in the middle of the night, the usage will be very small , Using the system 7000, some resources will be reclaimed for use by other tasks during low peak periods.
  • the distributed system 7000 includes a plurality of devices 7100 configured to provide resources and a device 5000 for allocating resources and tasks in the distributed system.
  • the device 7100 may have execution nodes, resource prediction nodes, scheduling nodes, and The server of the working node, the device 5000 for allocating resources and tasks in the distributed system can be distributed on multiple devices 7100, for example, the job receiving in the device 5000 for allocating resources and tasks in the distributed system can be realized through the execution node
  • the resource and task allocation Distribution methods can include:
  • Step S8010 the execution node submits the job executed in the distributed system to the resource prediction node.
  • the job job can include task task1, task2, and task3.
  • step S8020 the resource prediction node predicts the resource requirements to be allocated for each task executed by the working node based on the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system, and then calculates the predicted resource The demand returns to the execution node.
  • step S8030 the execution node submits the tasks task1, task2, and task3 included in the job job, and the predicted resource requirements for each task executed by the worker node to be allocated to the scheduling node.
  • step S8040 the scheduling node assigns task task1 to the work node 1 according to the predicted resource demand; and assigns the task task2 and the task task3 to the work node 2.
  • a preset allocation algorithm can be used, according to the predicted resource demand, combined with the current resource usage of each working node, the current task running situation, and the maximum limit of total resources, from multiple distributed systems.
  • the work nodes that are suitable for executing tasks task1, task2, and task3 are selected from the work nodes, and task task1 is assigned to the selected work node 1, and tasks task2 and task3 are assigned to the selected work node 2.
  • step S8050 the worker node 1 receives task task1, starts task task1, and monitors the resource usage of task1 in real time; and, after worker node 2 receives task task2 and task task3, starts task2 and task3, and monitors the task in real time Resource usage of task2 and task3.
  • step S8060 during the execution of task task1 by the worker node 1, the current usage of the resource and the execution status of task task1 are sent to the execution node; and, during the execution of task task1 and task2 by the worker node 2, the resource will be sent to the execution node.
  • the current usage status and the execution status of task task2 and task3 are sent to the execution node.
  • step S8070 after task task1 is executed, the end status and resource usage of working node 1 are fed back to working node 1; and, after task task2 and task3 are executed, the end status and resource usage of working node 2 are fed back to working node 2. .
  • step S8080 the working node 1 reports the information fed back by the task task1 to the scheduling node; and the working node 2 reports the information fed back by the tasks task2 and task3 to the scheduling node.
  • Step S8090 the scheduling node summarizes the information and resource usage of all tasks in the job and submits it to the resource prediction node, so that the resource prediction node continuously collects the real resource usage to update the rules and/or machines used to perform resource prediction Learning model.
  • the present disclosure may be a system, method and/or computer program product.
  • the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions configured to cause a processor to implement various aspects of the present disclosure.
  • the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon
  • the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
  • the computer program instructions configured to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages
  • Source code or object code written in any combination of, the programming language includes object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as "C" language or similar programming languages.
  • Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out.
  • the remote computer can be connected to the user's computer through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connection).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions.
  • FPGA field programmable gate array
  • PDA programmable logic array
  • the computer-readable program instructions are executed to realize various aspects of the present disclosure.
  • These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction includes one or more that are configured to implement prescribed logical functions.
  • Executable instructions may also occur in a different order than the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that realization by hardware, realization by software, and realization by a combination of software and hardware are all equivalent.
  • the present disclosure during the process of executing the assigned tasks by the working nodes, it dynamically adjusts the resource usage, thereby realizing efficient task allocation and resource scheduling, and improving task execution efficiency and resource utilization. Therefore, the present disclosure has strong industrial applicability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)
  • Multi Processors (AREA)
  • Computer And Data Communications (AREA)

Abstract

A method and apparatus for allocating resources and tasks in a distributed system, and a system. The method comprises: receiving a job to be executed in a distributed system (S2100); according to the resource-related information of each task type in the job and the resource upper limit of each worker node in the distributed system, predicting a resource requirement needing to be allocated to each task executed by the worker node (S2200); allocating each task to an appropriate worker node according to the predicted resource requirement (S2300); and in the process of executing the allocated task by the worker node, performing dynamic adjustment according to a resource use situation (S2400).

Description

在分布式系统中资源及任务的分配方法、装置及系统Method, device and system for allocating resources and tasks in a distributed system
本公开要求于2019年08月23日提交中国专利局,申请号为201910783327.3,申请名称为“在分布式系统中资源及任务的分配方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure requires the priority of a Chinese patent application filed with the Chinese Patent Office on August 23, 2019, the application number is 201910783327.3, and the application title is "Resources and Task Allocation Methods, Devices and Systems in Distributed Systems", all of which The content is incorporated into this disclosure by reference.
技术领域Technical field
本公开涉及分布式技术领域,更具体地,涉及一种在分布式系统中资源及任务的分配方法、在分布式系统中资源及任务的分配装置、在分布式系统中资源及任务的分配设备以及分布式系统。The present disclosure relates to the field of distributed technology, and more specifically, to a method for allocating resources and tasks in a distributed system, a device for allocating resources and tasks in a distributed system, and a device for allocating resources and tasks in a distributed system And distributed systems.
背景技术Background technique
现有的调度系统,例如Hadoop Yarn中,任务是运行在容器内部的,因此,需要先申请容器,申请时需要人为判断并指定容器资源的大小,这个资源大小是固定的,不可改变的。但是,任务运行时的资源使用量是可变的,而不是某一固定数值,因此,为了任务可以安全运行,往往会申请较大的资源,这样会造成一定程度的资源浪费。In the existing scheduling system, such as Hadoop Yarn, tasks are run inside the container. Therefore, you need to apply for the container first. When applying, you need to manually judge and specify the size of the container resource. The resource size is fixed and cannot be changed. However, the amount of resource usage when a task is running is variable, not a fixed value. Therefore, in order for the task to run safely, larger resources are often applied, which will cause a certain degree of waste of resources.
为了避免资源浪费,需要设置合理的资源限制,但这往往是比较困难的。尤其是在一些机器学习任务中,涉及到的数据量会比较大,任务内部流程也会比较复杂,通常情况下都需要多机来运行,所以要划分几个任务来运行,每个任务的资源限制是多少,这是非常困难的。虽然谷歌的Borg会回收已分配出来、但未被使用到的资源,用这些资源来运行一些对资源质量要求低的任务,以实现资源的最大利用,但是这些资源一旦被使用,所有超过资源限制的任务均会被直接杀掉。In order to avoid waste of resources, reasonable resource limits need to be set, but this is often more difficult. Especially in some machine learning tasks, the amount of data involved will be relatively large, and the internal process of the task will be more complicated. Usually, multiple machines are required to run, so it is necessary to divide several tasks to run, and the resources of each task How much is the limit, it is very difficult. Although Google's Borg will reclaim the allocated but unused resources, and use these resources to run some tasks with low resource quality requirements to achieve maximum utilization of resources, once these resources are used, all resources exceed the resource limit. All tasks will be killed directly.
发明内容Summary of the invention
本公开的一个目的是提供一种在分布式系统中资源及任务的分配的新的技术方案。An object of the present disclosure is to provide a new technical solution for the allocation of resources and tasks in a distributed system.
根据本公开的第一方面,提供了一种在分布式系统中资源及任务的分配方法,其包括:接收用于在分布式系统中执行的作业;根据作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求;根据预测的资源需求,将每个任务分配给适合的工作节点;以及,在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调整。According to a first aspect of the present disclosure, there is provided a method for allocating resources and tasks in a distributed system, which includes: receiving a job for execution in the distributed system; The resource upper limit of each working node in the distributed system, predicting the resource demand for each task executed by the working node; assigning each task to a suitable working node according to the predicted resource demand; and, executing on the working node In the process of assigning tasks, dynamic adjustments are made to resource usage.
在第一方面一种可能的实现方式中,所述任务类型包括机器学习中的参数服务器任务和/或训练学习任务;以及,所述资源相关信息包括相应任务类型的处理数据规模和处理内容之中的至少一项。In a possible implementation of the first aspect, the task type includes a parameter server task and/or a training learning task in machine learning; and, the resource-related information includes the scale of processing data and processing content of the corresponding task type. At least one of.
在第一方面一种可能的实现方式中,所述资源需求包括任务所需的每种资源类型和相应的资源需求值;其中,所述资源需求值包括峰值需求值和一般需求值之中的至少一项。In a possible implementation of the first aspect, the resource demand includes each resource type required by the task and the corresponding resource demand value; wherein, the resource demand value includes the peak demand value and the general demand value. At least one item.
在第一方面一种可能的实现方式中,所述预测由工作节点执行的每个任务的资源需求的步骤,包括:根据规则和/或机器学习模型来预测由工作节点执行的每个任务需被分配的资源需求,并且,所述方法还包括:收集工作节点执行任务时的实际资源使用情况,以获取所述规则和/或机器学习模型。In a possible implementation of the first aspect, the step of predicting the resource requirements of each task performed by the worker node includes: predicting the resource requirements of each task performed by the worker node according to rules and/or a machine learning model. The allocated resource requirements, and the method further includes: collecting actual resource usage of the working node when the task is executed, so as to obtain the rule and/or the machine learning model.
在第一方面一种可能的实现方式中,所述根据预测的资源需求,将每个任务分配给适合的工作节点,包括:获取各个工作节点的当前资源使用情况、当前任务运行情况以及总 资源最大限制;利用预设的分配算法,根据预测的资源需求,并结合获取的各个工作节点的当前资源使用情况、当前任务运行情况以及总资源最大限制,从分布式系统的多个工作节点中筛选出适合执行所述每个任务的工作节点,并将所述每个任务分配给筛选出的工作节点。In a possible implementation of the first aspect, the allocating each task to a suitable worker node according to the predicted resource demand includes: obtaining the current resource usage, current task running status, and total resources of each worker node Maximum limit; using the preset allocation algorithm, according to the predicted resource demand, combined with the current resource usage of each working node, the current task running situation and the maximum limit of total resources, filtering from multiple working nodes in the distributed system A working node suitable for executing each task is selected, and each task is assigned to the selected working node.
在第一方面一种可能的实现方式中,所述在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调整的步骤,包括:监控任务的资源使用情况;在任务的某种资源使用超过预测的资源需求值的情况下,判断所述某种资源的当前使用总量是否超过所述某种资源的总资源最大限制;在所述某种资源的当前使用总量超过所述某种资源的总资源最大限制的情况下,根据所述某种资源的压缩性来进行动态调整。In a possible implementation of the first aspect, the step of dynamically adjusting resource usage during the execution of the assigned task by the working node includes: monitoring the resource usage of the task; When the use exceeds the predicted resource demand value, it is determined whether the total current use of the certain resource exceeds the maximum limit of the total resource of the certain resource; when the total current use of the certain resource exceeds the certain resource When the total resources of this resource are at the maximum limit, dynamic adjustment is made according to the compressibility of the certain resource.
在第一方面一种可能的实现方式中,所述根据所述某种资源的压缩性来进行动态调整的步骤,包括:查找所述工作节点中针对所述某种资源超过预测的资源需求值的任务作为备选任务,并按照处理优先级和/或启动时间来选择备选任务;根据所述某种资源的压缩性,针对所选择的备选任务来进行动态调整。In a possible implementation of the first aspect, the step of dynamically adjusting according to the compressibility of the certain resource includes: searching for a resource demand value exceeding a predicted resource demand value for the certain resource in the working node The task is selected as a candidate task, and the candidate task is selected according to the processing priority and/or start time; according to the compressibility of the certain resource, the selected candidate task is dynamically adjusted.
在第一方面一种可能的实现方式中,所述根据所述某种资源的压缩性,针对所选择的备选任务来进行动态调整,包括:在所述某种资源为可压缩资源的情况下,限制所述备选任务对于所述某种资源的资源使用量。In a possible implementation manner of the first aspect, the dynamic adjustment for the selected candidate task according to the compressibility of the certain resource includes: when the certain resource is a compressible resource Next, limit the resource usage of the candidate task for the certain resource.
在第一方面一种可能的实现方式中,所述根据所述某种资源的压缩性,针对所选择的备选任务来进行动态调整,包括:在所述某种资源为不可压缩资源的情况下,判断所述备选任务是否支持扩容;在所述备选任务支持扩容的情况下,判断是否存在能够执行所述备选任务的其他工作节点;在存在所述其他工作节点的情况下,提取所述备选任务中未完成的部分任务;将提取出的所述部分任务发送到所述其他工作节点。In a possible implementation of the first aspect, the dynamic adjustment for the selected candidate task according to the compressibility of the certain resource includes: when the certain resource is an incompressible resource Next, determine whether the candidate task supports expansion; in the case that the candidate task supports expansion, determine whether there are other working nodes that can perform the candidate task; in the case where the other working nodes exist, Extracting the uncompleted part of the task among the candidate tasks; sending the extracted part of the task to the other working nodes.
在第一方面一种可能的实现方式中,所述方法还包括:在所述备选任务不支持扩容的情况下,判断所述备选任务是否支持冻结;在所述备选任务支持冻结的情况下,将所述备选任务的内存数据写入所述工作节点的磁盘中。In a possible implementation of the first aspect, the method further includes: when the candidate task does not support expansion, determining whether the candidate task supports freezing; In this case, the memory data of the candidate task is written into the disk of the working node.
在第一方面一种可能的实现方式中,所述方法还包括:在不存在所述其他工作节点的情况下,判断所述备选任务是否支持冻结;在所述备选任务支持冻结的情况下,将所述备选任务的内存数据写入所述工作节点的磁盘中。In a possible implementation of the first aspect, the method further includes: in the absence of the other working nodes, determining whether the candidate task supports freezing; in the case that the candidate task supports freezing Next, write the memory data of the candidate task into the disk of the working node.
在第一方面一种可能的实现方式中,所述方法在将所述备选任务的内存数据写入所述工作节点的磁盘中之后,还包括:判断所述备选任务是否支持迁移;在所述备选任务支持迁移的情况下,判断是否存在能够执行所述备选任务的其他工作节点;将所述内存数据发送给所述其他工作节点。In a possible implementation of the first aspect, after writing the memory data of the candidate task to the disk of the working node, the method further includes: determining whether the candidate task supports migration; In the case that the candidate task supports migration, it is determined whether there are other working nodes that can execute the candidate task; and the memory data is sent to the other working nodes.
在第一方面一种可能的实现方式中,所述方法还包括:在所述备选任务不支持迁移的情况下,响应于设定的触发事件,获取所述备选任务的当前资源使用情况;基于所述备选任务的当前资源使用情况,继续由所述工作节点执行所述备选任务。In a possible implementation of the first aspect, the method further includes: in the case that the candidate task does not support migration, in response to a set trigger event, obtaining the current resource usage of the candidate task ; Based on the current resource usage of the candidate task, continue to execute the candidate task by the working node.
在第一方面一种可能的实现方式中,所述触发事件包括所述工作节点中已完成所分配的任意一个任务、所述工作节点中存在被释放的所述资源之中的至少一个。In a possible implementation manner of the first aspect, the trigger event includes at least one of any one of the assigned tasks in the working node has been completed, and the released resource in the working node.
在第一方面一种可能的实现方式中,所述方法还包括:在所述备选任务不支持冻结的情况下,直接杀掉所述备选任务。In a possible implementation of the first aspect, the method further includes: directly killing the candidate task when the candidate task does not support freezing.
在第一方面一种可能的实现方式中,所述方法在直接杀掉所述备选任务之后,还包括:收集由所述工作节点发送的所述备选任务的资源使用情况;基于所述资源使用情况,扩充所述备选任务的资源需求,以根据扩充的资源需求,再次将所述备选任务分配给适合的工作节点。In a possible implementation of the first aspect, after directly killing the candidate task, the method further includes: collecting resource usage of the candidate task sent by the working node; based on the Resource usage, expanding the resource demand of the candidate task, so as to allocate the candidate task to a suitable work node again according to the expanded resource demand.
根据本公开的第二方面,还提供一种在分布式系统中资源及任务的分配装置,其包括:According to the second aspect of the present disclosure, there is also provided an apparatus for allocating resources and tasks in a distributed system, which includes:
作业接收单元,被配置为接收用于在分布式系统中执行的作业;The job receiving unit is configured to receive jobs for execution in the distributed system;
资源需求预测单元,被配置为根据作业中各任务类型的资源相关信息和分布式系统中 各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求;The resource demand prediction unit is configured to predict the resource demand to be allocated for each task executed by the working node based on the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system;
任务分配单元,被配置为根据预测的资源需求,将每个任务分配给适合的工作节点;The task allocation unit is configured to allocate each task to a suitable work node according to the predicted resource demand;
资源调度单元,被配置为在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调整。The resource scheduling unit is configured to dynamically adjust the resource usage in the process of executing the assigned task by the working node.
根据本公开的第三方面,还提供一种在分布式系统中资源及任务的分配设备,其包括:According to the third aspect of the present disclosure, there is also provided a device for allocating resources and tasks in a distributed system, which includes:
存储器,被配置为存储可执行指令;The memory is configured to store executable instructions;
处理器,被配置为根据所述可执行指令的控制,运行所述在分布式系统中资源及任务的分配设备执行如本公开第一方面所述的在分布式系统中资源及任务的分配方法。The processor is configured to run the resource and task allocation device in the distributed system according to the control of the executable instruction to execute the method for allocating resources and tasks in the distributed system as described in the first aspect of the present disclosure .
根据本公开的第三方面,还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序在被处理器执行时实现如本公开第一方面所述的在分布式系统中资源及任务的分配方法。According to the third aspect of the present disclosure, there is also provided a computer-readable storage medium on which a computer program is stored, and the computer program, when executed by a processor, implements the distributed system as described in the first aspect of the present disclosure How to allocate resources and tasks.
根据本公开的第四方面,还提供一种分布式系统,其包括:According to the fourth aspect of the present disclosure, there is also provided a distributed system, which includes:
多个被配置为提供资源的设备;Multiple devices configured to provide resources;
如本公开第二方面所述的在分布式系统中资源及任务的分配装置或如本公开第三方面所述的在分布式系统中资源及任务的分配设备。The device for allocating resources and tasks in a distributed system as described in the second aspect of the present disclosure or the device for allocating resources and tasks in a distributed system as described in the third aspect of the present disclosure.
根据本公开实施例的方法、装置、设备及系统,一方面,其并不是由业务人员人为的去判断分布式系统中各任务所需资源,而是根据各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,利用系统去预测由工作节点执行的每个任务需被分配的资源需求,这能够有效提高资源计算的效率和准确性;另一方面,其能够根据预测出的资源需求,将每个任务分配给适合的工作节点,并在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调节,从而实现有效率的任务分配和资源调度,提高任务的执行效率以及资源利用率。According to the method, device, equipment and system of the embodiments of the present disclosure, on the one hand, it is not for business personnel to artificially determine the resources required by each task in the distributed system, but based on the resource-related information and distributed information of each task type. The upper limit of the resource of each working node in the system, using the system to predict the resource demand that each task performed by the working node needs to be allocated, which can effectively improve the efficiency and accuracy of resource calculation; on the other hand, it can be based on the predicted Resource requirements, assign each task to a suitable working node, and dynamically adjust the resource usage during the execution of the assigned task by the working node, so as to achieve efficient task allocation and resource scheduling, and improve task execution efficiency And resource utilization.
附图说明Description of the drawings
被结合在说明书中并构成说明书的一部分的附图示出了本公开的实施例,并且连同其说明一起用于解释本公开的原理。The drawings incorporated in the specification and constituting a part of the specification illustrate the embodiments of the present disclosure, and together with the description thereof, serve to explain the principle of the present disclosure.
图1是显示可实现本公开实施例的分布式系统的硬件配置的例子的框图。FIG. 1 is a block diagram showing an example of a hardware configuration of a distributed system that can implement an embodiment of the present disclosure.
图2是根据本公开实施例的在分布式系统中资源及任务的分配方法的示意性流程图;2 is a schematic flowchart of a method for allocating resources and tasks in a distributed system according to an embodiment of the present disclosure;
图3是根据本公开另一实施例的在分布式系统中资源及任务的分配方法的示意性流程图;3 is a schematic flowchart of a method for allocating resources and tasks in a distributed system according to another embodiment of the present disclosure;
图4是根据本公开第三实施例的在分布式系统中资源及任务的分配方法的示意性流程图;4 is a schematic flowchart of a method for allocating resources and tasks in a distributed system according to a third embodiment of the present disclosure;
图5是根据本公开实施例的在分布式系统中资源及任务的分配装置的原理框图;Fig. 5 is a functional block diagram of an apparatus for allocating resources and tasks in a distributed system according to an embodiment of the present disclosure;
图6是根据本公开实施例的在分布式系统中资源及任务的分配设备的原理框图;Fig. 6 is a functional block diagram of a device for allocating resources and tasks in a distributed system according to an embodiment of the present disclosure;
图7是根据本公开实施例的分布式系统的框图;Fig. 7 is a block diagram of a distributed system according to an embodiment of the present disclosure;
图8是根据本公开一个例子的资源及任务的分配方法的示意性流程图。Fig. 8 is a schematic flowchart of a method for allocating resources and tasks according to an example of the present disclosure.
具体实施方式detailed description
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any limitation to the present disclosure and its application or use.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。The technologies, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the specification.
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific value should be interpreted as merely exemplary, rather than as a limitation. Therefore, other examples of the exemplary embodiment may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that similar reference numerals and letters indicate similar items in the following drawings, and therefore, once an item is defined in one drawing, it does not need to be further discussed in subsequent drawings.
图1示出了可以实现本实施例分布式系统的硬件配置框图。Figure 1 shows a block diagram of the hardware configuration that can implement the distributed system of this embodiment.
如图1所示,本实施例的分布式系统中包括多个服务器1000,图1中示出了4个服务器1000,分别为服务器1000A、服务器1000B、服务器1000C和服务器1000D。As shown in Fig. 1, the distributed system of this embodiment includes multiple servers 1000. Fig. 1 shows four servers 1000, namely server 1000A, server 1000B, server 1000C, and server 1000D.
在本实施例中,分布式系统中具有的服务器1000的数量可以根据实际场景确定,在此不做任何限定。In this embodiment, the number of servers 1000 in the distributed system can be determined according to actual scenarios, and there is no limitation here.
在本实施例中,这些服务器1000组成分布式系统,每一个服务器1000均可以作为在分布式系统中资源及任务的分配设备。In this embodiment, these servers 1000 form a distributed system, and each server 1000 can be used as a resource and task allocation device in the distributed system.
在本实施例中,可以是分布式系统中的任意一个具有执行(Executor)节点的服务器1000提交用于在分布式系统中执行的作业job,也可以是由与分布式系统连接的客户端向分布式系统中的任意一个服务器1000提交用于在分布式系统中执行的作业job,并由该服务器1000中的资源预测(ResourceGuess)节点根据作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求,以及,由该服务器1000中的调度(Scheduler)节点根据预测的资源需求,将每个任务分配给适合的工作节点,进而由分布式系统中具有工作(worker)节点的服务器1000在执行分配的任务的过程中,针对资源使用情况进行动态调整。In this embodiment, it can be any server 1000 with an execution (Executor) node in the distributed system submitting a job job for execution in the distributed system, or it can be a client connected to the distributed system to Any server 1000 in the distributed system submits a job job to be executed in the distributed system, and the resource prediction (ResourceGuess) node in the server 1000 is based on the resource-related information of each task type in the job and the distributed system The resource upper limit of each working node, predicting the resource demand to be allocated for each task executed by the working node, and the scheduling (Scheduler) node in the server 1000 according to the predicted resource demand, assigning each task to a suitable The working node, and in turn, the server 1000 having a worker node in the distributed system dynamically adjusts the resource usage in the process of executing the assigned task.
服务器1000提供处理、数据库、通讯设施的业务点。服务器1000可以是整体式服务器或是跨多计算机或计算机数据中心的分散式服务器。服务器可以是各种类型的,例如但不限于,网络服务器,新闻服务器,邮件服务器,消息服务器,广告服务器,文件服务器,应用服务器,交互服务器,数据库服务器,或代理服务器。在一些实施例中,每个服务器可以包括硬件,软件,或被配置为执行服务器所支持或实现的合适功能的内嵌逻辑组件或两个或多个此类组件的组合。例如,服务器例如刀片服务器、云端服务器等,或者可以是由多台服务器组成的服务器群组,可以包括上述类型的服务器中的一种或多种等等。The server 1000 provides service points for processing, database, and communication facilities. The server 1000 may be an integral server or a distributed server that spans multiple computers or computer data centers. The server can be of various types, such as, but not limited to, a web server, a news server, a mail server, a message server, an advertisement server, a file server, an application server, an interactive server, a database server, or a proxy server. In some embodiments, each server may include hardware, software, or an embedded logic component or a combination of two or more such components configured to perform suitable functions supported or implemented by the server. For example, the server may be a blade server, a cloud server, etc., or may be a server group composed of multiple servers, and may include one or more of the foregoing types of servers, and so on.
在一个实施例中,服务器1000可以如图1所示,包括处理器1100、存储器1200、接口装置1300、通信装置1400、显示装置1500、输入装置1600。In an embodiment, the server 1000 may be as shown in FIG. 1 and includes a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, and an input device 1600.
在该实施例中,服务器1000还可以包括扬声器、麦克风等等,在此不做限定。In this embodiment, the server 1000 may also include a speaker, a microphone, etc., which are not limited herein.
处理器1100可以是专用的服务器处理器,也可以是满足性能要求的台式机处理器、移动版处理器等,在此不做限定。存储器1200例如包括ROM(只读存储器)、RAM(随机存取存储器)、诸如硬盘的非易失性存储器等。接口装置1300例如包括各种总线接口,例如串行总线接口(包括USB接口)、并行总线接口等。通信装置1400能够进行有线或无线通信。显示装置1500例如是液晶显示屏、LED显示屏触摸显示屏等。输入装置1600例如可以包括触摸屏、键盘等。The processor 1100 may be a dedicated server processor, or may be a desktop processor or a mobile processor that meets performance requirements, and is not limited herein. The memory 1200 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, various bus interfaces, such as a serial bus interface (including a USB interface), a parallel bus interface, and the like. The communication device 1400 can perform wired or wireless communication. The display device 1500 is, for example, a liquid crystal display, an LED display touch screen, and the like. The input device 1600 may include, for example, a touch screen, a keyboard, and the like.
尽管在图1中示出了服务器1000的多个装置,但是,本公开可以仅涉及其中的部分装置,例如,服务器1000只涉及存储器1200、通信装置1400以及处理器1100。Although multiple devices of the server 1000 are shown in FIG. 1, the present disclosure may only relate to some of the devices. For example, the server 1000 only relates to the storage 1200, the communication device 1400, and the processor 1100.
网络2000可以是无线通信网络也可以是有线通信网络,可以是局域网也可以是广域网。在图1所示的分布式系统中,多个服务器1000之间可以通过网络2000进行通信。此外,多个服务器1000之间通信所基于的网络2000可以是同一个,也可以是不同的。The network 2000 may be a wireless communication network or a wired communication network, and may be a local area network or a wide area network. In the distributed system shown in FIG. 1, multiple servers 1000 can communicate through a network 2000. In addition, the network 2000 on which the communication between the multiple servers 1000 is based may be the same or different.
图1所示的分布式系统仅是解释性的,决不是为了要限制本公开、其应用或用途,在实际应用当中,还可以包含其他数量的分布式系统,例如,还可以包括2个分布式系统,3个分布式系统,5个分布式系统,甚至更多,在此不做任何限定。应用于本公开的实施例中,服务器1000的存储器1200被配置为存储指令,指令被配置为控制处理器1100进行操作以执行本公开实施例中提供的任意一项在分布式系统中资源及任务的分配方法。技术人 员可以根据本公开所公开方案设计指令。指令如何控制处理器进行操作,这是本领域公知,故在此不再详细描述。The distributed system shown in FIG. 1 is only for explanatory purposes, and is by no means intended to limit the present disclosure, its application or use. In actual applications, it may also include other numbers of distributed systems, for example, it may also include two distributions. Type system, 3 distributed systems, 5 distributed systems, or even more, there are no restrictions here. Applied to the embodiments of the present disclosure, the memory 1200 of the server 1000 is configured to store instructions, and the instructions are configured to control the processor 1100 to operate to execute any one of the resources and tasks in the distributed system provided in the embodiments of the present disclosure. Method of distribution. Technicians can design instructions according to the solutions disclosed in this disclosure. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.
图2是根据一个实施例的在分布式系统中资源及任务的分配方法的示意性流程图。Fig. 2 is a schematic flowchart of a method for allocating resources and tasks in a distributed system according to an embodiment.
参照图2所示,本实施例的在分布式系统中资源及任务的分配方法可以是由在分布式系统中资源及任务的分配装置实施,也可以是由在分布式系统中资源及任务的分配设备实施,该在分布式系统中资源及任务的分配装置或者在分布式系统中资源及任务的分配设备具体可以是分布在提供资源的设备上,本实施例的资源调度方法可以包括如下步骤S2100~S2400:Referring to FIG. 2, the method for allocating resources and tasks in a distributed system of this embodiment can be implemented by a device for allocating resources and tasks in a distributed system, or it can be implemented by resources and tasks in a distributed system. Implementation of the allocation device, the device for allocating resources and tasks in a distributed system or the device for allocating resources and tasks in a distributed system may be specifically distributed on the device that provides resources. The resource scheduling method of this embodiment may include the following steps S2100~S2400:
步骤S2100,接收用于在分布式系统中执行的作业。Step S2100: Receive a job for execution in the distributed system.
作业job是提交任务的基本单位,一个作业job中包括多个任务task,该多个任务task互相关联。A job job is the basic unit for submitting tasks. A job job includes multiple tasks, and the multiple tasks are related to each other.
任务task是任务运行的最小单位,通常情况下,可以是把一个进程称之为一个任务task。Task task is the smallest unit of task operation. Under normal circumstances, a process can be called a task task.
通过步骤S2100接收用于在分布式系统中执行的作业之后,可以结合后续步骤根据作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求,并根据预测的资源需求,将每个任务分配给适合的工作节点,使得工作节点根据预测的资源需求向对应的任务提供资源,从而,提高资源利用率。After receiving the job for execution in the distributed system through step S2100, it can be combined with subsequent steps to predict each job executed by the working node based on the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system. Tasks need to be allocated resource requirements, and according to the predicted resource requirements, each task is allocated to a suitable working node, so that the working node provides resources to the corresponding task according to the predicted resource demand, thereby improving resource utilization.
在接收用于在分布式系统中执行的作业之后,进入:After receiving the job for execution in the distributed system, enter:
步骤S2200,根据作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求。In step S2200, according to the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system, predict the resource demand to be allocated for each task executed by the working node.
任务类型包括机器学习中的参数服务器任务和/或训练学习任务,其中,该参数服务器任务是用于进行参数处理(例如,参数更新)的任务,该训练学习任务是用于进行模型训练(例如,样本计算)的任务。Task types include parameter server tasks and/or training learning tasks in machine learning, where the parameter server tasks are tasks for parameter processing (for example, parameter updates), and the training learning tasks are for model training (for example, , Sample calculation) task.
资源相关信息包括相应任务类型的处理数据规模和处理内容之中的至少一项。The resource-related information includes at least one of the processing data scale and processing content of the corresponding task type.
本实施例中,对于任意一个任务而言,所需的资源往往可能有多种,在此,为了区分不同的资源,资源需求至少包括任务所需的每种资源类型和相应的资源需求值。当然,资源需求中还可能包含任务对所需的资源的其他信息,在此不做限定。In this embodiment, for any task, the required resources may often be multiple. Here, in order to distinguish different resources, the resource requirements include at least each type of resource required by the task and the corresponding resource demand value. Of course, the resource requirements may also include other information about the resources required by the task, which is not limited here.
该资源类型例如可以包括CPU、内存用量、磁盘用量、磁盘输入/输出I/O(Input/Output)、网络I/O、图形处理器(Graphics Processing Unit,GPU)以及现场可编程门阵列(Field-Programmable Gate Array,FPGA)。The resource type may include, for example, CPU, memory usage, disk usage, disk input/output I/O (Input/Output), network I/O, graphics processing unit (GPU), and field programmable gate array (Field Programmable Gate Array). -Programmable Gate Array, FPGA).
该资源需求值包括峰值需求值和一般需求值之中的至少一项。本实施例中,一方面,预测出的资源需求值可能会大于实际使用值,从而造成资源浪费;另一方面,预测出的资源需求值也可能会小于实际使用值,从而造成资源不足,针对于这种情况,可以根据以下步骤S2400在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调增,以提高资源利用率,本步骤S2200在此不做详细赘述。The resource demand value includes at least one of a peak demand value and a general demand value. In this embodiment, on the one hand, the predicted resource demand value may be greater than the actual use value, resulting in a waste of resources; on the other hand, the predicted resource demand value may also be less than the actual use value, resulting in insufficient resources. In this case, according to the following step S2400, during the process of executing the assigned tasks on the working node, the resource usage can be dynamically adjusted to improve the resource utilization. This step S2200 will not be described in detail here.
在本实施例中,利用作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,其可能会将提交的工作中的某个任务划分为多个任务,并进一步预测由工作节点执行的每个任务需被分配的资源类型以及对应类型的资源需求值,进而将任务分配给适合的工作节点,以由工作节点提供相应的资源来执行任务。In this embodiment, using the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system, it may divide a certain task in the submitted work into multiple tasks, and further predict Each task performed by the work node needs to be allocated the resource type and the resource demand value of the corresponding type, and then the task is allocated to the appropriate work node, so that the work node provides the corresponding resource to perform the task.
在本实施例中,该步骤S2200中预测由工作节点执行的每个任务需被分配的资源需求可以进一步包括:In this embodiment, predicting the resource requirements to be allocated for each task executed by the working node in step S2200 may further include:
根据规则和/或机器学习模型来预测由工作节点执行的每个任务需被分配的资源需求。According to rules and/or machine learning models, the resource requirements that need to be allocated for each task executed by the worker nodes are predicted.
机器学习模型可以为神经网络模型,例如但不限于是BP(Back Propagation)神经网络模型、卷积神经网络模型等,当然,该机器学习模型也可以为逻辑回归模型,在此并不对机器学习模型进行具体限定,只要能够预测由工作节点执行的每个任务需被分配的资源需 求的任意机器学习模型均属于本公开实施例保护的内容。The machine learning model can be a neural network model, such as but not limited to a BP (Back Propagation) neural network model, a convolutional neural network model, etc. Of course, the machine learning model can also be a logistic regression model. This is not a machine learning model. Specific limitations are made, as long as any machine learning model that can predict the resource requirements that need to be allocated for each task executed by the working node belongs to the content protected by the embodiments of the present disclosure.
本实施例中,可以是将任务所实际涉及的数据规模和任务类型作为特征,并将任务实际对应的每种资源及其资源使用值等作为标记来组成训练样本输入至进行资源需求预测的机器学习模型当中,其中,该数据规模例如可以包括数据的行数和数据的列数之中的至少一个,该机器学习模型可被训练为基于待预测任务的数据规模和任务类型来预测该任务对应的每种资源及其资源需求值。In this embodiment, the data scale and task type actually involved in the task may be used as features, and each resource actually corresponding to the task and its resource usage value, etc., are used as markers to form a training sample and input to the machine that performs resource demand prediction. Among the learning models, the data scale may include, for example, at least one of the number of rows of data and the number of columns of data, and the machine learning model can be trained to predict the task corresponding to the task to be predicted based on the data scale and task type. Each resource and its resource demand value.
在本实施例中,还可以是在工作节点执行完所分配的任务之后,获取任务的实际资源使用情况作为新的训练样本,修正规则和/或机器学习模型,使得资源需求的预测越来越准确。In this embodiment, it is also possible to obtain the actual resource usage of the task as a new training sample after the worker node finishes executing the assigned task, and modify the rules and/or machine learning model to make the resource demand forecast more and more accurate.
通过步骤S2200根据作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求,并根据预测的资源需求,将每个任务分配给适合的工作节点,使得工作节点根据预测的资源需求向对应的任务提供资源,从而,提高资源利用率。Through step S2200, according to the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system, the resource demand for each task to be executed by the working node is predicted, and according to the predicted resource demand, each Each task is allocated to a suitable working node, so that the working node provides resources to the corresponding task according to the predicted resource demand, thereby improving resource utilization.
在根据作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求之后,进入步骤S2300。After predicting the resource requirements to be allocated for each task executed by the working node according to the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system, step S2300 is entered.
步骤S2300,根据预测的资源需求,将每个任务分配给适合的工作节点。In step S2300, each task is allocated to a suitable working node according to the predicted resource demand.
任务可以包括提交工作时,工作中包括的多个任务,也可以包括提交工作后,根据步骤S2200对某个任务进行划分后得到的多个任务。The tasks may include multiple tasks included in the work when the work is submitted, or multiple tasks obtained by dividing a certain task according to step S2200 after the work is submitted.
在本实施例中,可以是将预测的资源需求发送给工作节点,以控制工作节点根据预测的资源需求,向任务提供资源,也可以是工作节点接收到预测的资源信息之后,自行根据预测的资源需求,向任务提供资源。In this embodiment, the predicted resource demand may be sent to the working node to control the working node to provide resources to the task according to the predicted resource demand, or it may be the working node after receiving the predicted resource information, according to the prediction. Resource requirements, provide resources to tasks.
在本实施例中,该步骤S2300根据预测的资源需求,将每个任务分配给适合的工作节点可以进一步包括如下步骤S2310~S2320:In this embodiment, the step S2300 assigning each task to a suitable working node according to the predicted resource demand may further include the following steps S2310 to S2320:
步骤S2310,获取各个工作节点的当前资源使用情况、当前任务运行情况以及总资源最大限制。Step S2310: Obtain the current resource usage status, current task running status, and total resource maximum limit of each working node.
步骤S2320,利用预设的分配算法,根据预测的资源需求,并结合获取的各个工作节点的当前资源使用情况、当前任务运行情况以及总资源最大限制,从分布式系统的多个工作节点中筛选出适合执行每个任务的工作节点,并将每个任务分配给筛选出的工作节点。Step S2320, using the preset allocation algorithm, according to the predicted resource demand, combined with the current resource usage of each working node, the current task running situation and the maximum limit of total resources, to filter from multiple working nodes in the distributed system Work nodes suitable for performing each task are selected, and each task is assigned to the selected work nodes.
本公开实施例可以采用任意的分配算法进行任务分配,所以在此不做任何限定。In the embodiments of the present disclosure, any allocation algorithm may be used for task allocation, so there is no limitation here.
本步骤S2320中,可以是一个工作节点中仅分配一个任务,也可以是一个工作节点中分配多个任务,该多个任务可以是同时执行,也可以是按照预测的资源需求值的大小按照从小到大的顺序依次执行,本公开并不限于一个工作节点中多个任务的执行顺序。In this step S2320, only one task can be allocated to one working node, or multiple tasks can be allocated to one working node. The multiple tasks can be executed at the same time, or it can be based on the predicted resource demand value. The execution order of multiple tasks in one working node is not limited to the execution order of multiple tasks in one working node.
在根据步骤S2300预测由工作节点执行的每个任务需被分配的资源需求,并根据预测的资源需求,将每个任务分配给适合的工作节点,使得工作节点根据预测的在资源需求向对应的任务提供资源,从而,提高资源利用率。According to step S2300, the resource demand for each task to be allocated to be executed by the working node is predicted, and each task is allocated to a suitable working node according to the predicted resource demand, so that the working node will forward the corresponding resource demand to the corresponding working node according to the predicted resource demand. Tasks provide resources, thereby improving resource utilization.
根据预测的资源需求,将每个任务分配给适合的工作节点之后,进入步骤S2400。According to the predicted resource demand, after each task is allocated to a suitable working node, step S2400 is entered.
步骤S2400,在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调整。Step S2400, during the execution of the assigned task by the working node, dynamic adjustment is made according to the resource usage.
在本实施例中,由于根据步骤S2200预测出的资源需求值可能会大于实际使用值,从而造成资源浪费;或者,预测出的资源需求值也可能会小于实际使用值,从而造成资源不足,针对于这种情况,可以根据该步骤S2400在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调整,以提高资源利用率。In this embodiment, the resource demand value predicted according to step S2200 may be greater than the actual use value, resulting in waste of resources; or the predicted resource demand value may also be less than the actual use value, resulting in insufficient resources. In this case, according to this step S2400, in the process of executing the assigned tasks by the working node, dynamic adjustments can be made to the resource usage to improve the resource usage.
在本实施例中,参照图3所示,该步骤S2400在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调整可以进一步包括如下步骤S2410~S2440:In this embodiment, referring to FIG. 3, the step S2400 in the process of executing the assigned task by the working node, dynamically adjusting the resource usage may further include the following steps S2410 to S2440:
步骤S2410,监控任务的资源使用情况,并判断任务的某种资源使用是否超过预测的资源需求值,若是,则执行步骤S2430,若否,则执行步骤S2420。Step S2410: monitor the resource usage of the task, and determine whether a certain resource usage of the task exceeds the predicted resource demand value, if yes, execute step S2430, if not, execute step S2420.
根据该步骤S2410,可以是在工作节点开始执行所分配的任务之后,实时监控任务的资源使用情况。According to this step S2410, the resource usage of the task can be monitored in real time after the working node starts to execute the assigned task.
以资源包括CPU、GPU为例,可以是在工作节点执行所分配的任务的过程中,实时监控该工作节点中任务的CPU、GPU的使用情况,并判断任务对于CPU或GPU的使用是否超过预测的CPU或GPU的资源需求值,如果任务对于CPU或GPU的使用未超过任务的预测的CPU或GPU的资源需求值,则根据步骤S2420不进行任何处理,继续由该工作节点执行任务,如果任务对于CPU或GPU的使用超过预测的CPU或GPU的资源需求值,则会进一步根据步骤S2430判断CPU或GPU的当前使用总量是否超过CPU或GPU的总资源最大限制。Taking resources including CPU and GPU as an example, it can be used to monitor the CPU and GPU usage of tasks in the worker node in real time during the execution of the assigned tasks of the worker node, and determine whether the task's use of the CPU or GPU exceeds the forecast If the task’s use of the CPU or GPU does not exceed the task’s predicted CPU or GPU resource demand value, no processing will be performed according to step S2420, and the task will continue to be executed by the worker node. If the usage of the CPU or GPU exceeds the predicted resource demand value of the CPU or GPU, it is further determined according to step S2430 whether the total current usage of the CPU or GPU exceeds the maximum limit of the total resource of the CPU or GPU.
步骤S2420,不进行任何处理,继续由工作节点执行任务。In step S2420, no processing is performed, and the task continues to be executed by the working node.
继续步骤S2410的示例,如果任务对于CPU或GPU的使用未超过任务的预测的CPU或GPU的资源需求值,则根据本步骤S2420不进行任何处理,继续由该工作节点执行任务,仅通知调度节点实际的CPU或GPU的资源使用情况。Continuing with the example of step S2410, if the task's use of the CPU or GPU does not exceed the task’s predicted resource demand value of the CPU or GPU, no processing is performed according to this step S2420, and the task continues to be executed by the worker node, and only the scheduling node is notified The actual CPU or GPU resource usage.
步骤S2430,在任务的某种资源使用超过预测的资源需求值的情况下,判断某种资源的当前使用总量是否超过某种资源的总资源最大限制。In step S2430, in the case that a certain resource usage of the task exceeds the predicted resource demand value, it is judged whether the current total usage of a certain resource exceeds the total resource maximum limit of a certain resource.
继续上述步骤S2410的示例,如果任务对于CPU的使用超过预测的CPU的资源需求值,则根据本步骤S2430进一步判断CPU的当前使用总量是否超过CPU的总资源最大限制,如果超过CPU的总资源最大限制,则执行步骤S2440根据CPU的压缩性来进行动态调整,如果未超过CPU的总资源最大限制,则根据步骤S2450不进行任何处理,继续由该工作节点执行任务,仅通知调度节点实际的CPU的资源使用情况。Continuing the example of step S2410 above, if the task's use of the CPU exceeds the predicted resource demand value of the CPU, then according to this step S2430, it is further determined whether the total current usage of the CPU exceeds the maximum limit of the total resources of the CPU, if it exceeds the total resources of the CPU Maximum limit, execute step S2440 to dynamically adjust according to the compressibility of the CPU. If the maximum limit of the total resources of the CPU is not exceeded, no processing will be performed according to step S2450, and the task will continue to be executed by the working node, and only the scheduling node will be notified of the actual CPU resource usage.
步骤S2440,在某种资源的当前使用总量未超过某种资源的总资源最大限制的情况下,则不进行任何处理,继续由工作节点执行任务。In step S2440, when the total current usage of a certain resource does not exceed the maximum total resource limit of a certain resource, no processing is performed, and the task continues to be executed by the working node.
继续上述步骤S2430的示例,如果未超过CPU的总资源最大限制,则根据本步骤S2440不进行任何处理,继续由该工作节点执行任务,仅通知调度节点实际的CPU的资源使用情况。Continuing the example of step S2430 above, if the maximum limit of the total resources of the CPU is not exceeded, no processing is performed according to this step S2440, the task is continued to be executed by the working node, and only the scheduling node is notified of the actual CPU resource usage.
步骤S2450,在某种资源的当前使用总量超过某种资源的总资源最大限制的情况下,根据某种资源的压缩性来进行动态调整。Step S2450, in the case that the total current usage of a certain resource exceeds the maximum limit of the total resource of a certain resource, dynamic adjustment is made according to the compressibility of the certain resource.
按照资源的压缩性可以是将资源分为可压缩资源和不可压缩资源,其中,可压缩资源包括CPU、磁盘I/O以及网络I/O;以及,不可压缩资源包括内存用量、磁盘用量、GPU以及FPGA。According to the compressibility of resources, resources can be divided into compressible resources and incompressible resources. Among them, compressible resources include CPU, disk I/O, and network I/O; and incompressible resources include memory usage, disk usage, and GPU. And FPGA.
继续上述步骤S2430的示例,如果超过CPU的总资源最大限制,则执行本步骤S2450根据CPU的压缩性来进行动态调整,从而实现有效率的资源调度,提高资源利用率。Continuing the example of step S2430 above, if the maximum limit of the total resources of the CPU is exceeded, this step S2450 is executed to dynamically adjust according to the compressibility of the CPU, thereby realizing efficient resource scheduling and improving resource utilization.
根据本公开实施例的方法,一方面,其并不是由用户人为的去判断分布式系统中各任务所需资源,而是根据各任务类型的资源相关信息和分布式系统中工作节点的资源上限,利用系统去预测由工作节点执行的每个任务需被分配的资源需求,这能够有效提高资源计算的效率和准确性;另一方面,其能够根据预测出的资源需求,将每个任务分配给适合的工作节点,并在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调节,从而实现有效率的任务分配和资源调度,提高任务的执行效率以及资源利用率。According to the method of the embodiment of the present disclosure, on the one hand, it is not for the user to artificially determine the resources required by each task in the distributed system, but according to the resource-related information of each task type and the resource upper limit of the working node in the distributed system. , Using the system to predict the resource requirements that need to be allocated for each task executed by the working node, which can effectively improve the efficiency and accuracy of resource calculation; on the other hand, it can allocate each task according to the predicted resource demand Give suitable working nodes and dynamically adjust resource usage during the execution of assigned tasks by the working nodes, thereby realizing efficient task allocation and resource scheduling, and improving task execution efficiency and resource utilization.
在一个实施例中,上述步骤S2450中根据某种资源的压缩性来进行动态调整可以进一步包括如下步骤:In an embodiment, the dynamic adjustment according to the compressibility of a certain resource in the above step S2450 may further include the following steps:
步骤S2451,查找工作节点中针对某种资源超过预测的资源需求值的任务作为备选任务,并按照处理优先级和/或启动时间来选择备选任务。Step S2451: Search for tasks in the working node whose resources exceed the predicted resource demand value as candidate tasks, and select the candidate tasks according to the processing priority and/or start time.
本步骤S2451中,例如可以是在某种资源的当前使用总量超过某种资源的总资源最大限制的情况下,先查找工作节点中针对某种资源超过预测的资源需求值的任务作为备选任务,再按照备选任务的处理优先级的升序排序次序选择处理优先级低级的备选任务作为选 择出的备选任务,还可以是在存在多个优先级最低的备选任务的情况下,接续按照多个优先级最低的备选任务的启动时间的升序排序次序选择启动时间最长的备选任务作为所选择出的备选任务。In this step S2451, for example, when the total current usage of a certain resource exceeds the maximum limit of the total resource of a certain resource, first search for tasks in the working node that exceed the predicted resource demand value for a certain resource as candidates. Tasks, and then select the candidate task with a low processing priority as the selected candidate task according to the ascending order of the processing priority of the candidate task. It can also be the case where there are multiple candidate tasks with the lowest priority. Successively, the candidate task with the longest starting time is selected as the selected candidate task according to the ascending order of the starting time of the multiple candidate tasks with the lowest priority.
继续上述步骤S2450的示例,如果超过CPU的总资源最大限制,则先查找该工作节点中针对CPU超过预测的资源需求值的任务作为备选任务,示例性地,查找出的备选任务例如可以是任务1、任务2以及任务3,在此,可以是在将任务1、任务2以及任务3进行处理优先级从小到大的排序之后得到:任务3、任务2、任务1,并选择处理优先级最低的任务3作为所选择出的备选任务;另外,例如还可以是在任务3和任务2的处理优先级相同的情况下,接续将任务3和任务2进行启动时间从小到大的排序以得到:任务2、任务3,并选取启动时间最长的任务3作为所选择出的备选任务。Continuing the example of step S2450 above, if the maximum limit of the total resources of the CPU is exceeded, the task that exceeds the predicted resource demand value for the CPU in the working node is first searched as a candidate task. Illustratively, the searched candidate task may be, for example, It is task 1, task 2, and task 3. Here, it can be obtained after sorting task 1, task 2, and task 3 in order of processing priority: task 3, task 2, task 1, and select the processing priority Task 3 with the lowest level is selected as the candidate task; in addition, for example, if the processing priority of task 3 and task 2 are the same, then task 3 and task 2 are successively sorted from smallest to largest start time To get: task 2, task 3, and select task 3 with the longest startup time as the selected candidate task.
步骤S2452,根据某种资源的压缩性,针对所选择的备选任务来进行动态调整。Step S2452, according to the compressibility of a certain resource, dynamically adjust the selected candidate task.
本步骤S2452中,根据上述步骤S2451选择出备选任务之后,可以根据某种资源的压缩性,针对所选择的备选任务来进行动态调整,以提高资源利用率。In this step S2452, after the candidate task is selected according to the above step S2451, the selected candidate task can be dynamically adjusted according to the compressibility of a certain resource to improve resource utilization.
在本公开的一个例子中,该步骤S2452根据某种资源的压缩性,针对所选择的备选任务来进行动态调整可以进一步包括:In an example of the present disclosure, the step S2452 dynamically adjusting for the selected candidate task according to the compressibility of a certain resource may further include:
在某种资源为可压缩资源的情况下,限制备选任务对于某种资源的资源使用量。In the case that a certain resource is a compressible resource, the resource usage of the candidate task for a certain resource is restricted.
继续上述步骤S2451的示例,由于CPU为可压缩资源,在此,可以是限制所选择出的备选任务即任务3对于CPU的资源使用量。Continuing the example of step S2451 above, since the CPU is a compressible resource, here, the resource usage of the selected candidate task, that is, task 3, for the CPU may be restricted.
在本公开的一个例子中,参照图4所示,该步骤S2452根据某种资源的压缩性,针对所选择的备选任务来进行动态调整可以进一步包括:In an example of the present disclosure, referring to FIG. 4, the step S2452 dynamically adjusts the selected candidate task according to the compressibility of a certain resource may further include:
步骤S2452-1,在某种资源为不可压缩资源的情况下,判断备选任务是否支持扩容,如是,则执行步骤S2452-2,反之,执行步骤S2452-5。In step S2452-1, when a certain resource is an incompressible resource, it is judged whether the candidate task supports capacity expansion, if so, step S2452-2 is executed, otherwise, step S2452-5 is executed.
示例性地,以该资源为FPGA,所选择出的备选任务仍为任务3为例,由于FPGA为不可压缩资源,在此,可以是判断任务3是否支持扩容,如果任务3支持扩容,则根据以下步骤S2452-2判断是否存在能够执行任务3的其他工作节点,反之,则根据以下步骤S2452-5判断任务3是否支持冻结。Exemplarily, taking the resource as FPGA and the selected candidate task is still task 3 as an example, since FPGA is an incompressible resource, here, it can be judged whether task 3 supports expansion. If task 3 supports expansion, then According to the following step S2452-2, it is judged whether there are other working nodes that can execute task 3, otherwise, according to the following step S2452-5, it is judged whether task 3 supports freezing.
步骤S2452-2,判断是否存在能够执行备选任务的其他工作节点,如是,则执行步骤S2452-3,反之,执行步骤S2452-5。In step S2452-2, it is judged whether there are other working nodes that can perform the candidate task. If so, step S2452-3 is performed, otherwise, step S2452-5 is performed.
继续上述步骤S2452-1的示例,在任务3支持扩容的情况下,进一步判断是否存在能够执行任务3的其他工作节点,如果存在能够执行任务3的其他工作节点,则执行以下步骤S2452-3提取任务3中未完成的部分任务,反之,执行步骤S2452-5判断备选任务是否支持冻结。Continuing the example of step S2452-1 above, in the case that task 3 supports capacity expansion, further determine whether there are other working nodes that can perform task 3, and if there are other working nodes that can perform task 3, perform the following step S2452-3 extraction For some uncompleted tasks in task 3, on the contrary, step S2452-5 is executed to determine whether the candidate tasks support freezing.
步骤S2452-3,在存在其他工作节点的情况下,提取备选任务中未完成的部分任务。In step S2452-3, when there are other working nodes, extract the uncompleted part of the tasks among the candidate tasks.
继续上述步骤S2452-2的示例,在存在能够执行任务3的其他工作节点的情况下,提取任务3中未完成的部分任务,并继续执行步骤S2452-4。Continuing the example of step S2452-2 above, if there are other working nodes that can execute task 3, extract the uncompleted part of tasks in task 3, and continue to execute step S2452-4.
步骤S2452-4,将提取出的部分任务发送到其他工作节点。Step S2452-4, sending some of the extracted tasks to other working nodes.
继续上述步骤S2452-3的示例,在提取出任务3中未完成的部分任务之后,可以根据本步骤S2452-4将提取出的部分任务发送到其他工作节点,以由其他工作节点继续执行该部分任务。Continuing the example of step S2452-3 above, after extracting the unfinished part of task 3, the extracted part of the task can be sent to other working nodes according to this step S2452-4, so that other working nodes can continue to execute the part task.
步骤S2452-5,判断备选任务是否支持冻结,如是,则执行步骤S2452-6,反之,执行步骤S2452-12。In step S2452-5, it is judged whether the candidate task supports freezing, if so, step S2452-6 is executed, otherwise, step S2452-12 is executed.
继续上述步骤S2452-1或者步骤S2452-2的示例,在任务3不支持扩容的情况下,或者,不存在能够执行备选任务的其他工作节点的情况下,进一步判断任务3是否支持冻结,如果任务3支持冻结,则执行步骤S2452-6冻结任务3,反之,执行步骤S2452-12。Continuing the example of step S2452-1 or step S2452-2 above, in the case that task 3 does not support capacity expansion, or there is no other working node that can perform alternative tasks, it is further judged whether task 3 supports freezing, if If task 3 supports freezing, execute step S2452-6 to freeze task 3, otherwise, execute step S2452-12.
步骤S2452-6,在备选任务支持冻结的情况下,将备选任务的内存数据写入工作节点的 磁盘中。In step S2452-6, when the candidate task supports freezing, the memory data of the candidate task is written into the disk of the working node.
继续上述步骤S2452-5的示例,在任务3支持冻结的情况下,冻结任务3,即,将任务3的内存数据写入工作节点的磁盘中,可以是在将任务3的内存数据写入工作节点的磁盘中之后,继续执行步骤S2452-7判断任务3是否支持迁移。Continuing the example of step S2452-5 above, in the case that task 3 supports freezing, task 3 is frozen, that is, the memory data of task 3 is written to the disk of the working node, which can be in writing the memory data of task 3 to the working node. After the node is in the disk, continue to perform step S2452-7 to determine whether task 3 supports migration.
步骤S2452-7,判断备选任务是否支持迁移,如是,则执行步骤S2452-8,反之,执行步骤S2452-10。In step S2452-7, it is judged whether the candidate task supports migration, if so, step S2452-8 is executed, otherwise, step S2452-10 is executed.
继续上述步骤S2452-6的示例,在任务3支持迁移的情况下,根据以下步骤S2452-8进一步判断是否存在能够执行备选任务的其他工作节点,反之,执行步骤S2452-10的等待恢复。Continuing the example of step S2452-6, in the case that task 3 supports migration, according to the following step S2452-8, it is further judged whether there are other working nodes that can execute the candidate task, and vice versa, step S2452-10 waits for recovery.
步骤S2452-8,在备选任务支持迁移的情况下,判断是否存在能够执行备选任务的其他工作节点,如是,则执行步骤S2452-9,反之,执行S2452-5。In step S2452-8, if the candidate task supports migration, it is determined whether there are other working nodes that can execute the candidate task. If so, step S2452-9 is executed, otherwise, S2452-5 is executed.
继续上述步骤S2452-7的示例,在任务3支持迁移的情况下,进一步判断是否存在能够执行任务3的其他工作节点,如果存在能够执行任务3的其他工作节点,则执行步骤S2452-9将内存数据发送给其他工作节点,反之,执行步骤S2452-5判断任务3是否支持冻结。Continuing the example of step S2452-7 above, in the case that task 3 supports migration, it is further determined whether there are other working nodes that can execute task 3, and if there are other working nodes that can execute task 3, step S2452-9 is executed to save the memory The data is sent to other working nodes, otherwise, step S2452-5 is executed to determine whether task 3 supports freezing.
步骤S2452-9,将内存数据发送给其他工作节点。In step S2452-9, the memory data is sent to other working nodes.
继续上述步骤S2452-8的示例,如果存在能够执行任务3的其他工作节点,则执行本步骤S2452-9将内存数据发送给其他工作节点。Continuing the example of step S2452-8 above, if there are other working nodes that can perform task 3, execute this step S2452-9 to send the memory data to other working nodes.
步骤S2452-10,响应于设定的触发事件,获取备选任务的当前资源使用情况。Step S2452-10, in response to the set trigger event, obtain the current resource usage of the candidate task.
该设定的触发事件包括工作节点中已完成所分配的任意一个任务、工作节点中存在被释放的资源之中的至少一个。The set trigger event includes at least one of any assigned task in the working node has been completed, and at least one of the released resources in the working node.
继续上述步骤S2452-8的示例,在任务3不支持迁移的情况下,执行本步骤S2452-10的等待恢复,例如可以是在工作节点中已完成所分配的任意一个任务、工作节点中存在被释放的GPU之中的至少一个时,获取任务3的当前GPU使用情况。Continuing the example of step S2452-8 above, in the case that task 3 does not support migration, perform the waiting recovery of step S2452-10, for example, it can be that any task assigned in the working node has been completed, and there is a node in the working node. When at least one of the GPUs is released, the current GPU usage of task 3 is acquired.
步骤S2452-11,基于备选任务的当前资源使用情况,继续由工作节点执行备选任务。Step S2452-11, based on the current resource usage of the candidate task, continue to execute the candidate task by the working node.
继续上述步骤S2452-10的示例,基于任务3的当前GPU使用情况,继续由工作节点执行任务3。Continuing the example of step S2452-10 above, based on the current GPU usage of task 3, task 3 will continue to be executed by the worker node.
步骤S2452-12,直接杀掉备选任务。Step S2452-12, directly kill the candidate task.
继续上述S2452-5的示例,在任务3不支持冻结的情况下,执行本步骤S2452-12直接杀掉任务3,并接续执行步骤S2452-13。Continuing the example of S2452-5 above, if task 3 does not support freezing, execute this step S2452-12 to directly kill task 3, and continue to execute step S2452-13.
步骤S2452-13,收集由工作节点发送的备选任务的资源使用情况。Step S2452-13: Collect the resource usage status of the candidate tasks sent by the working node.
继续上述S2452-12的示例,在杀掉任务3之后,可以收集由工作节点发送的任务3的GPU使用情况,并自动推导新的资源需求。Continuing the example of S2452-12 above, after killing task 3, the GPU usage of task 3 sent by the worker node can be collected, and new resource requirements can be derived automatically.
步骤S2452-14,基于资源使用情况,扩充备选任务的资源需求,以根据扩充的资源需求,再次将备选任务分配给适合的工作节点。In step S2452-14, based on the resource usage, the resource requirements of the candidate tasks are expanded, so as to allocate the candidate tasks to suitable working nodes again according to the expanded resource requirements.
继续上述S2452-13的示例,在收集由工作节点发送的任务3的GPU使用情况之后,便可根据本步骤S2452-14扩充任务3的资源需求,以根据扩充的资源需求,再次将任务3分配给适合的工作节点,进而由该适合的工作节点重新执行任务3。Continuing the example of S2452-13 above, after collecting the GPU usage of task 3 sent by the worker node, the resource demand of task 3 can be expanded according to this step S2452-14, so that task 3 can be allocated again according to the expanded resource demand Give a suitable working node, and then perform task 3 again by the suitable working node.
根据本实施例,其可以在某种资源的当前使用总量超过某种资源的总资源最大限制的情况下,根据某种资源的压缩性来进行动态调整,从而,提高任务处理效率,提高资源利用率。According to this embodiment, when the total current usage of a certain resource exceeds the maximum limit of the total resource of a certain resource, it can dynamically adjust according to the compressibility of a certain resource, thereby improving task processing efficiency and resources. Utilization rate.
在本实施例中,还提供一种在分布式系统中资源及任务的分配装置5000,如图5所示,其包括作业接收单元5100、资源需求预测单元5200、任务分配单元5300以及资源调度单元5400。In this embodiment, an apparatus 5000 for allocating resources and tasks in a distributed system is also provided. As shown in FIG. 5, it includes a job receiving unit 5100, a resource demand prediction unit 5200, a task allocation unit 5300, and a resource scheduling unit. 5400.
该作业接收单元5100,被配置为接收用于在分布式系统中执行的作业。The job receiving unit 5100 is configured to receive jobs for execution in a distributed system.
该资源需求预测单元5200,被配置为根据作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求。The resource demand prediction unit 5200 is configured to predict the resource demand to be allocated for each task executed by the working node based on the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system.
该任务分配单元5300,被配置为根据预测的资源需求,将每个任务分配给适合的工作节点。The task allocation unit 5300 is configured to allocate each task to a suitable working node according to the predicted resource demand.
该资源调度单元5400,被配置为在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调整。The resource scheduling unit 5400 is configured to dynamically adjust the resource usage in the process of executing the assigned task by the working node.
在一个实施例中,所述任务类型包括机器学习中的参数服务器任务和/或训练学习任务;以及,In an embodiment, the task types include parameter server tasks and/or training learning tasks in machine learning; and,
所述资源相关信息包括相应任务类型的处理数据规模和处理内容之中的至少一项。The resource-related information includes at least one of the processing data scale and processing content of the corresponding task type.
在一个实施例中,所述资源需求包括任务所需的每种资源类型和相应的资源需求值;In one embodiment, the resource requirement includes each resource type and corresponding resource requirement value required by the task;
其中,所述资源需求值包括峰值需求值和一般需求值之中的至少一项。Wherein, the resource demand value includes at least one of a peak demand value and a general demand value.
在一个实施例中,该资源需求预测单元5200,还被配置为根据规则和/或机器学习模型来预测由工作节点执行的每个任务需被分配的资源需求;以及,In one embodiment, the resource demand prediction unit 5200 is further configured to predict the resource demand to be allocated for each task performed by the worker node according to rules and/or machine learning models; and,
收集工作节点执行任务时的实际资源使用情况,以获取所述规则和/或机器学习模型。Collect the actual resource usage of the working node when performing tasks to obtain the rules and/or machine learning models.
在一个实施例中,该任务分配单元5300,还被配置为获取各个工作节点的当前资源使用情况、当前任务运行情况以及总资源最大限制;In one embodiment, the task allocation unit 5300 is further configured to obtain the current resource usage status, current task running status, and total resource maximum limit of each working node;
利用预设的分配算法,根据预测的资源需求,并结合获取的各个工作节点的当前资源使用情况、当前任务运行情况以及总资源最大限制,从分布式系统的多个工作节点中筛选出适合执行所述每个任务的工作节点,并将所述每个任务分配给筛选出的工作节点。Using the preset allocation algorithm, according to the predicted resource demand, combined with the current resource usage of each working node, the current task running situation and the maximum limit of total resources, the multiple working nodes of the distributed system are selected to be suitable for execution. The working node of each task is assigned, and each task is assigned to the selected working node.
在一个实施例中,该资源调度单元5400,还被配置为监控任务的资源使用情况;In an embodiment, the resource scheduling unit 5400 is also configured to monitor the resource usage of the task;
在任务的某种资源使用超过预测的资源需求值的情况下,判断所述某种资源的当前使用总量是否超过所述某种资源的总资源最大限制;In the case where a certain resource usage of the task exceeds the predicted resource demand value, judging whether the total current usage of the certain resource exceeds the maximum total resource limit of the certain resource;
在所述某种资源的当前使用总量超过所述某种资源的总资源最大限制的情况下,根据所述某种资源的压缩性来进行动态调整。In the case that the total current usage of the certain resource exceeds the maximum limit of the total resource of the certain resource, dynamic adjustment is made according to the compressibility of the certain resource.
在一个实施例中,该资源调度单元5400,还被配置为查找所述工作节点中针对所述某种资源超过预测的资源需求值的任务作为备选任务,并按照处理优先级和/或启动时间来选择备选任务;In an embodiment, the resource scheduling unit 5400 is further configured to search for tasks in the working node that exceed the predicted resource demand value for the certain resource as candidate tasks, and start the task according to the processing priority and/or Time to choose alternative tasks;
根据所述某种资源的压缩性,针对所选择的备选任务来进行动态调整。According to the compressibility of the certain resource, the selected candidate task is dynamically adjusted.
在一个实施例中,该资源调度单元5400,还被配置为在所述某种资源为可压缩资源的情况下,限制所述备选任务对于所述某种资源的资源使用量。In an embodiment, the resource scheduling unit 5400 is further configured to limit the resource usage of the candidate task for the certain resource when the certain resource is a compressible resource.
在一个实施例中,该资源调度单元5400,还被配置为在所述某种资源为不可压缩资源的情况下,判断所述备选任务是否支持扩容;In an embodiment, the resource scheduling unit 5400 is further configured to determine whether the candidate task supports expansion when the certain resource is an incompressible resource;
在所述备选任务支持扩容的情况下,判断是否存在能够执行所述备选任务的其他工作节点;In the case that the candidate task supports capacity expansion, determining whether there are other working nodes that can execute the candidate task;
在存在所述其他工作节点的情况下,提取所述备选任务中未完成的部分任务;In the case where the other working nodes exist, extract the uncompleted part of the task among the candidate tasks;
将提取出的所述部分任务发送到所述其他工作节点。Send the extracted part of the task to the other working nodes.
在一个实施例中,该资源调度单元5400,还被配置为在所述备选任务不支持扩容的情况下,判断所述备选任务是否支持冻结;In an embodiment, the resource scheduling unit 5400 is further configured to determine whether the candidate task supports freezing when the candidate task does not support capacity expansion;
在所述备选任务支持冻结的情况下,将所述备选任务的内存数据写入所述工作节点的磁盘中。In the case that the candidate task supports freezing, the memory data of the candidate task is written into the disk of the working node.
在一个实施例中,该资源调度单元5400,还被配置为在不存在所述其他工作节点的情况下,判断所述备选任务是否支持冻结;In an embodiment, the resource scheduling unit 5400 is further configured to determine whether the candidate task supports freezing when the other working node does not exist;
在所述备选任务支持冻结的情况下,将所述备选任务的内存数据写入所述工作节点的磁盘中。In the case that the candidate task supports freezing, the memory data of the candidate task is written into the disk of the working node.
在一个实施例中,该资源调度单元5400,还被配置为判断所述备选任务是否支持迁移;In an embodiment, the resource scheduling unit 5400 is further configured to determine whether the candidate task supports migration;
在所述备选任务支持迁移的情况下,判断是否存在能够执行所述备选任务的其他工作节点;In the case that the candidate task supports migration, determining whether there are other working nodes that can execute the candidate task;
将所述内存数据发送给所述其他工作节点。Sending the memory data to the other working nodes.
在一个实施例中,该资源调度单元5400,还被配置为在所述备选任务不支持迁移的情况下,响应于设定的触发事件,获取所述备选任务的当前资源使用情况;In one embodiment, the resource scheduling unit 5400 is further configured to obtain the current resource usage status of the candidate task in response to a set trigger event when the candidate task does not support migration;
基于所述备选任务的当前资源使用情况,继续由所述工作节点执行所述备选任务。Based on the current resource usage of the candidate task, continue to execute the candidate task by the working node.
在一个实施例中,所述触发事件包括所述工作节点中已完成所分配的任意一个任务、所述工作节点中存在被释放的所述资源之中的至少一个。In an embodiment, the trigger event includes at least one of any one of the assigned tasks in the working node has been completed, and the resource that has been released in the working node.
在一个实施例中,该资源调度单元5400,还被配置为在所述备选任务不支持冻结的情况下,直接杀掉所述备选任务。In an embodiment, the resource scheduling unit 5400 is further configured to directly kill the candidate task when the candidate task does not support freezing.
在一个实施例中,该资源调度单元5400,还被配置为收集由所述工作节点发送的所述备选任务的资源使用情况;In an embodiment, the resource scheduling unit 5400 is further configured to collect resource usage of the candidate task sent by the working node;
基于所述资源使用情况,扩充所述备选任务的资源需求,以根据扩充的资源需求,再次将所述备选任务分配给适合的工作节点。Based on the resource usage, the resource requirements of the candidate tasks are expanded, so as to allocate the candidate tasks to suitable working nodes again according to the expanded resource requirements.
在本实施例中,还提供一种在分布式系统中资源及任务的分配设备6000,如图6所示,包括:In this embodiment, a device 6000 for allocating resources and tasks in a distributed system is also provided, as shown in FIG. 6, including:
存储器6100,被配置为存储可执行指令;The memory 6100 is configured to store executable instructions;
处理器6200,被配置为根据所述可执行指令的控制,运行所述在分布式系统中资源及任务的分配设备执行如本实施例中提供的在分布式系统中资源及任务的分配方法。The processor 6200 is configured to execute the resource and task allocation device in the distributed system according to the control of the executable instruction to execute the resource and task allocation method in the distributed system as provided in this embodiment.
在本实施例中,在分布式系统中资源及任务的分配设备6000可以是服务器。例如,在分布式系统中资源及任务的分配设备6000可以是如图1所示的服务器1000。In this embodiment, the resource and task allocation device 6000 in a distributed system may be a server. For example, the resource and task allocation device 6000 in a distributed system may be the server 1000 as shown in FIG. 1.
在分布式系统中资源及任务的分配设备6000还可以包括其他的装置,例如,如图1所示的服务器1000,还可以包括输入装置、通信装置、接口装置以及显示装置等。The resource and task allocation equipment 6000 in a distributed system may also include other devices, for example, the server 1000 shown in FIG. 1 may also include an input device, a communication device, an interface device, and a display device.
在本实施例中,还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序在被处理器执行时实现如本公开任意实施例的在分布式系统中任务及资源的分配方法。In this embodiment, a computer-readable storage medium is also provided, on which a computer program is stored. When the computer program is executed by a processor, the method for allocating tasks and resources in a distributed system as in any embodiment of the present disclosure is implemented. .
在本实施例中,还提供一种分布式系统7000,如图7所示,包括:In this embodiment, a distributed system 7000 is also provided, as shown in FIG. 7, including:
多个被配置为提供资源的设备7100,例如可以是被配置为提供资源的设备7100A和被配置为提供资源的设备7100B,该多个被配置为提供资源的设备7100可以是一个服务器集群里面的设备,也可以是分别属于不同服务器集群里面的设备。A plurality of devices 7100 configured to provide resources may be, for example, a device 7100A configured to provide resources and a device 7100B configured to provide resources. The plurality of devices 7100 configured to provide resources may be in a server cluster. Devices can also be devices that belong to different server clusters.
在本实施例中,被配置为提供资源的设备7100的数量可以根据实际场景确定,在此不做任何限定。In this embodiment, the number of devices 7100 configured to provide resources can be determined according to actual scenarios, and there is no limitation here.
在本实施例中,分布式系统7000还包括在分布式系统中资源及任务的分配装置5000或在分布式系统中资源及任务的分配设备6000。该在分布式系统中资源及任务的分配装置5000或在分布式系统中资源及任务的分配设备6000可以是分布在提供资源的设备7100上。In this embodiment, the distributed system 7000 further includes a device 5000 for allocating resources and tasks in the distributed system or a device 6000 for allocating resources and tasks in the distributed system. The device 5000 for allocating resources and tasks in a distributed system or the device 6000 for allocating resources and tasks in a distributed system may be distributed on a device 7100 that provides resources.
该分布式系统7000不仅适用于机器学习场景,也适用于其它对任务不做严格资源限制的非机器学习场景。该机器学习场景为:由于系统复杂,或对数据的一些未知原因,很难对任务的具体资源使用情况作出准确的判断,给出正确的结果的场景。例如,特征处理任务、离线的机器学习训练任务以及在线的机器学习预测任务等。该非机器学习场景为:一些在线服务一般都存在一些高峰期,如外卖系统,在午餐时间会存在很大的峰值,系统需要的资源就会很大,但在半夜,使用量就会很少,使用该系统7000,会在低峰时期,回收一些资源给其他任务使用。The distributed system 7000 is not only applicable to machine learning scenarios, but also applicable to other non-machine learning scenarios that do not impose strict resource restrictions on tasks. The machine learning scenario is: due to the complexity of the system or some unknown reasons for the data, it is difficult to make accurate judgments on the specific resource usage of the task and give correct results. For example, feature processing tasks, offline machine learning training tasks, and online machine learning prediction tasks. The non-machine learning scenario is: some online services generally have some peak periods, such as a take-out system, there will be a large peak during lunch time, the system will require a lot of resources, but in the middle of the night, the usage will be very small , Using the system 7000, some resources will be reclaimed for use by other tasks during low peak periods.
以下将进一步举例说明本实施例中提供的分布式系统7000实施的资源及任务的分配方法。The following will further illustrate the resource and task allocation method implemented by the distributed system 7000 provided in this embodiment.
在本例中,分布式系统7000包括多个被配置为提供资源的设备7100和在分布式系统中资源及任务的分配装置5000,该设备7100可以是具有执行节点、资源预测节点、调度 节点以及工作节点的服务器,该在分布式系统中资源及任务的分配装置5000可以分布在多个该设备7100上,例如,可以通过执行节点实现在分布式系统中资源及任务的分配装置5000中作业接收单元5100对应的功能,以及通过资源预测节点实现在分布式系统中资源及任务的分配装置5000中资源预测需求单元5200对应的功能,通过调度节点实现在分布式系统中资源及任务的分配装置5000中任务分配单元5300对应的功能,以及,通过工作节点实现在分布式系统中资源及任务的分配装置5000中资源调度单元5400对应的功能,如图8所示,本例子中,资源及任务的分配方法可以包括:In this example, the distributed system 7000 includes a plurality of devices 7100 configured to provide resources and a device 5000 for allocating resources and tasks in the distributed system. The device 7100 may have execution nodes, resource prediction nodes, scheduling nodes, and The server of the working node, the device 5000 for allocating resources and tasks in the distributed system can be distributed on multiple devices 7100, for example, the job receiving in the device 5000 for allocating resources and tasks in the distributed system can be realized through the execution node The function corresponding to the unit 5100 and the function corresponding to the resource prediction demand unit 5200 in the device 5000 for allocating resources and tasks in a distributed system through the resource prediction node, and the device 5000 for allocating resources and tasks in the distributed system through the scheduling node. The function corresponding to the task allocation unit 5300 in the medium, and the function corresponding to the resource scheduling unit 5400 in the resource and task allocation device 5000 in a distributed system is realized through the working node. As shown in FIG. 8, in this example, the resource and task allocation Distribution methods can include:
步骤S8010,执行节点提交在分布式系统中执行的作业job至资源预测节点。Step S8010, the execution node submits the job executed in the distributed system to the resource prediction node.
该作业job中可以包括任务task1、任务task2和task3。The job job can include task task1, task2, and task3.
步骤S8020,资源预测节点根据作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求,并将预测出的资源需求返回至执行节点。In step S8020, the resource prediction node predicts the resource requirements to be allocated for each task executed by the working node based on the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system, and then calculates the predicted resource The demand returns to the execution node.
步骤S8030,执行节点将工作job中包括的任务task1、任务task2和task3,以及,预测出的由工作节点执行的每个任务需被分配的资源需求提交至调度节点。In step S8030, the execution node submits the tasks task1, task2, and task3 included in the job job, and the predicted resource requirements for each task executed by the worker node to be allocated to the scheduling node.
步骤S8040,调度节点根据预测的资源需求,将任务task1分配给工作节点1;以及,将任务task2和任务task3分配给工作节点2。In step S8040, the scheduling node assigns task task1 to the work node 1 according to the predicted resource demand; and assigns the task task2 and the task task3 to the work node 2.
本步骤S8040中,可以是利用预设的分配算法,根据预测的资源需求,并结合获取的各个工作节点的当前资源使用情况、当前任务运行情况以及总资源最大限制,从分布式系统的多个工作节点中筛选出适合执行任务task1、任务task2和任务task3的工作节点,并将任务task1分配给筛选出的工作节点1,以及,将任务task2和task3分配给筛选出的工作节点2。In this step S8040, a preset allocation algorithm can be used, according to the predicted resource demand, combined with the current resource usage of each working node, the current task running situation, and the maximum limit of total resources, from multiple distributed systems. The work nodes that are suitable for executing tasks task1, task2, and task3 are selected from the work nodes, and task task1 is assigned to the selected work node 1, and tasks task2 and task3 are assigned to the selected work node 2.
步骤S8050,工作节点1接收到任务task1,启动任务task1,并实时监控任务task1的资源使用情况;以及,工作节点2接收到任务task2和任务task3之后,启动任务task2和任务task3,并实时监控任务task2和任务task3的资源使用情况。In step S8050, the worker node 1 receives task task1, starts task task1, and monitors the resource usage of task1 in real time; and, after worker node 2 receives task task2 and task task3, starts task2 and task3, and monitors the task in real time Resource usage of task2 and task3.
步骤S8060,在工作节点1执行任务task1的过程中,会将资源的当前使用情况以及任务task1的执行情况发送至执行节点;以及,工作节点2执行任务task1和任务task2的过程中,会将资源的当前使用情况以及任务task2和任务task3的执行情况发送至执行节点。In step S8060, during the execution of task task1 by the worker node 1, the current usage of the resource and the execution status of task task1 are sent to the execution node; and, during the execution of task task1 and task2 by the worker node 2, the resource will be sent to the execution node. The current usage status and the execution status of task task2 and task3 are sent to the execution node.
步骤S8070,任务task1执行完之后,回馈工作节点1的结束状态以及资源使用情况至工作节点1;以及,任务task2和task3执行完之后,回馈工作节点2的结束状态以及资源使用情况至工作节点2。In step S8070, after task task1 is executed, the end status and resource usage of working node 1 are fed back to working node 1; and, after task task2 and task3 are executed, the end status and resource usage of working node 2 are fed back to working node 2. .
步骤S8080,工作节点1将任务task1回馈的信息汇报至调度节点;以及,工作节点2将任务task2和task3回馈的信息汇报至调度节点。In step S8080, the working node 1 reports the information fed back by the task task1 to the scheduling node; and the working node 2 reports the information fed back by the tasks task2 and task3 to the scheduling node.
步骤S8090,调度节点汇总工作job中的所有任务的信息以及资源使用情况,提交至资源预测节点,以便资源预测节点不断收集真实的资源使用情况,以更新用于执行资源预测的规则和/或机器学习模型。Step S8090, the scheduling node summarizes the information and resource usage of all tasks in the job and submits it to the resource prediction node, so that the resource prediction node continuously collects the real resource usage to update the rules and/or machines used to perform resource prediction Learning model.
本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有被配置为使处理器实现本公开的各个方面的计算机可读程序指令。The present disclosure may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions configured to cause a processor to implement various aspects of the present disclosure.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其 他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。The computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon The protruding structure in the hole card or the groove, and any suitable combination of the above. The computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
被配置为执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。The computer program instructions configured to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages Source code or object code written in any combination of, the programming language includes object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as "C" language or similar programming languages. Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connection). In some embodiments, an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions. The computer-readable program instructions are executed to realize various aspects of the present disclosure.
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Various aspects of the present disclosure are described herein with reference to flowcharts and/or block diagrams of methods, apparatuses (systems) and computer program products according to embodiments of the present disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions on a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , So that the instructions executed on the computer, other programmable data processing apparatus, or other equipment realize the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个被配置为实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowcharts and block diagrams in the accompanying drawings show the possible implementation architecture, functions, and operations of the system, method, and computer program product according to multiple embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction includes one or more that are configured to implement prescribed logical functions. Executable instructions. In some alternative implementations, the functions marked in the block may also occur in a different order than the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that realization by hardware, realization by software, and realization by a combination of software and hardware are all equivalent.
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不 限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本公开的范围由所附权利要求来限定。The embodiments of the present disclosure have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the described embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or technical improvements in the market of the various embodiments, or to enable other ordinary skilled in the art to understand the various embodiments disclosed herein. The scope of the present disclosure is defined by the appended claims.
工业实用性Industrial applicability
通过本公开实施例,其在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调节,从而实现有效率的任务分配和资源调度,提高任务的执行效率以及资源利用率。因此本公开具有很强的工业实用性。Through the embodiments of the present disclosure, during the process of executing the assigned tasks by the working nodes, it dynamically adjusts the resource usage, thereby realizing efficient task allocation and resource scheduling, and improving task execution efficiency and resource utilization. Therefore, the present disclosure has strong industrial applicability.

Claims (20)

  1. 一种在分布式系统中资源及任务的分配方法,所述方法包括:A method for allocating resources and tasks in a distributed system. The method includes:
    接收用于在分布式系统中执行的作业;Receive jobs for execution in a distributed system;
    根据作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求;According to the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system, predict the resource demand that needs to be allocated for each task executed by the working node;
    根据预测的资源需求,将每个任务分配给适合的工作节点;以及According to the predicted resource demand, assign each task to the appropriate work node; and
    在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调整。During the execution of assigned tasks by the working nodes, dynamic adjustments are made to resource usage.
  2. 根据权利要求1所述的方法,其中,The method of claim 1, wherein:
    所述任务类型包括机器学习中的参数服务器任务和/或训练学习任务;以及,The task types include parameter server tasks and/or training learning tasks in machine learning; and,
    所述资源相关信息包括相应任务类型的处理数据规模和处理内容之中的至少一项。The resource-related information includes at least one of the processing data scale and processing content of the corresponding task type.
  3. 根据权利要求1或2所述的方法,其中,所述资源需求包括任务所需的每种资源类型和相应的资源需求值;The method according to claim 1 or 2, wherein the resource requirement includes each resource type and corresponding resource requirement value required by the task;
    其中,所述资源需求值包括峰值需求值和一般需求值之中的至少一项。Wherein, the resource demand value includes at least one of a peak demand value and a general demand value.
  4. 根据权利要求1至3中任一项所述的方法,其中,所述预测由工作节点执行的每个任务的资源需求的步骤,包括:根据规则和/或机器学习模型来预测由工作节点执行的每个任务需被分配的资源需求,The method according to any one of claims 1 to 3, wherein the step of predicting the resource requirements of each task performed by the working node comprises: predicting the execution by the working node according to rules and/or a machine learning model The resource requirements for each task to be allocated,
    并且,所述方法还包括:收集工作节点执行任务时的实际资源使用情况,以获取所述规则和/或机器学习模型。In addition, the method further includes: collecting actual resource usage of the working nodes when performing tasks to obtain the rules and/or machine learning models.
  5. 根据权利要求1至4中任一项所述的方法,其中,所述根据预测的资源需求,将每个任务分配给适合的工作节点,包括:The method according to any one of claims 1 to 4, wherein the allocating each task to a suitable working node according to the predicted resource demand comprises:
    获取各个工作节点的当前资源使用情况、当前任务运行情况以及总资源最大限制;Get the current resource usage, current task running status and maximum limit of total resources of each working node;
    利用预设的分配算法,根据预测的资源需求,并结合获取的各个工作节点的当前资源使用情况、当前任务运行情况以及总资源最大限制,从分布式系统的多个工作节点中筛选出适合执行所述每个任务的工作节点,并将所述每个任务分配给筛选出的工作节点。Using the preset allocation algorithm, according to the predicted resource demand, combined with the current resource usage of each working node, the current task running situation and the maximum limit of total resources, the multiple working nodes of the distributed system are selected to be suitable for execution. The working node of each task is assigned, and each task is assigned to the selected working node.
  6. 根据权利要求1至5中任一项所述的方法,其中,所述在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调整的步骤,包括:The method according to any one of claims 1 to 5, wherein the step of dynamically adjusting resource usage in the process of executing the assigned task by the working node comprises:
    监控任务的资源使用情况;Monitor the resource usage of tasks;
    在任务的某种资源使用超过预测的资源需求值的情况下,判断所述某种资源的当前使用总量是否超过所述某种资源的总资源最大限制;In the case where a certain resource usage of the task exceeds the predicted resource demand value, judging whether the total current usage of the certain resource exceeds the maximum total resource limit of the certain resource;
    在所述某种资源的当前使用总量超过所述某种资源的总资源最大限制的情况下,根据所述某种资源的压缩性来进行动态调整。In the case that the total current usage of the certain resource exceeds the maximum limit of the total resource of the certain resource, dynamic adjustment is made according to the compressibility of the certain resource.
  7. 根据权利要求1至6中任一项所述的方法,其中,所述根据所述某种资源的压缩性来进行动态调整的步骤,包括:The method according to any one of claims 1 to 6, wherein the step of dynamically adjusting according to the compressibility of the certain resource comprises:
    查找所述工作节点中针对所述某种资源超过预测的资源需求值的任务作为备选任务,并按照处理优先级和/或启动时间来选择备选任务;Searching for tasks in the working node whose resources exceed the predicted resource demand value as candidate tasks, and selecting the candidate tasks according to processing priority and/or start time;
    根据所述某种资源的压缩性,针对所选择的备选任务来进行动态调整。According to the compressibility of the certain resource, the selected candidate task is dynamically adjusted.
  8. 根据权利要求1至7中任一项所述的方法,其中,所述根据所述某种资源的压缩性,针对所选择的备选任务来进行动态调整,包括:The method according to any one of claims 1 to 7, wherein the dynamic adjustment for the selected candidate task according to the compressibility of the certain resource comprises:
    在所述某种资源为可压缩资源的情况下,限制所述备选任务对于所述某种资源的资源使用量。In the case that the certain resource is a compressible resource, the resource usage amount of the candidate task for the certain resource is restricted.
  9. 根据权利要求1至8中任一项所述的方法,其中,所述根据所述某种资源的压缩性,针对所选择的备选任务来进行动态调整,包括:The method according to any one of claims 1 to 8, wherein the dynamic adjustment for the selected candidate task according to the compressibility of the certain resource comprises:
    在所述某种资源为不可压缩资源的情况下,判断所述备选任务是否支持扩容;In the case that the certain resource is an incompressible resource, judging whether the candidate task supports expansion;
    在所述备选任务支持扩容的情况下,判断是否存在能够执行所述备选任务的其他工作节点;In the case that the candidate task supports capacity expansion, determining whether there are other working nodes that can execute the candidate task;
    在存在所述其他工作节点的情况下,提取所述备选任务中未完成的部分任务;In the case where the other working nodes exist, extract the uncompleted part of the task among the candidate tasks;
    将提取出的所述部分任务发送到所述其他工作节点。Send the extracted part of the task to the other working nodes.
  10. 根据权利要求1至9中任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 9, wherein the method further comprises:
    在所述备选任务不支持扩容的情况下,判断所述备选任务是否支持冻结;In the case that the candidate task does not support capacity expansion, determine whether the candidate task supports freezing;
    在所述备选任务支持冻结的情况下,将所述备选任务的内存数据写入所述工作节点的磁盘中。In the case that the candidate task supports freezing, the memory data of the candidate task is written into the disk of the working node.
  11. 根据权利要求1至10中任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 10, wherein the method further comprises:
    在不存在所述其他工作节点的情况下,判断所述备选任务是否支持冻结;In the absence of the other working nodes, determine whether the candidate task supports freezing;
    在所述备选任务支持冻结的情况下,将所述备选任务的内存数据写入所述工作节点的磁盘中。In the case that the candidate task supports freezing, the memory data of the candidate task is written into the disk of the working node.
  12. 根据权利要求1至11中任一项所述的方法,其中,所述方法在将所述备选任务的内存数据写入所述工作节点的磁盘中之后,还包括:The method according to any one of claims 1 to 11, wherein after writing the memory data of the candidate task to the disk of the working node, the method further comprises:
    判断所述备选任务是否支持迁移;Determine whether the candidate task supports migration;
    在所述备选任务支持迁移的情况下,判断是否存在能够执行所述备选任务的其他工作节点;In the case that the candidate task supports migration, determining whether there are other working nodes that can execute the candidate task;
    将所述内存数据发送给所述其他工作节点。Sending the memory data to the other working nodes.
  13. 根据权利要求1至12中任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 12, wherein the method further comprises:
    在所述备选任务不支持迁移的情况下,响应于设定的触发事件,获取所述备选任务的当前资源使用情况;In the case that the candidate task does not support migration, in response to a set trigger event, obtain the current resource usage of the candidate task;
    基于所述备选任务的当前资源使用情况,继续由所述工作节点执行所述备选任务。Based on the current resource usage of the candidate task, continue to execute the candidate task by the working node.
  14. 根据权利要求1至13中任一项所述的方法,其中,The method according to any one of claims 1 to 13, wherein:
    所述触发事件包括所述工作节点中已完成所分配的任意一个任务、所述工作节点中存在被释放的所述资源之中的至少一个。The trigger event includes at least one of the allocated tasks being completed in the working node, and the released resource in the working node.
  15. 根据权利要求1至14中任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 14, wherein the method further comprises:
    在所述备选任务不支持冻结的情况下,直接杀掉所述备选任务。If the candidate task does not support freezing, directly kill the candidate task.
  16. 根据权利要求1至15中任一项所述的方法,其中,所述方法在直接杀掉所述备选任务之后,还包括:The method according to any one of claims 1 to 15, wherein, after the method directly kills the candidate task, the method further comprises:
    收集由所述工作节点发送的所述备选任务的资源使用情况;Collecting resource usage of the candidate task sent by the working node;
    基于所述资源使用情况,扩充所述备选任务的资源需求,以根据扩充的资源需求,再次将所述备选任务分配给适合的工作节点。Based on the resource usage, the resource requirements of the candidate tasks are expanded, so as to allocate the candidate tasks to suitable working nodes again according to the expanded resource requirements.
  17. 一种在分布式系统中资源及任务的分配装置,包括:A device for allocating resources and tasks in a distributed system, including:
    作业接收单元,被配置为接收用于在分布式系统中执行的作业;The job receiving unit is configured to receive jobs for execution in the distributed system;
    资源需求预测单元,被配置为根据作业中各任务类型的资源相关信息和分布式系统中各工作节点的资源上限,预测由工作节点执行的每个任务需被分配的资源需求;The resource demand prediction unit is configured to predict the resource demand to be allocated for each task executed by the working node based on the resource-related information of each task type in the job and the resource upper limit of each working node in the distributed system;
    任务分配单元,被配置为根据预测的资源需求,将每个任务分配给适合的工作节点;The task allocation unit is configured to allocate each task to a suitable work node according to the predicted resource demand;
    资源调度单元,被配置为在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调整。The resource scheduling unit is configured to dynamically adjust the resource usage in the process of executing the assigned task by the working node.
  18. 一种在分布式系统中资源及任务的分配设备,包括:A device for allocating resources and tasks in a distributed system, including:
    存储器,被配置为存储可执行指令;The memory is configured to store executable instructions;
    处理器,被配置为根据所述可执行指令的控制,运行所述在分布式系统中资源及任务的分配设备执行如权利要求1至16任一项所述的在分布式系统中资源及任务的分配方法。A processor configured to execute the resource and task allocation device in the distributed system according to the control of the executable instruction to execute the resource and task in the distributed system according to any one of claims 1 to 16 Method of distribution.
  19. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序在被处理器执行时实现如权利要求1至16中任一项所述的在分布式系统中资源及任务的分配方法。A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method for allocating resources and tasks in a distributed system according to any one of claims 1 to 16 .
  20. 一种分布式系统,其中,包括:A distributed system, including:
    多个被配置为提供资源的设备;Multiple devices configured to provide resources;
    如权利要求17所述的在分布式系统中资源及任务的分配装置或如权利要求18所述的在分布式系统中资源及任务的分配设备。The device for allocating resources and tasks in a distributed system according to claim 17 or the device for allocating resources and tasks in a distributed system according to claim 18.
PCT/CN2020/110544 2019-08-23 2020-08-21 Method and apparatus for allocating resources and tasks in distributed system, and system WO2021036936A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910783327.3 2019-08-23
CN201910783327.3A CN110597626B (en) 2019-08-23 2019-08-23 Method, device and system for allocating resources and tasks in distributed system

Publications (1)

Publication Number Publication Date
WO2021036936A1 true WO2021036936A1 (en) 2021-03-04

Family

ID=68855493

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110544 WO2021036936A1 (en) 2019-08-23 2020-08-21 Method and apparatus for allocating resources and tasks in distributed system, and system

Country Status (2)

Country Link
CN (2) CN110597626B (en)
WO (1) WO2021036936A1 (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110597626B (en) * 2019-08-23 2022-09-06 第四范式(北京)技术有限公司 Method, device and system for allocating resources and tasks in distributed system
CN111190712A (en) * 2019-12-25 2020-05-22 北京推想科技有限公司 Task scheduling method, device, equipment and medium
CN113742053A (en) * 2020-05-29 2021-12-03 中国电信股份有限公司 Container resource allocation method and device
CN111507650B (en) * 2020-07-02 2021-01-05 深圳微品致远信息科技有限公司 Computing power distribution scheduling method and system for edge computing platform
CN111984408B (en) * 2020-08-14 2021-04-20 昆山华泛信息服务有限公司 Data cooperative processing method based on big data and edge computing and edge cloud platform
CN111782626A (en) * 2020-08-14 2020-10-16 工银科技有限公司 Task allocation method and device, distributed system, electronic device and medium
CN112256418B (en) * 2020-10-26 2023-10-24 清华大学深圳国际研究生院 Big data task scheduling method
CN112905350A (en) * 2021-03-22 2021-06-04 北京市商汤科技开发有限公司 Task scheduling method and device, electronic equipment and storage medium
CN113485833B (en) * 2021-07-09 2024-02-06 支付宝(杭州)信息技术有限公司 Resource prediction method and device
CN114265695A (en) * 2021-12-26 2022-04-01 特斯联科技集团有限公司 Energy control device and system based on decision technology
CN114780225B (en) * 2022-06-14 2022-09-23 支付宝(杭州)信息技术有限公司 Distributed model training system, method and device
CN116860723B (en) * 2023-09-04 2023-11-21 合肥中科类脑智能技术有限公司 Cross-computing center data migration method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103763378A (en) * 2014-01-24 2014-04-30 中国联合网络通信集团有限公司 Task processing method and system and nodes based on distributive type calculation system
CN105159769A (en) * 2015-09-11 2015-12-16 国电南瑞科技股份有限公司 Distributed job scheduling method suitable for heterogeneous computational capability cluster
CN105610621A (en) * 2015-12-31 2016-05-25 中国科学院深圳先进技术研究院 Method and device for dynamically adjusting task level parameter of distributed system architecture
CN107580023A (en) * 2017-08-04 2018-01-12 山东大学 A kind of the stream process job scheduling method and system of dynamic adjustment task distribution
US20180365072A1 (en) * 2017-06-20 2018-12-20 International Business Machines Corporation Optimizing resource usage in distributed computing environments by dynamically adjusting resource unit size
CN110597626A (en) * 2019-08-23 2019-12-20 第四范式(北京)技术有限公司 Method, device and system for allocating resources and tasks in distributed system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9525728B2 (en) * 2013-09-17 2016-12-20 Bank Of America Corporation Prediction and distribution of resource demand
CN104317658B (en) * 2014-10-17 2018-06-12 华中科技大学 A kind of loaded self-adaptive method for scheduling task based on MapReduce
CN109478147B (en) * 2016-07-13 2021-12-14 华为技术有限公司 Adaptive resource management in distributed computing systems
CN107066332B (en) * 2017-01-25 2020-03-13 广东神马搜索科技有限公司 Distributed system and scheduling method and scheduling device thereof
CN109117265A (en) * 2018-07-12 2019-01-01 北京百度网讯科技有限公司 The method, apparatus, equipment and storage medium of schedule job in the cluster
CN109165093B (en) * 2018-07-31 2022-07-19 宁波积幂信息科技有限公司 System and method for flexibly distributing computing node cluster
CN109669820A (en) * 2018-12-24 2019-04-23 广州君海网络科技有限公司 Task monitoring and managing method and device based on Kettle

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103763378A (en) * 2014-01-24 2014-04-30 中国联合网络通信集团有限公司 Task processing method and system and nodes based on distributive type calculation system
CN105159769A (en) * 2015-09-11 2015-12-16 国电南瑞科技股份有限公司 Distributed job scheduling method suitable for heterogeneous computational capability cluster
CN105610621A (en) * 2015-12-31 2016-05-25 中国科学院深圳先进技术研究院 Method and device for dynamically adjusting task level parameter of distributed system architecture
US20180365072A1 (en) * 2017-06-20 2018-12-20 International Business Machines Corporation Optimizing resource usage in distributed computing environments by dynamically adjusting resource unit size
CN107580023A (en) * 2017-08-04 2018-01-12 山东大学 A kind of the stream process job scheduling method and system of dynamic adjustment task distribution
CN110597626A (en) * 2019-08-23 2019-12-20 第四范式(北京)技术有限公司 Method, device and system for allocating resources and tasks in distributed system

Also Published As

Publication number Publication date
CN110597626A (en) 2019-12-20
CN115525438A (en) 2022-12-27
CN110597626B (en) 2022-09-06

Similar Documents

Publication Publication Date Title
WO2021036936A1 (en) Method and apparatus for allocating resources and tasks in distributed system, and system
US11442764B2 (en) Optimizing the deployment of virtual resources and automating post-deployment actions in a cloud environment
US11720408B2 (en) Method and system for assigning a virtual machine in virtual GPU enabled systems
US11088961B2 (en) Monitoring data streams and scaling computing resources based on the data streams
TWI620075B (en) Server and cloud computing resource optimization method thereof for cloud big data computing architecture
WO2016082693A1 (en) Method and device for scheduling computation tasks in cluster
US9483319B2 (en) Job scheduling apparatus and method therefor
US11704123B2 (en) Automated orchestration of containers by assessing microservices
KR20170029263A (en) Apparatus and method for load balancing
CN115373835A (en) Task resource adjusting method and device for Flink cluster and electronic equipment
CN111880914A (en) Resource scheduling method, resource scheduling apparatus, electronic device, and storage medium
CN111782147A (en) Method and apparatus for cluster scale-up
GB2611177A (en) Multi-task deployment method and electronic device
CN111506414B (en) Resource scheduling method, device, equipment, system and readable storage medium
US20220261254A1 (en) Intelligent Partitioning Engine for Cluster Computing
KR20230087316A (en) Apparatus and method for determining ai-based cloud service server
US11360822B2 (en) Intelligent resource allocation agent for cluster computing
EP2828761A1 (en) A method and system for distributed computing of jobs
Wang et al. Model-based scheduling for stream processing systems
Manjaly et al. Various approches to improve MapReduce performance in Hadoop
US11914586B2 (en) Automated partitioning of a distributed database system
CN114626546A (en) Atmospheric pollution source data analysis method, device, equipment and storage medium
CN113886036A (en) Method and system for optimizing cluster configuration of distributed system
CN114564292A (en) Distributed gridding processing method, device, equipment and medium for data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20857894

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20857894

Country of ref document: EP

Kind code of ref document: A1