WO2020034646A1 - Resource scheduling method and device - Google Patents

Resource scheduling method and device Download PDF

Info

Publication number
WO2020034646A1
WO2020034646A1 PCT/CN2019/081200 CN2019081200W WO2020034646A1 WO 2020034646 A1 WO2020034646 A1 WO 2020034646A1 CN 2019081200 W CN2019081200 W CN 2019081200W WO 2020034646 A1 WO2020034646 A1 WO 2020034646A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
node
resource
task request
request
Prior art date
Application number
PCT/CN2019/081200
Other languages
French (fr)
Chinese (zh)
Inventor
刘志飘
邓慧财
杰森•T•S• 兰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020034646A1 publication Critical patent/WO2020034646A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration

Definitions

  • the present application relates to the technical field of cloud computing, and in particular, to a method and a device for resource scheduling.
  • Another resource coordinator (Yet, Other, Resource, Negotiator, YARN) is a Hadoop 2.X general-purpose resource management system, responsible for resource management and task scheduling of nodes in a cloud server cluster.
  • a resource scheduling scheme based on YARN submits a task to a resource queue, and the division of the resource queue is the division of the logical cluster, that is, the task is scheduled to the logical cluster corresponding to the resource queue.
  • the node For example, for example, divide the resources of the entire cluster into three resource queues: resource queue 1, resource queue 2, and resource queue 3.
  • Resource queue 1 corresponds to logical cluster A
  • resource queue 2 corresponds to logical cluster B
  • resource queue 3 corresponds to logical cluster C
  • a resource queue is bound to a node in a logical cluster.
  • a node can only be determined in the logical cluster corresponding to the resource queue. It can be seen that in the prior art resource scheduling schemes, tasks may be scheduled to nodes with weak processing capabilities, resulting in low task efficiency and low resource utilization of nodes in the cluster.
  • the present application provides a resource scheduling method and device, which are used to improve task execution efficiency and node resource utilization.
  • the present application provides a resource scheduling method.
  • the method includes: a resource scheduling device receives a task request sent by a client, the task request includes a resource tag corresponding to the task request, and the task request passes the task.
  • the resource tag included in the request specifies the resources required to perform the task;
  • the resource scheduling device determines a target node from the node group according to the resource tag that each node in the node group has, and the target node has the resources included in the task request Label; the resource scheduling device instructs the target node to execute the task specified by the task request.
  • the resource scheduling device receives the task request sent by the client, and the resource scheduling device determines a target node from the node group according to the resource label that each node in the node group has and the resource label included in the task request, and the target The node has a resource tag included in the task request; the resource scheduling device instructs the target node to execute a task specified by the task request.
  • the task request since the task request includes a resource tag corresponding to the task request, the task request specifies a resource required to execute the task through the resource tag included in the task request, so the resource required by the task specified by the task request To determine the target node, a node with a processing capacity suitable for the task specified by the task request can be found, thereby helping to improve the execution efficiency of the task specified by the task node.
  • the resource label included in the task request is determined from the node group according to the resource label that the node has. In this way, the Resources, which in turn helps improve the resource utilization of nodes.
  • a resource scheduling device When a resource scheduling device receives a task request and determines a target node from a node group, since some tasks require resources to execute when they include a graphics processing unit (GPU), some tasks require resources that are not required for execution. Including the GPU. Therefore, if a task without a GPU requirement is scheduled to be executed on a node with a GPU, a task that urgently needs GPU resources cannot find a suitable node with a GPU resource, thereby causing a waste of GPU resources on the node. Therefore, in order to avoid wasting GPU resources in the node group, based on the above method, several alternative implementation methods are provided below.
  • GPU graphics processing unit
  • the resource scheduling device may select a target node for executing a task specified by the task request according to whether a resource tag corresponding to the task request includes a resource tag of the GPU.
  • a node without a resource tag of the GPU is selected from the node group; the selected node is instructed to execute the task specified by the task request.
  • a resource tag corresponding to the task request includes a resource tag of a GPU
  • a node having a resource tag of the GPU is selected from a node group; and the selected node is instructed to execute a task specified by the task request.
  • a resource tag corresponding to a task request includes a GPU tag
  • selecting a target node to execute a task specified by the task request it is possible to implement GPU resource scheduling on demand according to the resources required by the task specified by the task request. , Which can improve the resource utilization of the nodes in the node group, especially the utilization of GPU resources, which helps to avoid the exhaustion of the central processing unit (CPU) resources in the node and the Excessive GPU resources are wasted.
  • CPU central processing unit
  • the task request may further include the amount of resources required to execute the task
  • the resource scheduling device may select a target node to execute the task specified by the task request according to whether the amount of resources required to execute the task includes the amount of GPU resources.
  • the amount of resources required to execute a task does not include GPU resources, selecting a node that does not have the GPU resources from the node group; instructing the selected node to execute the task specified by the task request task.
  • the amount of resources required to perform a task does not include GPU resources, selecting a node that does not have the GPU resources from the node group; instructing the selected node to execute the task request specification Task.
  • the task request records the amount of resources required to execute the task; the resource scheduling device determines the target node from the node group according to the resource label that each node in the node group has, and specifically includes: The node group selects a target node that satisfies the amount of resources specified by the task request. Specifically, the resource scheduling device selects, from the node group, a node that satisfies the amount of resources specified by the task request and has the resource tag included in the task request, that is, the target node. In this way, the determined target node not only meets the resources required to execute the task specified by the resource tag included in the task request, but also meets the amount of resources required to execute the task, thereby helping to improve the execution efficiency of the task specified by the task request.
  • the method may further include: the resource scheduling device obtains resource information of each node in the node group, and the resource information records a node identifier of the node and resources owned by the node; The resource information of each node in the node group is described, and the resource tag and the amount of resources that each node in the node group has are recorded. In this way, the resource scheduling device can accurately determine the node used to perform the task specified in the task request according to the resource label and the amount of resources the node has.
  • the present application provides a resource scheduling method.
  • the method includes: a client generates a task request, the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to execute the task through the resource tag included in the task request. ; Then, the client sends a task request to the resource scheduling device, and the task request is used by the resource scheduling device to determine the target node to perform the task specified by the task request according to the resource tag of each node in the node group, and the target node has the task request Resource tags.
  • the client Based on this solution, the client generates a task request.
  • the task request specifies the resources required to perform the task through the resource tag included in the task request.
  • the client sends a task request to the resource scheduling device.
  • the resource scheduling device specifies the task specified by the task request.
  • the required resources are used to determine the target node, and a node with a processing capacity suitable for the task specified by the task request can be found, thereby helping to improve the execution efficiency of the task specified by the task node.
  • the resource scheduling device determines a target node that executes the task specified by the task request, it determines from the node group that the resource label includes the resource label included in the task request according to the resource label that the node has, so that the node can be used reasonably
  • the resources of the nodes in the group which in turn help improve the resource utilization of the nodes.
  • the present application provides a device having the functions of implementing the embodiments of the first aspect described above.
  • This function can be realized by hardware, and can also be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the device includes a communication unit and a processing unit, where:
  • a receiving unit configured to receive a task request sent by a client, where the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to perform a task through the resource tag included in the task request;
  • the processing unit is configured to determine a target node from the node group according to a resource label that each node in the node group has, and the target node has the resource label included in the task request; and instruct the target node to execute all resources.
  • the task specified by the task request is configured to determine a target node from the node group according to a resource label that each node in the node group has, and the target node has the resource label included in the task request; and instruct the target node to execute all resources. The task specified by the task request.
  • the present application provides a device having the functions of implementing the embodiments of the second aspect described above.
  • This function can be realized by hardware, and can also be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the device includes a communication unit and a processing unit, where:
  • a processing unit configured to generate a task request, where the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to perform a task by using the resource tag included in the task request;
  • a communication unit configured to send the task request to a resource scheduling device, where the task request is used by the resource scheduling device to determine a target node to perform a task specified by the task request according to a resource tag possessed by each node in the node group ,
  • the target node has a resource tag included in the task request.
  • the present application provides a device including: a processor and a memory; the memory is configured to store computer instructions; the processor executes the computer instructions stored in the memory, so that the device executes the foregoing first aspect or the first aspect
  • the memory may be integrated in the processor, or may be independent of the processor.
  • the present application provides a device including: a processor and a memory; the memory is configured to store computer instructions; the processor executes the computer instructions stored in the memory, so that the device executes the second aspect or the second aspect described above
  • the resource scheduling method in the method provided by any embodiment of the present invention.
  • the memory may be integrated in the processor, or may be independent of the processor.
  • the present application further provides a system including a resource scheduling apparatus in the method provided by any embodiment of the first aspect and a client in the method provided by any embodiment of the second aspect.
  • the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions are executed by a computer to implement the methods described in the foregoing aspects.
  • the present application also provides a computer program product containing instructions that, when run on a computer, causes the computer to execute the methods described in the above aspects.
  • FIG. 1a is a schematic diagram of a possible system architecture provided by this application.
  • FIG. 1b is a schematic diagram of another possible system architecture provided by this application.
  • FIG. 2 is a schematic diagram of a resource scheduling method provided by the present application.
  • FIG. 3 is a schematic diagram of a mapping relationship between a node identifier and a resource tag provided in this application;
  • FIG. 4 is a schematic diagram of another resource scheduling method provided by the present application.
  • FIG. 5 is a schematic structural diagram of a device provided by this application.
  • FIG. 6 is a schematic structural diagram of another device provided by the present application.
  • FIG. 1a exemplarily illustrates a possible system architecture diagram provided by an embodiment of the present application.
  • the system architecture includes N nodes included in a client, a resource scheduling device, and a node group.
  • FIG. 1a only shows nodes 1, ..., and N by way of example, where N is a positive integer.
  • the client generates a task request according to the job submitted by the user, and sends the task request to the resource scheduling device.
  • the task request specifies the resources required to perform the task through the resource tag included in the task request.
  • the task request is used for the resource scheduling device slave node. Determines the target node in the group to perform the task specified by the task request.
  • the resource scheduling device receives the task request sent by the client, and determines the target node for executing the task request from the node group according to the task request and the resources possessed by the N nodes included in the node group, and instructs the target node to execute the task request designation. Task.
  • the N nodes included in the node group are used to perform tasks instructed by the resource scheduling device to execute.
  • FIG. 1a and FIG. 1b another schematic diagram of a possible system architecture provided by an embodiment of the present application is exemplarily shown.
  • FIG. 1b there are exemplarily shown five modules including a task request forwarding module, a task scheduling control module, a label customization module, a scheduling engine (Yarn) module, and a Docker mirror registry (Docker Registry) included in the resource scheduling apparatus.
  • a task request forwarding module a task scheduling control module
  • a label customization module a scheduling engine (Yarn) module
  • a scheduling engine (Yarn) module a scheduling engine (Yarn) module
  • Docker Registry Docker mirror registry included in the resource scheduling apparatus.
  • the task request forwarding module is responsible for receiving algorithm scheduling tasks of users, providing load balancing and request forwarding capabilities.
  • the task scheduling control module is used to complete functions such as task parameter analysis, task scheduling control, and task queue management.
  • the task parameter analysis function may be to analyze the task priority
  • the task queue management function may be to place the task that has parsed the task parameters into the task queue according to the time sequence of the received task
  • the task scheduling control function may be based on the task.
  • the priority and the time when the task is placed in the task queue are scheduled from the task queue to Yarn for processing. Among them, the high priority task is scheduled first, and the tasks with the same priority are scheduled according to the time order of the task queue.
  • the label customization module mainly includes the information collection and reporting sub-module and the multi-label customization sub-module.
  • the information collection and reporting sub-module is used to collect the resource information of each node in the node group.
  • the information collection and reporting sub-module reports the collected resource information to Yarn, so that Yarn can be used for task scheduling according to the resource information of the node.
  • the resources required to perform this task on the other hand, the information collection and reporting sub-module sends the collected resource information to the multi-label custom sub-module, so that the label custom sub-module generates resource tags for each type of resource according to the resource information. Report the correspondence between the node identifier and all resource tags that the node has to Yarn.
  • a scheduling engine (Yarn) module is used to store the correspondence between the received node identifiers and all resource tags possessed by the nodes in a node tag database, and implement specific task scheduling functions according to all resource tags possessed by the nodes.
  • the node tag database is located in Yarn.
  • the node tag database may exist in Yarn in the form of a configuration file.
  • Yarn when Yarn receives a task scheduled by the task scheduling control module, Yarn queries the resource label of each node from the node label resource library, and determines according to the resource label of the node and the resource label of the task request. The node that executes the task specified by the task request.
  • Docker Registry which manages open source components for Docker's private image repository, mainly provides image upload and download capabilities.
  • FIG. 2 exemplarily illustrates a schematic flowchart of a resource scheduling method provided by the present application. The method includes the following steps:
  • Step 201 The client generates a task request.
  • the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to execute the task by using the resource tag included in the task request.
  • the task request may include one or more resource tags.
  • resource tags may exist in the task request in the form of a label list.
  • the resource tag corresponding to the task request is determined according to the amount of resources required by the task. Specifically, the client determines the resource tag corresponding to the task request according to the amount of resources required for the task, or the client determines the resource tag corresponding to the task request according to the amount of resources required for the task and a preset rule, and the preset rule is related to The resource scheduling device determines that the rules of the resource tags corresponding to the task requests are consistent.
  • the resource label corresponding to the task request may be obtained from a resource scheduling device.
  • the client sends a query message to the resource scheduling device, where the query message is used to obtain a resource label set, and the resource label set includes a resource label of each node in the node group.
  • the resource label indication of the node The resources the node has.
  • the client receives a response message from the resource scheduling device, the response message including the resource tag set.
  • the client can select one or more resource tags from the resource tag set according to the resources required by the task, and carry the selected resource tag to the task request.
  • the above task request may further include some or all of the following information: task name (name), job identification (identity), algorithm image identification (identity), task priority (priority), copy Number (copy number), the resources required by the task (such as CPU, memory, and GPU resources), and the amount of resources required by the task.
  • a job may include multiple tasks.
  • a user submits a job as an example of checking violations in a traffic video.
  • the client may generate one or more task requests based on the job.
  • a task request may correspond to a task.
  • Can correspond to multiple tasks.
  • one task request corresponds to one task.
  • the job is divided into two tasks.
  • Task 1 corresponding to task request 1 is to convert a video into an image.
  • Task 2 corresponding to task request 2 is image processing, such as face recognition.
  • the job identifiers of task 1 and task 2 belonging to the same job are the same, that is, the job identifiers included in task request 1 and the job identifiers included in task request 2 are the same.
  • task 1 and task 2 belong to the same job. Because the tasks in the same job have a dependency relationship, the task priorities of multiple tasks that belong to the same job can be set to the same, that is, tasks included in task request 1.
  • the task priority is the same as the task priority included in task request 2.
  • a container Before any task in task 1 or task 2 is scheduled to be executed on a node, a container needs to be generated in the node, and the container includes resources required to perform the task.
  • the task request 1 indicates the algorithm image identity (image identity) corresponding to the task 1, and Docker Registry pushes the image identified by the algorithm image identity to node 1 to generate a container in node 1.
  • Step 202 The client sends the task request to the resource scheduling device. Accordingly, the resource scheduling apparatus may receive the task request sent by the client.
  • the resource scheduling device determines a target node from the node group according to a resource label possessed by each node in the node group.
  • the target node has a resource tag included in the task request.
  • the resource scheduling device determines a target node having the resource label included in the task request from the node group according to the resource label corresponding to the task request and the resource label possessed by the node.
  • each node in the node group may have one or more resource tags.
  • the resource scheduling device acquires resource information of each node in the node group, and the resource information records a node identifier of the node and resources possessed by the node; the resource scheduling device according to the resource information of each node in the node group, Record the resource label and the amount of resources each node in the node group has. In this way, the resource scheduling device can accurately determine the node used to perform the task specified in the task request according to the resource label and the amount of resources the node has.
  • a label customization module may be used to collect resource information of each node in the node group, and generate a resource label that each node has.
  • the node identifier recorded in the resource information may be the IP address of the node, and the resources recorded by the node may be resources such as CPU, GPU, memory, and disk.
  • the resource tags possessed by the node may be stored in a node tag database.
  • FIG. 3 it is a schematic diagram of a mapping relationship between a node identifier and a resource label.
  • a node may correspond to one or more resource tags, and each resource tag may correspond to one or more nodes.
  • IP internet protocol
  • This example uses the unique identifier of a node (such as an IP address) to map resource tags, and supports a single node to customize multiple resource tags.
  • a node such as an IP address
  • only native Yarn interface capabilities can be used to indirectly provide nodes with task queues.
  • the scheme of tagging resource tags and the node does not support multiple resource tags.
  • This application can implement customizing multiple resource tags for a node and directly specifying the resource tags corresponding to the task request when the task is issued, which can optimize the resource scheduling process. This will help to improve the utilization of node group resources.
  • the target node has a resource tag included in the task request, and specifically, there can be at least two implementation manners.
  • the resource tag that the target node has includes a resource tag corresponding to the task request.
  • the resource scheduling device receives the task request, it is determined from the node group that the node having the resource label including the resource label corresponding to the task request is the target node.
  • a node may have one or more resource tags corresponding to the following resource types: CPU frequency, GPU model, GPU block number, memory module model, bandwidth, and computer room information.
  • the GPU model can include P4, P40, P100, and V100.
  • the memory module model can include double data rate (DDR) 1, DDR2, DDR4, and so on.
  • node 1 has one resource label, which indicates that the CPU clock frequency is high; for another example, node 2 has two resource labels, which indicate the CPU clock frequency is low and GPUP100 block.
  • the resource labels corresponding to the task requests are similar to the resource labels that the nodes have, and no further examples are given here.
  • Table 1 and Table 2 are used to illustrate the resource tags that the node has and the resource tags included in the task request.
  • the node group includes 6 nodes, which are Node1, Node2, Node3, Node4, Node5, and Node6.
  • the resource label of each node is determined as shown in Table 1 below. Show.
  • Resource Type Resource tag Resource Type Resource tag
  • CPU High CPU frequency RAM Memory
  • Memory DDR3 CPU Low CPU frequency RAM Memory
  • DDR4 GPU GPU P100, 1 block GPU GPU, P40, 4 blocks GPU GPU P100 2 blocks GPU GPU P4 bandwidth High bandwidth GPU GPU V100 bandwidth Low bandwidth RAM Memory DDR1 engine room Computer room 1 RAM Memory DDR2 engine room Computer room 2
  • the client selects the resource tag corresponding to the task request from the resource tag set according to the resource tag set of Table 2 and the resources required to execute the task specified by the task request.
  • task request 1 includes 3 resource tags, which are CPU high frequency, high bandwidth, and GPU P100. Based on the above table 1, only the 4 resource tags included in Node 3 include high CPU frequency and high bandwidth. , GPU 100, then Node 3 is the target node that can execute the task specified by the task request 1.
  • task request 2 includes two resource tags, namely GPU 40 blocks and memory DDR1.
  • the resource tags of Node 2 and Node4 include at least GPU 40 blocks and memory DDR1.
  • the target nodes are determined in 2 and Node4. Specifically, the target nodes can be randomly determined from Node2 and Node4, or the target nodes can be determined based on the resources available in Node2 and Node4, such as selecting a node with more available resources as the target. node.
  • the resource scheduling device may determine a target node to execute a task according to whether a node having a resource label including a resource label corresponding to the task request is determined in the node group.
  • the method of matching the resource tags to determine the target node is simple and helps save time.
  • the resources possessed by the node indicated by the resource tag possessed by the target node meet the requirements of the resources indicated by the resource tag corresponding to the task request.
  • the resource scheduling device when it receives the task request, it determines from a node group the resources that the node indicated by the resource label has, and the node that meets the resource requirements indicated by the resource label corresponding to the task request, that is, Target node.
  • a node may have one or more resource tags corresponding to the following resource types: number of CPU cores, memory capacity, disk capacity, and bandwidth.
  • the two resource tags possessed by node 4 respectively indicate the available disk capacity of 201-300GB and the memory capacity of 1000-1500MB.
  • the resource labels corresponding to the task requests are similar to the resource labels that the nodes have, and no further examples are given here.
  • Table 3 and Table 4 are used to illustrate the resource tags that the node has and the resource tags included in the task request.
  • the node group includes 6 nodes, which are Node1, Node2, Node3, Node4, Node5, and Node6. Based on the resource information of each node, the resource label of each node is determined as shown in Table 3 below. Show.
  • the client selects the resource tag corresponding to the task request from the resource tag set according to the resource tag set of Table 4 and the resources required to execute the task specified by the task request.
  • the resource label that the node has and the resource label corresponding to the task request may be the same or different.
  • task request 1 includes a resource tag indicating a memory capacity of 500-700 MB.
  • the resources indicated by the resource tags of Node 2 and Node 5 include the memory capacity indicated by the resource tag corresponding to task request 1. 500-1000MB, so Node 2 and Node 5 can be used as target nodes.
  • the resource indicated by the three resource tags of Node 4 does not include the memory capacity indicated by the resource tag corresponding to task request 500-1000MB, the memory capacity indicated by the resource tag of Node 4 has 1000-1500MB, The memory capacity that is higher than the task requires 500-1000MB, that is, Node 4 meets the resource requirements indicated by the resource label corresponding to task request 1, so Node 4 can also be used as the target node. Therefore, a node can be selected from all Nodes 2, 5 and 4 to perform the task specified by the task request 1.
  • task request 2 includes two resource tags, which respectively indicate a memory capacity of 500-900MB and a bandwidth of 21-50M. Because only the resources indicated by the two resource tags of Node 2 meet the instructions of the resource tags corresponding to the task request 2 Resource requirements, so this Node 2 can be used as the target node.
  • the resources possessed by the node indicated by the resource tag possessed by the node satisfy the requirements of the resource indicated by the resource tag corresponding to the task request, and the target node can be determined without the need of the resource tag and
  • the resource labels corresponding to the task requests are exactly the same to determine the target node, so that the task can be scheduled more flexibly.
  • the resource labels of the nodes shown in Table 3 also change, so when the available resources of the node change, it is necessary to target Table 3 and the table when the available resources of the node change.
  • the resource label in 4 is updated. Specifically, the resource label of the node may be updated according to the available resources of the node.
  • the resource tag possessed by the node may be a combination of any one or more of the resource tags in Table 1 and any one or more of the resource tags in Table 3.
  • resources The tag set may also be a combination of the resource tags in Tables 2 and 4 above.
  • Node 1 has three resource tags, which indicate the CPU's main frequency is high, the number of CPU cores is 32, and the bandwidth is 21-50M.
  • the task request records the amount of resources required to execute the task.
  • the resource scheduling device selects from the node group the resource amount that satisfies the task request specification. Target node.
  • the resource scheduling device selects, from the node group, a node having a resource tag corresponding to the task request and satisfying the amount of resources specified by the task request, that is, a target node.
  • the determined target node not only meets the resources required to execute the task specified by the resource tag included in the task request, but also meets the amount of resources required to execute the task, thereby helping to improve the execution efficiency of the task specified by the task request.
  • Step 204 The resource scheduling device instructs the target node to execute a task specified by the task request.
  • the client generates a task request.
  • the task request specifies the resources required to perform the task through the resource tag included in the task request.
  • the client sends a task request to the resource scheduling device, and the resource scheduling device receives the task sent by the client.
  • Request the resource scheduling device determines a target node from the node group according to the resource label that each node in the node group has and the resource label included in the task request; the target node has the resource label included in the task request; resource scheduling The device instructs the target node to execute a task specified by the task request.
  • the task request since the task request includes a resource tag corresponding to the task request, the task request specifies a resource required to execute the task through the resource tag included in the task request, so the resource required by the task specified by the task request To determine the target node, a node with a processing capacity suitable for the task specified by the task request can be found, thereby helping to improve the execution efficiency of the task specified by the task node.
  • the resource label included in the task request is determined from the node group according to the resource label that the node has. In this way, the Resources, which in turn helps improve the resource utilization of nodes.
  • the task request may further include a task affinity resource tag, and the task affinity resource tag is used to indicate the affinity or anti-affinity of the task specified by the task request and another task, thereby On-demand scheduling of tasks can be further implemented based on the affinity between tasks.
  • the resource scheduling device when the task specified by the task request has affinity with another task, the resource scheduling device instructs the target node to perform another task; when the task specified by the task request has anti-affinity with another task At this time, the resource scheduling device instructs another node in the node group different from the target node to perform another task. In this way, when determining a node that executes a task, the resource scheduling device considers the affinity between the tasks, so that a target node suitable for performing the task can be determined.
  • the task specified by the task request is task 1
  • the other task is task 2
  • the target node specified by the resource scheduling device for task 1 is Node1. If task 1 and task 2 have affinity, Node 1 can be used to perform task 2.
  • the task specified by the task request is task 1
  • the other task is task 3
  • the target node specified by the resource scheduling device for task 1 is Node1. If task 1 and task 3 have anti-affinity, nodes other than Node 1 can be used to perform task 2 to avoid task 1 and task 2 being executed on the same node, thereby improving the processing efficiency of task 1 and task 2.
  • a flag bit may be added to the task request, and the flag bit is used to identify whether the resource tag corresponding to the task request is a necessary resource tag. Specifically, if the resource tag identified by the flag bit is a necessary resource tag, the target node must have the resource tag identified by the flag bit; if the resource tag identified by the flag bit is an unnecessary resource tag, the target node may not have the resource tag. The tag identifies the resource label.
  • the value of the flag bit is 0, which means that the resource tag identified by the flag bit is an unnecessary resource tag; the value of the flag bit is 1, which means that the resource tag identified by the flag bit is a necessary resource tag.
  • the value of the flag bit is 1, which indicates that the resource tag identified by the flag bit is an unnecessary resource tag; the value of the flag bit is 0, which indicates that the resource tag identified by the flag bit is a necessary resource tag.
  • the resource tag corresponding to the task request includes GPU P40 and memory DDR, where GPU P40 is Necessary resource tag, memory DDR is an unnecessary resource tag, then the flag bit that can be in the task request can be expressed as 1, 0, see Table 1, select the resource with the task request from the nodes shown in Table 1
  • Nodes 2 and 4 with the necessary resource tags GPU P40 can be determined, and Node 2 and Node 4 can be used to perform the task.
  • the task specified in the request is assumed that the resource tag corresponding to the task request.
  • a resource scheduling device when a resource scheduling device receives a task request and determines a target node from a node group, because some tasks require resources to include GPUs during execution, and some tasks require resources to execute without GPUs, therefore, If a task without a GPU requirement is scheduled for execution on a node with a GPU, a task that urgently needs GPU resources cannot be scheduled to the node, thereby causing a waste of GPU resources on the node. Therefore, in order to avoid wasting GPU resources in the node group, based on the above method, several alternative implementation methods are provided below.
  • the resource scheduling device may select a target node for executing a task specified by the task request according to whether a resource tag corresponding to the task request includes a resource tag of the GPU.
  • the resource scheduling apparatus selects a node without a GPU resource tag from the node group, and instructs the selected node to execute the task request specification Task.
  • the resource scheduling device selects a node without a resource tag of the GPU from the node group, and selects a target node having a resource tag corresponding to the task request from the nodes without the resource tag of the GPU, and schedules the target node to execute a task The task specified in the request.
  • the resource scheduling apparatus selects a node having a resource tag of the GPU from a node group, and instructs the selected node to execute the task specified by the task request. task.
  • the resource scheduling device selects a node having a resource tag of the GPU from the node group, and selects a target node having a resource tag corresponding to the task request from the nodes having the resource tag of the GPU, and schedules the target node to execute the task request. Assigned tasks.
  • a resource tag corresponding to a task request includes a resource tag of a GPU
  • selecting a target node to execute a task specified by the task request it is possible to implement GPU resources on demand according to the resources required by the task specified by the task request.
  • Scheduling can improve the resource utilization of the nodes in the node group, especially the utilization of GPU resources, and help avoid the waste of redundant GPU resources in the node caused by the exhaustion of CPU resources in the node.
  • the task request may further include the amount of resources required to execute the task
  • the resource scheduling device may select a target node to execute the task specified by the task request according to whether the amount of resources required to execute the task includes the amount of GPU resources.
  • the amount of resources required to execute a task does not include GPU resources, selecting a node that does not have the GPU resource from the node group, and instructing the selected node to execute the task request specification Task.
  • the amount of resources required to execute a task includes GPU resources, selecting a node with the GPU resources from the node group, and instructing the selected node to execute the task specified by the task request task.
  • FIG. 4 illustrates a schematic flowchart of another resource scheduling method according to an embodiment of the present application. The method includes the following steps:
  • Step 401 The resource scheduling device receives a task request.
  • step 402 the resource scheduling device determines whether a GPU is required to execute the task specified in the task request; if so, step 403 is performed; if not, step 406 is performed.
  • step 403 the resource scheduling device traverses the node label database to find the node with the resource label of the GPU. If so, step 404 is performed; if not, step 409 is performed.
  • the node tag database stores a correspondence between a node identifier and a resource tag.
  • Step 404 The resource scheduling device determines a target node having a resource label corresponding to the task request from the nodes having a resource label of the GPU.
  • Step 405 The resource scheduling device schedules the target node to execute the task specified by the task request.
  • step 406 the resource scheduling device traverses the node tag database to find nodes without resource tags of the GPU. If so, step 407 is performed; if not, step 409 is performed.
  • the node tag database stores a correspondence between a node identifier and a resource tag.
  • Step 407 The resource scheduling device determines a target node having a resource label corresponding to the task request from the nodes without the resource label of the GPU.
  • Step 408 The resource scheduling device schedules the target node to execute the task specified by the task request.
  • Step 409 The resource scheduling device sends a task scheduling failure message to the client, and abandons scheduling the task specified by the task request.
  • the optimal combination scheduling of GPU resources based on user-defined multiple resource tags and sensing nodes is compared to the solution in the prior art where all nodes are treated equally without GPU resources of sensing nodes.
  • On-demand scheduling of GPU resources for tasks can help improve the resource utilization of each node in the node group, especially the utilization of GPU resources of the node.
  • each device in the foregoing embodiments may include a hardware structure and / or a software module corresponding to each function.
  • this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
  • FIG. 5 shows a possible exemplary block diagram of a device involved in the embodiment of the present application, and the device 500 may exist in the form of software.
  • the apparatus 500 may include a communication unit 501 and a processing unit 502.
  • the communication unit 501 may include a receiving unit and a sending unit.
  • the processing unit 502 is configured to control and manage the operations of the device 500.
  • the communication unit 501 is configured to support communication between the device 500 and other devices (such as a client).
  • the device 500 may further include a storage unit 503 for storing the program code and data of the device 500, such as the resource tag of the node in the above method.
  • the processing unit 502 may be a processor or a controller.
  • the processing unit 502 may be a general-purpose CPU, a general-purpose processor, digital signal processing (DSP), application-specific integrated circuits (ASIC), and a field programmable gate.
  • DSP digital signal processing
  • ASIC application-specific integrated circuits
  • Array field programmable array, FPGA
  • the processor may also be a combination that realizes a computing function, for example, includes a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
  • the communication unit 501 may be a communication interface, a transceiver, or a transceiver circuit.
  • the communication interface is collectively referred to. In a specific implementation, the communication interface may include multiple interfaces.
  • the storage unit 503 may be a memory.
  • the device 500 may also be a resource scheduling device involved in this application.
  • the processing unit 502 may support the device 500 to perform the actions of the resource scheduling device in the method examples above. For example, the processing unit 502 may perform steps 203 and 204 in FIG. 2 .
  • the communication unit 501 may support communication between the device 500 and a client. For example, the communication unit 501 is configured to support the device 500 to perform step 202 in FIG. 2.
  • the communication unit 501 may be configured to receive a task request sent by a client, where the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to perform a task by using the resource tag included in the task request. ;
  • the processing unit 502 may be configured to determine a target node from the node group according to a resource label of each node in the node group, where the target node has the resource label included in the task request; instruct the target node to execute The task requests a specified task.
  • the resource tag includes a resource tag used to identify a GPU; the processing unit 502 is further configured to: when the resource tag corresponding to the task request does not include a resource tag of the GPU And selecting a node from the node group that does not have a resource tag of the GPU; and instructing the selected node to execute a task specified by the task request.
  • the task request records the amount of resources required to perform the task; the processing unit 502 is configured to: select from the node group a resource amount that satisfies the task request specification Target node.
  • the processing unit 502 is further configured to: obtain resource information of each node in the node group, where the resource information records a node identifier of the node and resources possessed by the node; according to the The resource information of each node in the node group records the resource tag and the amount of resources that each node in the node group has.
  • the processing unit 502 is further configured to: when the task specified by the task request has affinity with another task, instruct the target node to execute the another task; in When the task specified by the task request has anti-affinity with another task, another node in the node group different from the target node is instructed to execute the another task.
  • the apparatus 500 may also be a client involved in this application.
  • the processing unit 502 may support the device 500 to perform the actions of the client in the method examples above.
  • the processing unit 502 is configured to support the device 500 to perform step 201 in FIG. 2.
  • the communication unit 501 may support communication between the device 500 and other devices (such as a resource scheduling device).
  • the communication unit 501 is used to support the device 500 to perform step 202 in FIG. 2.
  • the processing unit 502 may be configured to generate a task request, where the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to perform a task through the resource tag included in the task request;
  • the communication unit 501 may be configured to send the task request to a resource scheduling device, where the task request is used by the resource scheduling device to determine a target for executing a task specified by the task request according to a resource label that each node in the node group has. Node, the target node has a resource tag included in the task request.
  • the device may be the resource scheduling device or a client described above.
  • the device 600 includes: a memory 601, a processor 602, and a communication interface 603.
  • the device 600 may further include a bus 604.
  • the communication interface 603, the processor 602, and the memory 601 can be connected to each other through a bus 604.
  • the bus 604 can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (referred to as an abbreviation). EISA) bus and so on.
  • the bus 604 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only a thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.
  • the processor 602 may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the solution of the present application.
  • the communication interface 603 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), and wireless local area networks (WLAN). Wired access network, etc.
  • a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), and wireless local area networks (WLAN). Wired access network, etc.
  • the memory 601 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (random access memory, RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable and programmable read-only memory (Erasable, Programmable, read-only memory), compact disc (read-only memory, CD-ROM) or other optical disk storage, optical disk storage (including compression Optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer Any other medium, but not limited to this.
  • the memory may exist independently, and is connected to the processor through the bus 604. The memory can also be integrated with the processor.
  • the memory 601 is configured to store a computer execution instruction for executing the solution of the present application, and the processor 602 controls execution.
  • the processor 602 is configured to execute a computer execution instruction stored in the memory 601, so as to implement the method provided by the foregoing embodiment of the present application.
  • the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
  • the above product can execute the method provided in the embodiment of the present application, and has the corresponding functional modules and beneficial effects of executing the method.
  • the above product can execute the method provided in the embodiment of the present application, and has the corresponding functional modules and beneficial effects of executing the method.
  • all or part may be implemented by software, hardware, or a combination thereof.
  • a software program When implemented using a software program, it may be all or partly implemented in the form of a computer program product.
  • a computer program product includes one or more instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the instructions may be stored on a computer storage medium or transmitted from one computer storage medium to another computer storage medium.
  • the instructions may be wired (e.g., coaxial cable, fiber optic, twisted pair, etc.) from a website site, computer, server, or data center.
  • the computer storage medium may be any medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes one or more media integrations.
  • the medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape, a magneto-optical disk (MO), etc.), an optical medium (for example, an optical disk), or a semiconductor medium (for example, ROM, EPROM, EEPROM, solid state disk (SSD)) )Wait.
  • Embodiments of the present application are described with reference to flowcharts and / or block diagrams of methods, devices (systems), and computer program products according to the embodiments of the present application. It should be understood that each process and / or block in the flowcharts and / or block diagrams, and combinations of processes and / or blocks in the flowcharts and / or block diagrams can be implemented by instructions. These instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine such that instructions executed by the processor of a computer or other programmable data processing device are generated for implementation Means of the function specified in one block or blocks of the flowchart or block and block diagrams.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a particular manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A resource scheduling method and device, the method specifically comprising: a resource scheduling device receives a task request sent by a client, the task request comprising a resource tag corresponding to the task request, and the task request specifying resources required for executing a task by means of the resource tag comprised in the task request; the resource scheduling device determines a target node from a node group according to resource tags of each node in the node group, the target node having the resource tag comprised in the task request; and the resource scheduling device instructs the target node to execute the task specified in the task request. On one hand, because of the resources required for the task which are specified by the resource tag corresponding to the task request, it is possible to determine a target node of which the processing capabilities are suitable for the task, thereby contributing to an improvement in the efficiency of executing a task specified by a task node. On the other hand, determining a target node having the resource tag comprised in the task request from the node group may help to improve the resource utilization rate of the nodes.

Description

一种资源调度方法及装置Method and device for resource scheduling 技术领域Technical field
本申请涉及云计算技术领域,尤其涉及一种资源调度方法及装置。The present application relates to the technical field of cloud computing, and in particular, to a method and a device for resource scheduling.
背景技术Background technique
在视频云计算场景下,比如针对交通视频,进行视频编解码、人脸特征提取、车牌识别和拥堵事件监测等处理,需要实现将需要处理的任务调度至云服务器集群中的节点进行处理。另一种资源协调者(Yet Another Resource Negotiator,YARN)是Hadoop 2.X通用的资源管理系统,负责云服务器集群中的节点的资源管理和任务调度。In video cloud computing scenarios, such as traffic video, video encoding and decoding, facial feature extraction, license plate recognition, and congestion event monitoring and other processing, it is necessary to implement the tasks that need to be processed to the nodes in the cloud server cluster for processing. Another resource coordinator (Yet, Other, Resource, Negotiator, YARN) is a Hadoop 2.X general-purpose resource management system, responsible for resource management and task scheduling of nodes in a cloud server cluster.
现有技术中基于YARN的资源调度方案,将任务提交到某个资源队列,该资源队列的划分即是逻辑集群的划分,也就是说,该任务会调度至与该资源队列对应的逻辑集群中的节点上运行。举个例子,比如将整个集群的资源划分为3个资源队列,分别为资源队列1、资源队列2和资源队列3,资源队列1对应逻辑集群A,资源队列2对应逻辑集群B,资源队列3对应逻辑集群C,如果将一个任务1提交至资源队列2,那么该任务1就会被调度至逻辑集群B中的节点。现有技术中的资源调度方案中,资源队列和逻辑集群中的节点绑定,如果确定出资源队列,就只能在资源队列对应的逻辑集群中确定出节点。可见,现有技术中的资源调度方案可能会出现任务被调度至处理能力很弱的节点上,导致任务运行效率不高,而且集群中的节点的资源利用率不高。In the prior art, a resource scheduling scheme based on YARN submits a task to a resource queue, and the division of the resource queue is the division of the logical cluster, that is, the task is scheduled to the logical cluster corresponding to the resource queue. On the node. For example, for example, divide the resources of the entire cluster into three resource queues: resource queue 1, resource queue 2, and resource queue 3. Resource queue 1 corresponds to logical cluster A, resource queue 2 corresponds to logical cluster B, and resource queue 3. Corresponding to logical cluster C, if a task 1 is submitted to resource queue 2, the task 1 will be scheduled to a node in logical cluster B. In the resource scheduling scheme in the prior art, a resource queue is bound to a node in a logical cluster. If a resource queue is determined, a node can only be determined in the logical cluster corresponding to the resource queue. It can be seen that in the prior art resource scheduling schemes, tasks may be scheduled to nodes with weak processing capabilities, resulting in low task efficiency and low resource utilization of nodes in the cluster.
综上,如何实现提高任务的执行效率和节点的资源利用率,仍需进一步深入研究。In summary, how to improve the execution efficiency of the task and the resource utilization of the nodes still needs further research.
发明内容Summary of the Invention
本申请提供一种资源调度方法及装置,用于提高任务的执行效率和节点的资源利用率。The present application provides a resource scheduling method and device, which are used to improve task execution efficiency and node resource utilization.
第一方面,本申请提供一种资源调度方法,该方法包括:资源调度装置接收客户端发送的任务请求,所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源;资源调度装置根据节点组中每个节点具有的资源标签,从所述节点组中确定目标节点,所述目标节点具有所述任务请求包括的资源标签;资源调度装置指示所述目标节点执行所述任务请求指定的任务。In a first aspect, the present application provides a resource scheduling method. The method includes: a resource scheduling device receives a task request sent by a client, the task request includes a resource tag corresponding to the task request, and the task request passes the task. The resource tag included in the request specifies the resources required to perform the task; the resource scheduling device determines a target node from the node group according to the resource tag that each node in the node group has, and the target node has the resources included in the task request Label; the resource scheduling device instructs the target node to execute the task specified by the task request.
基于该方案,资源调度装置接收客户端发送的任务请求,资源调度装置根据节点组中每个节点具有的资源标签和任务请求包括的资源标签,从所述节点组中确定目标节点,所述目标节点具有所述任务请求包括的资源标签;资源调度装置指示所述目标节点执行所述任务请求指定的任务。一方面,由于所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源,所以根据任务请求指定的任务所需的资源来确定目标节点,可以找到处理能力适合该任务请求指定的任务的节点,从而有助于提高该任务节点指定的任务的执行效率。另一方面,在 确定执行该任务请求指定的任务的目标节点时,根据节点具有的资源标签,从节点组中确定出具有该任务请求包括的资源标签,如此,可以合理利用节点组中节点的资源,进而有助于提高节点的资源利用率。Based on this solution, the resource scheduling device receives the task request sent by the client, and the resource scheduling device determines a target node from the node group according to the resource label that each node in the node group has and the resource label included in the task request, and the target The node has a resource tag included in the task request; the resource scheduling device instructs the target node to execute a task specified by the task request. On the one hand, since the task request includes a resource tag corresponding to the task request, the task request specifies a resource required to execute the task through the resource tag included in the task request, so the resource required by the task specified by the task request To determine the target node, a node with a processing capacity suitable for the task specified by the task request can be found, thereby helping to improve the execution efficiency of the task specified by the task node. On the other hand, when determining the target node to execute the task specified by the task request, the resource label included in the task request is determined from the node group according to the resource label that the node has. In this way, the Resources, which in turn helps improve the resource utilization of nodes.
在资源调度装置接收到任务请求,并从节点组确定目标节点时,由于有些任务在执行时所需的资源包括图形处理器(graphics processing unit,GPU),有些任务在执行时所需的资源不包括GPU,因此,如果将没有GPU需求的任务调度到具有GPU的节点上执行,导致急需GPU资源的任务找不到合适的具有GPU资源的节点,从而造成该节点上的GPU资源的浪费。因此,为了避免节点组中GPU资源的浪费,基于上述方法,以下提供几种可选的实现方法。When a resource scheduling device receives a task request and determines a target node from a node group, since some tasks require resources to execute when they include a graphics processing unit (GPU), some tasks require resources that are not required for execution. Including the GPU. Therefore, if a task without a GPU requirement is scheduled to be executed on a node with a GPU, a task that urgently needs GPU resources cannot find a suitable node with a GPU resource, thereby causing a waste of GPU resources on the node. Therefore, in order to avoid wasting GPU resources in the node group, based on the above method, several alternative implementation methods are provided below.
实现方法一,资源调度装置可以根据任务请求对应的资源标签是否包括GPU的资源标签,选择执行该任务请求指定的任务的目标节点。In the first implementation method, the resource scheduling device may select a target node for executing a task specified by the task request according to whether a resource tag corresponding to the task request includes a resource tag of the GPU.
作为一种具体的示例,在该任务请求对应的资源标签未包括GPU的资源标签时,从节点组中选择不具有GPU的资源标签的节点;指示选择的节点执行该任务请求指定的任务。As a specific example, when the resource tag corresponding to the task request does not include the resource tag of the GPU, a node without a resource tag of the GPU is selected from the node group; the selected node is instructed to execute the task specified by the task request.
作为另一种具体的示例,在该任务请求对应的资源标签包括GPU的资源标签时,从节点组中选择具有GPU的资源标签的节点;指示选择的节点执行该任务请求指定的任务。As another specific example, when a resource tag corresponding to the task request includes a resource tag of a GPU, a node having a resource tag of the GPU is selected from a node group; and the selected node is instructed to execute a task specified by the task request.
通过该实现方法,根据任务请求对应的资源标签是否包括GPU的标签,选择执行所述任务请求指定的任务的目标节点,可以实现根据任务请求指定的任务所需的资源,按需进行GPU资源调度,从而可以提升节点组中节点的资源利用率,尤其是提升GPU资源的利用率,有助于避免因节点中的中央处理器(central processing unit,CPU)资源被耗尽而导致该节点中的富余的GPU资源浪费。With this implementation method, according to whether a resource tag corresponding to a task request includes a GPU tag, selecting a target node to execute a task specified by the task request, it is possible to implement GPU resource scheduling on demand according to the resources required by the task specified by the task request. , Which can improve the resource utilization of the nodes in the node group, especially the utilization of GPU resources, which helps to avoid the exhaustion of the central processing unit (CPU) resources in the node and the Excessive GPU resources are wasted.
实现方法二,任务请求还可以包括执行任务所需的资源量,资源调度装置可以根据该执行任务所需的资源量是否包括GPU资源量,选择执行该任务请求指定的任务的目标节点。In the second method, the task request may further include the amount of resources required to execute the task, and the resource scheduling device may select a target node to execute the task specified by the task request according to whether the amount of resources required to execute the task includes the amount of GPU resources.
作为一种具体的示例,若所述执行任务所需的资源量不包括GPU资源,则从所述节点组中选择不具有所述GPU资源的节点;指示选择的节点执行所述任务请求指定的任务。As a specific example, if the amount of resources required to execute a task does not include GPU resources, selecting a node that does not have the GPU resources from the node group; instructing the selected node to execute the task specified by the task request task.
作为另一种具体的示例,若所述执行任务所需的资源量不包括GPU资源,则从所述节点组中选择不具有所述GPU资源的节点;指示选择的节点执行所述任务请求指定的任务。As another specific example, if the amount of resources required to perform a task does not include GPU resources, selecting a node that does not have the GPU resources from the node group; instructing the selected node to execute the task request specification Task.
进一步的,作为一种可能的实现方式,任务请求记录有执行任务所需的资源量;资源调度装置根据节点组中每个节点具有的资源标签从节点组中确定目标节点,具体包括:从所述节点组中选择满足所述任务请求指定的资源量的目标节点。具体的,资源调度装置从节点组中选择满足所述任务请求指定的资源量、且具有所述任务请求包括的资源标签的节点,即为目标节点。如此,确定出的目标节点不仅满足任务请求包括的资源标签指定的执行任务所需的资源,还满足执行任务所需的资源量,从而有助于提高该任务请求指定的任务的执行效率。Further, as a possible implementation manner, the task request records the amount of resources required to execute the task; the resource scheduling device determines the target node from the node group according to the resource label that each node in the node group has, and specifically includes: The node group selects a target node that satisfies the amount of resources specified by the task request. Specifically, the resource scheduling device selects, from the node group, a node that satisfies the amount of resources specified by the task request and has the resource tag included in the task request, that is, the target node. In this way, the determined target node not only meets the resources required to execute the task specified by the resource tag included in the task request, but also meets the amount of resources required to execute the task, thereby helping to improve the execution efficiency of the task specified by the task request.
基于上述任一实施例,进一步还可以包括:资源调度装置获取所述节点组中的每个节点的资源信息,所述资源信息记录有节点的节点标识和节点具有的资源;资源调度装置根据所述节点组中的每个节点的资源信息,记录所述节点组中的每个节点具有的资源 标签以及具有的资源量。如此,资源调度装置可以根据节点具有的资源标签和节点具有的资源量,进而可以准确的确定出用于执行任务请求中指定的任务的节点。Based on any of the foregoing embodiments, the method may further include: the resource scheduling device obtains resource information of each node in the node group, and the resource information records a node identifier of the node and resources owned by the node; The resource information of each node in the node group is described, and the resource tag and the amount of resources that each node in the node group has are recorded. In this way, the resource scheduling device can accurately determine the node used to perform the task specified in the task request according to the resource label and the amount of resources the node has.
基于上述任一实施例,进一步还可以包括:在任务请求指定的任务与另一任务具有亲和性时,资源调度装置指示目标节点执行另一任务;在任务请求指定的任务与另一任务具有反亲和性时,资源调度装置指示节点组中与目标节点不同的另一节点执行另一任务。如此,资源调度装置在确定执行任务的节点时,考虑了任务之间的亲和性,从而可以确定出适合执行任务的目标节点。Based on any of the above embodiments, it may further include: when the task specified by the task request has affinity with another task, the resource scheduling device instructs the target node to perform another task; and when the task specified by the task request has another task In the case of anti-affinity, the resource scheduling device instructs another node in the node group different from the target node to perform another task. In this way, when determining a node that executes a task, the resource scheduling device considers the affinity between the tasks, so that a target node suitable for performing the task can be determined.
第二方面,本申请提供一种资源调度方法,该方法包括:客户端生成任务请求,任务请求包括该任务请求对应的资源标签,任务请求通过任务请求包括的资源标签指定执行任务所需的资源;然后,客户端向资源调度装置发送任务请求,任务请求用于资源调度装置根据节点组中每个节点具有的资源标签,确定执行任务请求指定的任务的目标节点,目标节点具有任务请求包括的资源标签。In a second aspect, the present application provides a resource scheduling method. The method includes: a client generates a task request, the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to execute the task through the resource tag included in the task request. ; Then, the client sends a task request to the resource scheduling device, and the task request is used by the resource scheduling device to determine the target node to perform the task specified by the task request according to the resource tag of each node in the node group, and the target node has the task request Resource tags.
基于该方案,客户端生成任务请求,任务请求通过任务请求包括的资源标签指定执行任务所需的资源,客户端向资源调度装置发送任务请求,一方面,以便资源调度装置根据任务请求指定的任务所需的资源来确定目标节点,可以找到处理能力适合该任务请求指定的任务的节点,从而有助于提高该任务节点指定的任务的执行效率。另一方面,以便资源调度装置在确定执行该任务请求指定的任务的目标节点时,根据节点具有的资源标签,从节点组中确定出具有该任务请求包括的资源标签,如此,可以合理利用节点组中节点的资源,进而有助于提高节点的资源利用率。Based on this solution, the client generates a task request. The task request specifies the resources required to perform the task through the resource tag included in the task request. The client sends a task request to the resource scheduling device. On the one hand, the resource scheduling device specifies the task specified by the task request The required resources are used to determine the target node, and a node with a processing capacity suitable for the task specified by the task request can be found, thereby helping to improve the execution efficiency of the task specified by the task node. On the other hand, when the resource scheduling device determines a target node that executes the task specified by the task request, it determines from the node group that the resource label includes the resource label included in the task request according to the resource label that the node has, so that the node can be used reasonably The resources of the nodes in the group, which in turn help improve the resource utilization of the nodes.
第三方面,本申请提供一种装置,该装置具有实现上述第一方面的各实施例的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a third aspect, the present application provides a device having the functions of implementing the embodiments of the first aspect described above. This function can be realized by hardware, and can also be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.
在一种可能的设计中,该装置包括通信单元和处理单元,其中:In a possible design, the device includes a communication unit and a processing unit, where:
接收单元,用于接收客户端发送的任务请求,所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源;A receiving unit, configured to receive a task request sent by a client, where the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to perform a task through the resource tag included in the task request;
所述处理单元,用于根据节点组中每个节点具有的资源标签,从所述节点组中确定目标节点,所述目标节点具有所述任务请求包括的资源标签;指示所述目标节点执行所述任务请求指定的任务。The processing unit is configured to determine a target node from the node group according to a resource label that each node in the node group has, and the target node has the resource label included in the task request; and instruct the target node to execute all resources. The task specified by the task request.
第四方面,本申请提供一种装置,该装置具有实现上述第二方面的各实施例的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a fourth aspect, the present application provides a device having the functions of implementing the embodiments of the second aspect described above. This function can be realized by hardware, and can also be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.
在一种可能的设计中,该装置包括通信单元和处理单元,其中:In a possible design, the device includes a communication unit and a processing unit, where:
处理单元,用于生成任务请求,所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源;A processing unit, configured to generate a task request, where the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to perform a task by using the resource tag included in the task request;
通信单元,用于向资源调度装置发送所述任务请求,所述任务请求用于所述资源调度装置根据节点组中每个节点具有的资源标签,确定执行所述任务请求指定的任务的目标节点,所述目标节点具有所述任务请求包括的资源标签。A communication unit, configured to send the task request to a resource scheduling device, where the task request is used by the resource scheduling device to determine a target node to perform a task specified by the task request according to a resource tag possessed by each node in the node group , The target node has a resource tag included in the task request.
第五方面,本申请提供一种装置,包括:处理器和存储器;该存储器用于存储计算机指令;该处理器执行该存储器存储的计算机指令,使得该装置执行上述第一方面或第 一方面的任一实施例提供的方法中的资源调度方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。In a fifth aspect, the present application provides a device including: a processor and a memory; the memory is configured to store computer instructions; the processor executes the computer instructions stored in the memory, so that the device executes the foregoing first aspect or the first aspect The resource scheduling method in the method provided by any embodiment. It should be noted that the memory may be integrated in the processor, or may be independent of the processor.
第六方面,本申请提供一种装置,包括:处理器和存储器;该存储器用于存储计算机指令;该处理器执行该存储器存储的该计算机指令,使得该装置执行上述第二方面或第二方面的任一实施例提供的方法中的资源调度方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。According to a sixth aspect, the present application provides a device including: a processor and a memory; the memory is configured to store computer instructions; the processor executes the computer instructions stored in the memory, so that the device executes the second aspect or the second aspect described above The resource scheduling method in the method provided by any embodiment of the present invention. It should be noted that the memory may be integrated in the processor, or may be independent of the processor.
第七方面,本申请还提供一种系统,包括第一方面的任一实施例提供的方法中的资源调度装置和第二方面的任一实施例提供的方法中的客户端。In a seventh aspect, the present application further provides a system including a resource scheduling apparatus in the method provided by any embodiment of the first aspect and a client in the method provided by any embodiment of the second aspect.
第八方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,该计算机指令被计算机执行来实现上述各方面所述的方法。In an eighth aspect, the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions are executed by a computer to implement the methods described in the foregoing aspects.
第九方面,本申请还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。In a ninth aspect, the present application also provides a computer program product containing instructions that, when run on a computer, causes the computer to execute the methods described in the above aspects.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1a为本申请提供的一种可能的系统架构示意图;FIG. 1a is a schematic diagram of a possible system architecture provided by this application; FIG.
图1b为本申请提供的另一种可能的系统架构示意图;FIG. 1b is a schematic diagram of another possible system architecture provided by this application; FIG.
图2为本申请提供的一种资源调度方法示意图;FIG. 2 is a schematic diagram of a resource scheduling method provided by the present application; FIG.
图3为本申请提供的节点标识与资源标签的映射关系示意图;FIG. 3 is a schematic diagram of a mapping relationship between a node identifier and a resource tag provided in this application;
图4为本申请提供的另一种资源调度方法示意图;4 is a schematic diagram of another resource scheduling method provided by the present application;
图5为本申请提供的一种装置结构示意图;FIG. 5 is a schematic structural diagram of a device provided by this application;
图6为本申请提供的另一种装置结构示意图。FIG. 6 is a schematic structural diagram of another device provided by the present application.
具体实施方式detailed description
下面将结合附图对本申请作进一步地详细描述。方法实施例中的具体操作方法也可以应用于装置实施例或系统实施例中。其中,在本申请的描述中,除非另有说明,“多个”的含义是两个或两个以上。The present application will be further described in detail below with reference to the accompanying drawings. The specific operation method in the method embodiment can also be applied to the device embodiment or the system embodiment. Wherein, in the description of the present application, unless otherwise stated, the meaning of "a plurality" is two or more.
图1a示例性示出了本申请实施例提供的一种可能的系统架构示意图。该系统架构中包括客户端、资源调度装置和节点组包括的N个节点,图1a中仅示例性示出了节点1、…、节点N,其中N为正整数。FIG. 1a exemplarily illustrates a possible system architecture diagram provided by an embodiment of the present application. The system architecture includes N nodes included in a client, a resource scheduling device, and a node group. FIG. 1a only shows nodes 1, ..., and N by way of example, where N is a positive integer.
其中,客户端根据用户提交的作业,生成任务请求,并将任务请求发送至资源调度装置,任务请求通过任务请求包括的资源标签指定执行任务所需的资源,任务请求用于资源调度装置从节点组中确定执行任务请求指定的任务的目标节点。The client generates a task request according to the job submitted by the user, and sends the task request to the resource scheduling device. The task request specifies the resources required to perform the task through the resource tag included in the task request. The task request is used for the resource scheduling device slave node. Determines the target node in the group to perform the task specified by the task request.
资源调度装置接收客户端发送的任务请求,并根据任务请求和节点组中包括的N个节点具有的资源,从节点组中确定执行该任务请求的目标节点,并指示目标节点执行该任务请求指定的任务。The resource scheduling device receives the task request sent by the client, and determines the target node for executing the task request from the node group according to the task request and the resources possessed by the N nodes included in the node group, and instructs the target node to execute the task request designation. Task.
节点组中包括的N个节点,用于执行资源调度装置指示执行的任务。The N nodes included in the node group are used to perform tasks instructed by the resource scheduling device to execute.
基于图1a,图1b示例性示出了本申请实施例提供的另一种可能的系统架构示意图。Based on FIG. 1a and FIG. 1b, another schematic diagram of a possible system architecture provided by an embodiment of the present application is exemplarily shown.
参见图1b,示例性示出了资源调度装置包括的任务请求转发模块、任务调度控制模块、标签自定义模块、调度引擎(Yarn)模块和Docker的镜像仓库(Docker Registry)五个模块。Referring to FIG. 1b, there are exemplarily shown five modules including a task request forwarding module, a task scheduling control module, a label customization module, a scheduling engine (Yarn) module, and a Docker mirror registry (Docker Registry) included in the resource scheduling apparatus.
其中,任务请求转发模块,用于负责接收用户的算法调度任务,提供负载均衡和请求转发能力。Among them, the task request forwarding module is responsible for receiving algorithm scheduling tasks of users, providing load balancing and request forwarding capabilities.
任务调度控制模块,用于完成任务参数解析、任务调度控制和任务队列管理等功能。作为一个示例,任务参数解析功能可以是解析出任务优先级,任务队列管理功能可以是将解析出任务参数的任务根据接收到任务的时间顺序放入任务队列中,任务调度控制功能可以是按照任务优先级和任务放入任务队列的时间,从任务队列中调度任务至Yarn进行处理,其中,高优先级的任务优先调度,优先级相同的任务按照放入任务队列的时间顺序进行调度。The task scheduling control module is used to complete functions such as task parameter analysis, task scheduling control, and task queue management. As an example, the task parameter analysis function may be to analyze the task priority, the task queue management function may be to place the task that has parsed the task parameters into the task queue according to the time sequence of the received task, and the task scheduling control function may be based on the task. The priority and the time when the task is placed in the task queue are scheduled from the task queue to Yarn for processing. Among them, the high priority task is scheduled first, and the tasks with the same priority are scheduled according to the time order of the task queue.
标签自定义模块,主要包括信息采集上报子模块和多标签自定义子模块。其中信息采集上报子模块用于采集节点组中各节点具有的资源信息,一方面,信息采集上报子模块将采集的资源信息上报至Yarn,以便Yarn根据节点具有的资源信息,为任务调度用于执行该任务所需的资源;另一方面,信息采集上报子模块将采集的资源信息发送至多标签自定义子模块,以便标签自定义子模块根据资源信息为每种类型的资源分别生成资源标签,并将节点的标识和该节点具有的所有资源标签的对应关系上报至Yarn。The label customization module mainly includes the information collection and reporting sub-module and the multi-label customization sub-module. The information collection and reporting sub-module is used to collect the resource information of each node in the node group. On the one hand, the information collection and reporting sub-module reports the collected resource information to Yarn, so that Yarn can be used for task scheduling according to the resource information of the node. The resources required to perform this task; on the other hand, the information collection and reporting sub-module sends the collected resource information to the multi-label custom sub-module, so that the label custom sub-module generates resource tags for each type of resource according to the resource information. Report the correspondence between the node identifier and all resource tags that the node has to Yarn.
调度引擎(Yarn)模块,用于将接收到的节点的标识和节点具有的所有资源标签的对应关系存储在节点标签数据库中,并根据节点具有的所有的资源标签实现具体的任务调度功能。其中,节点标签数据库位于Yarn中,示例性的,节点标签数据库可以是配置文件的形式存在于Yarn中。A scheduling engine (Yarn) module is used to store the correspondence between the received node identifiers and all resource tags possessed by the nodes in a node tag database, and implement specific task scheduling functions according to all resource tags possessed by the nodes. The node tag database is located in Yarn. For example, the node tag database may exist in Yarn in the form of a configuration file.
作为一个具体的示例,Yarn在接收到任务调度控制模块所调度的任务时,从节点标签资源库中查询各节点具有的资源标签,并根据节点具有的资源标签和任务请求具有的资源标签,确定执行任务请求指定的任务的节点。As a specific example, when Yarn receives a task scheduled by the task scheduling control module, Yarn queries the resource label of each node from the node label resource library, and determines according to the resource label of the node and the resource label of the task request. The node that executes the task specified by the task request.
Docker Registry,为Docker的私有镜像仓库管理开源组件,主要提供镜像上传、下载等能力。Docker Registry, which manages open source components for Docker's private image repository, mainly provides image upload and download capabilities.
下面结合图1a和图1b所示的系统架构,对本申请提供的资源调度方法进行详细说明。The resource scheduling method provided in the present application will be described in detail below with reference to the system architecture shown in FIG. 1a and FIG. 1b.
图2示例性示出了本申请提供的一种资源调度方法流程示意图。该方法包括以下步骤:FIG. 2 exemplarily illustrates a schematic flowchart of a resource scheduling method provided by the present application. The method includes the following steps:
步骤201,客户端生成任务请求。Step 201: The client generates a task request.
其中,该任务请求包括该任务请求对应的资源标签,该任务请求通过该任务请求包括的资源标签指定执行任务所需的资源。The task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to execute the task by using the resource tag included in the task request.
此处,该任务请求中可以包括一个或多个资源标签。作为一种实现方式,资源标签可以以标签列表(label list)的形式存在于任务请求中。Here, the task request may include one or more resource tags. As an implementation manner, resource tags may exist in the task request in the form of a label list.
作为一种实现方式,该任务请求对应的资源标签根据任务所需的资源量确定。具体的,客户端根据任务所需的资源量确定该任务请求对应的资源标签,或者,客户端根据任务所需的资源量和预设规则确定该任务请求对应的资源标签,该预设规则与资源调度 装置确定任务请求对应的资源标签的规则一致。As an implementation manner, the resource tag corresponding to the task request is determined according to the amount of resources required by the task. Specifically, the client determines the resource tag corresponding to the task request according to the amount of resources required for the task, or the client determines the resource tag corresponding to the task request according to the amount of resources required for the task and a preset rule, and the preset rule is related to The resource scheduling device determines that the rules of the resource tags corresponding to the task requests are consistent.
作为另一种实现方式,该任务请求对应的资源标签可以是从资源调度装置获取的。具体的,客户端向资源调度装置发送查询消息,该查询消息用于获取资源标签集合,该资源标签集合中包括节点组中各节点具有的资源标签,作为一种示例,节点具有的资源标签指示节点具有的资源。然后,客户端接收来自资源调度装置的响应消息,该响应消息包括该资源标签集合。客户端获取到该资源标签集合之后,可以根据任务所需的资源,从资源标签集合选择一个或多个资源标签,将选择的资源标签携带到任务请求中。As another implementation manner, the resource label corresponding to the task request may be obtained from a resource scheduling device. Specifically, the client sends a query message to the resource scheduling device, where the query message is used to obtain a resource label set, and the resource label set includes a resource label of each node in the node group. As an example, the resource label indication of the node The resources the node has. The client then receives a response message from the resource scheduling device, the response message including the resource tag set. After the client obtains the resource tag set, the client can select one or more resource tags from the resource tag set according to the resources required by the task, and carry the selected resource tag to the task request.
作为一个具体的示例,上述任务请求还可以包括以下内容中的部分或全部信息:任务名称(name)、作业标识(job identity)、算法镜像标识(image identity)、任务优先级(priority)、副本数(copy number)、任务所需的资源(比如CPU、内存和GPU等资源)、以及任务所需的资源量。As a specific example, the above task request may further include some or all of the following information: task name (name), job identification (identity), algorithm image identification (identity), task priority (priority), copy Number (copy number), the resources required by the task (such as CPU, memory, and GPU resources), and the amount of resources required by the task.
示例性的,一个作业可以包括多个任务,以用户提交一个作业为检查交通视频中违章行为例,客户端可以根据该作业,生成一个或多个任务请求,一个任务请求可以对应一个任务,也可以对应多个任务。以一个任务请求对应一个任务为例,比如将该作业分为两个任务,任务请求1对应的任务1为将视频转化为图像,任务请求2对应的任务2为图像处理,比如人脸识别。该示例中,属于同一作业的任务1和任务2的作业标识相同,即,任务请求1包括的作业标识和任务请求2包括的作业标识相同。For example, a job may include multiple tasks. A user submits a job as an example of checking violations in a traffic video. The client may generate one or more task requests based on the job. A task request may correspond to a task. Can correspond to multiple tasks. For example, one task request corresponds to one task. For example, the job is divided into two tasks. Task 1 corresponding to task request 1 is to convert a video into an image. Task 2 corresponding to task request 2 is image processing, such as face recognition. In this example, the job identifiers of task 1 and task 2 belonging to the same job are the same, that is, the job identifiers included in task request 1 and the job identifiers included in task request 2 are the same.
上述示例中,任务1和任务2属于同一作业,由于同一作业中的任务之间是有依赖关系的,所以可以将属于同一作业的多个任务的任务优先级设置相同,即任务请求1包括的任务优先级和任务请求2包括的任务优先级相同。In the above example, task 1 and task 2 belong to the same job. Because the tasks in the same job have a dependency relationship, the task priorities of multiple tasks that belong to the same job can be set to the same, that is, tasks included in task request 1. The task priority is the same as the task priority included in task request 2.
任务1和任务2中的任一个任务在调度至节点上执行之前,需要先在节点中生成容器,容器中包括满足执行任务所需的资源。以任务1调度至节点1为例,任务请求1中指示该任务1对应的算法镜像标识(image identity),Docker Registry推送算法镜像标识所标识的镜像至节点1,以便在节点1中生成容器。Before any task in task 1 or task 2 is scheduled to be executed on a node, a container needs to be generated in the node, and the container includes resources required to perform the task. Taking task 1 to node 1 as an example, the task request 1 indicates the algorithm image identity (image identity) corresponding to the task 1, and Docker Registry pushes the image identified by the algorithm image identity to node 1 to generate a container in node 1.
步骤202,客户端向资源调度装置发送该任务请求。相应地,资源调度装置可以接收到客户端发送的该任务请求。Step 202: The client sends the task request to the resource scheduling device. Accordingly, the resource scheduling apparatus may receive the task request sent by the client.
步骤203,资源调度装置根据节点组中每个节点具有的资源标签,从节点组中确定目标节点。In step 203, the resource scheduling device determines a target node from the node group according to a resource label possessed by each node in the node group.
其中,该目标节点具有任务请求包括的资源标签。The target node has a resource tag included in the task request.
具体的,资源调度装置接收到任务请求之后,根据任务请求对应的资源标签以及节点具有的资源标签,从节点组中确定出具有任务请求包括的资源标签的目标节点。Specifically, after receiving the task request, the resource scheduling device determines a target node having the resource label included in the task request from the node group according to the resource label corresponding to the task request and the resource label possessed by the node.
可选的,节点组中每个节点具有的资源标签可以为一个或多个。Optionally, each node in the node group may have one or more resource tags.
作为一种实现方式,资源调度装置获取节点组中的每个节点的资源信息,资源信息记录有节点的节点标识和节点具有的资源;资源调度装置根据节点组中的每个节点的资源信息,记录节点组中的每个节点具有的资源标签以及具有的资源量。如此,资源调度装置可以根据节点具有的资源标签和节点具有的资源量,进而可以准确的确定出用于执行任务请求中指定的任务的节点。As an implementation manner, the resource scheduling device acquires resource information of each node in the node group, and the resource information records a node identifier of the node and resources possessed by the node; the resource scheduling device according to the resource information of each node in the node group, Record the resource label and the amount of resources each node in the node group has. In this way, the resource scheduling device can accurately determine the node used to perform the task specified in the task request according to the resource label and the amount of resources the node has.
基于上述图1b,可以采用标签自定义模块采集节点组中的每个节点的资源信息,并生成每个节点具有的资源标签。其中资源信息中记录的节点标识可以为节点的IP地址, 记录的节点具有的资源可以为CPU、GPU、内存、磁盘等资源。可选的,该节点具有的资源标签可以存储于节点标签数据库中。在资源调度装置执行上述步骤203时,从节点标签数据库中遍历每个节点的资源标签,以便根据节点的资源标签为任务请求指定的任务选择目标节点。Based on the above-mentioned FIG. 1b, a label customization module may be used to collect resource information of each node in the node group, and generate a resource label that each node has. The node identifier recorded in the resource information may be the IP address of the node, and the resources recorded by the node may be resources such as CPU, GPU, memory, and disk. Optionally, the resource tags possessed by the node may be stored in a node tag database. When the resource scheduling device executes the above step 203, the resource tags of each node are traversed from the node tag database in order to select a target node for the task specified by the task request according to the resource tags of the node.
作为一个示例,如图3所示,为节点标识与资源标签的映射关系示意图。其中,一个节点可以对应一个或多个资源标签,每个资源标签也可以对应一个或多个节点。参见图3,以节点标识为网络之间互连的协议(internet protocol,IP)地址为例,节点1(Node1)的IP地址:192.168.0.10,对应资源标签1(Label 1)和Label 3;节点2(Node 2)的IP地址:192.168.0.11,对应资源标签2(Label 2);节点3(Node 3)的IP地址:192.168.0.12,对应Label 2和资源标签M(Label M);节点N(Node N)的IP地址:192.168.0.xx,对应Label 1、资源标签3(Label 3)和Label M,其中M为正整数。As an example, as shown in FIG. 3, it is a schematic diagram of a mapping relationship between a node identifier and a resource label. A node may correspond to one or more resource tags, and each resource tag may correspond to one or more nodes. Referring to FIG. 3, taking a node identifier as an internet protocol (IP) address for interconnection between networks as an example, the IP address of node 1 (Node1): 192.168.0.10, corresponding to resource label 1 (Label 1) and Label 3; Node 2's IP address: 192.168.0.11, corresponding to Resource Label 2 (Label 2); Node 3 (Node 3) 's IP address: 192.168.0.12, corresponding to Label 2 and resource label M (Label M); node The IP address of N (Node) is 192.168.0.xx, corresponding to Label 1, Resource Label 3 (Label 3), and Label M, where M is a positive integer.
通过该示例,采用节点的唯一标识(如IP地址)映射资源标签的方式,支持单节点自定义多个资源标签,相较于现有技术中只能采用原生Yarn接口能力通过任务队列间接给节点打资源标签、且节点不支持多个资源标签的方案,本申请可以实现为节点自定义多个资源标签,并结合任务下发时直接指定任务请求对应的资源标签,可以实现优化资源调度过程,进而有助于提高节点组资源的利用率。This example uses the unique identifier of a node (such as an IP address) to map resource tags, and supports a single node to customize multiple resource tags. Compared with the prior art, only native Yarn interface capabilities can be used to indirectly provide nodes with task queues. The scheme of tagging resource tags and the node does not support multiple resource tags. This application can implement customizing multiple resource tags for a node and directly specifying the resource tags corresponding to the task request when the task is issued, which can optimize the resource scheduling process. This will help to improve the utilization of node group resources.
进一步的,上述目标节点具有任务请求包括的资源标签,具体可以有至少两种实现方式。Further, the target node has a resource tag included in the task request, and specifically, there can be at least two implementation manners.
作为一种实现方式,目标节点具有的资源标签包括任务请求对应的资源标签。As an implementation manner, the resource tag that the target node has includes a resource tag corresponding to the task request.
具体的,在资源调度装置接收到该任务请求时,从节点组中确定出具有的资源标签包括任务请求对应的资源标签的节点,即为目标节点。Specifically, when the resource scheduling device receives the task request, it is determined from the node group that the node having the resource label including the resource label corresponding to the task request is the target node.
该实现方式中,节点可以具有一个或多个以下资源类型对应的资源标签:CPU主频、GPU的型号、GPU块数、内存条型号、带宽、机房信息。比如GPU的型号可以包括P4、P40、P100和V100,比如内存条型号可以包括双倍速率(double data rate,DDR)1、DDR2、DDR4等。示例性的,比如节点1具1个资源标签,该资源标签指示CPU主频高;再比如,节点2具有2个资源标签,分别指示CPU主频低和GPUP100 1块。任务请求对应的资源标签与节点具有的资源标签类似,在此不再一一举例。In this implementation, a node may have one or more resource tags corresponding to the following resource types: CPU frequency, GPU model, GPU block number, memory module model, bandwidth, and computer room information. For example, the GPU model can include P4, P40, P100, and V100. For example, the memory module model can include double data rate (DDR) 1, DDR2, DDR4, and so on. Exemplarily, for example, node 1 has one resource label, which indicates that the CPU clock frequency is high; for another example, node 2 has two resource labels, which indicate the CPU clock frequency is low and GPUP100 block. The resource labels corresponding to the task requests are similar to the resource labels that the nodes have, and no further examples are given here.
下面结合表1和表2,对节点具有的资源标签和任务请求包括的资源标签进行举例说明。In the following, Table 1 and Table 2 are used to illustrate the resource tags that the node has and the resource tags included in the task request.
假设节点组中包括6个节点,分别为Node 1、Node 2、Node 3、Node 4、Node 5和Node 6,根据每个节点的资源信息,确定出每个节点具有的资源标签如下表1所示。Assume that the node group includes 6 nodes, which are Node1, Node2, Node3, Node4, Node5, and Node6. According to the resource information of each node, the resource label of each node is determined as shown in Table 1 below. Show.
表1 节点具有的资源标签Table 1 Resource labels of nodes
Figure PCTCN2019081200-appb-000001
Figure PCTCN2019081200-appb-000001
Figure PCTCN2019081200-appb-000002
Figure PCTCN2019081200-appb-000002
基于表1中每个节点具有的资源标签,组成如下表2所示的资源标签集合。Based on the resource tags of each node in Table 1, the resource tag set shown in Table 2 below is composed.
表2 资源标签集合Table 2 Resource tag collection
资源类型Resource Type 资源标签Resource tag 资源类型Resource Type 资源标签Resource tag
CPUCPU CPU主频高High CPU frequency 内存RAM 内存DDR3Memory DDR3
CPUCPU CPU主频低Low CPU frequency 内存RAM 内存DDR4Memory DDR4
GPUGPU GPU P40 1块GPU P40 GPUGPU GPU P100 1块GPU, P100, 1 block
GPUGPU GPU P40 4块GPU, P40, 4 blocks GPUGPU GPU P100 2块GPU P100 2 blocks
GPUGPU GPU P4GPU P4 带宽bandwidth 带宽高High bandwidth
GPUGPU GPU V100GPU V100 带宽bandwidth 带宽低Low bandwidth
内存RAM 内存DDR1Memory DDR1 机房engine room 机房1Computer room 1
内存RAM 内存DDR2Memory DDR2 机房engine room 机房2Computer room 2
客户端依据该表2的资源标签集合和任务请求所指定的执行任务所需的资源,从资源标签集合中选择任务请求对应的资源标签。The client selects the resource tag corresponding to the task request from the resource tag set according to the resource tag set of Table 2 and the resources required to execute the task specified by the task request.
比如,任务请求1中包括3个资源标签,分别为CPU主频高、带宽高、GPU P100 2块,基于上述表1,只有Node 3具有的4个资源标签中包括CPU主频高、带宽高、GPU P100 2块,那么该Node 3即为可以执行该任务请求1指定的任务的目标节点。For example, task request 1 includes 3 resource tags, which are CPU high frequency, high bandwidth, and GPU P100. Based on the above table 1, only the 4 resource tags included in Node 3 include high CPU frequency and high bandwidth. , GPU 100, then Node 3 is the target node that can execute the task specified by the task request 1.
再比如,任务请求2包括2个资源标签,分别为GPU P40 4块和内存DDR1,基于上述表1,Node 2和Node4具有的资源标签中至少包括GPU P40 4块和内存DDR1,那么可以从Node 2和Node4中确定出目标节点,具体的,可以从Node 2和Node4中随机确定出目标节点,也可以根据Node 2和Node4中可用资源确定出目标节点,比如选择可用资源更多的节点作为目标节点。For another example, task request 2 includes two resource tags, namely GPU 40 blocks and memory DDR1. Based on the above table 1, the resource tags of Node 2 and Node4 include at least GPU 40 blocks and memory DDR1. The target nodes are determined in 2 and Node4. Specifically, the target nodes can be randomly determined from Node2 and Node4, or the target nodes can be determined based on the resources available in Node2 and Node4, such as selecting a node with more available resources as the target. node.
通过该实现方式,资源调度装置接收到任务请求之后,可以根据在节点组中确定出是否存在具有的资源标签包括任务请求对应的资源标签的节点,来确定执行任务的目标节点,这种根据两个资源标签匹配来确定目标节点的方式简单,有助于节省时间。In this implementation manner, after receiving the task request, the resource scheduling device may determine a target node to execute a task according to whether a node having a resource label including a resource label corresponding to the task request is determined in the node group. The method of matching the resource tags to determine the target node is simple and helps save time.
作为另一种实现方式,目标节点具有的资源标签所指示的节点具有的资源,满足该任务请求对应的资源标签所指示的资源的要求。As another implementation manner, the resources possessed by the node indicated by the resource tag possessed by the target node meet the requirements of the resources indicated by the resource tag corresponding to the task request.
具体的,在资源调度装置接收到该任务请求时,从节点组中确定出具有的资源标签所指示的节点具有的资源,满足该任务请求对应的资源标签所指示的资源要求的节点,即为目标节点。Specifically, when the resource scheduling device receives the task request, it determines from a node group the resources that the node indicated by the resource label has, and the node that meets the resource requirements indicated by the resource label corresponding to the task request, that is, Target node.
该实现方式中,节点可以具有一个或多个以下资源类型对应的资源标签:CPU核数、内存容量、磁盘容量、带宽。示例性的,节点4具有的2个资源标签,分别指示磁盘可用容量201-300GB,以及内存容量1000-1500MB。任务请求对应的资源标签与节点具有的资源标签类似,在此不再一一举例。In this implementation, a node may have one or more resource tags corresponding to the following resource types: number of CPU cores, memory capacity, disk capacity, and bandwidth. Exemplarily, the two resource tags possessed by node 4 respectively indicate the available disk capacity of 201-300GB and the memory capacity of 1000-1500MB. The resource labels corresponding to the task requests are similar to the resource labels that the nodes have, and no further examples are given here.
下面结合表3和表4,对节点具有的资源标签和任务请求包括的资源标签进行举例说明。In the following, Table 3 and Table 4 are used to illustrate the resource tags that the node has and the resource tags included in the task request.
假设节点组中包括6个节点,分别为Node 1、Node 2、Node 3、Node 4、Node 5和Node 6,根据每个节点的资源信息,确定出每个节点具有的资源标签如下表3所示。Assume that the node group includes 6 nodes, which are Node1, Node2, Node3, Node4, Node5, and Node6. Based on the resource information of each node, the resource label of each node is determined as shown in Table 3 below. Show.
表3 节点具有的资源标签Table 3 Resource labels of nodes
Figure PCTCN2019081200-appb-000003
Figure PCTCN2019081200-appb-000003
基于表3中每个节点具有的资源标签,组成如下表4所示的资源标签集合。Based on the resource tags of each node in Table 3, the resource tag set shown in Table 4 below is composed.
表4 资源标签集合Table 4 resource tag collection
Figure PCTCN2019081200-appb-000004
Figure PCTCN2019081200-appb-000004
客户端依据该表4的资源标签集合和任务请求所指定的执行任务所需的资源,从资源标签集合中选择任务请求对应的资源标签。The client selects the resource tag corresponding to the task request from the resource tag set according to the resource tag set of Table 4 and the resources required to execute the task specified by the task request.
该实现方式中,节点具有的资源标签与任务请求对应的资源标签可以相同,也可以不相同。In this implementation manner, the resource label that the node has and the resource label corresponding to the task request may be the same or different.
比如,任务请求1包括1个资源标签,指示内存容量500-700MB,基于上述表3,Node 2和Node 5具有的资源标签所指示的资源均包括任务请求1对应的资源标签所指示的内存容量500-1000MB,所以Node 2和Node 5可以作为目标节点。此外,虽然Node 4具有的3个资源标签所指示的资源中,不包括任务请求1对应的资源标签指示的内存容量500-1000MB,但是Node 4具有的资源标签所指示的内存容量1000-1500MB,高于任务所需的内存容量要求500-1000MB,也就是说,Node 4满足任务请求1对应的资源标签所指示的资源的要求,所以Node 4也可以作为目标节点。因此,可以从所以Node 2、Node 5和Node 4中选择一个节点作为目标节点,以便执行该任务请求1指定的任务。For example, task request 1 includes a resource tag indicating a memory capacity of 500-700 MB. Based on the above Table 3, the resources indicated by the resource tags of Node 2 and Node 5 include the memory capacity indicated by the resource tag corresponding to task request 1. 500-1000MB, so Node 2 and Node 5 can be used as target nodes. In addition, although the resource indicated by the three resource tags of Node 4 does not include the memory capacity indicated by the resource tag corresponding to task request 500-1000MB, the memory capacity indicated by the resource tag of Node 4 has 1000-1500MB, The memory capacity that is higher than the task requires 500-1000MB, that is, Node 4 meets the resource requirements indicated by the resource label corresponding to task request 1, so Node 4 can also be used as the target node. Therefore, a node can be selected from all Nodes 2, 5 and 4 to perform the task specified by the task request 1.
再比如,任务请求2包括2个资源标签,分别指示内存容量500-900MB、带宽21-50M, 由于只有Node 2具有的2个资源标签所指示的资源满足该任务请求2对应的资源标签所指示的资源的要求,所以可以将该Node 2作为目标节点。For another example, task request 2 includes two resource tags, which respectively indicate a memory capacity of 500-900MB and a bandwidth of 21-50M. Because only the resources indicated by the two resource tags of Node 2 meet the instructions of the resource tags corresponding to the task request 2 Resource requirements, so this Node 2 can be used as the target node.
通过该方式,在节点具有的资源标签所指示的节点具有的资源,满足该任务请求对应的资源标签所指示的资源的要求,即可确定出目标节点,而不需要根据节点具有的资源标签和任务请求对应的资源标签完全相同,来确定目标节点,因而可以更灵活的实现任务的按需调度。In this way, the resources possessed by the node indicated by the resource tag possessed by the node satisfy the requirements of the resource indicated by the resource tag corresponding to the task request, and the target node can be determined without the need of the resource tag and The resource labels corresponding to the task requests are exactly the same to determine the target node, so that the task can be scheduled more flexibly.
进一步的,该实现方式中,若节点中可用资源发生变化,则影响到表3中所示的节点具有的资源标签也发生变化,那么需要在节点的可用资源发生变化时,针对表3以及表4中的资源标签进行更新,具体的,可根据节点的可用资源对节点的资源标签进行更新。Further, in this implementation manner, if the available resources in a node change, the resource labels of the nodes shown in Table 3 also change, so when the available resources of the node change, it is necessary to target Table 3 and the table when the available resources of the node change. The resource label in 4 is updated. Specifically, the resource label of the node may be updated according to the available resources of the node.
结合上述两种实现方式,作为另一种实现方式,节点具有的资源标签可以为上述表1中任一或多种资源标签和表3中任一或多种资源标签的组合,相应的,资源标签集合中也可以是上述表2和表4中的资源标签的组合。比如Node 1具有的3个资源标签,分别指示CPU主频高、CPU核数32、带宽21-50M。In combination with the above two implementation manners, as another implementation manner, the resource tag possessed by the node may be a combination of any one or more of the resource tags in Table 1 and any one or more of the resource tags in Table 3. Correspondingly, resources The tag set may also be a combination of the resource tags in Tables 2 and 4 above. For example, Node 1 has three resource tags, which indicate the CPU's main frequency is high, the number of CPU cores is 32, and the bandwidth is 21-50M.
进一步的,作为一种可能的实现方式,任务请求记录有执行任务所需的资源量,实现上述步骤203的具体方式中,资源调度装置从所述节点组中选择满足该任务请求指定的资源量的目标节点。Further, as a possible implementation manner, the task request records the amount of resources required to execute the task. In a specific manner of implementing step 203 above, the resource scheduling device selects from the node group the resource amount that satisfies the task request specification. Target node.
具体的,资源调度装置从节点组中选择具有该任务请求对应的资源标签、且满足该任务请求指定的资源量的节点,即为目标节点。如此,确定出的目标节点不仅满足任务请求包括的资源标签指定的执行任务所需的资源,还满足执行任务所需的资源量,从而有助于提高该任务请求指定的任务的执行效率。Specifically, the resource scheduling device selects, from the node group, a node having a resource tag corresponding to the task request and satisfying the amount of resources specified by the task request, that is, a target node. In this way, the determined target node not only meets the resources required to execute the task specified by the resource tag included in the task request, but also meets the amount of resources required to execute the task, thereby helping to improve the execution efficiency of the task specified by the task request.
步骤204,资源调度装置指示该目标节点执行该任务请求指定的任务。Step 204: The resource scheduling device instructs the target node to execute a task specified by the task request.
通过上述步骤201-步骤204,客户端生成任务请求,任务请求通过任务请求包括的资源标签指定执行任务所需的资源,客户端向资源调度装置发送任务请求,资源调度装置接收客户端发送的任务请求,资源调度装置根据节点组中每个节点具有的资源标签和任务请求包括的资源标签,从所述节点组中确定目标节点,所述目标节点具有所述任务请求包括的资源标签;资源调度装置指示所述目标节点执行所述任务请求指定的任务。一方面,由于所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源,所以根据任务请求指定的任务所需的资源来确定目标节点,可以找到处理能力适合该任务请求指定的任务的节点,从而有助于提高该任务节点指定的任务的执行效率。另一方面,在确定执行该任务请求指定的任务的目标节点时,根据节点具有的资源标签,从节点组中确定出具有该任务请求包括的资源标签,如此,可以合理利用节点组中节点的资源,进而有助于提高节点的资源利用率。Through the above steps 201 to 204, the client generates a task request. The task request specifies the resources required to perform the task through the resource tag included in the task request. The client sends a task request to the resource scheduling device, and the resource scheduling device receives the task sent by the client. Request, the resource scheduling device determines a target node from the node group according to the resource label that each node in the node group has and the resource label included in the task request; the target node has the resource label included in the task request; resource scheduling The device instructs the target node to execute a task specified by the task request. On the one hand, since the task request includes a resource tag corresponding to the task request, the task request specifies a resource required to execute the task through the resource tag included in the task request, so the resource required by the task specified by the task request To determine the target node, a node with a processing capacity suitable for the task specified by the task request can be found, thereby helping to improve the execution efficiency of the task specified by the task node. On the other hand, when determining the target node to execute the task specified by the task request, the resource label included in the task request is determined from the node group according to the resource label that the node has. In this way, the Resources, which in turn helps improve the resource utilization of nodes.
基于上述任一实施例,任务请求中还可以包括任务亲和性资源标签,任务亲和性资源标签用于指示该任务请求指定的任务与另一任务的亲和性或反亲和性,从而可以根据任务之间的是否亲和性,进一步实现任务的按需调度。Based on any of the above embodiments, the task request may further include a task affinity resource tag, and the task affinity resource tag is used to indicate the affinity or anti-affinity of the task specified by the task request and another task, thereby On-demand scheduling of tasks can be further implemented based on the affinity between tasks.
作为一种可能的实现方式,在任务请求指定的任务与另一任务具有亲和性时,资源调度装置指示目标节点执行另一任务;在任务请求指定的任务与另一任务具有反亲和性 时,资源调度装置指示节点组中与目标节点不同的另一节点执行另一任务。如此,资源调度装置在确定执行任务的节点时,考虑了任务之间的亲和性,从而可以确定出适合执行任务的目标节点。As a possible implementation manner, when the task specified by the task request has affinity with another task, the resource scheduling device instructs the target node to perform another task; when the task specified by the task request has anti-affinity with another task At this time, the resource scheduling device instructs another node in the node group different from the target node to perform another task. In this way, when determining a node that executes a task, the resource scheduling device considers the affinity between the tasks, so that a target node suitable for performing the task can be determined.
作为一个示例,比如任务请求指定的任务为任务1,另一任务为任务2,资源调度装置为任务1指定的目标节点为Node 1。若任务1和任务2具有亲和性,则可以采用Node1执行任务2。As an example, for example, the task specified by the task request is task 1, the other task is task 2, and the target node specified by the resource scheduling device for task 1 is Node1. If task 1 and task 2 have affinity, Node 1 can be used to perform task 2.
作为另一个示例,比如任务请求指定的任务为任务1,另一任务为任务3,资源调度装置为任务1指定的目标节点为Node 1。若任务1和任务3具有反亲和性,则可以采用除Node 1之外的节点执行任务2,以避免任务1和任务2在同一节点上执行,进而提高任务1和任务2的处理效率。As another example, for example, the task specified by the task request is task 1, the other task is task 3, and the target node specified by the resource scheduling device for task 1 is Node1. If task 1 and task 3 have anti-affinity, nodes other than Node 1 can be used to perform task 2 to avoid task 1 and task 2 being executed on the same node, thereby improving the processing efficiency of task 1 and task 2.
基于上述任一实施例,还可以在任务请求中增加标志位,标志位用于标识任务请求对应的资源标签是否为必要的资源标签。具体的,若标志位标识的资源标签为必要的资源标签,则目标节点必须具有该标志位标识的资源标签;若标志位标识的资源标签为非必要的资源标签,则目标节点可以不具有该标志位标识的资源标签。Based on any of the above embodiments, a flag bit may be added to the task request, and the flag bit is used to identify whether the resource tag corresponding to the task request is a necessary resource tag. Specifically, if the resource tag identified by the flag bit is a necessary resource tag, the target node must have the resource tag identified by the flag bit; if the resource tag identified by the flag bit is an unnecessary resource tag, the target node may not have the resource tag. The tag identifies the resource label.
作为一个示例,标志位的值为0,表示该标志位标识的资源标签为非必要的资源标签;标志位的值为1,表示该标志位标识的资源标签为必要的资源标签。作为又一个示例,标志位的值为1,表示该标志位标识的资源标签为非必要的资源标签;标志位的值为0,表示该标志位标识的资源标签为必要的资源标签。As an example, the value of the flag bit is 0, which means that the resource tag identified by the flag bit is an unnecessary resource tag; the value of the flag bit is 1, which means that the resource tag identified by the flag bit is a necessary resource tag. As another example, the value of the flag bit is 1, which indicates that the resource tag identified by the flag bit is an unnecessary resource tag; the value of the flag bit is 0, which indicates that the resource tag identified by the flag bit is a necessary resource tag.
作为一个示例,以标志位的值为1表示必要的资源标签、标志位的值为0表示非必要的资源标签为例,假设任务请求对应的资源标签包括GPU P40和内存DDR,其中GPU P40为必要的资源标签,内存DDR为非必要的资源标签,那么可以在任务请求中的标志位可以表示为1、0,参见表1,从表1中所示的节点中选择具有任务请求对应的资源标签的节点,此时,根据任务请求对应的资源标签和标志位,可以确定出具有必要的资源标签GPU P40的Node 2和Node 4,并从Node 2和Node 4中确定出用于执行该任务请求中指定的任务。As an example, taking a flag value of 1 to indicate a necessary resource tag and a flag value of 0 to indicate an unnecessary resource tag, for example, it is assumed that the resource tag corresponding to the task request includes GPU P40 and memory DDR, where GPU P40 is Necessary resource tag, memory DDR is an unnecessary resource tag, then the flag bit that can be in the task request can be expressed as 1, 0, see Table 1, select the resource with the task request from the nodes shown in Table 1 At this time, according to the resource tag and flag bit corresponding to the task request, Nodes 2 and 4 with the necessary resource tags GPU P40 can be determined, and Node 2 and Node 4 can be used to perform the task. The task specified in the request.
本申请中,在资源调度装置接收到任务请求,并从节点组确定目标节点时,由于有些任务在执行时所需的资源包括GPU,有些任务在执行时所需的资源不包括GPU,因此,如果将没有GPU需求的任务调度到具有GPU的节点上执行,导致急需GPU资源的任务又不能调度到该节点上,从而造成该节点上的GPU资源的浪费。因此,为了避免节点组中GPU资源的浪费,基于上述方法,以下提供几种可选的实现方法。In this application, when a resource scheduling device receives a task request and determines a target node from a node group, because some tasks require resources to include GPUs during execution, and some tasks require resources to execute without GPUs, therefore, If a task without a GPU requirement is scheduled for execution on a node with a GPU, a task that urgently needs GPU resources cannot be scheduled to the node, thereby causing a waste of GPU resources on the node. Therefore, in order to avoid wasting GPU resources in the node group, based on the above method, several alternative implementation methods are provided below.
实现方法一,资源调度装置可以根据任务请求对应的资源标签是否包括GPU的资源标签,选择执行该任务请求指定的任务的目标节点。In the first implementation method, the resource scheduling device may select a target node for executing a task specified by the task request according to whether a resource tag corresponding to the task request includes a resource tag of the GPU.
作为一种具体的示例,资源调度装置在该任务请求对应的资源标签未包括GPU的资源标签时,从节点组中选择不具有GPU的资源标签的节点,并指示选择的节点执行该任务请求指定的任务。As a specific example, when the resource tag corresponding to the task request does not include a GPU resource tag, the resource scheduling apparatus selects a node without a GPU resource tag from the node group, and instructs the selected node to execute the task request specification Task.
具体的,资源调度装置从节点组中选择不具有GPU的资源标签的节点,并从不具有GPU的资源标签的节点中选择具有任务请求对应的资源标签的目标节点,并调度该目标节点执行任务请求中指定的任务。Specifically, the resource scheduling device selects a node without a resource tag of the GPU from the node group, and selects a target node having a resource tag corresponding to the task request from the nodes without the resource tag of the GPU, and schedules the target node to execute a task The task specified in the request.
作为另一种具体的示例,资源调度装置在该任务请求对应的资源标签包括GPU的资 源标签时,从节点组中选择具有GPU的资源标签的节点,并指示选择的节点执行该任务请求指定的任务。As another specific example, when the resource tag corresponding to the task request includes a resource tag of a GPU, the resource scheduling apparatus selects a node having a resource tag of the GPU from a node group, and instructs the selected node to execute the task specified by the task request. task.
具体的,资源调度装置从节点组中选择具有GPU的资源标签的节点,并从具有GPU的资源标签的节点中选择具有任务请求对应的资源标签的目标节点,并调度该目标节点执行任务请求中指定的任务。Specifically, the resource scheduling device selects a node having a resource tag of the GPU from the node group, and selects a target node having a resource tag corresponding to the task request from the nodes having the resource tag of the GPU, and schedules the target node to execute the task request. Assigned tasks.
通过该实现方法,根据任务请求对应的资源标签是否包括GPU的资源标签,选择执行所述任务请求指定的任务的目标节点,可以实现根据任务请求指定的任务所需的资源,按需进行GPU资源调度,从而可以提升节点组中节点的资源利用率,尤其是提升GPU资源的利用率,有助于避免因节点中CPU资源被耗尽而导致该节点中的富余的GPU资源浪费。With this implementation method, according to whether a resource tag corresponding to a task request includes a resource tag of a GPU, selecting a target node to execute a task specified by the task request, it is possible to implement GPU resources on demand according to the resources required by the task specified by the task request. Scheduling can improve the resource utilization of the nodes in the node group, especially the utilization of GPU resources, and help avoid the waste of redundant GPU resources in the node caused by the exhaustion of CPU resources in the node.
实现方法二,任务请求还可以包括执行任务所需的资源量,资源调度装置可以根据该执行任务所需的资源量是否包括GPU资源量,选择执行该任务请求指定的任务的目标节点。In the second method, the task request may further include the amount of resources required to execute the task, and the resource scheduling device may select a target node to execute the task specified by the task request according to whether the amount of resources required to execute the task includes the amount of GPU resources.
作为一种具体的示例,若所述执行任务所需的资源量不包括GPU资源,则从所述节点组中选择不具有所述GPU资源的节点,并指示选择的节点执行所述任务请求指定的任务。As a specific example, if the amount of resources required to execute a task does not include GPU resources, selecting a node that does not have the GPU resource from the node group, and instructing the selected node to execute the task request specification Task.
作为另一种具体的示例,若所述执行任务所需的资源量包括GPU资源,则从所述节点组中选择具有所述GPU资源的节点,并指示选择的节点执行所述任务请求指定的任务。As another specific example, if the amount of resources required to execute a task includes GPU resources, selecting a node with the GPU resources from the node group, and instructing the selected node to execute the task specified by the task request task.
下面结合图4详细介绍上述资源调度方法的一种可能的实现方式。A possible implementation manner of the foregoing resource scheduling method is described in detail below with reference to FIG. 4.
图4示例示出了本申请实施例提供的另一种资源调度方法流程示意图。该方法包括以下步骤:FIG. 4 illustrates a schematic flowchart of another resource scheduling method according to an embodiment of the present application. The method includes the following steps:
步骤401,资源调度装置接收到任务请求。Step 401: The resource scheduling device receives a task request.
步骤402,资源调度装置确定执行任务请求中指定的任务是否需要GPU;若是,则执行步骤403;若否,则执行步骤406。In step 402, the resource scheduling device determines whether a GPU is required to execute the task specified in the task request; if so, step 403 is performed; if not, step 406 is performed.
步骤403,资源调度装置遍历节点标签数据库,查找具有GPU的资源标签的节点,若是,则执行步骤404;若否,则执行步骤409。其中,节点标签数据库中存储有节点标识与资源标签的对应关系。In step 403, the resource scheduling device traverses the node label database to find the node with the resource label of the GPU. If so, step 404 is performed; if not, step 409 is performed. The node tag database stores a correspondence between a node identifier and a resource tag.
步骤404,资源调度装置从具有GPU的资源标签的节点中,确定出具有任务请求对应的资源标签的目标节点。Step 404: The resource scheduling device determines a target node having a resource label corresponding to the task request from the nodes having a resource label of the GPU.
步骤405,资源调度装置调度目标节点执行任务请求指定的任务。Step 405: The resource scheduling device schedules the target node to execute the task specified by the task request.
步骤406,资源调度装置遍历节点标签数据库,查找不具有GPU的资源标签的节点,若是,则执行步骤407;若否,则执行步骤409。其中,节点标签数据库中存储有节点标识与资源标签的对应关系。In step 406, the resource scheduling device traverses the node tag database to find nodes without resource tags of the GPU. If so, step 407 is performed; if not, step 409 is performed. The node tag database stores a correspondence between a node identifier and a resource tag.
步骤407,资源调度装置从不具有GPU的资源标签的节点中,确定出具有任务请求对应的资源标签的目标节点。Step 407: The resource scheduling device determines a target node having a resource label corresponding to the task request from the nodes without the resource label of the GPU.
步骤408,资源调度装置调度目标节点执行任务请求指定的任务。Step 408: The resource scheduling device schedules the target node to execute the task specified by the task request.
步骤409,资源调度装置向客户端发送任务调度失败消息,并放弃调度该任务请求指定的任务。Step 409: The resource scheduling device sends a task scheduling failure message to the client, and abandons scheduling the task specified by the task request.
通过该示例,基于节点自定义多个资源标签和感知节点的GPU资源的优化组合调度, 相对于现有技术中调度时所有节点同等对待、没有感知节点的GPU资源的方案,本申请中的示例可以实现任务的GPU资源的按需调度,有助于提高节点组中各节点的资源利用率,尤其是提高节点的GPU资源的利用率。Through this example, the optimal combination scheduling of GPU resources based on user-defined multiple resource tags and sensing nodes is compared to the solution in the prior art where all nodes are treated equally without GPU resources of sensing nodes. The examples in this application On-demand scheduling of GPU resources for tasks can help improve the resource utilization of each node in the node group, especially the utilization of GPU resources of the node.
可以理解的是,上述实施例中的各个设备为了实现相应的功能,其可以包括执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。It can be understood that, in order to implement corresponding functions, each device in the foregoing embodiments may include a hardware structure and / or a software module corresponding to each function. Those skilled in the art should easily realize that, with reference to the units and algorithm steps of each example described in the embodiments disclosed herein, this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
在采用集成的单元的情况下,图5示出了本申请实施例中所涉及的装置的可能的示例性框图,该装置500可以以软件的形式存在。装置500可以包括:通信单元501和处理单元502。作为一种实现方式,该通信单元501可以包括接收单元和发送单元。处理单元502用于对装置500的动作进行控制管理。通信单元501用于支持装置500与其他装置(比如客户端)的通信。装置500还可以包括存储单元503,用于存储装置500的程序代码和数据,比如上述方法中节点具有的资源标签。In the case of using an integrated unit, FIG. 5 shows a possible exemplary block diagram of a device involved in the embodiment of the present application, and the device 500 may exist in the form of software. The apparatus 500 may include a communication unit 501 and a processing unit 502. As an implementation manner, the communication unit 501 may include a receiving unit and a sending unit. The processing unit 502 is configured to control and manage the operations of the device 500. The communication unit 501 is configured to support communication between the device 500 and other devices (such as a client). The device 500 may further include a storage unit 503 for storing the program code and data of the device 500, such as the resource tag of the node in the above method.
其中,处理单元502可以是处理器或控制器,例如可以是通用CPU,通用处理器,数字信号处理(digital signal processing,DSP),专用集成电路(application specific integrated circuits,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包括一个或多个微处理器组合,DSP和微处理器的组合等。通信单元501可以是通信接口、收发器或收发电路等,其中,该通信接口是统称,在具体实现中,该通信接口可以包括多个接口。存储单元503可以是存储器。The processing unit 502 may be a processor or a controller. For example, the processing unit 502 may be a general-purpose CPU, a general-purpose processor, digital signal processing (DSP), application-specific integrated circuits (ASIC), and a field programmable gate. Array (field programmable array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure. The processor may also be a combination that realizes a computing function, for example, includes a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like. The communication unit 501 may be a communication interface, a transceiver, or a transceiver circuit. The communication interface is collectively referred to. In a specific implementation, the communication interface may include multiple interfaces. The storage unit 503 may be a memory.
该装置500还可以是本申请所涉及的资源调度装置,处理单元502可以支持装置500执行上文中各方法示例中资源调度装置的动作,例如处理单元502可以执行图2中的步骤203和步骤204。通信单元501可以支持装置500与客户端之间的通信,例如,通信单元501用于支持装置500执行图2中的步骤202。The device 500 may also be a resource scheduling device involved in this application. The processing unit 502 may support the device 500 to perform the actions of the resource scheduling device in the method examples above. For example, the processing unit 502 may perform steps 203 and 204 in FIG. 2 . The communication unit 501 may support communication between the device 500 and a client. For example, the communication unit 501 is configured to support the device 500 to perform step 202 in FIG. 2.
具体地,通信单元501可用于接收客户端发送的任务请求,所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源;Specifically, the communication unit 501 may be configured to receive a task request sent by a client, where the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to perform a task by using the resource tag included in the task request. ;
所述处理单元502,可用于根据节点组中每个节点具有的资源标签,从所述节点组中确定目标节点,所述目标节点具有所述任务请求包括的资源标签;指示所述目标节点执行所述任务请求指定的任务。The processing unit 502 may be configured to determine a target node from the node group according to a resource label of each node in the node group, where the target node has the resource label included in the task request; instruct the target node to execute The task requests a specified task.
在一种可能的实现方式中,所述资源标签包括用于标识GPU的资源标签;所述处理单元502,还用于:在所述任务请求对应的资源标签未包括所述GPU的资源标签时,从所述节点组中选择不具有所述GPU的资源标签的节点;指示选择的节点执行所述任务请求指定的任务。In a possible implementation manner, the resource tag includes a resource tag used to identify a GPU; the processing unit 502 is further configured to: when the resource tag corresponding to the task request does not include a resource tag of the GPU And selecting a node from the node group that does not have a resource tag of the GPU; and instructing the selected node to execute a task specified by the task request.
在一种可能的实现方式中,所述任务请求记录有执行所述任务所需的资源量;所述处理单元502,用于:从所述节点组中选择满足所述任务请求指定的资源量的目标节点。In a possible implementation manner, the task request records the amount of resources required to perform the task; the processing unit 502 is configured to: select from the node group a resource amount that satisfies the task request specification Target node.
在一种可能的实现方式中,处理单元502,还用于:获取所述节点组中的每个节点的资源信息,所述资源信息记录有节点的节点标识和节点具有的资源;根据所述节点组中的每个节点的资源信息,记录所述节点组中的每个节点具有的资源标签以及具有的资源量。In a possible implementation manner, the processing unit 502 is further configured to: obtain resource information of each node in the node group, where the resource information records a node identifier of the node and resources possessed by the node; according to the The resource information of each node in the node group records the resource tag and the amount of resources that each node in the node group has.
在一种可能的实现方式中,所述处理单元502,还用于:在所述任务请求指定的任务与另一任务具有亲和性时,指示所述目标节点执行所述另一任务;在所述任务请求指定的任务与另一任务具有反亲和性时,指示所述节点组中与所述目标节点不同的另一节点执行所述另一任务。In a possible implementation manner, the processing unit 502 is further configured to: when the task specified by the task request has affinity with another task, instruct the target node to execute the another task; in When the task specified by the task request has anti-affinity with another task, another node in the node group different from the target node is instructed to execute the another task.
该装置500还可以是本申请所涉及的客户端。处理单元502可以支持装置500执行上文中各方法示例中客户端的动作,例如处理单元502用于支持装置500执行图2中的步骤201。通信单元501可以支持装置500与其它装置(比如资源调度装置)之间的通信,例如,通信单元501用于支持装置500执行图2中的步骤202。The apparatus 500 may also be a client involved in this application. The processing unit 502 may support the device 500 to perform the actions of the client in the method examples above. For example, the processing unit 502 is configured to support the device 500 to perform step 201 in FIG. 2. The communication unit 501 may support communication between the device 500 and other devices (such as a resource scheduling device). For example, the communication unit 501 is used to support the device 500 to perform step 202 in FIG. 2.
具体地,处理单元502,可用于生成任务请求,所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源;Specifically, the processing unit 502 may be configured to generate a task request, where the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to perform a task through the resource tag included in the task request;
通信单元501,可用于向资源调度装置发送所述任务请求,所述任务请求用于所述资源调度装置根据节点组中每个节点具有的资源标签,确定执行所述任务请求指定的任务的目标节点,所述目标节点具有所述任务请求包括的资源标签。The communication unit 501 may be configured to send the task request to a resource scheduling device, where the task request is used by the resource scheduling device to determine a target for executing a task specified by the task request according to a resource label that each node in the node group has. Node, the target node has a resource tag included in the task request.
参阅图6所示,为本申请提供的一种装置示意图,该装置可以是上述资源调度装置或客户端。该装置600包括:存储器601、处理器602和通信接口603。可选的,装置600还可以包括总线604。其中,通信接口603、处理器602以及存储器601可以通过总线604相互连接;总线604可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。所述总线604可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Refer to FIG. 6, which is a schematic diagram of a device provided in the present application. The device may be the resource scheduling device or a client described above. The device 600 includes: a memory 601, a processor 602, and a communication interface 603. Optionally, the device 600 may further include a bus 604. The communication interface 603, the processor 602, and the memory 601 can be connected to each other through a bus 604. The bus 604 can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (referred to as an abbreviation). EISA) bus and so on. The bus 604 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only a thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.
处理器602可以是一个CPU,微处理器,ASIC,或一个或多个用于控制本申请方案程序执行的集成电路。The processor 602 may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the solution of the present application.
通信接口603,使用任何收发器一类的装置,用于与其他装置或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN),有线接入网等。The communication interface 603 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), and wireless local area networks (WLAN). Wired access network, etc.
存储器601可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable read only memory)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线604与处理器相连接。存储器也可以和处理器集成在一起。The memory 601 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (random access memory, RAM), or other types that can store information and instructions The dynamic storage device can also be electrically erasable and programmable read-only memory (Erasable, Programmable, read-only memory), compact disc (read-only memory, CD-ROM) or other optical disk storage, optical disk storage (including compression Optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer Any other medium, but not limited to this. The memory may exist independently, and is connected to the processor through the bus 604. The memory can also be integrated with the processor.
其中,存储器601用于存储执行本申请方案的计算机执行指令,并由处理器602来 控制执行。处理器602用于执行存储器601中存储的计算机执行指令,从而实现本申请上述实施例提供的方法。The memory 601 is configured to store a computer execution instruction for executing the solution of the present application, and the processor 602 controls execution. The processor 602 is configured to execute a computer execution instruction stored in the memory 601, so as to implement the method provided by the foregoing embodiment of the present application.
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。The above product can execute the method provided in the embodiment of the present application, and has the corresponding functional modules and beneficial effects of executing the method. For technical details not described in detail in this embodiment, reference may be made to the method provided in the embodiment of the present application.
在上述实施例中,可以全部或部分地通过软件、硬件或者其组合来实现、当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。指令可以存储在计算机存储介质中,或者从一个计算机存储介质向另一个计算机存储介质传输,例如,指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、双绞线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机存储介质可以是计算机能够存取的任何介质或者是包含一个或多个介质集成的服务器、数据中心等数据存储设备。介质可以是磁性介质,(例如,软盘、硬盘、磁带、磁光盘(MO)等)、光介质(例如光盘)、或者半导体介质(例如ROM、EPROM、EEPROM、固态硬盘(solid state disk,SSD))等。In the above embodiments, all or part may be implemented by software, hardware, or a combination thereof. When implemented using a software program, it may be all or partly implemented in the form of a computer program product. A computer program product includes one or more instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions according to the embodiments of the present application are wholly or partially generated. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The instructions may be stored on a computer storage medium or transmitted from one computer storage medium to another computer storage medium. For example, the instructions may be wired (e.g., coaxial cable, fiber optic, twisted pair, etc.) from a website site, computer, server, or data center. Wire) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer storage medium may be any medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes one or more media integrations. The medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape, a magneto-optical disk (MO), etc.), an optical medium (for example, an optical disk), or a semiconductor medium (for example, ROM, EPROM, EEPROM, solid state disk (SSD)) )Wait.
本申请实施例是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。Embodiments of the present application are described with reference to flowcharts and / or block diagrams of methods, devices (systems), and computer program products according to the embodiments of the present application. It should be understood that each process and / or block in the flowcharts and / or block diagrams, and combinations of processes and / or blocks in the flowcharts and / or block diagrams can be implemented by instructions. These instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine such that instructions executed by the processor of a computer or other programmable data processing device are generated for implementation Means of the function specified in one block or blocks of the flowchart or block and block diagrams.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a particular manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions The device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application also intends to include these changes and variations.

Claims (14)

  1. 一种资源调度方法,其特征在于,包括:A resource scheduling method includes:
    接收客户端发送的任务请求,所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源;Receiving a task request sent by a client, the task request including a resource tag corresponding to the task request, and the task request specifying a resource required to perform a task through the resource tag included in the task request;
    根据节点组中每个节点具有的资源标签,从所述节点组中确定目标节点,所述目标节点具有所述任务请求包括的资源标签;Determining a target node from the node group according to the resource label that each node in the node group has, the target node having the resource label included in the task request;
    指示所述目标节点执行所述任务请求指定的任务。Instruct the target node to execute a task specified by the task request.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, further comprising:
    在所述任务请求对应的资源标签未包括所述GPU的资源标签时,从所述节点组中选择不具有所述GPU的资源标签的节点;When the resource tag corresponding to the task request does not include the resource tag of the GPU, selecting a node without the resource tag of the GPU from the node group;
    指示选择的节点执行所述任务请求指定的任务。Instruct the selected node to execute the task specified by the task request.
  3. 根据权利要求1或2所述的方法,其特征在于,所述任务请求记录有执行所述任务所需的资源量;The method according to claim 1 or 2, wherein the task request records an amount of resources required to perform the task;
    所述根据节点组中每个节点具有的资源标签从所述节点组中确定目标节点,包括:The determining a target node from the node group according to a resource label of each node in the node group includes:
    从所述节点组中选择满足所述任务请求指定的资源量的目标节点。Select a target node from the node group that satisfies the amount of resources specified by the task request.
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 3, wherein the method further comprises:
    获取所述节点组中的每个节点的资源信息,所述资源信息记录有节点的节点标识和节点具有的资源;Acquiring resource information of each node in the node group, where the resource information records a node identifier of the node and resources possessed by the node;
    根据所述节点组中的每个节点的资源信息,记录所述节点组中的每个节点具有的资源标签以及具有的资源量。According to the resource information of each node in the node group, a resource label and an amount of resources that each node in the node group has are recorded.
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 4, further comprising:
    在所述任务请求指定的任务与另一任务具有亲和性时,指示所述目标节点执行所述另一任务;Instructing the target node to execute the another task when the task specified by the task request has affinity with another task;
    在所述任务请求指定的任务与另一任务具有反亲和性时,指示所述节点组中与所述目标节点不同的另一节点执行所述另一任务。When the task specified by the task request has anti-affinity with another task, another node in the node group different from the target node is instructed to perform the another task.
  6. 一种资源调度方法,其特征在于,包括:A resource scheduling method includes:
    生成任务请求,所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源;Generating a task request, the task request including a resource tag corresponding to the task request, and the task request specifying a resource required to perform a task by using the resource tag included in the task request;
    向资源调度装置发送所述任务请求,所述任务请求用于所述资源调度装置根据节点组中每个节点具有的资源标签,确定执行所述任务请求指定的任务的目标节点,所述目标节点具有所述任务请求包括的资源标签。Sending the task request to a resource scheduling device, where the task request is used by the resource scheduling device to determine a target node to perform a task specified by the task request according to a resource label that each node in the node group has, and the target node Having a resource tag included in the task request.
  7. 一种装置,其特征在于,包括:A device, comprising:
    通信单元,用于接收客户端发送的任务请求,所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源;A communication unit, configured to receive a task request sent by a client, the task request including a resource tag corresponding to the task request, and the task request specifying a resource required to perform a task by using the resource tag included in the task request;
    处理单元,用于根据节点组中每个节点具有的资源标签,从所述节点组中确定目标节点,所述目标节点具有所述任务请求包括的资源标签;指示所述目标节点执行所述任务请求指定的任务。A processing unit, configured to determine a target node from the node group according to a resource label that each node in the node group has, the target node having the resource label included in the task request; and instructing the target node to perform the task Request the specified task.
  8. 根据权利要求7所述的装置,其特征在于,所述处理单元,还用于:The apparatus according to claim 7, wherein the processing unit is further configured to:
    在所述任务请求对应的资源标签未包括所述GPU的资源标签时,从所述节点组中选择不具有所述GPU的资源标签的节点;When the resource tag corresponding to the task request does not include the resource tag of the GPU, selecting a node without the resource tag of the GPU from the node group;
    指示选择的节点执行所述任务请求指定的任务。Instruct the selected node to execute the task specified by the task request.
  9. 根据权利要求7或8所述的装置,其特征在于,所述任务请求记录有执行所述任务所需的资源量;所述处理单元,具体用于:The apparatus according to claim 7 or 8, wherein the task request records an amount of resources required to perform the task; and the processing unit is specifically configured to:
    从所述节点组中选择满足所述任务请求指定的资源量的目标节点。Select a target node from the node group that satisfies the amount of resources specified by the task request.
  10. 根据权利要求7至9任一项所述的装置,其特征在于,所述处理单元,还用于:The device according to any one of claims 7 to 9, wherein the processing unit is further configured to:
    获取所述节点组中的每个节点的资源信息,所述资源信息记录有节点的节点标识和节点具有的资源;Acquiring resource information of each node in the node group, where the resource information records a node identifier of the node and resources possessed by the node;
    根据所述节点组中的每个节点的资源信息,记录所述节点组中的每个节点具有的资源标签以及具有的资源量。According to the resource information of each node in the node group, a resource label and an amount of resources that each node in the node group has are recorded.
  11. 根据权利要求7至10任一项所述的装置,其特征在于,所述处理单元,还用于:The device according to any one of claims 7 to 10, wherein the processing unit is further configured to:
    在所述任务请求指定的任务与另一任务具有亲和性时,指示所述目标节点执行所述另一任务;Instructing the target node to execute the another task when the task specified by the task request has affinity with another task;
    在所述任务请求指定的任务与另一任务具有反亲和性时,指示所述节点组中与所述目标节点不同的另一节点执行所述另一任务。When the task specified by the task request has anti-affinity with another task, another node in the node group different from the target node is instructed to perform the another task.
  12. 一种装置,其特征在于,包括:A device, comprising:
    处理单元,用于生成任务请求,所述任务请求包括所述任务请求对应的资源标签,所述任务请求通过所述任务请求包括的资源标签指定执行任务所需的资源;A processing unit, configured to generate a task request, where the task request includes a resource tag corresponding to the task request, and the task request specifies a resource required to perform a task by using the resource tag included in the task request;
    通信单元,用于向资源调度装置发送所述任务请求,所述任务请求用于所述资源调度装置根据节点组中每个节点具有的资源标签,确定执行所述任务请求指定的任务的目标节点,所述目标节点具有所述任务请求包括的资源标签。A communication unit, configured to send the task request to a resource scheduling device, where the task request is used by the resource scheduling device to determine a target node to perform a task specified by the task request according to a resource tag possessed by each node in the node group , The target node has a resource tag included in the task request.
  13. 一种装置,其特征在于,包括处理器和存储器;所述处理器执行存储器存储的计算机指令,使得所述装置执行权利要求1至6中任一项所述的方法。A device, comprising a processor and a memory; the processor executes computer instructions stored in the memory, so that the device executes the method according to any one of claims 1 to 6.
  14. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储计算机指令,所述计算机指令被计算机执行来实现权利要求1至6中任一项所述的方法。A computer-readable storage medium, characterized in that computer instructions are stored in the computer-readable storage medium, and the computer instructions are executed by a computer to implement the method according to any one of claims 1 to 6.
PCT/CN2019/081200 2018-08-17 2019-04-03 Resource scheduling method and device WO2020034646A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810943505.XA CN109375992A (en) 2018-08-17 2018-08-17 A kind of resource regulating method and device
CN201810943505.X 2018-08-17

Publications (1)

Publication Number Publication Date
WO2020034646A1 true WO2020034646A1 (en) 2020-02-20

Family

ID=65404025

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/081200 WO2020034646A1 (en) 2018-08-17 2019-04-03 Resource scheduling method and device

Country Status (2)

Country Link
CN (1) CN109375992A (en)
WO (1) WO2020034646A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4123449A4 (en) * 2020-04-21 2023-07-26 Huawei Cloud Computing Technologies Co., Ltd. Resource scheduling method and related device

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109375992A (en) * 2018-08-17 2019-02-22 华为技术有限公司 A kind of resource regulating method and device
CN109922316A (en) * 2019-03-04 2019-06-21 北京旷视科技有限公司 Media resource scheduling and managing medium resource method, apparatus and electronic equipment
CN110968424B (en) * 2019-09-12 2023-04-07 广东浪潮大数据研究有限公司 Resource scheduling method, device and storage medium based on K8s
CN111090511A (en) * 2019-12-24 2020-05-01 北京推想科技有限公司 Task processing method and device and computer readable storage medium
CN111198756A (en) * 2019-12-28 2020-05-26 北京浪潮数据技术有限公司 Application scheduling method and device of kubernets cluster
CN111274012B (en) * 2020-01-16 2022-07-12 珠海格力电器股份有限公司 Service scheduling method, device, electronic equipment and storage medium
CN113344311A (en) * 2020-03-03 2021-09-03 北京国双科技有限公司 Task execution method and device, storage medium, processor and electronic equipment
CN111556126B (en) * 2020-04-24 2023-04-18 杭州浮云网络科技有限公司 Model management method, system, computer device and storage medium
CN111552550A (en) * 2020-04-26 2020-08-18 星环信息科技(上海)有限公司 Task scheduling method, device and medium based on GPU (graphics processing Unit) resources
CN112346859B (en) * 2020-10-26 2023-06-16 北京市商汤科技开发有限公司 Resource scheduling method and device, electronic equipment and storage medium
CN112395061A (en) * 2020-11-17 2021-02-23 广东电科院能源技术有限责任公司 Computing task scheduling device and method
CN112199200B (en) * 2020-12-04 2021-03-02 腾讯科技(深圳)有限公司 Resource scheduling method and device, computer equipment and storage medium
CN112689007B (en) * 2020-12-23 2023-05-05 江苏苏宁云计算有限公司 Resource allocation method, device, computer equipment and storage medium
CN112667378A (en) * 2020-12-28 2021-04-16 紫光云技术有限公司 Computing resource scheduling method based on resource label
CN112861346A (en) * 2021-02-07 2021-05-28 北京润尼尔网络科技有限公司 Data processing system, method and electronic equipment
CN113360284A (en) * 2021-06-04 2021-09-07 深圳前海微众银行股份有限公司 Resource management method, device and equipment
CN117579705B (en) * 2024-01-16 2024-04-02 四川并济科技有限公司 System and method for dynamically scheduling servers based on batch data requests

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160103699A1 (en) * 2014-10-13 2016-04-14 Vmware, Inc. Cloud virtual machine defragmentation for hybrid cloud infrastructure
CN105677467A (en) * 2015-12-31 2016-06-15 中国科学院深圳先进技术研究院 Yarn resource scheduler based on quantified labels
CN106020937A (en) * 2016-07-07 2016-10-12 腾讯科技(深圳)有限公司 Method, device and system for creating virtual machine
CN107038069A (en) * 2017-03-24 2017-08-11 北京工业大学 Dynamic labels match DLMS dispatching methods under Hadoop platform
CN109144710A (en) * 2017-06-16 2019-01-04 中国移动通信有限公司研究院 Resource regulating method, device and computer readable storage medium
CN109375992A (en) * 2018-08-17 2019-02-22 华为技术有限公司 A kind of resource regulating method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3110106B1 (en) * 2014-04-14 2019-11-06 Huawei Technologies Co., Ltd. Disaster recovery data center configuration method and apparatus in cloud computing architecture
CN107515784B (en) * 2016-06-16 2021-07-06 阿里巴巴集团控股有限公司 Method and equipment for calculating resources in distributed system
CN107818013A (en) * 2016-09-13 2018-03-20 华为技术有限公司 A kind of application scheduling method thereof and device
CN107135257A (en) * 2017-04-28 2017-09-05 东方网力科技股份有限公司 Task is distributed in a kind of node cluster method, node and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160103699A1 (en) * 2014-10-13 2016-04-14 Vmware, Inc. Cloud virtual machine defragmentation for hybrid cloud infrastructure
CN105677467A (en) * 2015-12-31 2016-06-15 中国科学院深圳先进技术研究院 Yarn resource scheduler based on quantified labels
CN106020937A (en) * 2016-07-07 2016-10-12 腾讯科技(深圳)有限公司 Method, device and system for creating virtual machine
CN107038069A (en) * 2017-03-24 2017-08-11 北京工业大学 Dynamic labels match DLMS dispatching methods under Hadoop platform
CN109144710A (en) * 2017-06-16 2019-01-04 中国移动通信有限公司研究院 Resource regulating method, device and computer readable storage medium
CN109375992A (en) * 2018-08-17 2019-02-22 华为技术有限公司 A kind of resource regulating method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4123449A4 (en) * 2020-04-21 2023-07-26 Huawei Cloud Computing Technologies Co., Ltd. Resource scheduling method and related device

Also Published As

Publication number Publication date
CN109375992A (en) 2019-02-22

Similar Documents

Publication Publication Date Title
WO2020034646A1 (en) Resource scheduling method and device
US20230195346A1 (en) Technologies for coordinating disaggregated accelerator device resources
CN107590001B (en) Load balancing method and device, storage medium and electronic equipment
US9971823B2 (en) Dynamic replica failure detection and healing
US10055262B1 (en) Distributed load balancing with imperfect workload information
US20200364608A1 (en) Communicating in a federated learning environment
US10834140B1 (en) Public service network job processing
US9774564B2 (en) File processing method, system and server-clustered system for cloud storage
US10114682B2 (en) Method and system for operating a data center by reducing an amount of data to be processed
WO2020052605A1 (en) Network slice selection method and device
US20180241802A1 (en) Technologies for network switch based load balancing
US20200044881A1 (en) Managing channels in an open data ecosystem
US9853906B2 (en) Network prioritization based on node-level attributes
US20230231825A1 (en) Routing for large server deployments
US9438665B1 (en) Scheduling and tracking control plane operations for distributed storage systems
US10505863B1 (en) Multi-framework distributed computation
US10158709B1 (en) Identifying data store requests for asynchronous processing
CN110226159B (en) Method for performing database functions on a network switch
US20180248772A1 (en) Managing intelligent microservices in a data streaming ecosystem
CN111327651A (en) Resource downloading method, device, edge node and storage medium
WO2018156979A1 (en) Selective distribution of messages in a publish-subscribe system
US20140222959A1 (en) Maximizing data transfer through multiple network devices
KR100985690B1 (en) Method, system and program product for storing downloadable content on a plurality of enterprise storage system ess cells
US20120265801A1 (en) Out of order assembling of data packets
EP3707610B1 (en) Redundant data storage using different compression processes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19850291

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19850291

Country of ref document: EP

Kind code of ref document: A1