CN112114959B

CN112114959B - Resource scheduling method, distributed system, computer device and storage medium

Info

Publication number: CN112114959B
Application number: CN201910541012.8A
Authority: CN
Inventors: 刘鑫; 龚亚辉; 孙英男; 涂中英; 王炜煜
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2024-05-31
Anticipated expiration: 2039-06-21
Also published as: CN112114959A

Abstract

The invention provides a resource scheduling method, a distributed platform, computer equipment and a storage medium. The resource scheduling method comprises the following steps: determining the task type of the task; the method comprises the steps of distributing super-divided resources to a task according to the task type of the task, wherein the super-divided resources distributed to the task are defined as task response resources, and the physical resources are super-divided according to the task type to obtain the super-divided resources; and binding the physical resource corresponding to the task response resource to the task. The invention can improve the resource utilization rate.

Description

Resource scheduling method, distributed system, computer device and storage medium

Technical Field

The present invention relates to the field of distributed platform technologies, and in particular, to a resource scheduling method, a distributed platform, a computer device, and a storage medium.

Background

In order to improve the task processing capability and the task processing reliability of a single node, the prior art proposes a distributed platform, which centrally manages resources in a plurality of physical server nodes or virtual machine nodes, and performs task request response. In order to improve the resource utilization rate of the distributed platform, related researches in the prior art make super-division on resources in the nodes, so that the logic quantity of the resources in the nodes is larger than the actual quantity of the resources, and the sharing scheduling of the resources is realized.

However, the present inventors have found that, in the current manner of superdividing resources, only unit resources are superdivided in number, for example, a unit resource is one physical device, and one physical device is superdivided into two units, so that when one unit resource is superdivided into two logic resources, and two logic resources are allocated to two tasks, the two tasks share the same unit resource, and the shared scheduling manner still has the following problems: if one physical device includes resources capable of executing different types of tasks simultaneously, when a certain type of task occupies one physical device, the resources of the physical device for processing other types of tasks are idle, and the resource utilization rate is low.

Therefore, a resource scheduling method, a distributed platform, a computer device and a storage medium are provided to further improve the resource utilization rate, which is a technical problem that needs to be solved in the field.

Disclosure of Invention

The invention aims to provide a resource scheduling method, a distributed platform, computer equipment and a storage medium, which are used for solving the technical problems in the prior art.

In one aspect, the present invention provides a resource scheduling method for achieving the above object.

The resource scheduling method comprises the following steps: determining the task type of the task; the method comprises the steps of distributing super-divided resources to a task according to the task type of the task, wherein the super-divided resources distributed to the task are defined as task response resources, and the physical resources are super-divided according to the task type to obtain the super-divided resources; and binding the physical resource corresponding to the task response resource to the task.

Further, the resource scheduling method further comprises the following steps: obtaining superdistribution configuration information of physical resources, wherein the superdistribution information comprises information for superdistribution of the physical resources according to task types; performing superdivision on the physical resources according to the superdivision configuration information to obtain superdivided resources; establishing a corresponding relation between physical resources and the super-divided resources; the step of binding the physical resource corresponding to the task response resource to the task comprises the following steps: and determining the physical resource corresponding to the task response resource according to the corresponding relation, and binding the physical resource to the task.

Further, the physical resource can process the first task and the second task simultaneously, wherein the task types of the first task and the second task are different; the hyper-allocation information comprises a first configuration parameter and a second configuration parameter, wherein the first configuration parameter comprises a type parameter and a first hyper-fraction number of the first task, and the second configuration parameter comprises a type parameter and a second hyper-fraction number of the second task.

Further, the physical resource is a GPU physical device, the first task is a video transcoding task, and the second task is an artificial intelligence training task.

Further, the first superminute number is the maximum number of physical resources capable of simultaneously processing the first task; the second superscalar number is the maximum number of physical resources that can simultaneously process the second task.

Further, the step of obtaining the hyper-distributed configuration information of the resource specifically includes: and reading environment variable parameters in the configuration file, wherein the environment variable parameters comprise superminute configuration information.

Further, after the step of establishing the correspondence between the physical resource and the super-divided resource, the physical resource scheduling method further includes: caching the corresponding relation to a memory; before the step of determining the physical resource corresponding to the task response resource according to the correspondence relationship, the physical resource scheduling method further comprises the following steps: reading the corresponding relation of the cache in the memory; the physical resource scheduling method further comprises the following steps: and detecting whether the configuration file is rewritten, wherein when the configuration file is rewritten, the environment variable parameters in the rewritten configuration file are read, the corresponding relation is reestablished, and the corresponding relation cached in the memory is updated.

In another aspect, to achieve the above object, the present invention provides a distributed platform.

The distributed platform comprises a management node and a plurality of processing nodes, wherein: the processing node comprises physical resources and a resource scheduling device, wherein the resource scheduling device is used for executing any one of the resource scheduling methods provided by the invention so as to schedule the physical resources and reporting resource information to the management node, and the resource information comprises super-divided resource types and the number of super-divided resources corresponding to each resource type; the management node is used for scheduling tasks to the processing node according to the resource information.

To achieve the above object, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.

The resource scheduling method, the distributed platform, the computer equipment and the storage medium provided by the invention can super-divide the physical resources according to the task types which can be processed by the physical resources when the physical resources are super-divided, so that the same physical resource can be super-divided into a plurality of super-divided resources corresponding to different task types. By adopting the resource scheduling method, the same physical resource is subjected to superdivision according to the task type, so that different task types can request the same physical resource, and the utilization rate of the physical resource can be improved.

Drawings

FIG. 1 is a flowchart of a resource scheduling method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of resource superdivision in a resource scheduling method according to an embodiment of the present invention;

FIG. 3 is a block diagram of a resource scheduling device according to an embodiment of the present invention;

FIG. 4 is a block diagram of a distributed platform provided by an embodiment of the present invention;

fig. 5 and fig. 6 are schematic diagrams of a service processing flow of a distributed platform according to an embodiment of the present invention; and

Fig. 7 is a hardware configuration diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Currently, in a distributed platform, as a resource disposed in a physical server node or a virtual machine node in a cluster, a graphics processor (Graphics Processing Unit, GPU) mainly performs scheduling in units of blocks, that is, when performing resource scheduling in response to a task request, at least one GPU physical device is generally allocated to one task.

In order to improve the utilization rate of the GPU physical resources, the invention provides a solution, namely, in one embodiment, the GPU physical resources in the distributed platform are subjected to super division, so that the number of GPU logic resources in the distributed platform is larger than the number of actual GPU physical devices, and the sharing scheduling of the GPU physical resources is realized. When the method is used for super-dividing, according to the current resource super-dividing thought, GPU physical resources are super-divided in quantity, for example, one piece of GPU physical equipment can be super-divided into two pieces, so that one piece of GPU physical equipment is regarded as two GPU logic resources, and the utilization rate of the GPU physical resources is improved.

However, the inventor further researches and discovers that with the development of artificial intelligence technology, the machine learning data is larger and larger in size, the requirement on the computing capacity of a server is higher and higher, and the GPU is widely applied to data processing of machine learning. In addition, in many video processing scenarios, the encoding and decoding chips in the GPU may be used to accelerate the encoding and decoding efficiency of video, so that the GPU is also widely used in the field of video transcoding. That is, the GPU physical device includes data processing capable of machine learning and video transcoding, and the resources in the GPUs required by the two tasks are different, that is, the GPU physical device can execute the resources of different types of tasks simultaneously, so according to the above-mentioned super-division of resources, when one type of task occupies one GPU physical device, the resources of the other type of task are idle, and the problem of low resource utilization still exists.

In order to solve the technical dilemma, the invention provides a resource scheduling method, a distributed platform, a computer device and a storage medium, wherein in the resource scheduling method, when physical resources are subjected to superdivision, the superdivision is carried out according to the task types which can be processed by the physical resources, the resources obtained after the superdivision are the superdivision resources, when task response is carried out, the task types of the tasks are determined first, then the superdivision resources corresponding to the task types are allocated to the tasks according to the task types, so that task response resources are obtained for the tasks, and then the physical resources corresponding to the task response resources are bound to the tasks, so that the bound physical resources are utilized to execute in the task execution process. By adopting the resource scheduling method, different task types can request the same physical resource, and the utilization rate of the physical resource can be further improved.

In the resource scheduling method, when the physical resource is GPU physical equipment, the resource utilization rate of the GPU equipment can be improved.

The resource scheduling method, the distributed platform, the computer device and the storage medium provided by the invention will be described in detail through specific embodiments. It should be noted that, for convenience of description, the detailed description in the following embodiments uses GPU physical device resources as an example, but the resource scheduling method of the present invention is not limited to GPU physical device resources.

Example 1

In an application scenario, an execution body of the resource scheduling method may be a resource management plug-in a processing node of a distributed platform, and when a management node of the distributed platform schedules a task to a certain processing node, the resource management plug-in the processing node binds a physical resource in the processing node to the task according to a task type of the task. Specifically, fig. 1 is a flowchart of a resource scheduling method according to an embodiment of the present invention, and as shown in fig. 1, the resource scheduling method according to the embodiment includes the following steps S101 to S103.

Step S101: the task type of the task is determined.

Specifically, the task allocated to the processing node is one of the tasks that can be executed by the physical resource, and thus, the task type determined in this step, that is, the type of one task that can be executed by the physical resource, for example, the task that can be executed by the physical resource simultaneously includes tasks of three task types, i.e., task a, task B, and task C.

The physical resource may be a memory, CPU, GPU, or other resource. Taking a physical resource as a GPU physical device as an example, tasks which can be executed by the GPU physical device comprise a video transcoding task, an artificial intelligent training task and the like, and in the step, the task type of the task is determined to be the video transcoding task or the artificial intelligent training task.

Step S102: and distributing the super-divided resources to the task according to the task type of the task.

In the invention, the resource obtained by superdividing the physical resource according to the task type is defined as the superdivided resource, and the superdivided resource allocated to the task is defined as the task response resource.

When the physical resource is subjected to superdivision, the basis of the superdivision includes task types, that is, if the physical resource can simultaneously execute tasks of three task types, the physical resource can be correspondingly superdivided into three types, on the basis, each type can be further superdivided in number, that is, after the physical resource is superdivided, a plurality of types of superdivided resources are obtained, and each type of superdivided resources can be a plurality of types.

Optionally, in the step S101, the task type of the task may be determined by analyzing the request parameter in the task instance, and then in the step S102, the super-divided resource is allocated to the task according to the task type and the super-divided resource type determined in the step S101. For example, the post-superminute resources include post-superminute resources corresponding to the task a, post-superminute resources corresponding to the task B, and post-superminute resources corresponding to the task C, and when the task type determined in step S101 is the task a, the post-superminute resources corresponding to the task a are allocated to the task in step S102.

Optionally, the request parameters in the task instance include resource type parameters of the requested super-divided resources, where the resource type parameters are consistent with the task types, and in the step S101, when determining the task types of the tasks, the resource type parameters may be specifically obtained, and the obtained resource type parameters are equivalent to the task types of the determined tasks. In this step S102, the super-divided resources are allocated to the task according to the resource type parameter. For example, the post-superminute resources include post-superminute resources corresponding to the task a, post-superminute resources corresponding to the task B, and post-superminute resources corresponding to the task C, where the resource type parameters are respectively, in order, the resource.a, the resource.b, and the resource.c, and when the resource type parameter obtained in the step S101 is the resource.a, the post-superminute resources corresponding to the task a are allocated to the task in the step S102.

For example, the physical resource 1 in a certain processing node may execute the task a and the task B simultaneously, and first, the physical resource 1 is superdivided according to the task type to obtain the resource corresponding to the task a and the resource corresponding to the task B, and at the same time, after the number of the resources is further superdivided, 3 superdivided resources corresponding to the task a and 2 superdivided resources corresponding to the task B are obtained. If the task type of the task is task a, in step S102, the task a may be allocated with the super-divided resources of the corresponding task a after the physical resource 1 is super-divided.

Step S103: and binding the physical resource corresponding to the task response resource to the task.

When a processing node includes a single physical resource, task response resources to which tasks are allocated to the processing node correspond to one physical resource, and for the physical resource, tasks of different task types can be bound at the same time.

When the processing node includes a plurality of physical resources, task response resources to which tasks are requested, which are allocated to the processing node, may correspond to different physical resources, and by this step, the physical resources corresponding to the task response resources are bound to the tasks. For any one physical resource, the physical resource can be simultaneously bound to tasks of different task types.

For example, the processing node includes a physical resource 1 and a physical resource 2, where the physical resource 1 is superdivided to obtain 1 superdivided resource corresponding to the task a and 2 superdivided resources corresponding to the task B, and the physical resource 2 is superdivided to obtain 1 superdivided resource corresponding to the task a and 2 superdivided resources corresponding to the task B. If the task type of the task is task B, in step S102, the task is allocated with the post-superminute resource corresponding to task B obtained after superminute of the physical resource 1, that is, the post-superminute resource is the task response resource, and in step S103, the physical resource 1 is bound to the task because the task response resource corresponds to the physical resource 1.

When the resource scheduling method provided by the embodiment is adopted, the physical resources are super-divided according to the task types which can be processed by the physical resources, so that the same physical resource can be super-divided into a plurality of super-divided resources corresponding to different task types, based on the super-divided resources, the task type of the task is determined firstly when task response is carried out, then the super-divided resources are allocated to the task according to the task type, and then the physical resources corresponding to the allocated super-divided resources (namely the task response resources) are bound to the task, so that the bound physical resources are utilized to execute in the task executing process. By adopting the resource scheduling method, the same physical resource is subjected to superdivision according to the task type, so that different task types can request the same physical resource, and the utilization rate of the physical resource can be improved.

Optionally, in an embodiment, the resource scheduling method further includes: obtaining superdistribution configuration information of physical resources, wherein the superdistribution information comprises information for superdistribution of the physical resources according to task types; performing superdivision on the physical resources according to the superdivision configuration information to obtain superdivided resources; establishing a corresponding relation between physical resources and the super-divided resources; the step of binding the physical resource corresponding to the task response resource to the task comprises the following steps: and determining the physical resource corresponding to the task response resource according to the corresponding relation, and binding the physical resource to the task.

Specifically, before step S102, that is, before the task is allocated with the post-superdivision resources, superdivision configuration information of the physical resources is first obtained during superdivision, where the superdivision configuration information includes dividing the physical resources into resources of which types according to task types, each type of resources is further superdivided into several pieces of information, after the superdivision configuration information is obtained, superdivision is performed on the physical resources to obtain the post-superdivision resources, and a correspondence relationship between the physical resources and the post-superdivision resources is established, where when a processing node includes multiple physical resources, each physical resource and the post-superdivision resources need to establish a correspondence relationship after the physical resources are superdivided. On this basis, in step S103, when the physical resource corresponding to the task response resource is determined according to the correspondence, the physical resource corresponding to the task response resource is queried in the correspondence by using the task response resource (i.e., the super-divided resource allocated to the task), and then the physical resource corresponding to the task response resource is bound to the task.

By adopting the resource scheduling method provided by the embodiment, the physical resources are subjected to superdivision according to the superdivision configuration information, so that a user can rewrite the superdivision configuration information according to the actual change of the physical resources in the distributed platform and the actual change of the processing tasks of the distributed platform, so that different superdivision modes are correspondingly performed on the physical resources, the superdivision control mode of the physical resources is more flexible on the basis of improving the utilization rate of the physical resources, and the resource scheduling can meet the actual needs. After the superdivision, storing the corresponding relation between the physical resource and the superdivision resource, so that when the physical resource corresponding to the task response resource is determined, the corresponding relation can be directly inquired, and the data processing speed is high.

Optionally, in an embodiment, the physical resource is capable of processing the first task and the second task simultaneously, wherein the task types of the first task and the second task are different; the hyper-allocation information comprises a first configuration parameter and a second configuration parameter, wherein the first configuration parameter comprises a type parameter and a first hyper-fraction number of the first task, and the second configuration parameter comprises a type parameter and a second hyper-fraction number of the second task.

Specifically, the physical resource is at least capable of processing two tasks with different task types, and accordingly, the super-allocation information includes configuration parameters corresponding to the different tasks, namely, a first configuration parameter corresponding to a first task and a second configuration parameter corresponding to a second task, for example, the first configuration parameter is A:2, the second configuration parameter is B:3, wherein A is a type parameter of the first task, 2 is a first super-division number, B is a type parameter of the second task, and 3 is a second super-division number.

By adopting the resource scheduling method provided by the embodiment, different configuration parameters are set corresponding to different tasks, and each configuration parameter comprises the type parameter and the superscore number of the task, so that the superscore information simultaneously comprises the task type and the superscore number corresponding to each task type, superscores can be carried out from two aspects of the task type and the superscore number, the superscore is more flexible, and the utilization rate of physical resources is further improved.

Optionally, in an embodiment, the physical resource is a GPU physical device, the first task is a video transcoding task, and the second task is an artificial intelligence training task.

Specifically, for GPU physical devices, when performing the super division, two types of resources may be divided, namely, the resources corresponding to the video transcoding task and the resources corresponding to the artificial intelligence training task, and if the resources corresponding to the video transcoding task and/or the resources corresponding to the Artificial Intelligence (AI) training task are super-divided in number, the super-divided resources are obtained after the super division. Fig. 2 is a schematic diagram of the super-division of resources in the resource scheduling method according to the embodiment of the present invention, as shown in fig. 2, after one GPU physical device GPUV is super-divided, the obtained super-divided resources include GPU-AI, GPU-Video0 and GPU-Video1, where GPU-AI is the super-divided resource corresponding to the AI training task, and GPU-Video0 and GPU-Video1 are the super-divided resources corresponding to the Video transcoding task.

By adopting the resource scheduling method provided by the embodiment, the utilization rate of the GPU physical equipment can be improved by performing super-division on the GPU physical equipment, so that the GPU physical equipment can execute the artificial intelligent training task while executing the video transcoding task.

Optionally, in one embodiment, the first superscalar number is the maximum number of physical resources that can simultaneously process the first task; the second superscalar number is the maximum number of physical resources that can simultaneously process the second task.

Specifically, the physical resources are divided in task types, and further superdivision is performed in number for each task type, wherein the number of superdivisions is the maximum number of the task types that the physical resources can simultaneously.

By adopting the resource scheduling method provided by the embodiment, the task types and the number are simultaneously used as the basis of superminute, and the maximum number of the physical resources which can be simultaneously used for the task types is set according to the superminute number, so that task conflict can be avoided, and the resource utilization rate can be maximally improved.

Optionally, in an embodiment, the step of obtaining the hyper-allocation information of the resource specifically includes: and reading environment variable parameters in the configuration file, wherein the environment variable parameters comprise superminute configuration information.

Specifically, the super-distribution information is used as an environment variable parameter in the configuration file, and when the configuration file is read and modified, the reading and modification of the super-distribution information can be realized.

By adopting the resource scheduling method provided by the embodiment, the superdistribution configuration information is loaded in the mode of environment variable parameters in the configuration file, the mode is simple, and a parameter transfer channel is not required to be set independently for the superdistribution information.

Optionally, in an embodiment, after the step of establishing the correspondence between the physical resource and the super-divided resource, the physical resource scheduling method further includes: caching the corresponding relation to a memory; before the step of determining the physical resource corresponding to the task response resource according to the correspondence relationship, the physical resource scheduling method further comprises the following steps: reading the corresponding relation of the cache in the memory; the physical resource scheduling method further comprises the following steps: and detecting whether the configuration file is rewritten, wherein when the configuration file is rewritten, the environment variable parameters in the rewritten configuration file are read, the corresponding relation is reestablished, and the corresponding relation cached in the memory is updated.

Specifically, after the corresponding relation between the physical resources and the super-divided resources is established, the corresponding relation is cached in the memory, so that when the physical resources corresponding to the task response resources are determined, the corresponding relation is obtained by reading the cache from the memory, and the data reading speed is improved. Meanwhile, the user can modify the super-allocation information in the configuration file, when the configuration file is detected to be rewritten, namely, the configuration file is rewritten and read, the latest environment variable parameters are obtained, namely, the latest super-allocation information is obtained, the physical resources are super-divided, a new corresponding relation is further established, and the memory is updated by the new corresponding relation. When the corresponding relation of the cache is read from the memory again, the corresponding relation obtained based on the latest superdivision mode can be obtained.

By adopting the resource scheduling method provided by the embodiment, the data processing speed can be improved, and meanwhile, after the super-allocation mode is changed, resource scheduling can be performed in time based on the new super-allocation mode.

Example two

Corresponding to the first embodiment, the second embodiment of the present invention provides a resource scheduling device, and corresponding technical features and technical effects are not described in detail in this embodiment, and reference may be made to the first embodiment for relevant points. Specifically, fig. 3 is a block diagram of a resource scheduling apparatus provided in an embodiment of the present invention, and as shown in fig. 3, the apparatus includes a determining module 301, an allocating module 302, and a binding module 303.

The determining module 301 is configured to determine a task type of the task. The allocation module 302 is configured to allocate the post-superminute resource to the task according to the task type of the task, where the post-superminute resource allocated to the task is defined as a task response resource, and the physical resource is superdivided according to the task type to obtain the post-superminute resource. The binding module 303 is configured to bind a physical resource corresponding to the task response resource to the task.

Optionally, in one embodiment, the resource scheduling method also comprises an acquisition module, a superdivision module, a processing module,

The acquisition module is used for acquiring the superdistribution configuration information of the physical resources, wherein the superdistribution information comprises information for superdistribution of the physical resources according to task types; the super-division module is used for performing super-division on the physical resources according to the super-division configuration information to obtain super-divided resources; the processing module is used for establishing the corresponding relation between the physical resource and the super-divided resource. The binding module 303 specifically performs the steps of: and determining the physical resource corresponding to the task response resource according to the corresponding relation, and binding the physical resource to the task.

Optionally, in one embodiment, the acquiring module is configured to perform the steps of: and reading environment variable parameters in the configuration file, wherein the environment variable parameters comprise superminute configuration information.

Optionally, in an embodiment, the physical resource scheduling apparatus further includes: the device comprises a caching module, a reading module and a detecting module. The cache module is used for caching the corresponding relation to the memory after the processing module establishes the corresponding relation between the physical resource and the super-divided resource; the reading module is used for reading the corresponding relation of the cache in the memory before the binding module determines the physical resource corresponding to the task response resource according to the corresponding relation; the detection module is used for detecting whether the configuration file is rewritten, wherein the acquisition module is also used for reading environment variable parameters in the rewritten configuration file when the configuration file is rewritten, the processing module is also used for reestablishing the corresponding relationship, and the caching module is also used for updating the corresponding relationship cached in the memory.

Example III

An embodiment of the present invention provides a distributed platform, and fig. 4 is a block diagram of the distributed platform provided by the embodiment of the present invention, as shown in fig. 4, where the distributed platform includes: the management node 401 and the plurality of processing nodes 402, where the processing nodes include a physical resource 4021 and a resource scheduling device 4022, where the resource scheduling device 4022 may be used to execute the resource scheduling method provided in the first embodiment, or the resource scheduling device 4022 is a resource scheduling device provided in the second embodiment, and reference may be specifically made to the first embodiment or the second embodiment.

The resource scheduler 4022 is configured to schedule physical resources, and the above is referred to for a process of scheduling physical resources. The resource scheduling device 4022 is further configured to report resource information to the management node, where the resource information includes a super-divided resource type and a number of super-divided resources corresponding to each resource type, where the resource type corresponds to a task type; the management node 401 is configured to schedule tasks to the processing nodes according to the resource information.

Specifically, the distributed platform is a cluster composed of a plurality of physical server nodes/virtual machine nodes. The distributed platform integrates various resources, and a business party requests different resources according to business requirements. The management nodes 401 in the distributed platform can uniformly manage and schedule the resources in the cluster, and can effectively improve the utilization rate of the resources.

The physical resource 4021 included in the processing node 402 is GPU physical equipment, which contains a codec chip facing the video processing field, and for general AI training, the codec chip is not needed; for video codec tasks, the computational chip usage for the GPU is not high. Thus, the GPU physical device can be considered to have two types of resources according to the usage field Jing Lai, one type being AI-training oriented computing resources and one type being video-processing oriented computing resources. For each type of resource, multiple tasks of the same type can be run simultaneously. The two resources actually correspond to one physical device, and correspond to the platform, so that the physical resources of the physical GPU physical device can be oversubscribed according to the service scene.

The types of resources are superdivided: taking a GPU as an example, two different services in a distributed platform mainly need to use GPU physical resources, namely video transcoding service GPU-video, and each piece of GPU physical equipment can run R_ (GPU-video) tasks simultaneously, namely the super-division number is R_ (GPU-video); one type is the AI training service GPU-AI, where each GPU physical device can only run R_ (GPU-AI) AI tasks, i.e., the superscalar number is R_ (GPU-AI). If the number of GPUs owned by a certain processing node 402 is n, after the physical resources of the GPUs on the processing node 402 are overdrived, the obtained overdrived resources are as follows:

N_(gpu-video)＝R_(gpu-video)×n

N_(gpu-ai)＝R_(gpu-ai)×n

After superdivision, the processing node 402 has two types of resources, i.e. gpu-video and gpu-ai, after superdivision, and performs task scheduling by reporting to the management node 401. When a video transcoding task requests GPU resources, corresponding to the GPU-video of the requested resource type; when the AI training task requests GPU resources, the corresponding requested resource type is GPU-AI. The method realizes the super division of the same physical resource according to the service scene, realizes the mixed part of the service and improves the utilization rate of the resource.

Fig. 5 and fig. 6 are schematic diagrams of a service processing flow of a distributed platform according to an embodiment of the present invention, where, as shown in fig. 5 and fig. 6, the service processing flow of the distributed platform is as follows:

Step S.1: the device-plug in (i.e., the resource scheduler 4022) reads the configuration information of the superminute. The hyper-distributed configuration information is written into a configuration file of the device-plug in module before deployment in the form of environment variables, and specific environment parameters are as follows:

RESOURCE_ NAMES =nvidia.com/GPU-ai: 1, nvidia.com/GPU-video:2, i.e. the GPU physical RESOURCEs are divided into two categories, namely nvidia.com/GPU-ai, the superdivision number is 1, i.e. the number is not superdivision; the other is nvidia.com/gpu-video, the supersplit number being 2.

Step S.2: the device-plug in module will correspond the physical resource to the serial number of the super-divided resource according to the super-divided number of each resource, and buffer the physical resource to the memory, and the specific corresponding rule is as follows:

for the GPU-video, the super-division number is 2, and the super-divided uuid is the physical GPU uuid plus the suffix sequence number, namely { physical GPU uuid } -0 and { physical GPU uuid } -1;

For GPU-ai, the superscalar number is 1, and the superscalar uuid is { uuid of physical GPU } -0.

Step S.3: the device-plug in module reports the super-divided resource types and numbers of the processing nodes 402 to the management node 401.

Step S.4: the management node 401 schedules tasks according to the reported resource information. When the AI training task uses GPU resources, the resource request mode is as follows:

Step S.5: after the dispatching is completed, the task is distributed to a certain processing node 402, and the device-plug in module of the processing node 402 distributes and binds the device resource requested by the task according to the corresponding relation between the cached physical resource and the super-divided resource.

Example IV

The fifth embodiment also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server, or a rack-mounted server (including an independent server or a server cluster formed by a plurality of servers) that can execute the program. As shown in fig. 7, the computer device 01 of the present embodiment includes at least, but is not limited to: the memory 011, the processor 012, which can be communicatively connected to each other through a system bus, as shown in fig. 7. It is noted that fig. 7 only shows a computer device 01 having a component memory 011 and a processor 012, but it is understood that not all of the components shown are required to be implemented, and more or fewer components may alternatively be implemented.

In this embodiment, the memory 011 (i.e., readable storage medium) includes flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, memory 011 may be an internal storage unit of computer device 01, such as a hard disk or memory of computer device 01. In other embodiments, the memory 011 may also be an external storage device of the computer device 01, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 01. Of course, the memory 011 may also include both the internal memory unit of the computer device 01 and its external memory device. In this embodiment, the memory 011 is generally used to store an operating system and various application software installed in the computer device 01, for example, program codes of the resource scheduling apparatus of the second embodiment. Further, the memory 011 can also be used for temporarily storing various types of data that have been output or are to be output.

The processor 012 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 012 is typically used to control the overall operation of the computer device 01. In this embodiment, the processor 012 is configured to execute program codes stored in the memory 011 or process data such as a resource scheduling method and the like.

Example five

The fifth embodiment also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor, performs the corresponding functions. The computer readable storage medium of the present embodiment is used for storing a resource scheduling method, and when executed by a processor, implements the resource scheduling method of the first embodiment.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A method for scheduling resources, comprising:

determining the task type of the task;

The method comprises the steps of distributing super-divided resources to a task according to the task type of the task, wherein the super-divided resources distributed to the task are defined as task response resources, and performing super-division on physical resources according to the task type to obtain the super-divided resources; and

Binding physical resources corresponding to the task response resources to the task;

The resource scheduling method further comprises the following steps:

acquiring the superdistribution configuration information of the physical resource, wherein the superdistribution information comprises information for superdistribution of the physical resource according to a task type;

performing superdivision on the physical resources according to the superdivision configuration information to obtain superdivided resources;

Establishing a corresponding relation between the physical resource and the super-divided resource;

after the step of establishing the correspondence between the physical resource and the super-divided resource, the physical resource scheduling method further includes: caching the corresponding relation to a memory;

Before the step of determining the physical resource corresponding to the task response resource according to the correspondence relationship, the physical resource scheduling method further includes: reading the corresponding relation cached in the memory;

The physical resource scheduling method further comprises the following steps: and detecting whether the configuration file is rewritten, wherein when the configuration file is rewritten, the environment variable parameters in the rewritten configuration file are read, the corresponding relation is reestablished, and the corresponding relation cached in the memory is updated.

2. The resource scheduling method of claim 1, wherein the step of binding the physical resource corresponding to the task response resource to the task comprises: and determining physical resources corresponding to the task response resources according to the corresponding relation, and binding the physical resources to the task.

3. The resource scheduling method of claim 2, wherein,

The physical resource can process a first task and a second task simultaneously, wherein the task types of the first task and the second task are different;

the hyper-allocation information comprises a first configuration parameter and a second configuration parameter, wherein the first configuration parameter comprises a type parameter and a first hyper-fraction of the first task, and the second configuration parameter comprises a type parameter and a second hyper-fraction of the second task.

4. The resource scheduling method of claim 3, wherein,

The physical resource is GPU physical equipment, the first task is a video transcoding task, and the second task is an artificial intelligent training task.

5. The resource scheduling method of claim 3, wherein,

The first superdivision number is the maximum number of the physical resources capable of simultaneously processing the first tasks;

The second superscore number is the maximum number of the physical resources capable of simultaneously processing the second task.

6. The resource scheduling method of claim 2, wherein,

The step of obtaining the hyper-distributed configuration information of the physical resource specifically comprises the following steps: and reading environment variable parameters in a configuration file, wherein the environment variable parameters comprise the hyper-distributed configuration information.

7. A distributed system comprising a management node and a plurality of processing nodes, wherein:

The processing node comprises a physical resource and a resource scheduling device, wherein the resource scheduling device is used for executing the resource scheduling method of any one of claims 1 to 6 so as to schedule the physical resource, and is also used for reporting resource information to the management node, wherein the resource information comprises super-divided resource types and the number of super-divided resources corresponding to each resource type;

and the management node is used for scheduling tasks to the processing node according to the resource information.

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when the computer program is executed by the processor.

9. A computer-readable storage medium having stored thereon a computer program, characterized by: which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.