CN111580974A

CN111580974A - GPU instance distribution method and device, electronic equipment and computer readable medium

Info

Publication number: CN111580974A
Application number: CN202010383919.9A
Authority: CN
Inventors: 杨启凡; 罗建勋; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-05-08
Filing date: 2020-05-08
Publication date: 2020-08-25
Anticipated expiration: 2040-05-08
Also published as: CN111580974B

Abstract

The embodiment of the disclosure discloses a GPU instance distribution method and a GPU instance distribution device. One embodiment of the method comprises: for each service in a service set needing GPU operation, acquiring service information of the service; determining GPU computing power required by each service based on the service information of the service; grouping the service set based on the service priority of each service to generate at least one service group; determining the number of GPU instances required for each service group based on the determined GPU computing power; and allocating the GPU instances to each service group based on the GPU resources and the number of the GPU instances required by each service group. The embodiment realizes the allocation of the GPU instances in the unit of the group, and provides an effective mode for resource allocation.

Description

GPU instance distribution method and device, electronic equipment and computer readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for allocating GPU instances, an electronic device, and a computer-readable medium.

Background

In recent years, with the continuous development of understanding of big data and videos. Whether the multiple instances of the GPU (Graphics Processing Unit) can be reasonably allocated may not only affect the quality of the service, but also result in a low utilization rate of resources.

However, the related method can only perform resource scheduling on a single service, and a more reasonable resource allocation method is lacking when multiple services are confronted simultaneously.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of one application scenario of a GPU instance allocation method according to some embodiments of the present disclosure;

FIG. 2 is a flowchart of one embodiment of a GPU instance assignment method, according to an embodiment of the present disclosure;

FIG. 3 is a flow diagram in accordance with still further embodiments of the disclosed GPU example allocation method;

FIG. 4 is an exemplary flowchart of the determine GPU computational power step of further embodiments of the GPU instance assignment method according to the present disclosure;

FIG. 5 is a schematic block diagram of some embodiments of a GPU example distribution apparatus according to the present disclosure;

FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

FIG. 1 is a schematic diagram 100 of one application scenario of a GPU instance allocation method according to some embodiments of the present disclosure.

As shown in fig. 1, as an example, the electronic device 101 first acquires, for a service 1, a service 2, a service 3, and a service 4 in a service set, service information 102 of the service 1, service information 104 of the service 2, service information 106 of the service 3, and service information 108 of the service 4 corresponding thereto. Then, the GPU calculation power 103 required for service 1, the GPU calculation power 105 required for service 2, the GPU calculation power 107 required for service 3, and the GPU calculation power 109 required for service 4 are determined, respectively, based on the acquired service information. Next, the services 1 to 4 in the service set are grouped according to the service priority of each service, for example, a service group 1 and a service group 2 are generated. Wherein, the service group 1 may include a service 1 and a service 3; service group 2 may include service 2 and service 4. Here, the number of GPU instances required for service group 1 is determined by the GPU calculation power 103 required for service 1 and the GPU calculation power 107 required for service 3; the number of GPU instances required for service group 2 is determined by the GPU power required for service 2 105 and the GPU power required for service 4 109. And finally, sequentially distributing the GPU instances corresponding to the existing GPU resources to the service group 1 and the service group 2 according to the GPU instances required by the service group 1 and the service group 2.

It is understood that the method of generating GPU instance assignments may be performed by the electronic device 101 described above. The electronic device 101 may be hardware or software. When the electronic device 101 is hardware, it may be various electronic devices with information processing capabilities, including but not limited to smartphones, tablets, e-book readers, laptop portable computers, desktop computers, servers, and the like. When the electronic device 101 is software, it can be installed in the electronic devices listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of electronic devices in fig. 1 is merely illustrative. There may be any number of electronic devices, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a GPU instance allocation method according to the present disclosure is shown. The GPU instance distribution method comprises the following steps:

step 201, for each service in a service set which needs GPU operation, acquiring service information of the service;

in some embodiments, an execution subject of the GPU instance allocation method (e.g., the electronic device 101 shown in fig. 1) may obtain the service information of each service in the service set by a wired connection or a wireless connection. Here, the service may be characterized as a heavy computing power demand, stateless computing. Here, the above-mentioned service may include, but is not limited to, at least one of: an online synchronization service or a batch processing service. The service information may include, but is not limited to, at least one of the following: the service type information, the service log information, and the service CPU utilization information.

Step 202, determining the GPU computing power required by each service based on the service information of the service.

In some embodiments, the execution agent may determine the GPU calculation power required by the service according to the service information of each service obtained in step 201. Here, GPU computing power is generally used to characterize the amount of computation required for each service. As an example, from the service information for each service, the GPU effort required for the service may be determined using various means.

Step 203, grouping the service sets based on the service priority of each service to obtain at least one service group.

In some embodiments, the execution subject groups the services included in the service set according to the order of the service priority. At least one service group may then be obtained. As an example, when 100 services are included in a service set, the services in the service set are divided into 10 groups, where each group includes 10 services, in the order of the service priority from high to low and the number of services accommodated by each group is defined as 10.

And step 204, determining the number of GPU instances required by each service group based on the determined GPU computing power.

In some embodiments, the execution agent may determine the number of GPU instances required for each service group through the GPU effort of step 202. Then, based on the service groups determined in step 203, the GPU instances required for each service group can be determined. Here, the GPU instance may correspond to a computational capability provided to the service with a certain specification. For example, "1 GPU with 6 cores of CPU, 12GB of memory" can be considered as an example.

Step 205, allocating the GPU instances to each service group based on the GPU resources and the number of GPU instances required for each service group.

In some embodiments, the execution agent may sequentially traverse each service group according to the current GPU resources and the GPU instance required by each service group obtained in step 204. When traversing to the current service group, the GPU instances required by the current service group may be sequentially allocated to each service in the current service group. Here, the GPU resources may be resources represented by different types of available GPUs installed in different regions.

In some optional implementations of some embodiments, the step of allocating the GPU instance to each service group may be as follows:

first, for each service group in the at least one service group, the execution subject generates a priority for each service group according to a service priority level of a service included in the service group. For example, if the priority of the service in the first service group is higher than the priority of the service in the second service group, the priority of the first service group is higher than the priority of the second service group.

And secondly, the execution main body can further distribute the GPU instances to each service group according to the sequence of the service group priorities from high to low.

The implementation mode meets the requirement of the high-priority service on resources, thereby ensuring the service quality of the high-priority service.

In some optional implementation manners of some embodiments, when the execution main body detects that the number of GPU instances required by the target service group is greater than the number of GPU instances corresponding to the remaining GPU resources, the execution main body may convert the number of GPU instances corresponding to the remaining GPU resources into the number of sub-GPU instances required by the target service group. And finally, the execution main body distributes the sub GPU instances of the GPU instances required by the target service group to the target service group.

As an example, it is assumed that 10 groups are allocated in total according to the priority of the service, wherein each group includes 5 services. When the execution main body allocates resources to each service group, when allocating to the 10 th group, a total of 10 GPU instances are required for 5 services in the 10 th group. However, the GPU resources currently remaining can only allocate 9 GPU instances. Here, the 9 GPU instances corresponding to the currently remaining GPU resources may be split into 10 sub-GPU instances on average. Finally, these 10 sub-GPU instances are assigned to 5 services of the 10 th group. The implementation mode enables each service in the service group to be distributed to the GPU instance as much as possible, and therefore operation of the service is guaranteed.

One of the above-described various embodiments of the present disclosure has the following advantageous effects: first, by acquiring service information of each service in a service set, category information of each service can be obtained. Then, based on the service information of each service, the GPU computing power required for the service can be generated. By grouping the services in the service set, the services in the service set may be distinguished by service priority. And finally, distributing the GPU instances to each service group through the GPU instances required by the service groups and the existing GPU resources, thereby realizing the distribution of the GPU instances by taking the groups as units and providing an effective mode of resource distribution.

With continued reference to fig. 3, a flow 300 of some embodiments of a GPU instance allocation method according to the present disclosure is shown. The GPU instance distribution method comprises the following steps:

step 301, for each service in the service set that needs GPU operation, obtaining service information of the service.

Here, the specific implementation of step 301 and the technical effect brought by the implementation may refer to step 201 in those embodiments corresponding to fig. 2, and are not described herein again.

And 302, performing pressure measurement on a plurality of pieces of hardware by using the service to obtain a pressure measurement result of the service.

In some embodiments, the service information includes service meta information and service state information. The service meta information includes policy information to which the service is bound. The policy information bound by the service may be one of: single policy information, combined policy information. By single policy is meant that only one policy is bound to a service. In contrast, a combined policy means that a service can bind multiple (e.g., 3) policies. The advantage of this implementation is that the policies can be flexibly combined to universally adapt to a plurality of services of different nature.

As an example, the service status information may include, but is not limited to, at least one of: real-time state information of the service, for example, utilization information of the service to the GPU at the moment; historical state information over a period of time in the past, e.g., utilization information for the GPU at the service history time.

In some embodiments, the execution body may first perform pressure testing (pressure testing) with the service using a plurality of pieces of hardware (e.g., a plurality of GPUs of different models), respectively, to obtain a pressure testing result. The pressure measurement result may be a set of throughput rates of the service to a plurality of hardware.

Step 303, determining the GPU computation power corresponding to the upper bound policy information, the middle bound policy information, and the lower bound policy information, respectively.

In some embodiments, the execution subject may determine the GPU computation power corresponding to the intermediate policy by referring to the service state information of the service and the pressure measurement result obtained in step 302 according to the intermediate policy bound by the service. For example, the middle-bound policy of the service may be "real-time utilization rate according to the service effort to complete corresponding expansion and contraction capacity for the service". First, the execution agent can obtain the computational power utilization rate preset by the service. Next, the execution agent extracts the GPU computation power required for the service at that time from the state information of the service. Then, the executing body removes the GPU computing force value required by the service at the moment by the computing force utilization rate preset by the service, so that the GPU computing force required by the service according to the middle-range strategy is determined. And the execution main body is combined with the pressure measurement result of the service, and finally, the execution main body can obtain the GPU computing power required by the service according to the intermediate policy.

By analogy, the execution main body can determine the GPU calculation power corresponding to the upper bound policy according to the upper bound policy bound by the service, and by referring to the service state information of the service and the pressure measurement result obtained in step 302. For example, the upper bound policy of a service may be "perform corresponding scaling on the service according to the service effort history utilization". First, the execution agent may obtain a predetermined computational power utilization rate of the service. Next, the GPU computation power required for the service history is extracted from the state information of the service. And dividing the GPU computing power required by the history by the pre-set computing power utilization rate of the service, thereby determining the GPU computing power required by the service according to the upper-bound strategy. And the execution main body is combined with the pressure measurement result of the service, and finally, the execution main body can obtain the GPU computing power required by the service according to the upper-bound strategy.

Here, the execution subject may determine the GPU computation power corresponding to the lower-bound policy by referring to the service state information of the service and the pressure measurement result obtained in step 302 according to the lower-bound policy bound to the service. For example, the lower bound policy of the service may be "configure according to time to complete corresponding scaling for the service". First, the execution agent may extract the calculation power required for the service at the current time point from the service state information at predetermined time intervals (for example, 5 minutes). Then, the calculation power required by the current time point is used as the GPU calculation power required by the service according to the lower-bound strategy. And the execution main body is combined with the pressure measurement result of the service, and finally, the execution main body can obtain the GPU computing power required by the service according to the lower-bound strategy.

Here, the GPU computation power size may be measured using a universal computation power unit (NCU). The GPU computational power is a ratio, and aiming at the problem that multiple services are freely distributed on the multiple types of GPUs, the execution main body can define a general computational power unit of the services. And the execution main body determines the sample throughput rates among different GPU models and service combinations through the pressure measurement result of the service. Then, the execution subject determines the calculation weight of each service and GPU model combination according to the sample throughput rate, so that calculation matching among multiple models and multiple services is achieved, and GPU instances of the same service can use different GPU model resources in a mixed mode.

Here, the policy information may include, but is not limited to, at least one of: GPU real-time utilization rate information, GPU historical utilization rate information, timing specified instance information, waiting queue length information and waiting queue duration information. Accordingly, the policy corresponding to the policy information may include, but is not limited to, at least one of the following: completing corresponding expansion and contraction of the service according to the real-time utilization rate of the service GPU; completing corresponding expansion and contraction of the service according to the historical utilization rate of the service GPU; completing corresponding expansion and contraction capacity of the service according to the length of the service waiting queue; and completing corresponding expansion and contraction of the service according to the service waiting queue duration. The implementation provides multiple strategies to meet the requirements of services with different properties, so that resources can be better allocated.

At step 304, the GPU effort required for the service is determined.

In some embodiments, further reference may be made to fig. 4, which illustrates another exemplary flow 400 of the determining step 304 of the GPU instance allocation method according to some embodiments of the present application. As shown in fig. 4, the determining step 304 may also proceed as follows.

Step 401: the constraint middle-bound strategy corresponding computational power is not larger than the upper-bound strategy corresponding computational power.

And when the calculation force required by the middle-bound strategy is larger than that required by the upper-bound strategy, taking the calculation force required by the upper-bound strategy as the calculation force required by the middle-bound strategy.

Step 402: and limiting the calculation force corresponding to the middle-bound strategy to be not less than that corresponding to the lower-bound strategy.

And when the calculation force required by the middle-bound strategy is smaller than that required by the lower-bound strategy, taking the calculation force required by the lower-bound strategy as the calculation force required by the middle-bound strategy.

Step 403: and taking the calculation power required by the middle-bound strategy as the GPU calculation power required by the service.

Step 305, at least one service group is obtained.

Step 306, determine the number of GPU instances required for each service group.

Step 307, allocating GPU instances to each service group.

Here, the specific implementation and technical effects of steps 305-307 can refer to steps 203-205 in the embodiments corresponding to fig. 2, and are not described herein again.

In some optional implementation manners of some embodiments, the number of the types of the GPUs corresponding to the GPU instance is at least one, so that GPU instances of the same service can use resources of different GPU types in a mixed manner.

As can be seen from fig. 3, compared with the description of some embodiments corresponding to fig. 2, the process 300 of the GPU instance allocation method in some embodiments corresponding to fig. 3 further highlights that for each service in the service set, the service can flexibly bind the policy, and the policy bound by the service can be extended to multiple policies, so that a reasonable capacity expansion and reduction policy can be provided for each service, and the reasonable allocation of GPU resources is ensured.

With further reference to fig. 5, as an implementation of the above-described method for the above-described figures, the present disclosure provides some embodiments of a GPU instance allocating apparatus, which correspond to those of the method embodiments described above for fig. 2, and which may be applied in various electronic devices in particular.

As shown in fig. 5, the GPU instance allocating device 500 of some embodiments comprises: an acquisition unit 501, a first determination unit 502, a generation unit 503, a second determination unit 504, and an allocation unit 505. An obtaining unit 501 configured to obtain service information of a service requiring GPU computation of a graphics processor for each service in a service set; a first determining unit 502 configured to determine a GPU computation required for each service based on the service information of the service; a generating unit 503 configured to group the service sets based on the service priority of each service, and generate at least one service group; a second determining unit 504 configured to determine a number of GPU instances required for each service group based on the determined GPU effort; and an allocation unit 505 configured to allocate GPU instances to the respective service groups based on the GPU resources and the number of GPU instances required for each service group.

In some optional implementations of some embodiments, allocation unit 505 may be further configured to: for each service group in the at least one service group, determining a service group priority of the service group based on service priorities of services included in the service group; and allocating GPU instances to the service groups according to the sequence of the service group priorities from high to low.

In some optional implementations of some embodiments, the service information includes service meta information and service state information, the service meta information includes policy information bound by the service, and the policy information bound by the service is one of: single policy information, combined policy information.

In some optional implementations of some embodiments, the combination policy information includes: upper bound policy information, middle bound policy information, and lower bound policy information; and the first determining unit may be further configured to: using the service to carry out pressure measurement on a plurality of hardware to obtain the pressure measurement result of the service; determining the GPU calculation power respectively corresponding to the upper-bound strategy information, the middle-bound strategy information and the lower-bound strategy information based on the pressure measurement result, the upper-bound strategy information, the middle-bound strategy information and the lower-bound strategy information; and determining the GPU computing power required by the service based on the GPU computing power corresponding to the upper-bound strategy, the GPU computing power corresponding to the middle-bound strategy and the GPU computing power corresponding to the lower-bound strategy.

In some optional implementations of some embodiments, allocation unit 505 may be further configured to: in response to the fact that the number of GPU instances required by the target service group is larger than the number of GPU instances corresponding to the residual GPU resources, converting the number of GPU instances corresponding to the residual GPU resources into sub-GPU instances of the number of GPU instances required by the target service group; and distributing the sub GPU instances of the GPU instances required by the target service group to the target service group.

In some optional implementation manners of some embodiments, the number of the types of the GPUs corresponding to the GPU instance is at least one.

It will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.

Referring now to FIG. 6, a block diagram of an electronic device (e.g., the electronic device of FIG. 1) 600 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the apparatus; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: for each service in a service set needing GPU operation, acquiring service information of the service; determining GPU computing power required by each service based on the service information of the service; grouping the service set based on the service priority of each service to generate at least one service group; determining the number of GPU instances required for each service group based on the determined GPU computing power; allocating GPU instances to each service group based on GPU resources and the number of GPU instances required by each service group

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a first determination unit, a generation unit, a second determination unit, and an allocation unit. The names of these units do not in some cases constitute a limitation on the units themselves, and for example, the obtaining unit may also be described as "obtaining service information of a service for each service in a set of services that require GPU operations of a graphics processor.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

According to one or more embodiments of the present disclosure, there is provided a GPU instance allocation method, including: for each service in a service set needing GPU operation, acquiring service information of the service; determining GPU computing power required by each service based on the service information of the service; grouping the service set based on the service priority of each service to generate at least one service group; determining the number of GPU instances required for each service group based on the determined GPU computing power; and allocating the GPU instances to each service group based on the GPU resources and the number of the GPU instances required by each service group.

According to one or more embodiments of the present disclosure, obtaining a second feature map based on the sub-image set and a pre-trained second deep learning network includes: inputting each sub-image in the sub-image set into the second deep learning network to generate a sub-image feature map, so as to obtain a sub-image feature map set; and splicing each sub-image feature map according to the corresponding sub-image in the space position of the target image to obtain the second feature map.

According to one or more embodiments of the present disclosure, the allocating GPU instances to each service group includes: for each service group in the at least one service group, determining a service group priority of the service group based on service priorities of services included in the service group; and allocating GPU instances to the service groups according to the sequence of the service group priorities from high to low.

According to one or more embodiments of the present disclosure, the service information includes service meta information and service state information, the service meta information includes policy information to which the service is bound, and the policy information to which the service is bound is one of: single policy information, combined policy information.

According to one or more embodiments of the present disclosure, the combination policy information includes: upper bound policy information, middle bound policy information, and lower bound policy information; and the determining the GPU power required for the service based on the service information of each service includes: using the service to carry out pressure measurement on a plurality of hardware to obtain the pressure measurement result of the service; determining the GPU calculation power respectively corresponding to the upper-bound strategy information, the middle-bound strategy information and the lower-bound strategy information based on the pressure measurement result, the upper-bound strategy information, the middle-bound strategy information and the lower-bound strategy information; and determining the GPU computing power required by the service based on the GPU computing power corresponding to the upper-bound strategy, the GPU computing power corresponding to the middle-bound strategy and the GPU computing power corresponding to the lower-bound strategy.

According to one or more embodiments of the present disclosure, the allocating the GPU instance to the service group includes: in response to the fact that the number of GPU instances required by the target service group is larger than the number of GPU instances corresponding to the residual GPU resources, converting the number of GPU instances corresponding to the residual GPU resources into sub-GPU instances of the number of GPU instances required by the target service group; and distributing the sub GPU instances of the GPU instances required by the target service group to the target service group.

According to one or more embodiments of the present disclosure, the number of the types of the GPUs corresponding to the GPU example is at least one.

According to one or more embodiments of the present disclosure, the above GPU instance allocating apparatus includes: an acquisition unit configured to acquire service information of a service requiring a GPU operation of a graphics processor for each service in a service set; a first determination unit configured to determine a GPU computation power required for each service based on service information of the service; a generating unit configured to group the service sets based on a service priority of each service, and generate at least one service group; a second determination unit configured to determine a number of GPU instances required for each service group based on the determined GPU computing power; and the allocation unit is configured to allocate the GPU instances to each service group based on the GPU resources and the number of the GPU instances required by each service group.

In accordance with one or more embodiments of the present disclosure, allocation unit 505 may be further configured to: for each service group in the at least one service group, determining a service group priority of the service group based on service priorities of services included in the service group; and allocating GPU instances to the service groups according to the sequence of the service group priorities from high to low.

According to one or more embodiments of the present disclosure, the combination policy information includes: upper bound policy information, middle bound policy information, and lower bound policy information; and the first determining unit may be further configured to: using the service to carry out pressure measurement on a plurality of hardware to obtain the pressure measurement result of the service; determining the GPU calculation power respectively corresponding to the upper-bound strategy information, the middle-bound strategy information and the lower-bound strategy information based on the pressure measurement result, the upper-bound strategy information, the middle-bound strategy information and the lower-bound strategy information; and determining the GPU computing power required by the service based on the GPU computing power corresponding to the upper-bound strategy, the GPU computing power corresponding to the middle-bound strategy and the GPU computing power corresponding to the lower-bound strategy.

In accordance with one or more embodiments of the present disclosure, allocation unit 505 may be further configured to: in response to the fact that the number of GPU instances required by the target service group is larger than the number of GPU instances corresponding to the residual GPU resources, converting the number of GPU instances corresponding to the residual GPU resources into sub-GPU instances of the number of GPU instances required by the target service group; and distributing the sub GPU instances of the GPU instances required by the target service group to the target service group.

According to one or more embodiments of the present disclosure, there is provided an electronic device including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as described in any of the embodiments above.

According to one or more embodiments of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the method as described in any of the embodiments above.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A GPU instance allocation method comprises the following steps:

for each service in a service set needing GPU operation, acquiring service information of the service;

determining GPU computing power required by each service based on the service information of the service;

grouping the service sets based on the service priority of each service to generate at least one service group;

determining the number of GPU instances required for each service group based on the determined GPU computing power;

and allocating the GPU instances to each service group based on the GPU resources and the number of the GPU instances required by each service group.

2. The method of claim 1, wherein said assigning GPU instances to respective service groups comprises:

for each service group of the at least one service group, determining a service group priority of the service group based on service priorities of services included in the service group;

and allocating GPU instances to the service groups according to the sequence of the service group priorities from high to low.

3. The method of claim 1, wherein the service information comprises service meta-information and service state information, the service meta-information comprising policy information bound by the service, the policy information bound by the service being one of: single policy information, combined policy information.

4. The method of claim 3, wherein the combined policy information comprises: upper bound policy information, middle bound policy information, and lower bound policy information; and

the determining, based on the service information for each service, a GPU computation power required by the service includes:

using the service to carry out pressure measurement on a plurality of hardware to obtain a pressure measurement result of the service;

determining GPU computing power corresponding to the upper bound strategy information, the middle bound strategy information and the lower bound strategy information respectively based on the pressure measurement result, the upper bound strategy information, the middle bound strategy information and the lower bound strategy information;

and determining the GPU computing power required by the service based on the GPU computing power corresponding to the upper-bound strategy, the GPU computing power corresponding to the middle-bound strategy and the GPU computing power corresponding to the lower-bound strategy.

5. The method of claim 1, wherein the assigning the GPU instance to the service group comprises:

in response to the fact that the number of GPU instances required by the target service group is larger than the number of GPU instances corresponding to the residual GPU resources, converting the number of GPU instances corresponding to the residual GPU resources into sub-GPU instances of the number of GPU instances required by the target service group;

and distributing the sub GPU instances of the GPU instances required by the target service group to the target service group.

6. The method of claim 1, wherein the number of types of GPUs corresponding to the GPU instance is at least one.

7. A GPU instance allocation apparatus, comprising:

an acquisition unit configured to acquire service information of a service for each service in a service set requiring a GPU operation;

a first determination unit configured to determine a GPU computation power required for each service based on service information of the service;

a generating unit configured to group the service sets based on a service priority of each service, generating at least one service group;

a second determination unit configured to determine a number of GPU instances required for each service group based on the determined GPU computing power;

and the allocation unit is configured to allocate the GPU instances to each service group based on the GPU resources and the number of the GPU instances required by each service group.

8. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-6.

9. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.