CN116893899A

CN116893899A - Resource allocation method, device, computer equipment and storage medium

Info

Publication number: CN116893899A
Application number: CN202310836274.3A
Authority: CN
Inventors: 高翔
Original assignee: China Telecom Technology Innovation Center; China Telecom Corp Ltd
Current assignee: China Telecom Technology Innovation Center; China Telecom Corp Ltd
Priority date: 2023-07-07
Filing date: 2023-07-07
Publication date: 2023-10-17

Abstract

The application relates to a resource allocation method, a resource allocation device, computer equipment and a storage medium, and relates to the technical field of task scheduling. The method comprises the following steps: responding to a GPU computing request triggered by a target task, and transferring video memory data of the target task from a target storage space to a video memory pool; the video memory pool comprises video memories corresponding to the GPU; further, after the target task is executed, the video memory data of the target task is migrated from the video memory pool to the target memory space, so that GPU video memory resources occupied by the target task are released. By the method, the video memory resources of the GPU can be actively released after the task calculation is completed by maintaining a video memory pool allowing the video memory resources of the task to migrate in and migrate out, so that the resource utilization rate of the GPU is improved.

Description

Resource allocation method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of task scheduling technologies, and in particular, to a resource allocation method, a device, a computer device, and a storage medium.

Background

With the continuous development of application and calculation scale in AI (Artificial Intelligence ) scenes, it has been difficult for a conventional CPU (Central Processing Unit ) to meet basic operation requirements of AI training, whereas GPU (Graphics Processing Unit, graphics processor) has been widely used due to its excellent operation capability in images and graphics.

For periodically triggered tasks such as online training and application development, the real time for using the GPU may be short, however, it may take a long time to wait before triggering again. Because the video memory of the GPU is occupied by these periodic tasks, even if the GPU computing resources are idle, no other tasks will be scheduled to run, which will cause the GPU to be in an idle state for a long time, greatly wasting GPU resources.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a resource allocation method, apparatus, computer device, and storage medium capable of improving the GPU resource utilization.

In a first aspect, the present application provides a resource allocation method, including:

responding to a GPU computing request triggered by a target task, and transferring video memory data of the target task from a target storage space to a video memory pool; the video memory pool comprises video memories corresponding to the GPU;

after the target task is executed, the video memory data of the target task is migrated from the video memory pool to the target memory space, so that GPU video memory resources occupied by the target task are released.

In one embodiment, in response to a GPU computing request triggered by a target task, migrating video memory data of the target task from a target memory space to a video memory pool includes:

responding to a GPU computing request triggered by a target task, and providing a global resource lock for the target task;

under the condition that the global resource lock is obtained by the target task, the video memory data of the target task are migrated from the target storage space to the video memory pool, and the global resource lock is adopted to lock GPU video memory resources allocated to the target task.

In one embodiment, providing a global resource lock to a target task includes:

providing a global resource lock for the target task under the condition that the amount of GPU computing resources required by the target task computation is not greater than the amount of idle computing resources of the GPU; the amount of idle computing resources is the amount of resources corresponding to the GPU computing resources which are not locked.

In one embodiment, the method further comprises:

and prohibiting the global resource lock from being provided for the target task under the condition that the amount of GPU computing resources required for computing the target task is larger than the amount of idle computing resources.

In one embodiment, providing a global resource lock to a target task includes:

receiving a resource lock acquisition request sent by a target task through a resource lock request interface;

and providing the global resource lock for the target task according to the resource lock acquisition request.

In one embodiment, migrating video memory data of a target task from a video memory pool to a target storage space includes:

and under the condition that the global resource lock is released by the target task, the video memory data of the target task is migrated from the video memory pool to the target storage space.

In one embodiment, the method further comprises:

according to a preset request acquisition mode, acquiring a GPU computing request triggered by a target task from a request queue;

the request queue comprises GPU computing requests triggered by different tasks.

In one embodiment, the method further comprises:

and under the condition that the target task does not acquire the global resource lock, the GPU computing request triggered by the target task is restored to the request queue.

In one embodiment, according to a preset request acquisition mode, acquiring a GPU computing request triggered by a target task from a request queue includes:

according to the time sequence information of the GPU computing requests corresponding to the tasks, obtaining GPU computing requests triggered by target tasks from a request queue; the time sequence information is used for indicating the time when the GPU computing request corresponding to the task is added to the request queue; or (b)

And acquiring the GPU computing request triggered by the target task from the request queue according to the priority information of the GPU computing request corresponding to each task.

In a second aspect, the present application also provides a resource allocation apparatus, including:

the first migration module is used for responding to a GPU calculation request triggered by the target task and migrating the video memory data of the target task from the target storage space to the video memory pool; the video memory pool comprises video memories corresponding to the GPU;

and the second migration module is used for migrating the video memory data of the target task from the video memory pool to the target memory space after the target task is executed, so as to release GPU video memory resources occupied by the target task.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory and a processor, the memory stores a computer program, and the processor executes the computer program to realize the following steps:

after the target task is calculated, the video memory data of the target task is migrated from the video memory pool to the target memory space, so that GPU video memory resources occupied by the target task are released.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

In a fifth aspect, the application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:

The resource allocation method, the resource allocation device, the computer equipment and the storage medium are used for responding to the GPU calculation request triggered by the target task and transferring the video memory data of the target task from the target storage space to the video memory pool; the video memory pool comprises video memories corresponding to the GPU; further, after the target task is executed, the video memory data of the target task is migrated from the video memory pool to the target memory space, so that GPU video memory resources occupied by the target task are released. Compared with the prior art that the memory resources of the GPU are still occupied after the task calculation is completed, the memory pool allowing the memory resources of the task to migrate in and migrate out is maintained, so that the memory resources of the GPU are released after the task calculation is completed, and the resource utilization rate of the GPU is improved.

Drawings

FIG. 1 is an application scenario diagram of a resource allocation method in one embodiment;

FIG. 2 is a flow chart of a method of resource allocation in one embodiment;

FIG. 3 is a flow chart of migrating memory data into a memory pool according to one embodiment;

FIG. 4 is a flow diagram of a request to acquire a global resource lock in one embodiment;

FIG. 5 is a flow chart of migrating the memory data out of the memory pool according to one embodiment;

FIG. 6 is a flow diagram of a request for selecting a GPU computation in one embodiment;

FIG. 7 is a flow chart of a method for allocating resources according to another embodiment;

FIG. 8 is a block diagram of a resource allocation device in one embodiment;

FIG. 9 is a block diagram showing the construction of a resource allocation apparatus according to still another embodiment;

FIG. 10 is a block diagram showing the construction of a resource allocation apparatus according to still another embodiment;

FIG. 11 is an internal block diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The resource allocation method provided by the embodiment of the application can be applied to an application scene shown in fig. 1. The system process responds to a GPU computing request triggered by a task, accesses a global resource lock interface designed and packaged by a programming language such as C, C ++ or Python and the like to apply for acquiring a global resource lock. The GPU physical video memory is used for storing rendering data which is processed by the video card chip or is to be extracted, namely graphic information, and a video memory pool of the GPU is maintained in a physical address space of the GPU video memory, and the video memory pool supports the input and output of the video memory data. The memory is used for temporarily storing operation data in the CPU and data exchanged with an external memory such as a hard disk and the like, and serves as a bridge for communicating with the CPU. Magnetic disks refer to memory devices that store data using magnetic recording technology.

Specifically, the system process responds to a GPU computing request triggered by a target task, accesses a preset global resource lock interface to acquire a global resource lock and locks resources; further, the video memory data of the target task is migrated to a video memory pool from a disk or a memory space; further, after the target task is calculated, a preset global resource lock interface is accessed again to release the global resource lock, and video memory data of the target task is migrated from the video memory pool to an original storage space, namely a disk or a memory space, so as to release GPU video memory resources occupied by the target task. Thus, after the target task is calculated, the system process actively releases the global resource lock to indicate that the display data of the target task is migrated out of the display memory pool corresponding to the GPU, and the overall utilization rate of the GPU resource is effectively improved.

In one embodiment, as shown in fig. 2, there is provided a resource allocation method, including the steps of:

s201, in response to a GPU computing request triggered by a target task, migrating video memory data of the target task from a target storage space to a video memory pool.

The target task is a currently running task, and may specifically be a periodic task, such as an online training task for an AI algorithm, a development task of an application program, and the like.

When the target task needs to utilize the GPU to calculate, a corresponding GPU calculation request is triggered. Correspondingly, the system process responds to the GPU calculation request, and the video memory data of the target task are migrated from the target storage space to the video memory pool. The video memory pool comprises video memories corresponding to the GPUs, and the target storage space can be a disk or a memory space.

Under the condition that the display data of the target task is migrated to the display memory pool corresponding to the GPU, the target task can be calculated by utilizing GPU computing resources, such as a core of the GPU.

S202, after the target task is executed, the video memory data of the target task is migrated from the video memory pool to the target memory space, so that GPU video memory resources occupied by the target task are released.

It will be appreciated that during the running of the periodic task, the time to actually utilize the GPU computing resources may be short, but the memory space of the GPU is still occupied by the memory data of the periodic task at other times. Therefore, after the target task is calculated, the display data of the target task needs to be actively migrated out of the display memory pool corresponding to the GPU, so that the display memory resources occupied by the target task are released, the display memory data of other tasks can be migrated to the display memory pool corresponding to the GPU, and the calculation is continuously performed by utilizing the GPU calculation resources.

Compared with the prior art, the scheme has the advantages that the video memory data of the target task are migrated from the target storage space to the video memory pool in response to the GPU computing request triggered by the target task; the video memory pool comprises video memories corresponding to the GPU; further, after the target task is calculated, the video memory data of the target task is migrated from the video memory pool to the target memory space, so that GPU video memory resources occupied by the target task are released. By maintaining a video memory pool allowing the migration in and out of the video memory resources of the task, the GPU video memory resources are released after the task calculation is completed, and therefore the resource utilization rate of the GPU is improved.

Because the memory pool corresponding to the GPU cannot sense when the GPU computing resource needs to be utilized for computing, in an embodiment, a global resource lock corresponding to the GPU may be designed, and the step of triggering the migration of the memory data is indicated by the global resource lock, so as to refine S201. As shown in fig. 3, the method specifically includes the following steps:

s301, responding to a GPU computing request triggered by a target task, and providing a global resource lock for the target task.

In a multitasking operating system, multiple tasks running simultaneously may all need to use the same resources, such as GPU memory resources and GPU computing resources. Global resource locks, including but not limited to mutex locks, conditional locks, and spin locks, may be used to control access to GPU computing resources by threads corresponding to tasks.

Mutex lock means that if there is already one thread locking process, then the waiting thread will be awakened when the state of the lock changes; conditional lock refers to a process of locking if a thread unlocks, then the thread will wait to be woken up again; spin lock refers to a thread that retries the locking process continuously if the thread does not acquire a lock. It will be appreciated that the mutex, conditional and spin locks described above are only optional global resource lock types in this embodiment and are not limiting of global resource locks.

In an alternative embodiment, a resource lock acquisition request sent by a target task through a resource lock request interface is received; and providing the global resource lock for the target task according to the resource lock acquisition request.

And the system process responds to the GPU computing request corresponding to the target task, accesses a preset global resource lock interface, and sends a resource lock acquisition request to acquire the global resource lock. Correspondingly, according to the resource lock acquisition request, a global resource lock is provided for the target task. The global resource lock interface may be designed and packaged in a programming language such as C, C ++ or Python.

S302, under the condition that the global resource lock is obtained by the target task, the video memory data of the target task are migrated from the target storage space to the video memory pool, and the global resource lock is adopted to lock GPU video memory resources allocated to the target task.

And triggering the step of migrating the video memory data of the target task only when the global resource lock is successfully acquired. Specifically, the video memory data of the target task is migrated from the target storage space, such as a disk or a memory space, to the video memory pool corresponding to the GPU.

Furthermore, GPU computing resources are allocated to the target task, and global resource locks are adopted to lock the allocated GPU computing resources. For ease of description, the amount of GPU computing resources is quantified by a percentage value as follows. For example, 20% of GPU computing resources are allocated to the target task, and locking processing is performed on the allocated 20% of GPU computing resources, so that other parallel tasks in the system cannot use the 20% of GPU computing resources, and task time sharing exclusive GPU computing resources are realized.

The method is characterized in that the global resource lock is adopted to lock the GPU video memory resource, and the locking processing can be realized by locking the thread corresponding to the target task. Thus, the shared resource, namely the GPU video memory resource and the GPU computing resource, can be accessed by only one thread, so that the problem of data disorder caused by simultaneous access of multiple threads can be avoided.

In an alternative embodiment, in the case that the target task does not acquire the global resource lock, the GPU computing request triggered by the target task is restored to the request queue.

In other words, in the event that the acquisition is unsuccessful, the GPU computation request triggered by the target task is returned to the request queue. Specifically, under the condition that the global resource lock is not successfully acquired, the step of migrating the stored data of the target task is not triggered, but the GPU computing request triggered by the target task is returned to the request queue, and the global resource lock is waited for being acquired again. The request queue includes a plurality of GPU computing requests corresponding to tasks that require computation using GPU computing resources.

It can be understood that a packaged global resource lock interface is preset, so that a thread corresponding to a target task can directly call the global resource lock interface to trigger the swap-in operation of the video memory in the video memory pool. If the global resource lock is successfully obtained, further triggering the swap-in operation of the video memory in the video memory pool; if the acquisition of the global resource lock is unsuccessful, the global resource lock is continuously requested to be acquired until the acquisition is successful. Therefore, the step of triggering the migration of the video memory data is indicated by the global resource lock, the reliability of resource allocation is effectively improved, and the resource utilization rate of the GPU is further ensured.

Because the threads corresponding to each task in the system are parallel, there may be multiple threads corresponding to each task that compute using GPU computing resources at the same time, in an embodiment based on the above embodiment, the global resource lock may be requested to be acquired according to the amount of GPU computing resources required for the target task computation, and the above S301 may be refined. As shown in fig. 4, the method specifically includes the following steps:

s401, responding to a GPU computing request triggered by a target task, and determining the amount of GPU computing resources required by target task computing.

The amount of GPU computing resources required for the target task computation may be carried in the GPU computing request, the size of the amount of GPU computing resources depending on the difficulty of the target task. For example, for an online training task of the AI algorithm, the amount of GPU computing resources required for computation depends on the size of the model, the number of training samples, the desired training duration, and so forth.

S402, according to the amount of GPU computing resources required by target task computing, accessing a preset global resource lock interface to request to acquire a global resource lock.

Requesting to acquire a global resource lock according to the GPU computing resource quantity required by target task computing, and if the GPU computing resource quantity required by target task computing meets the global resource lock application condition, receiving the display data migration of the target task to a display memory pool corresponding to the GPU, and further distributing GPU computing resources to the target task; if the amount of GPU computing resources required by target task computing does not meet the global resource lock application condition, migrating the display data of the target task to a display memory pool corresponding to the GPU is refused to avoid the problem that the display memory is occupied ineffectively and GPU computing resources cannot be allocated to other tasks due to the fact that the capability of the GPU for computing the target task does not exist and the display data of the target task is migrated to the display memory pool.

In an alternative embodiment, a global resource lock may be provided to the target task in the event that the amount of GPU computing resources required for the target task computation is not greater than the amount of idle computing resources of the GPU; and prohibiting the global resource lock from being provided for the target task under the condition that the amount of GPU computing resources required for computing the target task is larger than the amount of idle computing resources.

In other words, if the amount of GPU computing resources required for the target task computation is not greater than the amount of idle computing resources of the GPU, the global resource lock is successfully acquired; if the amount of GPU computing resources required by target task computing is greater than the amount of idle computing resources of the GPU, the global resource lock acquisition is unsuccessful.

The amount of idle computing resources of the GPU is the amount of resources corresponding to the GPU computing resources which are not locked. For example, the amount of GPU computing resources required for computing the target task is 20% of the total amount of GPU computing resources, and if the amount of idle computing resources of the GPU is 30% of the total amount of GPU computing resources, it is determined that the target task can be computed, and the global resource lock is successfully obtained; if the amount of idle computing resources of the GPU is 10% of the total amount of computing resources of the GPU, judging that the residual resources cannot complete the computing target task, and obtaining the global resource lock is unsuccessful.

It will be appreciated that the threads corresponding to each task in the system are parallel, and that there may be multiple threads corresponding to each task simultaneously that utilize GPU computing resources to perform the computation. For example, task a uses 20% of GPU computing resources to perform computation, task B uses 35% of GPU computing resources to perform computation, task C uses 40% of GPU computing resources to perform computation, the computation process of these three tasks may be performed simultaneously, and for each task, after the task computation is completed, the occupied GPU video memory resources are released in time, so that other tasks waiting for computation may migrate the video memory data to the video memory pool, and further use the allocated GPU computing resources to perform computation.

According to the method and the device, the global resource lock is requested to be acquired through the GPU computing resource quantity required by target task computing, multi-task parallel computing can be achieved, GPU video memory resources and GPU computing resources really used by each task are not in conflict, the rationality of resource allocation is guaranteed, and the resource utilization rate of the GPU is improved.

Because the memory pool corresponding to the GPU cannot sense when the task is completed, in an embodiment, the global resource lock interface may be accessed again to trigger the step of migrating the memory data based on the above embodiment, so as to refine the above S202. As shown in fig. 5, the method specifically includes the following steps:

s501, accessing a preset global resource lock interface to release a global resource lock.

The step of releasing the global resource lock is triggered only if the target task calculation is complete. Specifically, after the target task is calculated, the thread corresponding to the target task accesses the preset global resource lock interface again to release the global resource lock. The global resource lock interface may be designed and packaged in a programming language such as C, C ++ or Python.

S502, under the condition that the global resource lock is released by the target task, the video memory data of the target task is migrated from the video memory pool to the target storage space.

The target storage space is an original storage space, such as a disk or a memory space. For example, in response to a GPU computing request triggered by a target task, the stored data of the target task is migrated from disk space to the memory pool, and then, in the event that the global resource lock is released, the stored data of the target task is migrated to the original disk space. And when the GPU computing request triggered by the target task is responded, the display data of the target task is migrated from the memory space to the display memory pool, and then the display data of the target task is migrated to the original memory space under the condition of releasing the global resource lock.

It can be understood that a packaged global resource lock interface is preset, so that a thread corresponding to a target task can directly call the global resource lock interface to trigger the swap-out operation of the video memory in the video memory pool. If the target task is calculated, further triggering the swap-out operation of the video memory in the video memory pool. Therefore, the step of triggering the migration of the video memory data is indicated by the global resource lock, the reliability of resource allocation is effectively improved, and the resource utilization rate of the GPU is further ensured.

In order to orderly allocate GPU computing resources to each task to be computed in the system, in one embodiment, the GPU computing request triggered by the target task may be determined from the request queue, and the above resource allocation method may be refined. As shown in fig. 6, the method specifically includes the following steps:

s601, according to a preset request acquisition mode, acquiring a GPU computing request triggered by a target task from a request queue.

In response to at least one task-triggered GPU computing request, adding the at least one task-triggered GPU computing request to a request queue. The request queue comprises GPU computing requests triggered by different tasks.

Because the threads corresponding to the tasks in the system are parallel, there may be multiple task-triggered GPU computing requests simultaneously, and for each task-triggered GPU computing request, it is added to the request queue. In other words, the request queue includes at least one task-triggered GPU computing request.

Specifically, the request queue may be implemented by a FIFO (First In First Out, first-in first-out) memory, which is a buffer link of the system, and may buffer a continuous data stream, so as to prevent data from being lost during an incoming operation and a storage operation, and may centralize data into the incoming operation and the storage operation, so as to avoid frequent bus operations, reduce the burden of the CPU and/or the GPU, and may also allow the system to perform a DMA (Direct Memory Access ) operation, and increase the transmission speed of the data.

S602, determining the amount of GPU computing resources required by target task computing.

Selecting a GPU computing request triggered by a task from a request queue, taking the task as a target task, taking the GPU computing request triggered by the task as a GPU computing request triggered by the target task, and further determining the amount of GPU computing resources required by computation.

In an alternative embodiment, the GPU computing request triggered by the target task may be obtained from the request queue according to the timing information of the GPU computing request corresponding to each task; or according to the priority information of the GPU calculation requests corresponding to the tasks, acquiring the GPU calculation requests triggered by the target tasks from the request queue.

For example, the request queue includes a GPU computing request triggered by task a, a GPU computing request triggered by task B, and a GPU computing request triggered by task C.

The time sequence information is used for indicating the time when the GPU computing request corresponding to the task is added to the request queue. If the time of adding the GPU computing request corresponding to the task A to the request queue is earlier than the time of adding the GPU computing request corresponding to the task B to the request queue, and the time of adding the GPU computing request corresponding to the task B to the request queue is earlier than the time of adding the GPU computing request corresponding to the task C to the request queue, selecting the GPU computing request triggered by the task A from the request queue, and taking the GPU computing request as the GPU computing request triggered by the target task.

Further, in the case that the request queue includes the GPU calculation request triggered by the task B and the GPU calculation request triggered by the task C, the GPU calculation request triggered by the task B is continuously selected and used as a new GPU calculation request triggered by the target task.

If the priority information of the GPU computing request corresponding to the task C is better than the priority information of the GPU computing request corresponding to the task B, and the priority information of the GPU computing request corresponding to the task B is better than the priority information of the GPU computing request corresponding to the task A, selecting the GPU computing request triggered by the task C from a request queue, and taking the GPU computing request as the GPU computing request triggered by the target task.

Further, in the case that the request queue includes the GPU calculation request triggered by the task a and the GPU calculation request triggered by the task B, the GPU calculation request triggered by the task B is continuously selected and used as a new GPU calculation request triggered by the target task.

According to the method and the device, the request queues corresponding to the GPU computing requests are set, the GPU computing requests triggered by the target tasks are sequentially determined, GPU computing resources are orderly distributed to the tasks to be computed in the system, the reliability of resource distribution is effectively improved, and the resource utilization rate of the GPU is further ensured.

In one embodiment, as shown in FIG. 7, an alternative example of a resource allocation method is provided. The specific process is as follows:

s701, responding to at least one GPU computing request triggered by a task, and adding the at least one GPU computing request triggered by the task to a request queue.

S702, selecting a GPU computing request triggered by a target task from a request queue, and determining the amount of GPU computing resources required by target task computing.

Optionally, according to the time sequence information of the GPU computing request corresponding to each task, acquiring the GPU computing request triggered by the target task from a request queue; or according to the priority information of the GPU calculation requests corresponding to the tasks, acquiring the GPU calculation requests triggered by the target tasks from the request queue.

The time sequence information is used for indicating the time when the GPU corresponding to the task calculates the request to be added to the request queue.

S703, accessing a preset global resource lock interface according to the amount of GPU computing resources required by target task computing to request to acquire a global resource lock.

S704, judging whether the amount of GPU computing resources required by target task computing is larger than the amount of idle computing resources of the GPU.

The amount of idle computing resources of the GPU is the amount of resources corresponding to the GPU computing resources which are not locked.

If not, executing the step of S705, and continuing to execute the step of S707; if yes, the step of S706 is performed, and the step of S702 is performed back.

And S705, migrating the video memory data of the target task from the target storage space to the video memory pool, and locking the GPU video memory resources allocated to the target task by adopting a global resource lock.

The video memory pool comprises video memories corresponding to the GPU.

S706, returning the GPU computing request triggered by the target task to the request queue.

S707, after the target task is calculated, accessing a preset global resource lock interface to release the global resource lock.

S708, under the condition of releasing the global resource lock, the video memory data of the target task is migrated from the video memory pool to the target memory space, so that GPU video memory resources occupied by the target task are released.

The specific processes of S701 to S708 may be referred to the description of the above method embodiments, and the implementation principle and technical effects are similar, and are not repeated herein.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a resource allocation device for realizing the above-mentioned resource allocation method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of one or more resource allocation apparatus provided below may refer to the limitation of the resource allocation method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 8, there is provided a resource allocation apparatus 1 including: a first migration module 10 and a second migration module 20, wherein:

and the first migration module is used for responding to the GPU calculation request triggered by the target task and migrating the video memory data of the target task from the target storage space to the video memory pool.

The video memory pool comprises video memories corresponding to the GPU.

In one embodiment, on the basis of fig. 8, as shown in fig. 9, the first migration module 10 may include:

a request unit 11, configured to provide a global resource lock to a target task in response to a GPU computing request triggered by the target task.

The first migration unit 12 is configured to migrate, when the target task acquires the global resource lock, video memory data of the target task from the target storage space to the video memory pool, and perform locking processing on GPU video memory resources allocated to the target task by using the global resource lock; and under the condition that the acquisition is unsuccessful, the GPU computing request triggered by the target task is restored to the request queue.

In one embodiment, the request unit 11 may include:

and the determining subunit is used for determining the amount of GPU computing resources required by target task computing in response to the GPU computing request triggered by the target task.

And the request subunit is used for accessing a preset global resource lock interface according to the GPU computing resource quantity required by target task computing so as to request to acquire the global resource lock.

Specifically, if the amount of GPU computing resources required by target task computing is not greater than the amount of idle computing resources of the GPU, obtaining the GPU computing resources successfully; if the amount of GPU computing resources required by target task computing is larger than the amount of idle computing resources of the GPU, the acquisition is unsuccessful. The amount of idle computing resources of the GPU is the amount of resources corresponding to the GPU computing resources which are not locked.

In one embodiment, on the basis of fig. 8, as shown in fig. 10, the second migration module 20 may include:

and a releasing unit 21, configured to access a preset global resource lock interface to release the global resource lock.

And the second migration unit 22 is configured to migrate the video memory data of the target task from the video memory pool to the target storage space under the condition that the global resource lock is released.

In one embodiment, the resource allocation apparatus 1 may further include:

the acquisition module is used for acquiring the GPU computing request triggered by the target task from the request queue according to a preset request acquisition mode.

In one embodiment, the acquiring module is specifically configured to:

according to the time sequence information of the GPU computing requests corresponding to the tasks, obtaining GPU computing requests triggered by target tasks from a request queue; or according to the priority information of the GPU calculation requests corresponding to the tasks, acquiring the GPU calculation requests triggered by the target tasks from the request queue.

The respective modules in the above-described resource allocation apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data such as task names. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a resource allocation method.

It will be appreciated by those skilled in the art that the structure shown in FIG. 11 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory having a computer program stored therein and a processor that when executing the computer program performs the steps of a method of resource allocation.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, implements the steps of a resource allocation method.

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, implements the steps of a resource allocation method.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method for resource allocation, comprising:

2. The method of claim 1, wherein the migrating the video memory data of the target task from the target memory space to the video memory pool in response to the GPU computing request triggered by the target task comprises:

providing a global resource lock to the target task in response to a GPU computing request triggered by the target task;

and under the condition that the global resource lock is obtained by the target task, migrating the video memory data of the target task from the target storage space to the video memory pool, and locking GPU video memory resources allocated to the target task by adopting the global resource lock.

3. The method of claim 2, wherein said providing a global resource lock to said target task comprises:

and providing the global resource lock for the target task under the condition that the amount of GPU computing resources required by the target task computation is not greater than the amount of idle computing resources of the GPU, wherein the amount of idle computing resources is the amount of resources corresponding to the GPU computing resources which are not locked.

4. A method according to claim 3, characterized in that the method further comprises:

5. The method of claim 2, wherein said providing a global resource lock to said target task comprises:

receiving a resource lock acquisition request sent by the target task through a resource lock request interface;

6. The method of claim 2, wherein the migrating the video memory data of the target task from the video memory pool to the target memory space comprises:

and under the condition that the global resource lock is released by the target task, migrating the video memory data of the target task from the video memory pool to the target storage space.

7. The method of claim 2, wherein the method further comprises, prior to migrating the video memory data of the target task from the target memory space to the video memory pool in response to a GPU computation request triggered by the target task:

according to a preset request acquisition mode, acquiring a GPU computing request triggered by the target task from a request queue;

8. The method of claim 7, wherein the method further comprises:

and under the condition that the global resource lock is not acquired by the target task, the GPU computing request triggered by the target task is restored to the request queue.

9. The method according to claim 7, wherein the obtaining, according to a preset request obtaining manner, the GPU computing request triggered by the target task from the request queue includes:

according to the time sequence information of the GPU computing requests corresponding to the tasks, obtaining GPU computing requests triggered by the target tasks from the request queue; the time sequence information is used for indicating the time when the GPU computing request corresponding to the task is added to the request queue; or (b)

10. A resource allocation apparatus, comprising:

the first migration module is used for responding to a GPU computing request triggered by a target task and migrating video memory data of the target task from a target storage space to a video memory pool; the video memory pool comprises video memories corresponding to the GPUs;

and the second migration module is used for migrating the video memory data of the target task from the video memory pool to the target storage space after the target task is executed, so as to release GPU video memory resources occupied by the target task.

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 9.