CN113268356B - LINUX system-based multi-GPU board card bounding system, method and medium - Google Patents

LINUX system-based multi-GPU board card bounding system, method and medium Download PDF

Info

Publication number
CN113268356B
CN113268356B CN202110821406.6A CN202110821406A CN113268356B CN 113268356 B CN113268356 B CN 113268356B CN 202110821406 A CN202110821406 A CN 202110821406A CN 113268356 B CN113268356 B CN 113268356B
Authority
CN
China
Prior art keywords
gpu
master
board card
gpu board
gbound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110821406.6A
Other languages
Chinese (zh)
Other versions
CN113268356A (en
Inventor
王世凯
陈伟
张凡路
兰琦
冯立彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Xintong Semiconductor Technology Co ltd
Original Assignee
Xi'an Xintong Semiconductor Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Xintong Semiconductor Technology Co ltd filed Critical Xi'an Xintong Semiconductor Technology Co ltd
Priority to CN202110821406.6A priority Critical patent/CN113268356B/en
Publication of CN113268356A publication Critical patent/CN113268356A/en
Application granted granted Critical
Publication of CN113268356B publication Critical patent/CN113268356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Abstract

The invention discloses a system, a method and a medium for multi-GPU board bounding based on a LINUX system; the system comprises: a gbound kernel driver and a gbound _ slave user tool; the gbound kernel driver comprises a gbound module and a scheduling module; the system comprises a gbound module, a target GPU Master, a GPU card list and a Gbound module, wherein the gbound module is configured to create a virtual GPU Master based on a user request and add a GPU card to be added to the GPU card list managed by the target GPU Master; the scheduling module is configured to allocate a target GPU board card to the GPU board card managed by the target GPU Master based on the processing task of the application program based on the process granularity; and issuing a processing task to the target GPU board card through the gbound _ slave user tool according to the GPU board card list managed by the target GPU Master so as to execute the processing task.

Description

LINUX system-based multi-GPU board card bounding system, method and medium
Technical Field
The embodiment of the invention relates to the technical field of computer hardware, in particular to a system, a method and a medium for binding a plurality of GPU (graphics processing Unit) board cards based on a LINUX (Linux) system.
Background
Due to the ultra-strong computing power of a Graphics Processing Unit (GPU), the GPU plays an important role in the fields of image video Processing, physics, bioscience, chemistry, artificial intelligence, and the like, which require high-performance computing. Currently, in order to complete increasingly complex graphic image processing tasks, general computing tasks and reduce computing time, multiple GPU boards are often used for computing. Based on this, it is necessary to provide a scheme capable of managing and allocating resources to multiple GPU boards.
Disclosure of Invention
In view of this, embodiments of the present invention are intended to provide a system, a method, and a medium for multiple GPU board bounding based on LINUX system; the multi-GPU board card can be used for processing tasks in a concurrent mode, meanwhile, the multi-GPU board cards and resources can be flexibly managed, thread concurrent efficiency and computing performance are improved as far as possible, and the purpose of high throughput is achieved.
The technical scheme of the embodiment of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a system for multiple GPU board cards bounding based on a LINUX system, where the system includes: a gbound kernel driver and a gbound _ slave user tool; the gbound kernel driver comprises a gbound module and a scheduling module; the gbound module is configured to create a virtual GPU Master based on a user request and add a GPU board card to be added to a GPU board card list managed by a target GPU Master;
the scheduling module is configured to allocate a target GPU board card to the GPU board card managed by the target GPU Master based on the process granularity for the processing task of the application program; and issuing a processing task to the target GPU board card through the gbound _ slave user tool according to the GPU board card list managed by the target GPU Master so as to execute the processing task.
In a second aspect, an embodiment of the present invention provides a method for multiple GPU board card bounding based on a LINUX system, where the method is applied to the system for multiple GPU board bounding based on a LINUX system in the first aspect, and the method includes:
creating a virtual GPU Master based on a user request through a gbound module and adding a GPU board card to be added to a GPU board card list managed by a target GPU Master;
distributing a target GPU board card in a GPU board card managed by a target GPU Master for a processing task of an application program based on process granularity through a scheduling module; and issuing a processing task to the target GPU board card through the gbound _ slave user tool according to the GPU board card list managed by the target GPU Master so as to execute the processing task.
In a third aspect, an embodiment of the present invention provides a computer storage medium, where a LINUX system-based multi-GPU board bounding program is stored in the computer storage medium, and the LINUX system-based multi-GPU board bounding program implements, when executed by at least one processor, the LINUX system-based multi-GPU board bounding method steps in the second aspect.
The embodiment of the invention provides a system, a method and a medium for multi-GPU board card bounding based on a LINUX system; by adding a gbound kernel driver and a gbound _ slave user tool in the LINUX system, a user is assisted to flexibly manage the current GPU bound equipment, the thread concurrency is improved as much as possible, and the purpose of high throughput is achieved; in addition, the method is convenient for a user scene, flexibly and dynamically adjusts the use scene of the current multi-GPU board card, is not limited to a fixed bounding scene any more, and can take the GPU board card as other purposes as required; and finally, the GPU board card binding is carried out by taking the process as the granularity, so that the problem that the storage and the task instruction are not executed in the same GPU is solved, the NUMA phenomenon is avoided, and the calculation performance is improved.
Drawings
Fig. 1 is a system framework diagram of a computing device to which the technical solution of the embodiment of the present invention can be applied.
FIG. 2 is a block diagram of an example implementation of software modules and a slave system provided by an embodiment of the present invention.
Fig. 3 is a schematic flowchart of task allocation according to an embodiment of the present invention.
Fig. 4 is a schematic flow chart of a method for multiple GPU board card bounding based on the LINUX system according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
At present, when a multi-GPU board card is applied to execute specific operation or processing tasks, a conventional scheme usually adopts a mapping relation between a Ringbuffer page space and the multi-GPU board card to manage data of the multi-GPU board card, but a management scheme of the multi-GPU board card is not provided on the whole, flexible adjustment cannot be performed according to a service scene, and the application range of the multi-GPU board card is limited; for example, when the system normally works, after binding of three GPU boards is set to be completed, a user can only force to occupy the three GPU boards all the time, and if one of the GPU boards is to be removed for another function, the machine can only be restarted to reinitialize the binding configuration, and the video card binding cannot be dynamically adjusted according to the service requirement. In addition, the problem of matching between storage allocation and task instructions is not considered, which is more likely to cause that the storage space and the task instructions are not in the same GPU, that is, the GPU needs to Access the storage data across the device bus, so that the storage problem of Non-Uniform Memory Access (NUMA) occurs, and the computational performance is seriously affected. Based on this, embodiments of the present invention are expected to provide a dynamic management scheme for multiple GPU boards in LINUX system environment, where multiple GPU boards are expected to be bound as one logical GPU board for a user to use, that is, bound as multiple GPU boards, which may also be referred to as multiple GPU binding, so as to flexibly and dynamically adjust the usage scenario of the current multiple GPU boards; in addition, the problem that the storage and the task instruction are not in the same GPU is solved.
Referring to fig. 1, which shows a schematic diagram of a system framework 100 of a computing device to which the technical solution of the embodiment of the present invention can be applied, as shown in fig. 1, the framework 100 may include a host system 1 and a slave system 2, the host system 1 and the slave system 2 may be connected through a BUS 3, and the BUS supports functions such as data transmission and access between the two. The host system 1 may include a hardware module 11 such as a CPU, a host Memory, and a Direct Memory Access (DMA), and further includes a software module 12 that can be executed based on the hardware module. For the slave system 2, a plurality of GPU boards may be included, as shown in fig. 1, which may be represented as: GPU 1, GPU2, … …, GPU N. For the software module 12 in the host system 1, a plurality of applications (for example, M applications, which are denoted as application 1, application 2, … …, and application M respectively), DRM, GPU kernel module, and gbound kernel driver 121 and gbound _ slave user tool 122 in LINUX-based system environment may be included. In some examples, software module 12 may be executed by a CPU in hardware module 11 calling a program stored in host memory.
Referring to the block diagram shown in fig. 2 illustrating an example implementation of software module 12 and slave system 2 in fig. 1 in further detail. With reference to fig. 1 and fig. 2, an embodiment of the present invention provides a system for multiple GPU board cards bounding based on a LINUX system, in which a Server-Client (C/S, Client-Server) architecture is adopted to bind multiple GPU board cards to a virtual GPU host (Master), so that the GPU board cards to which the GPU board cards belong are dynamically and flexibly managed by the GPU Master. The system may include at least: a gbound kernel driver 121 and a gbound _ slave user tool 122; the gbound kernel driver 121 includes a gbound module 1211 and a scheduling module 1212; the gbound module 1211 is configured to create a virtual GPU Master based on a user request and add a GPU board to be added to a GPU board list managed by a target GPU Master;
the scheduling module 1212 is configured to allocate a target GPU board card among the GPU board cards managed by the target GPU Master for the processing task of the application program based on the process granularity; and issues a processing task to the target GPU board card for execution according to the GPU board card list managed by the target GPU Master through the gbound _ slave user tool 122.
With reference to fig. 2 and the above-mentioned multi-GPU binding system based on LINUX system, it should be noted that the gbound kernel driver 121 and the gbound _ slave user tool 122 need to cooperate, the gbound kernel driver 121 can complete the operations of binding and allocating between the GPU board card and the GPU Master and between the processing task process of the application program and the GPU board card, and the gbound _ slave user tool 122 is used to complete the specific execution process of the above operations.
In some possible implementations, the gbound module 1211 is configured to create the GPU Master by using a DRM framework of the LINUX system based on a request for creating a virtual GPU Master sent by a user, and mount the GPU Master in a global Master list in the gbound _ slave user tool 122 after completing interface initialization.
For the above implementation, specifically, when a user issues a request for creating a virtual GPU Master, the gbound module 1211 in the gbound kernel driver 121 may create the GPU Master in the gbound module 1211 by using a DRM framework of LINUX, and in order to distinguish from a GPU board card to be added subsequently, in an embodiment of the present invention, the created GPU Master may be named bcard, and may be identified as bcard 1, for example. After the creation is completed, interfaces related to DRM, storage, and GPU task processing may be initialized, and mounted to the global Master list after the initialization is completed. For example, the gbound _ slave user tool 122 may manage a global GPU Master list including multiple GPU masters, and the gbound module 1211 may manage a global GPU Master list including multiple GPU masters, where the global GPU Master list includes other virtual GPU masters that have been created, such as GPU masters identified as bcard 0, bcard 2, and the like.
In some possible implementation manners, the gbound module 1211 is configured to add, based on a GPU board add request sent by a user, the GPU board to be added to a GPU board list managed by a target GPU Master according to a current working state of the GPU board to be added specified in the GPU board add request.
For the above implementation, specifically, for each virtual GPU Master, the gbound module 1211 also maintains a list for representing the GPU boards managed by the GPU Master, and in the embodiment of the present invention, the list may be named a bcard slave list. Taking the target GPU Master identifier as bcard 1 as an example, when a user sends an add request for adding a GPU board card to the GPU Master identifier of bcard 1, the gbound module 1211 in the gbound kernel driver 121 may check whether the GPU board card to be added is in an idle state, it should be noted that, in order to distinguish from the GPU Master, the GPU board card may be registered based on a DRM framework and named as a card, for example, it may be identified as card 1; in addition, the gbound _ slave user tool 122 may use the card provided by the DRM framework described above and send binding information of the relevant card device to the gbound module 1211. For the idle state of the GPU board to be added, the embodiment of the present invention exemplarily characterizes by whether the GPU board to be added (card 1) is in a waiting state for the task to be processed, and may also exemplarily characterize by whether the GPU board to be added (card 1) has been added to other GPU masters for management, and the like. If the above exemplary characterization manners are all negative, it can be determined that the GPU board card (card 1) to be added is in an idle state, and further, the GPU board card can be added to a bcard slave list of the target GPU Master (bcard 1); otherwise the addition will not be possible. Specifically, the Master in the GPU board card (card 1) to be added may be pointed to be the target GPU Master (bcard 1), thereby completing the addition request. It can be understood that, after the addition of the GPU board card (card 1) to be added to the target GPU Master (bcard 1) is completed, the GPU board card (card 1) only processes the task request issued by the target GPU Master (bcard 1), and no longer processes the task request issued by the user; in addition, considering the performance factors of the GPU boards (i.e., card devices), the embodiment of the present invention may also adjust the size of the task buffer corresponding to each GPU board by user-defined weights.
Based on the foregoing implementation manner, after the creation process of the GPU Master and the addition process of adding the GPU board card to the GPU Master are completed, in some examples, the gbound module 1211 is further configured to delete the GPU board card to be deleted from the GPU board card list managed by the GPU Master in the working state of the GPU board card to be deleted based on a request for deleting the GPU board card sent by a user. For the above example, specifically, when a user sends a request for deleting a GPU board, the gbound module 1211 in the gbound kernel driver 121 detects whether the GPU board to be deleted specified by the request for deleting the GPU board is in an idle state according to the means set forth in the foregoing implementation manner, and with reference to the foregoing specific example, the GPU board to be deleted is identified as card 1, and the GPU Master device where the GPU board to be deleted is located is identified as bcard 1. When it is determined that card 1 is in the idle state, card 1 may be deleted from the bcard slave list of bcard 1, so that card 1 no longer belongs to any GPU Master, and then card 1 may be added to a GPU board card list managed by another GPU Master based on a GPU board card addition request of a user.
In some examples, the gbound module 1211 is further configured to, based on a request for deleting a virtual GPU Master sent by a user, sequentially delete GPU boards in a GPU board list managed by the GPU Master to be deleted, clean private resources of the GPU Master to be deleted, and delete the GPU Master to be deleted from the global Master list. For the above example, specifically, when the user sends a request to delete a virtual GPU Master, in combination with the foregoing example, the GPU Master to be deleted is identified as bcard 1, and the gbound module 1211 in the gbound kernel driver 121 may sequentially clear all GPU boards in a bcard slave list of bcard 1, then clear the private resources corresponding to bcard 1, and delete bcard 1 from the global Master list.
Through the implementation manner and the example, the process of creating, adding, and deleting the virtual GPU Master and the GPU board card by using the gbound module 1211 in the gbound kernel driver 121 according to the embodiment of the present invention is described. In addition, after the creation of the virtual GPU Master and the addition of the GPU board card are completed, the embodiment of the present invention further needs to process a storage application, a release request, a task processing request, and the like sent by a user with respect to the GPU board card; considering that storage may not be in the same GPU as a task instruction, which may cause a NUMA condition, and thus cause performance degradation, as shown in fig. 2, the gbound kernel driver 121 may implement task allocation for the basic scheduling unit based on a task thread of an application program through a scheduling module 1212 included in the gbound kernel driver. Based on this, in some examples, the scheduling module 1212 is configured to allocate, based on the performance of the GPU board card managed by the GPU Master specified by the application program, the processing task of the application program to the target GPU board card managed by the GPU Master according to the process as the basic unit;
allocating a storage space from the target GPU board card according to the corresponding relation between the process identification and the GPU board card;
and issuing the processing task to the target GPU board card and the storage space to execute according to the corresponding relation between the processing task and the process identification issued by the user and the GPU board card.
In addition, the scheduling module 1212 is further configured to notify the application program to release the storage space and close the GPU Master specified by the application program after the issued processing task is completed.
For the above example, specifically, it is described with respect to the task assignment process execution process as S1 to S6 shown in fig. 3:
s1: the application opens a designated GPU Master, such as labeled bcard 1.
S2: the scheduling module 1212 allocates a GPU board to the current process according to the performance of the GPU board managed by the current bcard 1, such as the resource, storage, and processing capability of the GPU board.
For example, if the GPU board applied for the bcard 1 is the GPU2, the correspondence between the process ID and the GPU ID can be bound, that is, the subsequent storage allocation and task issue of the process of the application are executed on the GPU 2.
S3: and the application program finds the target GPU board card according to the current process ID and allocates a storage space from the target GPU board card.
S4: and the application program issues the calculation task to the target GPU board card and the storage space, so that the target GPU board card executes the issued calculation task.
S5: and after the computing task is completed, informing the application program to release the storage space.
S6: and closing the GPU Master (i.e. bcard 1) specified by the application program.
Through the above technical solution and the implementation manner and example thereof, the embodiment of the present invention can assist a user in flexibly managing the current GPU bound device by adding the bound kernel driver 121 and the bound _ slave user tool 122 in the LINUX system, so as to improve the concurrency of threads as much as possible and achieve the purpose of high throughput; in addition, the method is convenient for a user scene, flexibly and dynamically adjusts the use scene of the current multi-GPU board card, is not limited to a fixed bounding scene any more, and can take the GPU board card as other purposes as required; and finally, the GPU board card binding is carried out by taking the process as the granularity, so that the problem that the storage and the task instruction are not executed in the same GPU is solved, the NUMA phenomenon is avoided, and the calculation performance is improved.
Based on the technical solution and the inventive concept with the same implementation and example, referring to fig. 4, it shows a method for multiple GPU board card bounding based on the LINUX system according to an embodiment of the present invention, where the method may be applied to the system for multiple GPU board card bounding based on the LINUX system set forth in the technical solution, and the method may include:
s401: creating a virtual GPU Master based on a user request through a gbound module 1211 and adding a GPU board card to be added to a GPU board card list managed by a target GPU Master;
s402: allocating a target GPU board card to the GPU board card managed by the target GPU Master based on the processing task of the application program through a scheduling module 1212; and issues a processing task to the target GPU board card for execution according to the GPU board card list managed by the target GPU Master through the gbound _ slave user tool 122.
In some possible implementations, the creating, by the gbound module 1211, a virtual GPU Master in the gbound _ slave user tool 122 based on the user request includes:
the gbound module 1211 creates the GPU Master in the gbound module 1211 by using a DRM framework of the LINUX system based on a request for creating a virtual GPU Master sent by a user, and mounts the GPU Master in a global Master list in the gbound module 1211 after interface initialization is completed.
In some possible implementation manners, the adding, by the gbound module 1211, the GPU board to be added to the GPU board list managed by the target GPU Master based on the user request includes:
based on a GPU board add request sent by a user, the gbound module 1211 adds the GPU board to be added to a GPU board list managed by a target GPU Master in the gbound module 1211 according to the current working state of the GPU board to be added specified in the GPU board add request.
Based on the foregoing implementation manner, after completing the process of creating the GPU Master and the process of adding the GPU board card to the GPU Master, in some examples, the method further includes:
based on a request for deleting a GPU board card sent by a user, the gbound module 1211 deletes the GPU board card to be deleted from the GPU board card list managed by the GPU Master through the gbound _ slave user tool 122 according to the working state of the GPU board card to be deleted.
In some examples, the method further comprises:
the gbound module 1211 deletes the GPU boards in the GPU board list managed by the GPU Master to be deleted in sequence based on a request for deleting the virtual GPU Master sent by a user, and then clears the private resources of the GPU Master to be deleted and deletes the GPU Master to be deleted from the global Master list.
Based on the foregoing implementation manner and examples, in some examples, the scheduling module 1212 allocates a target GPU board card to the GPU board card managed by the target GPU Master for the processing task of the application program based on the process granularity; and issuing a processing task to the target GPU board card through the gbound _ slave user tool 122 according to the GPU board card list managed by the target GPU Master to execute, including:
distributing the processing tasks of the application program to a target GPU board card managed by the GPU Master according to a process as a basic unit based on the performance of the GPU board card managed by the GPU Master specified by the application program through the scheduling module 1212;
allocating a storage space from the target GPU board card through the scheduling module 1212 according to the corresponding relationship between the process identifier and the GPU board card;
and issuing the processing task to a target GPU board card and a storage space for execution by the scheduling module 1212 according to the processing task issued by the user and the correspondence between the process identifier and the GPU board card.
Furthermore, the method further comprises: when the issued processing task is completed, the scheduling module 1212 notifies the application program to release the storage space; and closing the GPU Master appointed by the application program.
It can be understood that the above exemplary technical solution of the method for multiple GPU board routing based on the LINUX system belongs to the same concept as the above technical solution of the system for multiple GPU board routing based on the LINUX system, and therefore, for details that are not described in detail in the above technical solution of the method for multiple GPU board routing based on the LINUX system, reference may be made to the above description of the technical solution of the system for multiple GPU routing based on the LINUX system. The embodiments of the present invention will not be described in detail herein.
It is to be understood that, in the embodiment of the present invention, the bound kernel driver 121 and the bound _ slave user tool 122 may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Therefore, this embodiment provides a computer storage medium, where a program for multiple GPU board card bounding based on the LINUX system is stored in the computer storage medium, and when the program for multiple GPU board card bounding based on the LINUX system is executed by at least one processor, the steps of the method for multiple GPU board card bounding based on the LINUX system in the foregoing technical solution are implemented.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
It should be noted that: the technical schemes described in the embodiments of the present invention can be combined arbitrarily without conflict.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (8)

1. A system of multi-GPU board card bounding based on LINUX system is characterized in that the system comprises: a gbound kernel driver and a gbound _ slave user tool; the gbound kernel driver comprises a gbound module and a scheduling module; the gbound module is configured to create a virtual GPU Master based on a user request and add a GPU board card to be added to a GPU board card list managed by a target GPU Master;
the scheduling module is configured to allocate a target GPU board card to the GPU board card managed by the target GPU Master based on the process granularity for the processing task of the application program; issuing a processing task to the target GPU board card through the gbound _ slave user tool according to the GPU board card list managed by the target GPU Master so as to execute the processing task;
the scheduling module is configured to allocate processing tasks of the application program to a target GPU board card managed by a GPU Master according to a process as a basic unit based on the performance of the GPU board card managed by the GPU Master specified by the application program;
allocating a storage space from the target GPU board card according to the corresponding relation between the process identification and the GPU board card;
and issuing the processing task to the target GPU board card and the storage space for execution according to the corresponding relation between the process identification included in the processing task issued by the user and the GPU board card.
2. The system according to claim 1, wherein the gbound module is configured to create the GPU Master by using a DRM framework of the LINUX system based on a request for creating a virtual GPU Master from a user, and to mount the GPU Master in a global Master list after interface initialization is completed.
3. The system according to claim 1, wherein the gbound module is configured to add, based on a GPU board add request issued by a user, the GPU board to be added to the GPU board list managed by the target GPU Master according to a current operating state of the GPU board to be added specified in the GPU board add request.
4. The system according to claim 1, wherein the gbound module is further configured to delete the GPU board to be deleted from the GPU board list managed by the GPU Master according to a working state of the GPU board to be deleted, based on a request for deleting the GPU board sent by a user.
5. The system according to claim 1, wherein the gbound module is further configured to sequentially delete GPU boards in a GPU board list managed by a GPU Master to be deleted based on a request for deleting a virtual GPU Master sent by a user, clean private resources of the GPU Master to be deleted, and delete the GPU Master to be deleted from the global Master list.
6. The system according to claim 1, wherein the scheduling module is further configured to notify the application program to release the storage space and close the GPU Master specified by the application program when the issued processing task is completed.
7. A method for multi-GPU board binding based on LINUX system, the method being applied to the system for multi-GPU binding based on LINUX system of any one of claims 1 to 6, the method comprising:
creating a virtual GPU Master based on a user request through a gbound module and adding a GPU board card to be added to a GPU board card list managed by a target GPU Master;
distributing a target GPU board card in a GPU board card managed by a target GPU Master for a processing task of an application program based on process granularity through a scheduling module; issuing a processing task to a target GPU board card for execution according to a GPU board card list managed by the target GPU Master through the gbound _ slave user tool;
the method comprises the steps that a target GPU board card is distributed in a GPU board card managed by a target GPU Master for processing tasks of an application program through a scheduling module based on process granularity; and issuing a processing task to a target GPU board card through the gbound _ slave user tool according to the GPU board card list managed by the target GPU Master so as to execute the processing task, wherein the processing task comprises the following steps:
distributing the processing tasks of the application program to a target GPU board card managed by the GPU Master according to a process as a basic unit based on the performance of the GPU board card managed by the GPU Master appointed by the application program through the scheduling module;
allocating a storage space from a target GPU board card through the scheduling module according to the corresponding relation between the process identification and the GPU board card;
and issuing the processing task to a target GPU board card and a storage space for execution through the scheduling module according to the corresponding relation between the process identification included in the processing task issued by the user and the GPU board card.
8. A computer storage medium, wherein the computer storage medium stores a LINUX system-based multi-GPU board wiring program, and the LINUX system-based multi-GPU board wiring program implements the LINUX system-based multi-GPU board wiring method steps of claim 7 when executed by at least one processor.
CN202110821406.6A 2021-07-20 2021-07-20 LINUX system-based multi-GPU board card bounding system, method and medium Active CN113268356B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110821406.6A CN113268356B (en) 2021-07-20 2021-07-20 LINUX system-based multi-GPU board card bounding system, method and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110821406.6A CN113268356B (en) 2021-07-20 2021-07-20 LINUX system-based multi-GPU board card bounding system, method and medium

Publications (2)

Publication Number Publication Date
CN113268356A CN113268356A (en) 2021-08-17
CN113268356B true CN113268356B (en) 2021-10-29

Family

ID=77236932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110821406.6A Active CN113268356B (en) 2021-07-20 2021-07-20 LINUX system-based multi-GPU board card bounding system, method and medium

Country Status (1)

Country Link
CN (1) CN113268356B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116450055B (en) * 2023-06-15 2023-10-27 支付宝(杭州)信息技术有限公司 Method and system for distributing storage area between multi-processing cards

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324558A (en) * 2020-02-05 2020-06-23 苏州浪潮智能科技有限公司 Data processing method and device, distributed data stream programming framework and related components
US10761821B1 (en) * 2019-03-27 2020-09-01 Sap Se Object oriented programming model for graphics processing units (GPUS)
CN112231049A (en) * 2020-09-28 2021-01-15 苏州浪潮智能科技有限公司 Computing equipment sharing method, device, equipment and storage medium based on kubernets
CN112905331A (en) * 2019-11-19 2021-06-04 上海商汤智能科技有限公司 Task processing system, method and device, electronic device and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102075354A (en) * 2010-12-30 2011-05-25 瑞斯康达科技发展股份有限公司 Monitoring method and monitoring device of equipment in distributed system
CN106909522B (en) * 2015-12-22 2020-03-20 中国电信股份有限公司 Delay control method and device for GPU write request data and cloud computing system
CN107391432B (en) * 2017-08-11 2020-07-28 中国计量大学 Heterogeneous parallel computing device and operation node interconnection network
US11372683B2 (en) * 2019-07-12 2022-06-28 Vmware, Inc. Placement of virtual GPU requests in virtual GPU enabled systems using a requested memory requirement of the virtual GPU request
CN110618744A (en) * 2019-09-20 2019-12-27 浪潮电子信息产业股份有限公司 Novel GPU Carrier board card
CN111078356A (en) * 2019-11-22 2020-04-28 北京达佳互联信息技术有限公司 GPU cluster resource control system, method, device, equipment and storage medium
CN111679911B (en) * 2020-06-04 2024-01-16 建信金融科技有限责任公司 Management method, device, equipment and medium of GPU card in cloud environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10761821B1 (en) * 2019-03-27 2020-09-01 Sap Se Object oriented programming model for graphics processing units (GPUS)
CN112905331A (en) * 2019-11-19 2021-06-04 上海商汤智能科技有限公司 Task processing system, method and device, electronic device and storage medium
CN111324558A (en) * 2020-02-05 2020-06-23 苏州浪潮智能科技有限公司 Data processing method and device, distributed data stream programming framework and related components
CN112231049A (en) * 2020-09-28 2021-01-15 苏州浪潮智能科技有限公司 Computing equipment sharing method, device, equipment and storage medium based on kubernets

Also Published As

Publication number Publication date
CN113268356A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
US20190220418A1 (en) Memory Management Method and Apparatus
US10191759B2 (en) Apparatus and method for scheduling graphics processing unit workloads from virtual machines
JP5510556B2 (en) Method and system for managing virtual machine storage space and physical hosts
US9588787B2 (en) Runtime virtual process creation for load sharing
US10248175B2 (en) Off-line affinity-aware parallel zeroing of memory in non-uniform memory access (NUMA) servers
US20230196502A1 (en) Dynamic kernel memory space allocation
US8312201B2 (en) Managing memory allocations loans
US10055254B2 (en) Accelerated data operations in virtual environments
US20120110293A1 (en) Method and system for managing virtual machine storage space and physical host
US9292427B2 (en) Modifying memory space allocation for inactive tasks
EP2375324A2 (en) Virtualization apparatus for providing a transactional input/output interface
JP2022516486A (en) Resource management methods and equipment, electronic devices, and recording media
CN109977037B (en) DMA data transmission method and system
US20190227918A1 (en) Method for allocating memory resources, chip and non-transitory readable medium
US9697047B2 (en) Cooperation of hoarding memory allocators in a multi-process system
US8751724B2 (en) Dynamic memory reconfiguration to delay performance overhead
CN113268356B (en) LINUX system-based multi-GPU board card bounding system, method and medium
US9015418B2 (en) Self-sizing dynamic cache for virtualized environments
CN105677481A (en) Method and system for processing data and electronic equipment
CN109558210B (en) Method and system for virtual machine to apply GPU (graphics processing Unit) equipment of host
US9720597B2 (en) Systems and methods for swapping pinned memory buffers
CN116324706A (en) Split memory pool allocation
CN114662162B (en) Multi-algorithm-core high-performance SR-IOV encryption and decryption system and method for realizing dynamic VF distribution
CN113918283A (en) Data storage method, device, system and medium
CN117280325A (en) System and method for dynamically partitioning a computer at runtime

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 301, Building D, Yeda Science and Technology Park, No. 300 Changjiang Road, Yantai Area, China (Shandong) Pilot Free Trade Zone, Yantai City, Shandong Province, 265503

Patentee after: Xi'an Xintong Semiconductor Technology Co.,Ltd.

Address before: Room 21101, 11 / F, unit 2, building 1, Wangdu, No. 3, zhangbayi Road, Zhangba Street office, hi tech Zone, Xi'an City, Shaanxi Province

Patentee before: Xi'an Xintong Semiconductor Technology Co.,Ltd.

CP03 Change of name, title or address