WO2022028061A1

WO2022028061A1 - Gpu management apparatus and method based on detection adjustment module, and gpu server

Info

Publication number: WO2022028061A1
Application number: PCT/CN2021/096546
Authority: WO
Inventors: 滕学军
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2020-08-03
Filing date: 2021-05-27
Publication date: 2022-02-10
Also published as: CN112000468A; CN112000468B

Abstract

The present invention provides a GPU management apparatus based on a detection adjustment module, comprising: a CPU module, a CPU management module, a conversion module, a GPU module, a GPU management module, and the detection adjustment module. An adjustment control end of the detection adjustment module is communicatively connected to control ends of the GPU management module and the CPU management module, separately, and the detection adjustment module is used for detecting a data type to be processed and selecting the corresponding GPU module and/or CPU module for processing according to said data type; the GPU management module is communicatively connected to the GPU module and is used for realizing management of the GPU module and balanced allocation of tasks to be processed. The present invention further provides a GPU management method based on the detection adjustment module, and a GPU server, and effectively improves the utilization rate and the task processing efficiency of a CPU and a GPU.

Description

A GPU management device, method and GPU server based on detection and adjustment module

This application is required to be submitted to the State Intellectual Property Office of China on August 3, 2020, the application number is CN202010767363.3, and the name of the invention is "a GPU management device, method and GPU server based on a detection and adjustment module". Priority, the entire contents of which are incorporated herein by reference.

technical field

The present invention relates to the field of GPU management design, in particular to a GPU management device, method and GPU server based on a detection and adjustment module.

Background technique

With the rapid development of GPU (Graphics Processing Unit, graphics processor) server technology, more and more machine learning and AI (Artificial Intelligence, artificial intelligence) applications have been promoted and used; GPU servers have been used in deep learning training and other businesses. large-scale applications.

In the prior art, applications in graphics design, artificial intelligence, scientific research and other fields require the use of a large number of GPU processors, and a server often includes a GPU processor and a CPU (Central Processing Unit, central processing unit) processor. While CPU processors are better at integer operations, GPU processors are better at floating point operations.

However, the existing task processing cannot adjust the interconnection topology between the CPU and GPU according to different application scenarios, so as to achieve a reasonable configuration of floating-point operation (GPU advantage) and integer operation (CPU advantage), which is not conducive to improving the CPU and GPU utilization and task processing efficiency.

SUMMARY OF THE INVENTION

In order to solve the problems existing in the prior art, the present invention innovatively proposes a GPU management device, a method and a GPU server based on a detection and adjustment module, which effectively solves the inability to adjust the relationship between the CPU and GPU according to different application scenarios due to the prior art. In order to achieve a reasonable configuration of floating-point operations and integer operations, the utilization of CPU and GPU and the efficiency of task processing are effectively improved.

A first aspect of the present invention provides a GPU management device based on a detection and adjustment module, including: a CPU module, a CPU management module, a conversion module, a GPU module, a GPU management module, and a detection and adjustment module. The adjustment control terminal is respectively connected in communication with the control terminal of the GPU management module and the CPU management module, and is used to detect the data type to be processed, and select the corresponding GPU module and/or CPU module for processing according to the data type to be processed; the CPU The management module is communicatively connected with the CPU module, for realizing the management of the CPU module; the GPU management module is communicatively connected with the GPU module, for realizing the management of the GPU module and the balanced distribution of the tasks to be processed; the CPU module is converted into The module is communicatively connected with the GPU module.

Optionally, the GPU module includes a plurality of GPU sub-modules connected in parallel, each GPU sub-module includes several GPUs and an accelerator card, and several GPUs and the accelerator card are arranged in parallel, between multiple GPU sub-modules and between several GPUs. Both communicate through the GPU management module to jointly complete the data processing tasks issued by the GPU management module.

Further, the GPU management module includes a plurality of GPU management sub-modules, the plurality of GPU management sub-modules are connected in parallel, and each GPU management sub-module is connected in communication with a plurality of GPU sub-modules connected in parallel.

Optionally, it also includes: a power consumption monitoring module and a fan control module, the monitoring end of the power consumption monitoring module is connected to the GPU module, and is used to monitor the power consumption of the GPU module in real time, and the output end of the power consumption monitoring module is connected to the GPU module. The input end of the fan control module is connected, and once the power consumption of the monitored GPU module exceeds the set threshold, the fan control module increases the running speed of the fan.

A second aspect of the present invention provides a GPU management method based on a detection and adjustment module, which is implemented on the basis of the GPU management device based on the detection and adjustment module described in the first aspect of the present invention, including:

Divide pending tasks into integer operations and floating-point operations;

The detection adjustment module detects the type of the task. If it is a floating-point operation task, the GPU management module is given priority to call the GPU module to implement data operation processing; if it is an integer operation task, the CPU management module is given priority to call the CPU module to achieve data operation. Processing; if the types of tasks to be processed include integer operation part tasks and floating point operation part tasks, the floating point operation part tasks are given priority to call the GPU module through the GPU management module to realize data operation processing, and the integer operation part tasks are given priority to be managed by the CPU The module calls the CPU module to realize the operation processing of the data.

Optionally, when the GPU management module receives the task assigned by the detection and adjustment module, it acquires the task with the highest priority in the task queue, and schedules the GPU cluster resources in the GPU module according to the priority of the task to be processed.

Further, scheduling the GPU cluster resources in the GPU module according to the priority of the tasks to be processed specifically includes:

The GPU management module traverses the GPU cluster resources, and if the idle computing power of the current GPU cluster meets the minimum computing power requirement of the user corresponding to the task to be processed, the task to be processed is allocated to the one that meets the minimum computing power requirement and requires the least number of GPUs. In a GPU cluster; if the idle computing power of the current GPU cluster cannot meet the minimum computing power requirements of the user corresponding to the task to be processed, then traverse the currently executing task from small to large according to the task priority, and traverse the currently executing task from small to large according to the task priority. Priority for pending task scheduling.

Further, scheduling tasks to be processed according to the priorities of currently executing tasks and tasks to be processed specifically includes:

If the priority of all currently executing tasks is greater than or equal to the priority of the pending task, the pending task waits for the next scheduling; if the priority of the currently executing task is less than the priority of the pending task, the current execution is calculated and processed in turn The sum of the idle computing power and the computing power to be released of the GPU cluster of the task, if the sum of the idle computing power and the computing power to be released of the GPU cluster currently executing the task does not meet the minimum computing power requirement of the user corresponding to the task to be processed, Then wait for the next scheduling; if the sum of the idle computing power of the GPU cluster currently executing the task and the computing power to be released meets the minimum computing power requirement of the user corresponding to the task to be processed, the task to be processed is allocated to meet the minimum computing power. A GPU cluster with the least number of GPUs required and required, saves and suspends the currently executed task corresponding to the computing power to be released in the GPU cluster.

Optionally, also include:

The power consumption monitoring module obtains the power consumption of the GPU module in real time, compares the current power consumption value of the GPU module with the set power consumption value, and controls the fan control module to increase the fan speed if the current GPU module power consumption value is greater than the set power consumption value .

A third aspect of the present invention provides a GPU server, including the GPU management device based on the detection and adjustment module as described in the first aspect.

The technical scheme adopted in the present invention includes the following technical effects:

1. The present invention effectively solves the problem that the interconnection topology between the CPU and the GPU cannot be adjusted according to different application scenarios due to the prior art, so as to achieve a reasonable configuration of floating-point operations and integer operations, and effectively improves the performance of the CPU and the GPU. Utilization and task processing efficiency.

2. In the technical solution of the present invention, multiple GPU sub-modules and between several GPUs communicate through the GPU management module to jointly complete the data processing tasks issued by the GPU management module, avoiding the need for communication between multiple GPUs to pass through the CPU module. The problem of low communication efficiency caused by the conversion improves the communication efficiency between GPUs.

3. In the technical solution of the present invention, each GPU management sub-module is communicatively connected with a plurality of GPU sub-modules connected in parallel, which can improve the bandwidth of parallel processing and make the interconnection bandwidth between GPUs achieve the best performance.

4. In the technical solution of the present invention, the fan control module and the power consumption monitoring module set separately, the power consumption monitoring module monitors the power consumption of the GPU module in real time, and once the power consumption of the monitored GPU module exceeds the set threshold, the fan control module will pass the fan control module in time. Increase the fan running speed to avoid the heat problem caused by the untimely cooling of the fan control module due to the dramatic change in the power consumption of the GPU module, thus affecting the GPU usage efficiency.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.

Description of drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, for those of ordinary skill in the art, On the premise of no creative work, other drawings can also be obtained from these drawings.

Fig. 1 is the structural schematic diagram of the device of Example 1 in the scheme of the present invention;

Fig. 2 is the schematic flow chart of the method of embodiment 2 in the scheme of the present invention;

Fig. 3 is the schematic flow chart of embodiment three method in the scheme of the present invention;

Fig. 4 is the schematic flow chart of S6 in the method of embodiment three in the scheme of the present invention;

Fig. 5 is the schematic flow chart of S64 in the third method in the embodiment of the present invention;

6 is a schematic flow chart of the method of Example 4 in the scheme of the present invention;

FIG. 7 is a schematic structural diagram of a GPU server in Embodiment 5 in the solution of the present invention.

detailed description

In order to clearly illustrate the technical features of the solution, the present invention will be described in detail below through specific embodiments and in conjunction with the accompanying drawings. The following disclosure provides many different embodiments or examples for implementing different structures of the invention. In order to simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in different instances. This repetition is for the purpose of simplicity and clarity and does not in itself indicate a relationship between the various embodiments and/or arrangements discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted from the present invention to avoid unnecessarily limiting the present invention.

Example 1

As shown in FIG. 1 , the present invention provides a GPU management device based on a detection and adjustment module, including: a CPU module 1, a CPU management module 2, a conversion module 3, a GPU module 4, a GPU management module 5, and a detection and adjustment module 6, the adjustment control terminal of the detection adjustment module 6 is respectively connected with the control terminal of the GPU management module 5 and the CPU management module 2 to communicate with each other, for detecting the data type to be processed, and select the corresponding GPU module 1 according to the data type to be processed. And/or the CPU module 4 is processed; the CPU management module 2 is communicated and connected with the CPU module 1, for realizing the management of the CPU module 1; the GPU management module 5 is communicated and connected with the GPU module 4, and is used for realizing the management of the GPU module 4 And the balanced distribution of tasks to be processed; the CPU module 1 is connected to the GPU module 4 in communication through the conversion module 3 .

Specifically, the GPU module 4 includes a plurality of GPU sub-modules 41 connected in parallel, each GPU sub-module 41 includes a plurality of GPUs 411 and an accelerator card 412, and a plurality of GPUs 411 and the accelerator card 412 are arranged in parallel. The GPUs 411 all communicate through the GPU management module 5 to jointly complete the data processing task issued by the GPU management module 5 .

CPUs are highly efficient in computing-intensive applications such as digital media processing and scientific computing, while GPUs are highly efficient in parallel computing of large-scale data. The efficient parallel computing based on GPU mainly utilizes the cooperative computing mode of CPU and GPU in the hybrid architecture. In the hybrid architecture system, the execution performance of the program is improved. On the multi-CPU and multi-GPU hybrid architecture system platform, the data cannot be directly transmitted between the GPU and the GPU. Only the GPU can first transmit the data to the CPU through the conversion module, and then the CPU To transmit the corresponding data to another GPU that receives the data, this communication method will bring huge communication overhead. Communication between multiple GPU sub-modules and between several GPUs is through the GPU management module, and the task of GPU management module 5 (playing dual roles of switching and management) is used to distribute it to each GPU in a balanced manner, so as to prevent high transaction costs between GPUs and GPUs. The communication overhead of the GPU affects the overall performance of the data flow program; multiple GPU sub-modules and several GPUs communicate through the GPU management module to jointly complete the data processing tasks issued by the GPU management module, avoiding communication between multiple GPUs The problem of low communication efficiency caused by the conversion of the CPU module is required to improve the communication efficiency between GPUs.

The CPU module 1 includes at least two CPU11, namely CPU0 and CPU1, and the conversion module 3 includes a Retimer chip and a PCIe Switch chip. The Retimer chip is connected in series between the CPU and the PCIe Switch chip, one end is connected to the CPU, and the other end is connected to the PCIe Switch chip , mainly used for signal relay to ensure lossless transmission of signals. The main function of the PCIe Switch chip is channel conversion; each CPU11 is connected to two conversion modules 3 respectively, and each conversion module 3 is connected to the corresponding GPU sub-module 41. Correspondingly, the number of GPU sub-modules 41 is 4, and each GPU sub-module 41 includes two GPUs 411 and one accelerator card 412, that is, from GPU0-GPU7, accelerator card 0-acceleration card 3, and CPU0 all the way PCIe X16 leads through Retimer The chip and PCIe Switch chip are expanded into 3 lanes of PCIe X16, which are respectively connected to GPU0, GPU1 and accelerator card 0; the other lane of PCIe X16 from CPU0 is expanded into 3 lanes of PCIe X16 through the Retimer chip and PCIe Switch chip, which are respectively connected to GPU2 and GPU3 and accelerator card 1; one route of PCIe X16 from CPU1 is expanded into 3 routes of PCIe X16 through Retimer chip and PCIe Switch chip, which are connected to GPU4, GPU5 and accelerator card 2 respectively; another route of PCIe X16 from CPU1 is routed through Retimer chip and PCIe Switch chip Expanded to 3-way PCIe X16, connected to GPU6, GPU7 and accelerator card 3 respectively. GPU0 . . . GPU7 and accelerator card 0 . . . accelerator card 3 are respectively connected to the GPU management module 5 .

The GPU management module 5 includes a plurality of GPU management sub-modules 51, and the plurality of GPU management sub-modules 51 are connected in parallel, and each GPU management sub-module 51 is connected in communication with a plurality of GPU sub-modules 41 connected in parallel.

In order to match the GPU sub-modules 41 in the present invention, the number of GPU management sub-modules 51 can be multiple (one is also possible, but the bandwidth performance is not optimal), specifically, it can be 6, each GPU management sub-module. 51 are connected to multiple GPU sub-modules connected in parallel, which can improve the bandwidth of parallel processing and make the interconnection bandwidth between GPUs achieve the best performance.

Further, it also includes: a power consumption monitoring module 7 and a fan control module 8, and the monitoring end of the power consumption monitoring module 7 is connected to the GPU module 4 for real-time monitoring of the power consumption of the GPU module 4, and the output end of the power consumption monitoring module 7 It is connected to the input end of the fan control module 8, and once the power consumption of the monitored GPU module 4 exceeds the set threshold, the fan control module 8 increases the running speed of the fan.

Specifically, the fan control module 8 may include a BMC81 (Baseboard Management Controller, baseboard management controller), a CPLD82 (Complex Programming logic device, programmable logic device), a fan 83, and the control output end of the BMC81 is connected to the control input end of the fan 83 , the control output terminal of CPLD82 is connected to the control input terminal of the fan, and the monitoring terminal of CPLD82 is connected to the fault output terminal of BMC81. Under normal conditions, BMC81 controls the operation of the fan; once CPLD82 monitors the fault of BMC, CPLD82 takes over the operation of BMC81 to control the operation of the fan.

In the technical solution of the present invention, the fan control module 8 and the separately provided power consumption monitoring module 7, the power consumption monitoring module 7 monitors the power consumption of the GPU module 4 in real time, and once the power consumption of the monitored GPU module 4 exceeds the set threshold, the fan control The module 8 increases the running speed of the fan, so as to avoid the heating problem caused by the untimely heat dissipation of the fan control module 8 due to the drastic change in the power consumption of the GPU module 4, thereby affecting the GPU usage efficiency. The purpose of separately setting the power consumption monitoring module 7 in the present invention is to shorten the power consumption monitoring and alarm time of the GPU module, because when the BMC monitors the power consumption of the GPU module 4, it generally adopts the polling method to obtain the power consumption of the GPU module 4, and the polling period may need 1s, and the power consumption change of the GPU module 4 is often at the level of us. Therefore, if the BMC is used to directly monitor the GPU power consumption, it is easy to cause an untimely alarm and cause the GPU module 4 to overheat. In the technical solution of the present invention, the power consumption monitoring module is separately set. 7. When the power consumption of the GPU module 4 changes greatly, the BMC can be notified in time to adjust the fan speed, so that the GPU module 4 can be cooled in time, and the operation of the GPU module can be avoided due to heat dissipation problems.

The invention effectively solves the problem that the interconnection topology between the CPU and the GPU cannot be adjusted according to different application scenarios due to the prior art, so as to achieve a reasonable configuration of the floating point operation and the integer operation, and effectively improves the utilization rate of the CPU and the GPU and task processing efficiency.

Embodiment 2

As shown in FIG. 2 , the technical solution of the present invention also provides a GPU management method based on a detection and adjustment module, which is implemented on the basis of Embodiment 1 of the present invention, including:

S1, divide the tasks to be processed into integer operations and floating-point operations;

S2, the detection and adjustment module detects the task type;

S3, if it is a floating-point operation task, the GPU module is preferentially called through the GPU management module to realize the operation processing of the data;

S4, if it is an integer operation task, the CPU management module is given priority to call the CPU module to realize the operation processing of the data;

S5, if the type of the task to be processed includes an integer operation part task and a floating point operation part task, the floating point operation part task is preferentially called by the GPU management module to realize the operation processing of the data, and the integer operation part task is preferentially managed by the CPU The module calls the CPU module to realize the operation processing of the data.

Embodiment 3

As shown in FIG. 3 , the technical solution of the present invention also provides a GPU management method based on a detection and adjustment module, which is implemented on the basis of Embodiment 1 of the present invention, including:

S2, the detection and adjustment module detects the task type;

S3, if it is a floating-point operation task, the GPU management module is given priority to call the GPU module to realize the operation processing of the data;

S5, if the type of the task to be processed includes an integer operation part task and a floating point operation part task, the floating point operation part task is given priority to call the GPU module through the GPU management module to realize the operation processing of the data, and the integer operation part task is preferentially managed by the CPU The module calls the CPU module to realize the operation processing of the data;

S6, when the GPU management module receives the task assigned by the detection and adjustment module, obtains the task with the highest priority in the task queue, and schedules the GPU cluster resources in the GPU module according to the priority of the task to be processed.

As shown in Figure 4, step S6 specifically includes:

S61, the GPU management module traverses the GPU cluster resources;

S62, judging that the idle computing power of the current GPU cluster meets the minimum computing power requirement of the user corresponding to the task to be processed, if the judgment result is yes, then execute step S63; if the judgment result is no, execute step S64;

S63, assigning the task to be processed to the GPU cluster that meets the minimum computing capability requirement and requires the least number of GPUs;

S64, traverse the currently executed tasks from small to large according to the priority of the tasks, and schedule the to-be-processed tasks according to the priorities of the currently-executed task and the to-be-processed task.

In step S63, if at least 4 GPUs can meet the minimum computing capability requirement, the task to be processed is allocated to the corresponding 4 GPUs for computing processing.

Further, as shown in Figure 5, S64 specifically includes:

S641, whether the priority of all currently executing tasks is greater than or equal to the priority of the task to be processed, if the judgment result is yes, then execute step S642, if the judgment result is no, then execute step S643;

S642, the pending task waits for the next scheduling;

S643, successively calculate and process the idle computing power of the GPU cluster of the current execution task and the sum of the computing power to be released;

S644, determine whether the sum of the idle computing power of the GPU cluster currently executing the task and the computing power to be released meets the minimum computing power requirement of the user corresponding to the task to be processed, if the judgment result is yes, then perform step S645, if the judgment result If no, execute step S646;

S645, assigning the task to be processed to the GPU cluster that meets the minimum computing capability requirement and requires the least number of GPUs, and suspends after saving the current execution task corresponding to the computing capability to be released in the GPU cluster;

S646, waiting for the next scheduling.

In the technical solution of the present invention, multiple GPU sub-modules and multiple GPUs communicate through the GPU management module to jointly complete the data processing tasks issued by the GPU management module, avoiding the need for the conversion of the CPU module for communication between multiple GPUs The resulting problem of low communication efficiency improves the communication efficiency between GPUs.

In the technical solution of the present invention, each GPU management sub-module is communicatively connected with a plurality of GPU sub-modules connected in parallel, which can improve the bandwidth of parallel processing and make the interconnection bandwidth between GPUs achieve the best performance.

In the embodiment of the present invention, the GPU management module is used to evenly distribute tasks to each GPU, so as to prevent the high communication overhead between the GPU and the GPU from affecting the overall performance of the data flow program, achieve load balancing between GPUs, and ensure the performance of the GPUs. Run efficiently.

Embodiment 4

As shown in FIG. 6 , the technical solution of the present invention also provides a GPU management method based on a detection and adjustment module, which is implemented on the basis of Embodiment 1 of the present invention, including:

S2, the detection and adjustment module detects the task type;

S5, if the type of the task to be processed includes an integer operation part task and a floating point operation part task, the floating point operation part task is preferentially called by the GPU management module to realize the operation processing of the data, and the integer operation part task is preferentially managed by the CPU The module calls the CPU module to realize the operation processing of the data;

S6, when the GPU management module receives the task assigned by the detection adjustment module, obtains the task with the highest priority in the task queue, and schedules the GPU cluster resources in the GPU module according to the priority of the task to be processed;

S7, the power consumption monitoring module obtains the power consumption of the GPU module in real time, and compares the current power consumption value of the GPU module with the set power consumption value. If the current GPU module power consumption value is greater than the set power consumption value, it controls the fan control module to increase speed of the fan.

In the technical solution of the present invention, the fan control module and the power consumption monitoring module set separately, the power consumption monitoring module monitors the power consumption of the GPU module in real time, and once the power consumption of the monitored GPU module exceeds the set threshold, the fan control module increases the power consumption in time. The fan running speed can avoid the heat problem caused by the untimely cooling of the fan control module due to the dramatic change in the power consumption of the GPU module, thus affecting the efficiency of the GPU.

Embodiment 5

As shown in FIG. 7 , the technical solution of the present invention further provides a GPU server, including the GPU management device based on the detection and adjustment module according to Embodiment 1 of the present invention. The height of the GPU server may be 4U. In addition to the GPU management device based on the detection and adjustment module of the first embodiment of the present invention, it may also include CPU Board (CPU board, which can integrate 2 CPUs), GPU Board (GPU board) , can integrate 8 GPUs), Bridge Board (CPU board and GPU board interconnection connector), Riser Board (expansion board), PDB Board (power backplane), redundant power supply (4+4 or 3+3PSU), etc., It can also be other GPU server structure, which is not limited in the present invention.

Although the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, they do not limit the scope of protection of the present invention. Those skilled in the art should understand that on the basis of the technical solutions of the present invention, those skilled in the art do not need to pay creative work. Various modifications or deformations that can be made are still within the protection scope of the present invention.

Claims

A GPU management device based on a detection and adjustment module, characterized in that it comprises: a CPU module, a CPU management module, a conversion module, a GPU module, a GPU management module, a detection and adjustment module, and an adjustment control end of the detection and adjustment module It is respectively connected with the control terminal of the GPU management module and the CPU management module to detect the data type to be processed, and select the corresponding GPU module and/or the CPU module for processing according to the data type to be processed; the CPU management module and The CPU module is communicatively connected to realize the management of the CPU module; the GPU management module is communicatively connected to the GPU module to realize the management of the GPU module and the balanced distribution of tasks to be processed; the CPU module is connected to the GPU through the conversion module Module communication connection.
The GPU management device based on a detection and adjustment module according to claim 1, wherein the GPU module includes a plurality of GPU sub-modules connected in parallel, each GPU sub-module includes a plurality of GPUs and an accelerator card, and the plurality of GPUs and Accelerator cards are set in parallel, and multiple GPU sub-modules and several GPUs communicate through the GPU management module to jointly complete the data processing tasks issued by the GPU management module.
The GPU management device based on the detection and adjustment module according to claim 2, wherein the GPU management module comprises a plurality of GPU management sub-modules, the plurality of GPU management sub-modules are connected in parallel, and each GPU management sub-module Both are connected in communication with a plurality of GPU sub-modules connected in parallel.
The GPU management device based on a detection and adjustment module according to any one of claims 1-3, further comprising: a power consumption monitoring module and a fan control module, wherein a monitoring end of the power consumption monitoring module is connected to the GPU module , used to monitor the power consumption of the GPU module in real time. The output end of the power consumption monitoring module is connected to the input end of the fan control module. Once the power consumption of the monitored GPU module exceeds the set threshold, the fan control module will increase the fan running speed. .
A GPU management method based on a detection and adjustment module is characterized in that, it is realized on the basis of the GPU management device based on the detection and adjustment module of any one of claims 1-4, comprising:

Divide pending tasks into integer operations and floating-point operations;

The detection and adjustment module detects the task type. If it is a floating-point operation task, the GPU module is called through the GPU management module to realize the operation processing of the data; if it is an integer operation task, the CPU module is called through the CPU management module to realize the operation and processing of the data; If the types of tasks to be processed include integer operation part tasks and floating point operation part tasks, the floating point operation part of the task is called through the GPU management module to call the GPU module to implement data operation processing, and the integer operation part of the task is called through the CPU management module to call the CPU module Realize the operation and processing of data.
The GPU management method based on the detection and adjustment module according to claim 5, wherein when the GPU management module receives the task assigned by the detection and adjustment module, it obtains the task with the highest priority in the task queue, and according to the task to be processed Priority scheduling of GPU cluster resources in the GPU module.
The GPU management method based on the detection adjustment module according to claim 6, wherein scheduling the GPU cluster resources in the GPU module according to the priority of the task to be processed specifically includes:

The GPU management module traverses the GPU cluster resources, and if the idle computing power of the current GPU cluster meets the minimum computing power requirement of the user corresponding to the task to be processed, the task to be processed is allocated to the one that meets the minimum computing power requirement and requires the least number of GPUs. In a GPU cluster; if the idle computing power of the current GPU cluster cannot meet the minimum computing power requirements of the user corresponding to the task to be processed, then traverse the currently executing task from small to large according to the task priority, and traverse the currently executing task from small to large according to the task priority. Priority for pending task scheduling.
The GPU management method based on the detection and adjustment module according to claim 7, wherein the scheduling of tasks to be processed according to the priority of the current execution task and the task to be processed specifically includes:

If the priority of all currently executing tasks is greater than or equal to the priority of the pending task, the pending task waits for the next scheduling; if the priority of the currently executing task is less than the priority of the pending task, the current execution is calculated and processed in turn The sum of the idle computing power and the computing power to be released of the GPU cluster of the task, if the sum of the idle computing power and the computing power to be released of the GPU cluster currently executing the task does not meet the minimum computing power requirement of the user corresponding to the task to be processed, Then wait for the next scheduling; if the sum of the idle computing power of the GPU cluster currently executing the task and the computing power to be released meets the minimum computing power requirement of the user corresponding to the task to be processed, the task to be processed is allocated to meet the minimum computing power. A GPU cluster with the least number of GPUs required and required, saves and suspends the currently executed task corresponding to the computing power to be released in the GPU cluster.
The GPU management method based on the detection adjustment module according to any one of claims 5-8, it is characterized in that, also comprises:

The power consumption monitoring module obtains the power consumption of the GPU module in real time, and compares the current power consumption value of the GPU module with the set power consumption value. If the current GPU module power consumption value is greater than the set power consumption value, it controls the fan control module to increase the fan speed. .
A GPU server, characterized in that it includes the GPU management device based on a detection and adjustment module according to any one of claims 1-4.