CN116775283A

CN116775283A - GPGPU resource allocation management method and system

Info

Publication number: CN116775283A
Application number: CN202310687298.7A
Authority: CN
Inventors: 赵先明; 林昀
Original assignee: Beijing Hongshan Information Technology Research Institute Co Ltd
Current assignee: Beijing Hongshan Information Technology Research Institute Co Ltd
Priority date: 2023-06-12
Filing date: 2023-06-12
Publication date: 2023-09-19

Abstract

The application discloses a GPGPU resource allocation management method and a GPGPU resource allocation management system in the technical field of GPGPU, and the method comprises the following steps of S100: numbering the whole resources, and allocating and managing all the resources according to the numbers, and S200: requesting the required resources according to the demand, and feeding back the request information to the management end, S300: the management end allocates numbered resources according to the request of the required resources and feeds back information, and S400: the resource is used, and at the same time, the resource after the use is recovered, and S500: and (3) sorting the recovered resources, deleting the reference resources, and secondarily recovering and grouping the sorted resources for reuse. The beneficial effects of the application are as follows: according to the application, the GPGPU operation resources are uniformly divided and grouped and managed, so that the uniform management of the GPGPU operation resources is realized, when external image information is processed, the application of corresponding resources is carried out according to the image information requirement to be processed, and the on-demand allocation of the GPGPU operation resources is realized.

Description

GPGPU resource allocation management method and system

Technical Field

The application relates to the technical field of GPGPU, in particular to a GPGPU resource allocation management method and a GPGPU resource allocation management system.

Background

A General-purpose graphics processor (GPGPU) is a type of graphics processor that utilizes graphics processing to compute General-purpose computing tasks that would otherwise be processed by a central processing unit. These general purpose computations often have no relation to graphics processing. Due to the powerful parallel processing capabilities and programmable pipelines of modern graphics processors, streaming processors can process non-graphics data. Particularly when facing single instruction Stream Multiple Data Streams (SIMDs), and the operand of data processing is much greater than the need for data scheduling and transmission, general purpose graphics processors are far beyond conventional central processor applications in performance.

A general purpose graphics processor is a computer chip that has emerged in recent years and has created a significant breakthrough in high performance embedded computing in aerospace and defense applications. This powerful chip was introduced in the last decade as a graphics processing engine for high-end computer games, a massively parallel processor. It not only facilitates complex floating point computing processes, but is also easy to program, and is attractive for a wide range of embedded systems.

When the GPGPU performs information processing, because the operating resources are strong, if the GPGPU operating resources cannot be managed, the operating resources are not moved timely when the GPGPU is operated, and the operating efficiency of the GPGPU is affected due to the fact that the resources are not managed in place, so that the GPGPU resource allocation management method and system are provided.

Disclosure of Invention

The application aims to provide a GPGPU resource allocation management method and a GPGPU resource allocation management system, which are used for uniformly dividing GPGPU operation resources and carrying out grouping management to realize uniform management of the GPGPU operation resources, when external image information is processed, application of corresponding resources is carried out according to the image information requirement to be processed, so that the on-demand allocation of the GPGPU operation resources is realized, meanwhile, when the resource end controls the GPGPU operation resources to process the image information, the processed image information is fed back to the application end, original channel information is transmitted in time, and meanwhile, the GPGPU operation resources are recovered and cleaned, and are repeatedly ordered and are subjected to progressive queuing operation, so that the problem that the operating efficiency of the GPGPU is affected due to the fact that the GPGPU operation resources are not timely adjusted and cannot be in place when the GPGPU operation resources cannot be managed due to the strong operation resources when the GPGPU operation resources are processed in the background technology is solved.

In order to achieve the above purpose, the present application provides the following technical solutions: a GPGPU resource allocation management method comprises the following steps.

S100: numbering the whole resources, and distributing and managing all the resources according to the numbers.

S200: and requesting the required resources according to the demand, and feeding back request information to the management end.

S300: and the management end allocates the numbered resources according to the request of the required resources and feeds back information.

S400: the resources are used, and the resources after the use are recovered.

S500: and (3) sorting the recovered resources, deleting the reference resources, and secondarily recovering and grouping the sorted resources for reuse.

In the step S100, the GPGPU operation resources are divided according to the requirements, and numbering is performed according to the division result after the GPGPU operation resources are divided, so that batch management of the GPGPU operation resources is realized.

As a further scheme of the application: after the request resource is divided in step S300, in the management unit, the division resource is in operation, and when the second request resource is divided, the subsequent resource is progressively arranged without considering the in-operation resource.

As still further aspects of the application: and step S400 is performed after the use of the running resources is finished, the resources are recovered, the running residues are cleaned, the running efficiency and the running state of the running resources are ensured, and the resources after the running are recovered through a recovery unit, wherein the recovered resources are arranged at the tail end, and the resources are arranged and ordered in a progressive mode.

Preferably: dividing the GPGPU operation resources, dividing the first group of operation resources, wherein the numbers are A1, A2 and the number of the operation resources are A2, the number of the operation resources is Ax, dividing the operation resources according to the operation bytes of the GPGPU operation resources and the operation resources are numbered sequentially when dividing the GPGPU operation resources, matching the operation resources with the corresponding numbers according to the application information of an application end as required, automatically changing the numbers of the operation resources into Aa when the operation resources with the corresponding numbers process image information, and changing the operation resource numbers of the batches into B1, B2 and B3 after the operation is finished.

A GPGPU resource allocation management system, comprising: the system comprises a control end, a recovery end, a resource end and a request end, wherein the control end is configured to be a system control unit, the signal priority of a control list is greater than that of the recovery end, the resource end and the request end, the control end and the recovery end, the resource end and the request end are all connected by signals, the recovery end, the resource end and the request end are connected by signals, the resource end is configured to be a GPGPU operation resource management unit, the request end is configured to be a GPGPU operation resource request unit, the request end comprises a plurality of wired external signal connection ports and wireless signal connection chips, and the recovery end is configured to be an operation resource recycling unit.

As still further aspects of the application: when the application end receives the GPGPU image processing application, the application end submits an operation resource use application to the resource end and the control end according to the operation resource occupied by the image to be processed.

As still further aspects of the application: when the control end receives the operation resources, the control end matches the applied operation resources according to the applied image information, if the matching is passed, a signal is given to the resource end, after the resource end receives the control end signal, the control end applies for the operation resources with corresponding numbers and units according to the operation resources provided by the application end, and the image information is processed according to the allocated operation resources.

As still further aspects of the application: when the corresponding number and the running resource of the unit process the image information, the running resource of the number unit enters a queuing stopping state, and meanwhile, the running resource of the subsequent number unit sequentially progresses, so that resource allocation management is realized.

As still further aspects of the application: and after the operation resources are allocated to the application end, feeding back the processed image information to the application end, giving out a signal to the control end, after the signal is received, giving out a signal to the recovery end, after the recovery end receives the signal, cleaning and recovering the operation resources after the operation of the resource end is finished, numbering again, and entering progressive ordering.

Compared with the prior art, the application has the beneficial effects that:

1. according to the application, the GPGPU operation resources are uniformly divided and grouped and managed, so that the uniform management of the GPGPU operation resources is realized, when external image information is processed, the application of corresponding resources is carried out according to the image information requirement to be processed, and the on-demand allocation of the GPGPU operation resources is realized.

2. In the application, the resource end controls the GPGPU operation resource to process the image information, and then feeds the processed image information back to the application end, so that the original channel information is transmitted in time, and meanwhile, the GPGPU operation resource is recovered and cleaned and reordered to perform progressive queuing operation, so that the mobilization and operation efficiency of the GPGPU operation resource are improved.

Drawings

FIG. 1 is a schematic diagram of the structure of the present application;

FIG. 2 is a schematic diagram of a system structure of a resource end in the present application;

FIG. 3 is a schematic diagram of a recovery end system according to the present application;

FIG. 4 is a schematic diagram of a system structure of an application end in the present application;

FIG. 5 is a schematic diagram of a management method according to the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

General purpose graphics processor technology is becoming more and more widely used in aerospace and defense digital signal processing, in which software programming languages including the "open graphics library" (Open Graphics Library, openGL) language, the parallel processing programming language CUDA created by NVIDIA corporation, and the recently occurring "open computing language" (Open Computing Language, openCL) play a considerable role. Before the advent of software programming languages such as OpenGL, CUDA and OpenCL, the programming of massively parallel processing computers was a difficult task that could be accomplished using mystery programming languages by a small number of experts. These emerging software programming languages, and in particular OpenCL, facilitate the acceptance of general-purpose graphics processor technology by programmers familiar with the C language and the c++ language. Moreover, openCL is still under further development, and eventually may be common in general purpose graphics processors, CPUs, and FPGAs. Such development facilitates future development of embedded computing architecture systems involving the combined use of CPUs, FPGAs and general purpose graphics processors, all programmed and maintained in the same software language. The kurar believes that the general purpose graphics processor and FPGA will not change directly and rapidly, and therefore the CPU can play an important role therein, so that the general purpose graphics processor, CPU and FPGA can be programmed as an open chip. The content of the open software library of the general-purpose graphics processor is also increasing, so that the software of the general-purpose graphics processor is more widely applied. There are also many Linux operating systems available today for downloading and adding materials to general purpose graphics processors. The use of devices such as FPGAs and DSPs has been said to develop various types of embedded computing work, and these systems employ specialized processing techniques. The open programming language used by general purpose graphics processors varies greatly. While general purpose graphics processors are programmed similarly to FPGAs and DSPs, they are programmed using OpenCL to a lesser degree of specialization. Furthermore, the programming software of the general-purpose graphics processor also contributes to its development in embedded computing. In a general-purpose graphics processor, multiple processing cores are regularly arranged together, so over time the number of processing cores in a device will increase, but the software of the general-purpose graphics processor does not have to be rewritten as the number of processing cores increases.

Referring to fig. 1 to 4, in an embodiment of the present application, a GPGPU resource allocation management system includes: the system comprises a control end, a recovery end, a resource end and a request end, wherein the control end is configured to be a system control unit, the signal priority of a control list is greater than that of the recovery end, the resource end and the request end, the control end and the recovery end, the resource end and the request end are all connected by signals, the recovery end, the resource end and the request end are connected by signals, the resource end is configured to be a GPGPU operation resource management unit, the request end is configured to be a GPGPU operation resource request unit, the request end comprises a plurality of wired external signal connection ports and wireless signal connection chips, and the recovery end is configured to be an operation resource recycling unit.

When the application end receives the GPGPU image processing application, the application end submits an operation resource use application to the resource end and the control end according to the operation resource occupied by the image to be processed.

When the control end receives the operation resources, the control end matches the applied operation resources according to the applied image information, if the matching is passed, a signal is given to the resource end, after the resource end receives the control end signal, the control end applies for the operation resources with corresponding numbers and units according to the operation resources provided by the application end, and the image information is processed according to the allocated operation resources.

When the corresponding number and the running resource of the unit process the image information, the running resource of the number unit enters a queuing stopping state, and meanwhile, the running resource of the subsequent number unit sequentially progresses, so that resource allocation management is realized.

And after the operation resources are allocated to the application end, feeding back the processed image information to the application end, giving out a signal to the control end, after the signal is received, giving out a signal to the recovery end, after the recovery end receives the signal, cleaning and recovering the operation resources after the operation of the resource end is finished, numbering again, and entering progressive ordering.

Embodiment one:

specifically, when dividing the GPGPU operating resources, dividing the first group of operating resources, and numbering the first group of operating resources to be A1, A2 and a.i. Ax, wherein when dividing, dividing the first group of operating resources according to the operating bytes of the GPGPU operating resources, numbering the first group of operating resources in sequence, then matching the operating resources with corresponding numbers according to application information of an application end as required, when processing image information by the corresponding numbered operating resources, automatically changing the numbers of the first group of operating resources to Aa, and when the operation is finished, changing the default numbers of the operating resources of the batch to be B1, B2 and B3.i. Bx, and meanwhile, the default ordering to be the second group of sequences of the B sequence, wherein the B sequence is arranged after the a sequence by default, and the subsequent operation is in a default round ordering, thereby reasonably dividing the GPGPU operating resources.

When multiple channels apply for image information processing of GPGPU operation data at the same time, the same sequence of the GPGPU operation data is divided into multiple independent computing unit groups by adopting a structure route allocation principle, and each computing unit group is provided with an independent L1 data cache, an L1 instruction cache, a shared memory, a final-stage cache and a global memory space; the global memory is globally visible to the on-chip DMA, and the GPGPU chip is divided into a plurality of independent computing unit groups by configuring the structure routing rule, so that the flexibility of software is improved, and the method has more advantages for multi-user computing.

Digital signal processing using a general-purpose graphics processor has not been seemingly utilized for its graphics processing capabilities, but the graphics nature of such devices has fundamental impact on signal processing in imaging devices, radar, sonar, signal intelligence, and other devices that perform complex calculations. The explanation for this is that applying a general purpose graphics processor to signal processing runs the graphics card in reverse. Franklin, a general purpose graphics processor may be used to parse things to obtain information that is available, delivering useful materials in the surrounding environment.

In this embodiment, the GPGPU has a processor array composed of a certain number of SPs, where NV is called IPC, and each TPC contains a certain number of SMs, and each SM contains 8S. The main structure of the SP is an ALU (logical operation unit), an FPU (floating point operation unit) and a register file. The SM contained an InstrongUnit, a ConstantMemoryUnit, a textureMemor,8192 registers, a 16KB ShareMemor, 8 StreamProcessor (SP) and two SpecialFunctionUnits (SFU). (GeForce 9300MGS only has 1 SM) Thread is the most basic operation unit in the CUDA model, and executes the most basic program instruction.

General purpose graphics processors are adept at accomplishing two things, one is to represent things and the other is to parse things. The general graphic processor provides available graphic processing technology for the designer of military signal processing application system, and has very great embedded parallel processing capacity almost free. The growth of general-purpose graphics processors in aerospace and defense applications is an example of the use of off-the-shelf commercial technology in the military technology field. Franklin states that the primary use of graphics processors is still graphics processing, and while graphics processor manufacturers struggle in computer games for billions of dollars, companies such as NVIDIA incur a cost of 20 billions of dollars per development of a series of graphics processors. Now, not only the application field of the general-purpose graphic processor chip is extended from a single graphic processing apparatus to a signal processing apparatus, but also the software programming language of the general-purpose graphic processor is extended toward signal processing and general-purpose processing. Graphics processing languages like "open graphics library" (OpenGL) can be used for general purpose processing.

Referring to fig. 5, a GPGPU resource allocation management method includes the following steps.

S400: the resources are used, and the resources after the use are recovered.

After the request resource is divided in step S300, in the management unit, the division resource is in operation, and when the second request resource is divided, the subsequent resource is progressively arranged without considering the in-operation resource.

And step S400 is performed after the use of the running resources is finished, the resources are recovered, the running residues are cleaned, the running efficiency and the running state of the running resources are ensured, and the resources after the running are recovered through a recovery unit, wherein the recovered resources are arranged at the tail end, and the resources are arranged and ordered in a progressive mode.

Meanwhile, in this embodiment, while the GPGPU operating resources are managed in batch, the power units operated by the GPGPU device are also divided according to units, and the power usage status is that each GPGPU operating resource unit obtained by dividing the input power into a plurality of units according to the operating efficiency, where when the power usage is divided, only the GPGPU operating resources ordered by the front end of the a sequence are: only the running GPGPU running resource units in operation are matched with partitioned power units.

In this embodiment:

a graphics processor (gpu) can significantly accelerate the computation process of deep learning. They are an important component of modern artificial intelligence infrastructure and develop and optimize new gpus specifically for deep learning. A graphics processing unit (gpu) is a specialized processing core that may be used to accelerate the computing process. These kernels were originally designed to process image and visual data. However, gpu is now used to enhance other computational processes, such as deep learning. This is because gpus can be effectively used in parallel for large-scale distributed computing processes.

The main benefit of gpu is that individual parts of the whole are handled in parallel or simultaneously. There are four architectures for parallel processing implementations, including: single instruction single data, but instruction multiple data, multiple instruction single data, and multiple instruction multiple data. The central processing unit, i.e. cpu, also has parallel processing capabilities. Most cpus employ multi-instruction multi-data parallel processing architectures, while most gpus do not employ single-instruction multi-data parallel architectures. Compared to parallel architecture of multiple instruction multiple data, architecture of single instruction multiple data is more suitable for distributed computation, and the architecture is more suitable for training deep learning model, which is why gpu is widely used for deep learning model training.

cuda is a framework for accessing a gpu program written in c language by using the access proposed by Injeida, and resources in the gpu can be directly accessed by using cuda.

Although cuda provides the ability to access the gpu resources, it is not specifically designed for deep learning, so constructing a deep-learned model using cuda is still an underlying access that does not provide sufficient abstraction of the general model for deep learning, from which cuda-based deep-learned models need to be constructed, which still presents considerable difficulties to users of non-professional-area deep-learned models.

In view of this, a framework based on cuda and dedicated to training a deep learning model is proposed, for example, tensorflow and pyrach, the bottom layer of these frameworks directly uses api provided by cuda to abstract the deep learning model, and when the deep learning model is trained by using these frameworks, only the network structure and the loss function of deep learning can be focused, after the network structure and the loss function are defined, tensorflow and pyrach can directly train the model until the loss function converges, and a developer can directly train the model without focusing on the bottom layer structure of gpu, which greatly reduces the training threshold of the deep learning model, and is an important reason for the vigorous development of deep learning in recent years.

Among the algorithms currently existing are classical methods such as first come first served (fcfs) algorithm, shortest job first (sjf) algorithm, etc. These classical algorithms require task requests with values of estimated resource requirements (often inaccurate estimates, users typically estimate resources higher than the actual needs of the task in order to occupy more resources), which allocate the computational resources of the cpu to different tasks according to the task's resource requirements. The first come first serve (fcfs) method is to allocate part of computing resources of a single cpu core to task requests according to the resource requirements of the tasks in sequence according to the arrival sequence of the task requests. The shortest job priority (sjf) scheduling algorithm is to allocate part of the computing resources of a single cpu core to the task requests according to the resource requirements of the tasks, wherein the task requests are ordered according to the resource requirements (the task requests with smaller cpu resource requirements have higher priority). However, these algorithms only make allocation decisions according to the resource requirements in the task request, when a task requests lie on the resource requirements, the idle of the cpu resources and the accumulation of the following tasks are caused, meanwhile, due to the phenomenon that the utilization efficiency of the resources by the image model is marginally decreased, the relationship between the efficiency of processing the image tasks by the image model and the cpu resources is nonlinear, and as the allocated cpu resources are more, the efficiency of processing the image tasks by the image model is more slowly improved.

In this embodiment, the control end performs verification on the application data according to the application, so as to ensure the distribution ratio of the application data to be distributed as required, and meanwhile, the operation resources occupied by different processing information are reduced by numbering the operation data of the GPGPU in batches.

The foregoing is only a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art, who is within the scope of the present application, should make equivalent substitutions or modifications according to the technical scheme of the present application and the inventive concept thereof, and should be covered by the scope of the present application.

It should be noted that: like reference numerals and letters designate like items in the drawings, and thus once an item is defined in one drawing, no further definition or explanation thereof is necessary in the subsequent drawings.

In the foregoing description of the application, it should be noted that the azimuth or positional relationship indicated by the terms "one side", "the other side", etc. are based on the azimuth or positional relationship shown in the drawings, or the azimuth or positional relationship in which the inventive product is conventionally put in use, are merely for convenience of describing the application and simplifying the description, and are not indicative or implying that the apparatus or element to be referred to must have a specific azimuth, be configured and operated in a specific azimuth, and therefore should not be construed as limiting the application. Furthermore, the terms "first," "second," and the like, are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Furthermore, the terms "identical" and the like do not denote that the components are identical, but rather that there may be minor differences. The term "perpendicular" merely means that the positional relationship between the components is more perpendicular than "parallel" and does not mean that the structure must be perfectly perpendicular, but may be slightly tilted.

Claims

1. A GPGPU resource allocation management method is characterized in that: the method comprises the following steps:

s100: numbering the whole resources, and distributing and managing all the resources according to the numbers;

s200: requesting the required resources according to the demand, and feeding back request information to a management end;

s300: the management end allocates the numbered resources according to the request of the required resources and feeds back information;

s400: the method comprises the steps of using resources and recycling the used resources;

s500: the recovery resources are arranged, the reference resources are deleted, and the arranged resources are recovered for the second time, grouped and reused;

2. The GPGPU resource allocation management method of claim 1, wherein: after the request resource is divided in step S300, in the management unit, the division resource is in operation, and when the second request resource is divided, the subsequent resource is progressively arranged without considering the in-operation resource.

3. The GPGPU resource allocation management method of claim 1, wherein: and step S400 is performed after the use of the running resources is finished, the resources are recovered, the running residues are cleaned, the running efficiency and the running state of the running resources are ensured, and the resources after the running are recovered through a recovery unit, wherein the recovered resources are arranged at the tail end, and the resources are arranged and ordered in a progressive mode.

4. A GPGPU resource allocation management system, comprising: control end, recovery end, resource end and request end, its characterized in that: the control end is configured to be a system control unit, wherein the signal priority of the control list is greater than that of the recovery end, the resource end and the application end, the control end and the recovery end, the resource end and the application end are all connected by signals, the resource end is configured to be a GPGPU operation resource management unit, the application end is configured to be a GPGPU operation resource application unit, the application end comprises a plurality of wired external signal connectors and wireless signal connection chips, and the recovery end is configured to be an operation resource recycling unit.

5. The GPGPU resource allocation management system of claim 4, wherein: when the application end receives the GPGPU image processing application, the application end submits an operation resource use application to the resource end and the control end according to the operation resource occupied by the image to be processed.

6. The GPGPU resource allocation management system of claim 4, wherein: when the control end receives the operation resources, the control end matches the applied operation resources according to the applied image information, if the matching is passed, a signal is given to the resource end, after the resource end receives the control end signal, the control end applies for the operation resources with corresponding numbers and units according to the operation resources provided by the application end, and the image information is processed according to the allocated operation resources.

7. The GPGPU resource allocation management system of claim 6, wherein: when the corresponding number and the running resource of the unit process the image information, the running resource of the number unit enters a queuing stopping state, and meanwhile, the running resource of the subsequent number unit sequentially progresses, so that resource allocation management is realized.

8. The GPGPU resource allocation management system of claim 4, wherein: and after the operation resources are allocated to the application end, feeding back the processed image information to the application end, giving out a signal to the control end, after the signal is received, giving out a signal to the recovery end, after the recovery end receives the signal, cleaning and recovering the operation resources after the operation of the resource end is finished, numbering again, and entering progressive ordering.