WO2024055168A1

WO2024055168A1 - Resource allocation method, processor, and computing platform

Info

Publication number: WO2024055168A1
Application number: PCT/CN2022/118522
Authority: WO
Inventors: 陈清龙; 毕舒展; 项能武
Original assignee: 华为技术有限公司
Priority date: 2022-09-13
Filing date: 2022-09-13
Publication date: 2024-03-21

Abstract

Disclosed in the present application are a resource allocation method, a processor, and a computing platform, which are used for improving the resource utilization rate of a chip, such as a GPU and an NPU, and reducing the area of the chip and static power consumption loss. The processor comprises a computing unit pool, which comprises a plurality of computing units, wherein each idle computing unit among the plurality of computing units can be called as required, and the numbers of computing units of different types among the plurality of computing units are positively correlated with the numbers of instances of the computing units of the types being called. In the present application, by means of performing pooling on computing units in a processing chip, such as an NPU and a GPU, each computing unit can be called by means of an upper-layer application as required when the computing unit is idle, thereby improving the resource utilization rate, reducing static power consumption loss of the computing units, and reducing the area of the chip.

Description

A resource allocation method, processor and computing platform

Technical field

The present application relates to the field of computer technology, and in particular, to a resource allocation method, processor and computing platform.

Background technique

With the rise of machine learning, artificial intelligence (AI), driverless driving, industrial simulation and other fields, general-purpose processors (such as central processing units (CPU)) are increasingly used to process massive calculations and massive data/ More and more performance bottlenecks are encountered when processing images, such as low parallelism, insufficient bandwidth, and high latency. In order to cope with the demand for diversified computing, more and more scenarios are beginning to introduce chips such as graphics processing unit (GPU) or neural network processor (neural processing unit, NPU), which together with general-purpose processors form heterogeneous computing platform. There are many kinds of computing resources in NPU, GPU and other chips, such as tensor computing unit, vector computing unit, scalar computing unit, etc. In the NPU and GPU produced by existing chip manufacturers, the proportion of various computing resources is fixed, and the same computing resources will be copied many times.

For example, see Figure 1, which is a schematic diagram of the system on chip (SoC) of the Xavier chip, including GPU, CPU, vision accelerator, deep learning accelerator (DLA), video encoder, ( video encoder), video decoder (video decoder), camera ingest (camera ingest), Internet service provider (internet server provider, ISP), etc. Among them, the GPU includes multiple independent stream multiprocessors (SM), each SM contains a universal parallel computing architecture (compute unified device architecture, CUDA) core, and each CUDA core All contain a variety of computing units. And, various mathematical operations (such as exponent (exp), reciprocal (1/x), logarithm (log), square root (sqrt), addition (add), subtraction (sub), multiplication (mul), etc.) , each corresponding to a separate physical computing unit; even to meet different data types (such as 8-bit integer (int8), 16-bit floating point number (fp16), 32-bit floating point number (fp32), 4-bit integer (int4), etc.) , and placing computing units of various specifications requires stacking multiple resources in CUDA Core.

For example, see Figure 2, which is a schematic diagram of the SoC of Huawei's Ascend chip, including NPU, CPU, task scheduler, network card, universal serial bus (USB) interface, external memory interface, and high-speed serial computer extension Bus (peripheral component interconnect express, PCIE) interface, general-purpose input/output (GPIO) interface, etc. Among them, the NPU includes multiple AI Cores, and each AI core stacks the same computing resources, such as the same type and the same number of tensor computing units, vector computing units, etc.

However, in actual applications, when NPU, GPU and other chips are running, not all computing resources are used at the same time. Therefore, the design methods of NPU, GPU and other chips in the existing technology not only cause a waste of chip area, but also cause static functions. The problem of consumption loss.

How to improve the resource utilization of chips including NPU, GPU, etc. is the problem to be solved in this application.

Contents of the invention

The present application provides a resource allocation method, a processor, and a computing platform to improve the resource utilization of chips including GPUs, NPUs, etc., and reduce chip area and static power consumption losses.

In a first aspect, a processor is provided. The processor includes a computing unit pool, and the computing unit pool includes multiple computing units; each idle computing unit in the multiple computing units can be called on demand, and among the multiple computing units The number of different types of computing units is positively related to the number of times that type of computing unit is called.

It can be understood that the processor may be a processing chip used to implement parallel computing, such as an NPU or a GPU or other processing chips used to implement parallel computing.

This application pools computing units in processing chips, including NPU and GPU, which are used to implement large-scale parallel calculations, so that each computing unit can be called on demand by upper-layer applications when it is idle, which can improve resource utilization. Reduce the static power loss of the computing unit and reduce the chip area.

In one possible design, all computing units in the processor are pooled, that is, all computing units in the processor are included in the computing unit pool.

All computing units of the processor are pooled, which can minimize the static power loss of the processor and reduce the chip area.

In another possible design, only some of the computing units in the processor are pooled, that is, only some of the computing units are included in the computing unit pool.

Some computing units of the processor are pooled, which can reduce static power loss and reduce the chip area of the processor. At the same time, retaining a small number of computing units from being pooled (for example, in the form of AI core) can make the processor have good forward compatibility.

In one possible design, types correspond to algorithms executed by multiple computing units. In other words, the types of computing units can be divided according to the algorithms executed by the computing units. For example, the type of the computing unit used to perform the addition (add) algorithm is the add computing unit, the type of the computing unit used to perform the logarithm (log) algorithm is the type log computing unit, and so on.

In one possible design, each of the multiple computing units corresponds to a low-level computing instruction. For example, the add calculation unit corresponds to the addition instruction, and the log calculation unit corresponds to the log instruction. It is understandable that in actual applications, different computing units can correspond to the same underlying operation instructions. The scale of the underlying operations varies according to the capabilities of the processor. It is specifically designed according to the chip specifications of the processor. The same underlying operation instructions may be used at the bottom layer. Executed in parallel by multiple computing units.

In a possible design, the computing unit pool includes one or more of a tensor computing unit pool, a vector computing unit pool, and a scalar computing unit pool.

Pooling tensor computing units, vector computing units, and scalar computing units separately can improve resource management efficiency and improve the efficiency of upper-layer application scheduling resources.

In one possible design, the processor is configured to: receive a request to execute the first application; determine the computing resource requirements of the first application, and allocate computing units to the first application from the computing unit pool according to the computing resource requirements to implement the first application, Computing resource requirements include one or more of tensor computing units, vector computing units, and vector computing units, as well as the quantity of each computing unit.

Through the above method, resources can be dynamically scheduled for upper-layer applications according to their requests, which can improve the flexibility of resource allocation and improve resource utilization.

In a possible design, the processor is also used to: determine multiple processes corresponding to the first application; determine the computing resource requirements of each of the multiple processes, and obtain the computing resource requirements from the computing unit pool according to the computing resource requirements of each process. Each process is allocated a computing unit to implement the first application; wherein the computing unit allocated to each process is different from the computing units allocated to other processes.

Through the above method, it is possible to allocate computing units from the computing unit pool to each process according to the computing resource requirements of each process of the first application, which can improve resource utilization.

In a possible design, the processor is further configured to: determine whether the computing unit allocated to the first application meets the needs of the first application; when the computing unit allocated to the first application cannot meet the needs of the first application, adjust the computing unit allocated to the first application. The computing resource allocation result of an application is allocated to the first application from the computing unit pool according to the adjusted computing resource allocation result.

In a possible design, the processor is further configured to establish a correspondence between the first application and the computing unit allocated to the first application.

Through the above method, it can be ensured that the resources allocated to the first application can meet its needs, and the reliability of resource allocation can be improved.

In the second aspect, a resource allocation method is provided, which is applied to a processor. The processor includes a computing unit pool, and the computing unit pool includes multiple computing units; each idle computing unit in the multiple computing units can be called on demand, And the number of different types of computing units in the plurality of computing units is positively correlated with the number of times that type of computing unit is called; the method includes: receiving a request to execute the first application; determining the computing resource requirements of the first application, and starting from Computing units are allocated to the first application in the computing unit pool to implement the first application. Computing resource requirements include one or more of tensor computing units, vector computing units, and vector computing units, as well as the number of each computing unit.

In one possible design, determining the computing resource requirements of the first application, and allocating computing units to the first application from the computing unit pool according to the computing resource requirements to implement the first application includes: determining multiple processes corresponding to the first application; Determine the computing resource requirements of each process in the plurality of processes, and allocate computing units to each process from the computing unit pool according to the computing resource requirements of each process to implement the first application; wherein the computing unit allocated to each process is equal to Other processes are allocated different compute units.

In a possible design, the method further includes: determining whether the computing unit allocated to the first application meets the needs of the first application; when the computing unit allocated to the first application cannot meet the needs of the first application, adjusting the first application the computing resource allocation result, and allocates a computing unit to the first application from the computing unit pool according to the adjusted computing resource allocation result.

In a possible design, the method further includes: establishing a correspondence between the first application and the computing unit allocated to the first application.

A third aspect provides a processing device, which includes modules/units/technical means for executing the method described in the second aspect or any possible design of the second aspect.

Exemplarily, the device may include: a transceiver module, configured to receive a request to execute the first application; a processing module, configured to determine the computing resource requirements of the first application, and select the computing unit pool for the first application from the computing unit pool of the processor according to the computing resource requirements. An application allocates computing units to implement the first application. The computing resource requirements include one or more of tensor computing units, vector computing units, and vector computing units, and the number of each computing unit; wherein the computing unit pool includes multiple computing units. computing units; each idle computing unit among the multiple computing units can be called on demand, and the number of different types of computing units among the multiple computing units is positively related to the number of times that type of computing unit is called.

In a possible design, the processing module can also be used to: determine multiple processes corresponding to the first application; determine the computing resource requirements of each of the multiple processes, and select the computing resource from the computing unit pool according to the computing resource requirements of each process. Allocating a computing unit to each process in to implement the first application; wherein the computing unit allocated to each process is different from the computing units allocated to other processes.

In a possible design, the processing module may also be used to: determine whether the computing unit allocated to the first application meets the needs of the first application; when the computing unit allocated to the first application cannot meet the needs of the first application, Adjust the computing resource allocation result of the first application, and allocate computing units to the first application from the computing unit pool according to the adjusted computing resource allocation result.

In a possible design, the processing module may also be used to establish a correspondence between the first application and the computing unit allocated to the first application.

A fourth aspect provides a computing platform, including the processor described in the first aspect and the processing device described in the third aspect.

In a fifth aspect, a computer-readable storage medium is provided. The computer-readable storage medium stores computer-executable instructions. When called by a computer, the computer-executable instructions make it possible as in the second aspect or any one of the second aspects. The method described in the design is implemented.

Description of drawings

Figure 1 is a schematic diagram of a SoC;

Figure 2 is a schematic diagram of another SoC;

Figure 3 is a schematic diagram of a processor provided by an embodiment of the present application;

Figure 4 is a schematic diagram showing that all computing units in the NPU are pooled;

Figure 5 is a schematic diagram showing that some computing units in the NPU are pooled;

Figure 6 is a statistical chart of the usage frequency of operators;

Figure 7 is a flow chart of a resource allocation method provided by an embodiment of the present application;

Figure 8A is a schematic diagram of a possible resource allocation provided by an embodiment of the present application;

Figure 8B is a schematic diagram of another possible resource allocation provided by an embodiment of the present application;

Figure 9 is a schematic diagram of a possible computing unit pool 11 provided by the embodiment of the present application;

Figure 10 is a schematic diagram of a processing device provided by an embodiment of the present application;

Figure 11 is a schematic diagram of a computing platform provided by an embodiment of the present application.

Detailed ways

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to FIG. 3 , a schematic diagram of a processor 01 is provided according to an embodiment of the present application. The processor 01 includes a computing unit pool 11 , and the computing unit pool 11 includes a plurality of computing units 111 .

It can be understood that the computing units in the computing unit pool 11 are pooled computing units, or computing units that can be shared, and can be called by different upper-layer applications, for example. In specific implementation, for each computing unit in the computing unit pool 11, when it is not called (or not running), it is an idle computing unit and can be called by upper-layer applications. Each idle computing unit in the plurality of computing units 111 can be called on demand. Specifically, it can be called according to the needs of upper-layer applications. That is, if any upper-layer application needs it and there are idle computing units in the computing unit pool 11, then The idle computing unit can be called by the upper-layer application.

The processor 01 may be a processing chip used to implement parallel computing, such as an NPU or a GPU or other processing chips used to implement parallel computing. The multiple computing units 111 in the computing unit pool 11 may be computing units in a chip used to implement parallel computing, such as computing units in an NPU or a GPU. In other words, the computing unit pool 11 can be pooled by computing units in an NPU or a GPU or other processing chips used to implement parallel computing.

The above solution pools computing units in processing chips, including NPU and GPU, which are used to implement large-scale parallel computing, so that each computing unit can be called on demand by upper-layer applications when it is idle. Compared with the existing technology In terms of stacking multiple identical computing resources using design methods such as SM or AI Core, resource utilization can be improved. Under the same resource usage, the static power consumption loss of the computing unit can be reduced and the chip area of the processor 01 can be reduced.

For the convenience of description, in the following description, the NPU is mainly used as an example. But the same implementation can be applied to other chips such as GPUs.

In one possible design, all computing units in the NPU are pooled, that is, all computing units in the NPU are included in the computing unit pool 11 . For example, as shown in FIG. 4 , all computing units 111 in the NPU are in the computing unit pool 11 . In other words, every idle computing unit among all computing units of the NPU can be called on demand.

Through the above method, all computing units of the NPU are pooled, which can reduce the static power consumption loss of the processor 01 as much as possible and reduce the chip area of the processor 01.

In another possible design, only some of the computing units in the NPU are pooled, that is, only some of the computing units are included in the computing unit pool 11 . For example, as shown in Figure 5, some computing units 111 in the NPU are in the computing unit pool 11, and other computing units 111 are not in the computing unit pool 11, but are configured in the AI Core.

Through the above method, some computing units of the NPU are pooled, which can reduce static power consumption loss and reduce the chip area of the processor 01. At the same time, a small number of computing units are retained in the form of AI core, which allows processor 01 to have good forward compatibility.

In one possible design, the computing unit pool 11 may include different types of computing units, where the computing unit type corresponds to the algorithm executed by the computing unit. The number of different types of computing units 111 in the plurality of computing units 111 is positively related to the number of times a type of computing unit 111 is called. In other words, the more times a type of computing unit 111 is called, the more likely it is in the computing unit pool. The greater the number in 11 , or in other words, the more times an algorithm is called, the greater the number of computing units 111 corresponding to the algorithm in the computing unit pool 11 . For example, if add computing units are called more, there can be more add computing units in the computing unit pool 11, and if log computing units are called less, there can be fewer log computing units in the computing unit pool 11.

In specific implementation, each computing unit 111 corresponds to a low-level operation instruction of the processor 01. In other words, each computing unit 111 among the plurality of computing units 111 corresponds to a low-level operation instruction. It can be understood that in actual applications, different computing units 111 can correspond to the same underlying operation instructions, and the scale of the underlying operations varies according to the capabilities of the processor 01, and is specifically designed according to the chip specifications of the processor 01. For example, for an add instruction, when the amount of data to be processed corresponding to the instruction is large, in order to improve the processing capability of processor 01 at the same time (that is, the degree of parallelism), multiple add computing units can be used at the bottom to perform parallel processing on different data. calculation, so for the same add instruction, the underlying layer may perform multiple additions.

As an example, the computing resource pool can be configured according to the resource situation that meets the actual requirements by analyzing and counting the proportion of various computing units used by various operators in the model (such as the neural network model) run by processor 01 in actual applications. 11 The number of various types of computing units. It can be understood that one operator can correspond to one or more computing units 111.

For example, see Figure 6, which is a statistical chart of the frequency of use of various operators in the model. The functions of each operator in Figure 6 are explained as follows:

Conv: convolution;

Relu: linear rectified unit (rectified linear unit);

Dequant: reverse quantification;

Mul: multiplication;

Slice: slice;

Gather: gather;

Shape: plastic surgery;

BatchNormalization: batch normalization;

Resize: zoom;

Pad: data expansion;

ConstantOfShape: generates a tensor with a given value and shape;

Tanh: activation function;

LeakyRelu: leakyretified linear unit;

GlobalAveragePool: global average pooling;

Floor: round down;

ConvTranspose: transposed convolution, also known as Deconvolution;

Softmax: normalization;

Exp: index;

Flatten: dimensionality reduction;

Equal: Determine whether the sequence is equal;

Expand: extension;

MatMul: matrix multiplication operation;

And: AND operation;

ReduceMean: reduction layer in convolutional neural network;

ReduceMax: Take the maximum value.

It can be understood that only some operator types are shown in FIG. 6 , and there may be other operator types in actual applications, which are not limited in this application.

As can be seen from Figure 6, operators such as Conv, Relu, and AscendDequant are used more frequently, so more computing units 111 can be configured in the computing unit pool 11 for operators such as Conv, Relu, and AscendDequant, while Softmax, Operators such as Exp, Flatten, Equal, Expand, MatMul, And, ReduceMean, and ReduceMax are used less frequently, and fewer computing units 111 can be configured in the computing unit pool 11 for operators such as Conv, Relu, and AscendDequant.

It can be understood that during specific implementation, the actual number of each type of computing unit 111 may be slightly greater than the number corresponding to the number of calls of that type of computing unit 111 to ensure that the processor 01 has sufficient performance to cope with emergencies.

In this design method, by configuring the number of different types of computing units 111 in the computing unit pool 11 to be positively correlated with the number of calls of the type of computing unit 111, the static performance of the processor 01 can be reduced while ensuring the running performance of the processor 01. Power loss.

It should be noted that the above is based on the underlying operation instructions (or the type of algorithm executed by the computing unit 111, or the type of mathematical operation performed by the computing unit 111) to distinguish the type of the computing unit 111. In actual application, it can also be distinguished from other aspects. The computing unit 111 is classified into types.

In one possible example, divided from the algorithm dimension, multiple computing units 111 may include a tensor computing unit (or matrix computing unit) 111A, a vector computing unit (or vector computing unit) 111B, a scalar computing unit One or more types of unit 111C, etc. Among them, the tensor calculation unit 111A is used to perform matrix calculations, the vector calculation unit 111B is used to perform vector calculations, and the scalar calculation unit 111C is used to perform scalar calculations.

In another possible example, according to the data type corresponding to the algorithm, the multiple computing units 111 may include an eight-bit integer (int8) computing unit and a 16-bit floating point (fp16) computing unit. One or more types of unit, 32-bit floating point (fp32) calculation unit, 4-bit integer (int4) calculation unit, etc.

It can be understood that the above ways of dividing computing unit types are only examples and not limitations. There are actually other ways of dividing.

In practical applications, the above types of classification methods can be combined with each other. For example, all calculation units 11 are first divided into tensor calculation units 111A, vector calculation units 111B, and scalar calculation units 111C, and then further subdivided according to mathematical operation types and/or data types in the tensor calculation unit 111A.

In one possible design, the processor 01 can schedule resources for the upper-layer application from the computing unit pool 11 when receiving a request from the upper-layer application.

For example, see FIG. 7 , which is a flow chart of a resource allocation method provided by an embodiment of the present application. The method may be executed by the processor 01 in FIG. 3 , or may be executed by other processing devices other than the processor 01 . The method includes:

S701. Receive a request to execute the first application;

S702. Determine the computing resource requirements of the first application; allocate computing units to the first application from the computing unit pool according to the computing resource requirements to implement the first application. The computing resource requirements include tensor computing units, vector computing units and vector computing units. One or more types of computing units, and the quantity of each type of computing unit.

When the above method is executed by the processor 01, the NPU in the processor 01 or other specially configured processing unit for resource allocation may be used to execute the above method, which is not limited in this application.

Optionally, you can first determine multiple processes corresponding to the first application; then determine the computing resource requirements of each of the multiple processes, and allocate computing units to each process from the computing unit pool according to the computing resource requirements of each process. To implement the first application; wherein the computing unit allocated to each process is different from the computing units allocated to other processes. In other words, different processes in the plurality of processes are allocated to different computing units.

As an example, see Figure 8A, which is a schematic diagram of a possible resource allocation. In the chip design stage, add, sub, mul, exp, etc., each of these resources is placed in processor 01, and matrix multiplication is placed in multiple copies. Assume that application 0 and application n are to be executed at the same time. The computing resources required by application 0 include matrix multiplication, add, and sub, and the computing resources required by application n include matrix multiplication, mul, and exp. When allocating resources, allocate them to application 0. The resources of can be shown as resource group 0, including matrix multiplication, add, and sub. The resources allocated for application n can be shown as resource group n, including matrix multiplication, mul, and exp. In this way, different types of computing resources can be allocated to different applications, which can achieve the purpose of reducing chip area, cost, and static power consumption without reducing the processing performance of the chip.

As an example, see FIG. 8B , which is a schematic diagram of another possible resource allocation. In the chip design stage, add, sub, mul, exp, etc., these resources are placed in two copies each in processor 01, and matrix multiplication is placed in multiple copies. Assume that application 0 and application n are to be executed at the same time. Application 0 has a high demand for add, while application n has a low demand for add. When allocating resources, the resources allocated to application 0 can be as shown in resource group 0. Contains matrix multiplication, add, sub, mul, and exp, of which add has two copies. The resources allocated for application n can be as shown in resource group n, including matrix multiplication, sub, mul, and exp. In this way, different amounts of computing resources can be allocated to different applications, which can achieve the purpose of reducing chip area, cost, and static power consumption without reducing the processing performance of the chip.

Through this design, resources can be dynamically scheduled for upper-layer applications based on their requests, improving the flexibility of resource allocation.

In one possible design, after allocating a computing unit to the first application from the computing unit pool according to the computing resource requirements, it may also be determined whether the computing unit allocated to the first application meets the needs of the first application; when the first application When the allocated computing unit cannot meet the needs of the first application, the computing resource allocation result of the first application is adjusted, and the computing unit is allocated to the first application from the computing unit pool 11 according to the adjusted computing resource allocation result.

For example, if the computing unit 111a allocated to the first application cannot meet the indicators (such as delay) required by the first application when performing calculations, the computing resource allocation result of the first application is adjusted to add more computing units, such as The computing units 111b and 111c are assigned to the first application to meet the indicators required by the first application.

Through the above method, it can be ensured that the resources allocated to the first application can meet its needs, and the reliability of resource allocation is improved.

In a possible design, a corresponding relationship between the first application and the computing unit allocated to the first application can also be established. In other words, after the computing unit is allocated to the first application, the computing unit allocated to the first application is fixed (that is, changed from an idle computing unit to a non-idle computing unit) and is only used by the first application. And during the use of the first application, other applications cannot call this part of the resource.

Through the above method, the problem of interrupting the first application caused by allocating resources allocated to the first application to other resources can be avoided, and stable operation of the first application can be ensured.

Optionally, after the first application has finished using the computing units allocated for the first application, the part of the computing units may be released (for example, the corresponding relationship is deleted), so that the part of the computing units may be called by other applications. Through the above method, resource utilization can be further improved.

Optionally, after the first application has finished using the computing units allocated for the first application, it may not release the part of the computing units (for example, still retain the corresponding relationship), so that the part of the computing units is not called by other applications, but remains The first application retains this part of the computing units, so that when the first application runs next time, it can directly use this part of the computing units without the need to allocate resources again. Through the above method, the performance of the first application can be guaranteed first.

In a possible design, the computing unit pool 11 includes one or more of a tensor computing unit pool, a vector computing unit pool, and a scalar computing unit pool.

For example, as shown in FIG. 9 , the computing unit pool 11 includes a tensor computing unit pool 11A, a vector computing unit pool 11B, and a scalar computing unit pool 11C. The tensor computing unit pool 11A includes one or more tensor computing units 111A. The vector calculation unit pool 11B includes one or more vector calculation units 111B, and the scalar calculation unit pool 11C includes one or more scalar calculation units 111C.

When allocating a computing unit to the first application, a corresponding computing unit pool may be found first according to its corresponding algorithm dimension, and then the computing unit may be determined from the computing unit pool.

It can be understood that each of the above design methods can be implemented individually or in combination with each other.

Based on the same technical concept, the embodiment of the present application also provides a processing device 100. The processing device 100 includes modules/units/technical means for executing the method shown in Figure 7.

For example, referring to Figure 10, the processing device 100 may include:

Transceiver module 1001, configured to receive a request to execute the first application;

The processing module 1002 is used to determine the computing resource requirements of the first application, and allocate computing units to the first application from the computing unit pool of the processor according to the computing resource requirements to implement the first application. The computing resource requirements include tensor computing units, vector One or more of computing units and vector computing units, and the number of each computing unit; wherein the computing unit pool includes multiple computing units; each idle computing unit among the multiple computing units can be called on demand, And the number of different types of computing units in multiple computing units is positively correlated with the number of times that type of computing unit is called.

It should be understood that all relevant content of each step involved in the above method embodiments can be quoted from the functional description of the corresponding functional module, and will not be described again here.

Based on the same technical concept, see FIG. 11 , an embodiment of the present application also provides a computing platform 1100 , including the above-mentioned processor 01 and the processing device 100 .

Based on the same technical concept, embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores computer-executable instructions. When called by a computer, the computer-executable instructions enable the method shown in Figure 7 be executed.

Those skilled in the art will understand that embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the present application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.

These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.

These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the protection scope of the present application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and equivalent technologies, the present application is also intended to include these modifications and variations.

Claims

A processor, characterized in that the processor includes a computing unit pool, and the computing unit pool includes a plurality of computing units;

Each idle computing unit in the plurality of computing units can be called on demand, and the number of different types of computing units in the plurality of computing units is positively correlated with the number of times the type of computing unit is called.
The processor according to claim 1, wherein each computing unit in the plurality of computing units corresponds to a low-level computing instruction.
The processor according to claim 1 or 2, wherein the computing unit pool includes one or more of a tensor computing unit pool, a vector computing unit pool, and a scalar computing unit pool.
The processor according to any one of claims 1-3, characterized in that the processor is used for:

Receive a request to execute the first application;

Determine the computing resource requirements of the first application, and allocate computing units from the computing unit pool to the first application according to the computing resource requirements to implement the first application, where the computing resource requirements include tensor calculations One or more of units, vector calculation units, and vector calculation units, and the number of each type of calculation unit.
The processor according to claim 4, characterized in that the processor is also used for:

Determine multiple processes corresponding to the first application;

Determine the computing resource requirements of each process in the plurality of processes, and allocate computing units to each process from the computing unit pool according to the computing resource requirements of each process to implement the first application; wherein , the computing unit allocated to each process is different from the computing units allocated to other processes.
The processor according to claim 4 or 5, characterized in that the processor is also used for:

Determine whether the computing unit allocated to the first application meets the needs of the first application;

When the computing unit allocated to the first application cannot meet the needs of the first application, the computing resource allocation result of the first application is adjusted, and the computing resource allocation result is obtained from the computing unit according to the adjusted computing resource allocation result. The first application is allocated a computing unit in the pool.
The processor according to claim 4 or 5, characterized in that the processor is also used for:

A corresponding relationship between the first application and the computing unit allocated for the first application is established.
The processor according to any one of claims 1-7, wherein the type corresponds to an algorithm executed by the plurality of computing units.
A resource allocation method, characterized in that it is applied to a processor, the processor includes a computing unit pool, the computing unit pool includes a plurality of computing units; each idle computing unit in the plurality of computing units can be Called on demand, and the number of different types of computing units among the plurality of computing units is positively correlated with the number of times the type of computing unit is called;

The methods include:

Receive a request to execute the first application;

Determine the computing resource requirements of the first application, and allocate computing units from the computing unit pool to the first application according to the computing resource requirements to implement the first application, where the computing resource requirements include tensor calculations One or more of units, vector calculation units, and vector calculation units, and the number of each type of calculation unit.
The method according to claim 9, characterized in that the computing resource requirements of the first application are determined, and computing units are allocated to the first application from the computing unit pool according to the computing resource requirements to implement The first application includes:

Determine multiple processes corresponding to the first application;

Determine the computing resource requirements of each process in the plurality of processes, and allocate computing units to each process from the computing unit pool according to the computing resource requirements of each process to implement the first application; wherein , the computing unit allocated to each process is different from the computing units allocated to other processes.
The method according to claim 9 or 10, characterized in that, the method further includes:

Determine whether the computing unit allocated to the first application meets the needs of the first application;

When the computing unit allocated to the first application cannot meet the needs of the first application, the computing resource allocation result of the first application is adjusted, and the computing resource allocation result is obtained from the computing unit according to the adjusted computing resource allocation result. The first application is allocated a computing unit in the pool.
The method according to claim 9 or 10, characterized in that, the method further includes:

A corresponding relationship between the first application and the computing unit allocated for the first application is established.
A processing device, characterized in that it includes:

A transceiver module, configured to receive a request to execute the first application;

a processing module, configured to determine the computing resource requirements of the first application, and allocate computing units to the first application from the computing unit pool of the processor according to the computing resource requirements to implement the first application; the computing Resource requirements include one or more of tensor computing units, vector computing units, and vector computing units, as well as the number of each computing unit;

Wherein, the computing unit pool includes multiple computing units; each idle computing unit in the multiple computing units can be called on demand, and the number of different types of computing units in the multiple computing units is the same as the number of computing units. The number of times a type of computing unit is called is positively correlated.
The device according to claim 13, characterized in that the processing module is also used to:

Determine multiple processes corresponding to the first application;

Determine the computing resource requirements of each process in the plurality of processes, and allocate computing units to each process from the computing unit pool according to the computing resource requirements of each process to implement the first application; wherein , the computing unit allocated to each process is different from the computing units allocated to other processes.
The device according to claim 13 or 14, characterized in that the processing module is also used to:

Determine whether the computing unit allocated to the first application meets the needs of the first application;

When the computing unit allocated to the first application cannot meet the needs of the first application, the computing resource allocation result of the first application is adjusted, and the computing resource allocation result is obtained from the computing unit according to the adjusted computing resource allocation result. The first application is allocated a computing unit in the pool.
The device according to claim 13 or 14, characterized in that the processing module is also used to:

A corresponding relationship between the first application and the computing unit allocated for the first application is established.
A computing platform, characterized by comprising the processor according to any one of claims 1-3 and the device according to any one of claims 13-16.