CN113918290A

CN113918290A - API calling method and device

Info

Publication number: CN113918290A
Application number: CN202010656379.7A
Authority: CN
Inventors: 黄波
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2022-01-11
Also published as: CN118642826A

Abstract

An API calling method and device are applied to a heterogeneous system, the heterogeneous system can comprise a plurality of processing engines including a first processing engine and a second processing engine, the method can be executed by a CPU in the heterogeneous system, the CPU can be one of the first processing engine and the second processing engine, and the CPU can also be one of the processing engines except the first processing engine and the second processing engine in the heterogeneous system. The CPU determines a target API to be called; and then, based on the API calling information, selecting the first processing engine to call the target API, wherein the API calling information is used for indicating the efficiency of calling the target API by the first processing engine and the second processing engine respectively. The CPU in the heterogeneous system can select a proper processing engine to call the target API according to the efficiency of calling the target API by different processing engines, so that the efficient calling of the API is realized.

Description

API calling method and device

Technical Field

The present application relates to the field of communications technologies, and in particular, to an API calling method and apparatus.

Background

With the deep popularization of computers and intelligent devices in different application fields, in order to meet the data processing requirements in different fields, in addition to a Central Processing Unit (CPU), many different processing engines have emerged, such as a Graphics Processing Unit (GPU), an Image Processor (IP), a Digital Signal Processor (DSP), a neural Network Processing Unit (NPU), a Field Programmable Gate Array (FPGA), and the like. For different data processing scenarios, different types of processing engines have better data processing capability in the corresponding scenarios.

In order to improve the overall data processing capability of a computing system, the computing system may further include processing engines such as a GPU, an IP, a DSP, and an NPU, in addition to a Central Processing Unit (CPU). Such computing systems with two or more types of processors may also be referred to as heterogeneous systems.

In the heterogeneous system, the CPU can be used as a dispatcher to perform data interaction with other processing engines and assist the other processing engines in data processing.

At present, because the types of program instructions used by different processing engines in a heterogeneous system are different, a compiler needs to send program instructions, such as Application Programming Interface (API) functions, to be executed by each processing engine (including a CPU) to the CPU in advance. And then, the CPU sends the program instructions required to be executed by other processing engines to the corresponding processing engines.

The compiler needs to configure in advance which program instructions are executed by which processing engine during the development of the program instructions before sending them to the CPU.

Such a manner of configuring the processing engine for executing the program instruction depends on a developer, and whether the configured processing engine for executing the program instruction is suitable or not can be guaranteed, so that the processing engine can execute the program instruction more efficiently than other processing engines.

Disclosure of Invention

The application provides an API calling method and device, which are used for determining a processing engine capable of efficiently calling an API.

In a first aspect, an embodiment of the present application provides an application program interface API calling method, where the method is applied to a heterogeneous system, where the heterogeneous system may include multiple processing engines, where the multiple processing engines include a first processing engine and a second processing engine, and the method may be executed by a CPU in the heterogeneous system, where the CPU may be one of the first processing engine and the second processing engine, or one processing engine other than the first processing engine and the second processing engine in the heterogeneous system. The CPU firstly determines a target API needing to be called; and then, based on the API calling information, selecting the first processing engine to call the target API, wherein the API calling information is used for indicating the efficiency of calling the target API by the first processing engine and the second processing engine respectively. The target API may be a heterogeneous API or other types of APIs.

Through the method, the CPU in the heterogeneous system can select the proper processing engine to call the target API according to the efficiency of calling the target API by different processing engines, so that the high-efficiency calling of the API is realized.

In a possible implementation manner, when the CPU selects the first processing engine to call the target API based on the API call information, the CPU may first determine a target parameter size of the target API. For example, the CPU may first obtain a signature of the target API, where the signature of the target API is used to indicate a target parameter size of the target API. And then, selecting the first processing engine according to the API calling information and the target parameter scale of the target API, wherein the API calling information indicates the efficiency of calling the target API of the candidate parameter scale by the first processing engine and the second processing engine, and the candidate parameter scale comprises the target parameter scale.

Through the method, the CPU in the heterogeneous system can select the proper processing engine to call the target API with the target parameter scale according to the efficiency of calling the target API with different candidate parameter scales by different processing engines.

In a possible implementation manner, the embodiment of the present application does not limit the manner in which the CPU obtains the signature of the target API, for example, the CPU may first obtain the identifier of the target API; and then, determining the signature of the target API from a pre-configured API signature set according to the identification of the target API, wherein the signature of the target API comprises the identification of the target API.

By the method, the CPU can conveniently determine the signature of the target API through the pre-configured API signature set.

In a possible implementation manner, after selecting the first processing engine to call the target API, the CPU may obtain, from a preconfigured API function library of the first processing engine, a program instruction required by the first processing engine to call the target API, where the API function library of the first processing engine includes program instructions required by the first processing engine to call one or more APIs, respectively, and the one or more APIs include the target API; the program instructions required by the first processing engine to call the target API are then sent to the first processing engine.

By the method, the pre-configured API function library of the first processing engine comprises the program instructions required by the first processing engine to call one or more APIs respectively, and the CPU can acquire the program instructions required by the first processing engine to call the target API more quickly through the API function library of the first processing engine, so that the efficiency of the first processing engine to call the target API can be improved.

In a possible implementation manner, after selecting the first processing engine to call the target API, the CPU may also obtain an intermediate representation of the target API stored in advance; compiling the intermediate representation of the target API into program instructions required by the first processing engine to call the target API; and then sending program instructions required by the first processing engine to call the target API to the first processing engine.

By the method, the CPU can compile and generate the program instruction required by the first processing engine to call the target API more quickly through the intermediate representation of the target API, the efficiency of calling the target API by the first processing engine is improved, in addition, the method can be suitable for various heterogeneous systems, the heterogeneous systems only need to comprise the processing engine capable of compiling the intermediate representation of the target API, and the application range is effectively expanded.

In one possible implementation, the API call information may further indicate a storage address of the intermediate representation of the target API, and the CPU may determine the storage address of the intermediate representation of the target API from the API call information when acquiring the pre-stored intermediate representation of the target API, and then acquire the intermediate representation of the target API according to the storage address of the intermediate representation of the target API.

By the method, the CPU can conveniently acquire the intermediate representation of the target API through the API calling information, further, the time for generating the program instruction required by the first processing engine to call the target API can be shortened, and the efficiency of calling the target API by the first processing engine is improved.

In a possible implementation manner, the API call information is stored in a table form, and the table form is more intuitive, so that the CPU can obtain the relevant information from the API call information.

In a possible implementation manner, the API call information may further indicate that the first processing engine and the second processing engine respectively call a cache address of a program instruction required by the target API, and after selecting the first processing engine to call the target API, the CPU may obtain the program instruction required by the first processing engine to call the target API according to the cache address of the program instruction required by the first processing engine to call the target API in the API call information; thereafter, the program instructions are sent to a first processing engine.

By the method, the CPU can also conveniently acquire the cache address of the program instruction required by the first processing engine to call the target API through the API call information, and can quickly acquire the program instruction required by the first processing engine to call the target API, so that the first processing engine can efficiently call the target API with high efficiency.

In one possible implementation, the cache address of the program instruction required by the first processing engine to call the target API includes a cache address of a program instruction required by the first processing engine to call the target API of the candidate parameter size, and the candidate parameter size includes the target parameter size.

When acquiring the program instruction required by the first processing engine to call the target API according to the cache address of the program instruction required by the first processing engine to call the target API in the API call information, the CPU may acquire the cache address of the program instruction required by the first processing engine to call the target API of the target parameter scale from the API call information according to the target parameter scale of the target API, and then acquire the program instruction required by the first processing engine to call the target API of the target reference scale according to the cache address of the program instruction required by the first processing engine to call the target API of the target parameter scale.

By the method, the API call information can indicate the cache addresses of the program instructions required by the first processing engine to call the target APIs with different candidate parameter scales respectively. The CPU is facilitated to be able to select program instructions required for the first processing engine to call a target API of a target reference size.

In a second aspect, an API calling apparatus is further provided in the embodiments of the present application, and for beneficial effects, reference may be made to the description of the first aspect and details are not described here again. The apparatus has functionality to implement the actions in the method instance of the first aspect described above. The functions may be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions. In a possible design, the structure of the device includes a determining unit, a selecting unit, and optionally, an instruction determining unit and a sending unit, where these units may perform corresponding functions in the method example of the first aspect, for which specific reference is made to detailed description in the method example, and details are not repeated here.

In a third aspect, an embodiment of the present application further provides a computing apparatus, and for beneficial effects, reference may be made to the description of the first aspect, which is not described herein again. The computing apparatus is configured to include a processor and a memory, the processor being configured to support the device to perform the corresponding functions of the method of the first aspect. The memory is coupled to the processor and holds the program instructions and data necessary for the computing device. The computing device also includes a communication interface for communicating with other devices.

In a fourth aspect, the present application also provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In a fifth aspect, the present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In a sixth aspect, the present application further provides a computer chip, where the chip is connected to a memory, and the chip is used to read and execute a software program stored in the memory, and execute the method of the first aspect.

Drawings

FIG. 1A is a block diagram of a system according to the present application;

FIG. 1B is a block diagram of a system according to the present application;

FIG. 1C is a block diagram of a system according to the present application;

FIG. 1D is a block diagram of a system according to the present application;

FIG. 2 is a schematic diagram of a heterogeneous API call method provided herein;

FIG. 3 is a schematic structural diagram of a heterogeneous system provided herein;

FIG. 4 is a schematic structural diagram of a heterogeneous system provided herein;

fig. 5 is a schematic structural diagram of a heterogeneous API calling apparatus provided in the present application;

fig. 6 is a schematic structural diagram of an apparatus provided in the present application.

Detailed Description

Fig. 1A is a schematic structural diagram of a system suitable for use in the embodiment of the present application, and the system includes a compiler 100 and a heterogeneous system 200. The heterogeneous system 200 includes a plurality of processing engines 210, in the embodiment of the present application, the processing engines 210 are units capable of performing data processing operations, the embodiment of the present application does not limit the specific type and form of the processing engines 210, and all the units capable of performing data processing operations may be the processing engines 210.

The plurality of processing engines 210 may include at least one CPU, and the remaining processing engines 210 may be processing engines 210 of a different type than the CPU, for example, the remaining processing engines 210 may include some or all of the following:

CPU, GPU, IP, DSP, NPU, or FPGA.

One processing engine 210 of the plurality of processing engines 210 can act as a dispatcher for interacting with the remaining processing engines 210 to assist the remaining processing engines 210 in data processing. In the embodiment of the present application, the dispatcher is a CPU as an example, and for convenience of description, the CPU as the dispatcher is referred to as a dispatching CPU or a main CPU (host CPU).

The embodiment of the present application does not limit the deployment manner of the heterogeneous system 200, for example, the heterogeneous system 200 may be deployed on one computing node in a centralized deployment manner, or may be deployed on a plurality of computing nodes in a distributed deployment manner.

Compiler 100 is capable of compiling a source program that includes a heterogeneous API into program instructions that may run in heterogeneous system 200 (e.g., host CPU) or into an Intermediate Representation (IR) of the heterogeneous API. Compiler 100 may be deployed independently of heterogeneous system 200 and at a different compute node, or may be deployed co-located with heterogeneous system 200 and at the same compute node.

Specifically, after obtaining the prototype declaration file of one or more heterogeneous APIs, the compiler 100 may identify the prototype declaration file of the one or more heterogeneous APIs, and edit a heterogeneous API header file, where the heterogeneous API header file includes a signature of each of the heterogeneous APIs. For ease of illustration, the set of signatures of one or more heterogeneous APIs is referred to as a heterogeneous API signature set.

The prototype declaration file for each heterogeneous API may indicate a parameter size of the heterogeneous API, where the parameter size can describe a size (e.g., a type, a number, etc. of a parameter) of a parameter required in invoking the heterogeneous API.

The prototype declaration file of each heterogeneous API may also indicate relevant information of the heterogeneous API, such as some information other than the functional body of the heterogeneous API, such as the number of parameters, types of parameters, and parameter names required to call the heterogeneous API.

It should be noted that, in the embodiment of the present application, the size of the parameter does not refer to the numerical size of the parameter, but refers to the memory occupied by the parameter or the number of bytes occupied in the processing engine 210.

Embodiments of the present application do not limit the manner in which the prototype declaration document for a heterogeneous API indicates the parameter size of the heterogeneous API. For example, the identifiers corresponding to different parameter scales may be preset, and the parameter scales of the heterogeneous APIs are different. For example, the identifier A may correspond to 2-4 parameters with a size of 2-4 bits, and the identifier B may correspond to 5-7 parameters with a size of 5-10 bits. The compiler 100 determines the parameter size of the heterogeneous API by identifying the identifier corresponding to the parameter size in the prototype declaration file of the heterogeneous API.

For another example, an expression for calculating the parameter scale may be predefined, and the parameter information required for calling the heterogeneous API needs to be brought into the expression to obtain the parameter scale expression of the heterogeneous API. Compiler 100 determines the parameter size of a heterogeneous API by identifying an expression of the parameter size in a prototype declaration file for the heterogeneous API.

For example, the following statements may be added to the prototype declaration file for the heterogeneous API:

#pragma HAPI_PARA_SIZE(hapi_para_size_expr)

# pragma HAPI _ PARA _ SIZE is a parameter SIZE characterizing the heterogeneous API claimed herein, and HAPI _ PARA _ SIZE _ expr is an expression of the parameter SIZE.

For example, the expression of the parameter Size may be max (a. Size (), b. Size (), c. Size ()), where max (a. Size (), b. Size (), c. Size ()) represents taking as the parameter Size the value of the largest Size of the three parameters A, B, C.

The embodiment of the present application does not limit the specific content of the parameter-scale expression, and the operand appearing in the expression may be a constant positive integer, a parameter positive integer in the corresponding heterogeneous API prototype declaration file, a member variable positive integer of the parameter, or a member function whose return value is a positive integer in general.

The compiler 100 is capable of generating a signature of the heterogeneous API, where in this embodiment, the signature of the heterogeneous API includes an identifier of the heterogeneous API (for uniquely identifying the heterogeneous API), and may further include related contents of a prototype declaration of the heterogeneous API, such as the number of parameters required to call the heterogeneous API, the type of each parameter, and the size of the parameter, the type of a return value of the heterogeneous API (i.e., a result value generated after calling the heterogeneous API), and the number of bytes occupied by the return value of the heterogeneous API in the host CPU. The signature of the heterogeneous API may also indicate a parameter size of the heterogeneous API.

The method for indicating the parameter size of the heterogeneous API in the signature of the heterogeneous API is similar to the method for indicating the parameter size of the heterogeneous API in the prototype declaration file of the heterogeneous API, and specific reference may be made to the foregoing contents, and no further description is given here. The signature of the heterogeneous API may indicate the parameter size of the heterogeneous API in the same manner as in the prototype declaration file of the heterogeneous API, or may also indicate the parameter size of the heterogeneous API in a different manner, which is not limited in the embodiment of the present application.

The heterogeneous API signature set may be pre-deployed in the heterogeneous system 200, such as pre-loaded in the host CPU. In addition to the heterogeneous API signature set, in the embodiment of the present application, the heterogeneous API function libraries of the processing engines 210 in the heterogeneous system 200 may be pre-deployed in the heterogeneous system 200, for example, pre-loaded in the host CPU.

The heterogeneous API function library of any processing engine 210 includes program instructions required by that processing engine 210 to invoke the heterogeneous API. The types of program instructions used by different processing engines 210 may vary. The program instructions in the heterogeneous API function library of any processing engine 210 are program instructions that convert a source program that includes a heterogeneous API into a program that the processing engine 210 can directly call. The heterogeneous API function libraries of different types of processing engines 210 may be different.

The source program including the heterogeneous API is the original code of the heterogeneous API, is the most original program instruction when the heterogeneous API is written, and generally cannot be directly called by the processor engine.

If the heterogeneous system 200 does not pre-deploy the heterogeneous API function libraries of the processing engines 210, the heterogeneous system 200 may have a dynamic compiling function, for example, a host CPU has a dynamic compiling function. The host CPU may compile the intermediate representation of one or more heterogeneous APIs obtained from compiler 100 into the program instructions needed by the different processing engines 210 to call the heterogeneous APIs.

Wherein the intermediate representation of the heterogeneous API is compiled by the compiler 100 based on a heterogeneous API source program, and is capable of being recognized by different heterogeneous systems 200 (e.g., host CPUs in the heterogeneous systems 200).

In the embodiment of the present application, the code generated by the compiler 100 triggers the host CPU in the heterogeneous system 200 to call the heterogeneous API, for example, a statement in the code generated by the compiler 100 calling the heterogeneous API is changed to call the host CPU, and the host CPU calls the heterogeneous API to determine a processing engine executing the API in the heterogeneous system. The heterogeneous API may be referred to as a target heterogeneous API.

After determining that the target heterogeneous API needs to be called, the host CPU in the heterogeneous system 200 may select a processing engine 210 (the processing engine 210 may also be referred to as a target processing engine) corresponding to the target heterogeneous API from the processing engines 210 in the heterogeneous system 200 to call the target heterogeneous API.

In the embodiment of the present application, the host CPU in the heterogeneous system 200 may decide to call the target processing engine of the target heterogeneous API, without human involvement, and may improve the efficiency of calling the heterogeneous API in the heterogeneous system 200.

Two possible deployment scenarios of the system in a practical scenario are listed below.

Fig. 1B is a schematic structural diagram of another system to which the embodiment of the present invention is applicable, in which the compiler 100 may run on a computing node deployed in a development machine, a heterogeneous system 200, and the heterogeneous system 200 includes a plurality of processing engines 210, such as CPUs, GPUs, DSPs, and the like. Wherein, the CPU is a host CPU.

Compiler 100 is capable of compiling a source program including a heterogeneous API to generate program instructions, identified herein as "application executable code", that can be executed in a host CPU, and deploying the application executable code in heterogeneous system 200, where the host CPU in the heterogeneous system can determine, according to the application executable code, a processing engine (i.e., a target processing engine) that calls the heterogeneous API.

It should be noted that the host CPU may include a module for determining an operation of a target processing engine that calls the heterogeneous API, for example, the module may be a Heterogeneous Runtime (HRT). The heterogeneous runtime may be a module in the host CPU that is responsible for executing a heterogeneous program (the heterogeneous program is a program executed by one or more processing engines in the multiple processing engines in the heterogeneous system, such as a heterogeneous API, etc.), and implementing heterogeneous program scheduling (e.g., determining a processing engine that executes the heterogeneous program, sending the heterogeneous program to a corresponding processing engine to trigger the execution of the heterogeneous program, etc.). The name of the module is not limited herein, and "heterogeneous runtime" is merely an example.

Fig. 1C is a schematic structural diagram of another system to which the embodiment of the present invention is applicable, in which the compiler 100 may run on a computing node deployed in a development machine, a heterogeneous system 200, and the heterogeneous system 200 includes a plurality of processing engines 210, such as CPUs, GPUs, DSPs, and the like. Wherein, the CPU is a host CPU.

The compiler 100 may compile a source program including a heterogeneous API, generate an intermediate representation of the heterogeneous API, deploy (or store) the intermediate representation of the heterogeneous API in the heterogeneous system 200 (e.g., host CPU), and when determining that the heterogeneous API needs to be called, the host CPU in the heterogeneous system 200 may first determine a processing engine (i.e., a target processing engine) that calls the heterogeneous API, and then compile the intermediate representation of the heterogeneous API into program instructions that are required by the target processing engine to call the heterogeneous API.

The function of the host CPU in the system shown in fig. 1B is distinguished, and the host CPU (or the heterogeneous runtime in the host CPU) in the system shown in fig. 1C can execute a heterogeneous program and implement heterogeneous program scheduling, and also has a dynamic compiling function, and can compile the intermediate representation of one heterogeneous API into a program instruction required by a different processing engine to call the heterogeneous API in real time.

Fig. 1D is a schematic structural diagram of another system applicable to the embodiment of the present application, where the system includes a compiler 100 and a heterogeneous cluster 20. The heterogeneous cluster 20 includes a plurality of heterogeneous systems 200, and the structure of the heterogeneous system 200 may specifically refer to the structure of the heterogeneous system 200 shown in fig. 1, which is not described herein again.

The heterogeneous cluster 20 includes one heterogeneous system 200 or the processing engine 210 in one heterogeneous system 200 can be used as a scheduler for performing data interaction with the processing engines 210 in the rest of the heterogeneous systems 200 (and the rest of the processing engines 210 in the heterogeneous system 200), and assisting the processing engines 210 in the rest of the heterogeneous systems 200 (and the rest of the processing engines 210 in the heterogeneous system 200) in performing data processing.

The functions of the compiler 100 can be referred to the above description, and are not described herein. The operations executed by the heterogeneous system 200 or the processing engine 210 in the heterogeneous system 200 as the scheduler may refer to the operations executed by the host CPU shown in fig. 1B, which may specifically refer to the foregoing contents and are not described herein again.

Taking the systems shown in fig. 1A to 1C as an example, the heterogeneous API calling method provided in the present embodiment is described with reference to fig. 2, and when the processing engine 210 as the dispatcher in the heterogeneous system 200 is another processing engine 210, the method in the present embodiment is also applicable, and is different in the execution subject. The method comprises the following steps:

step 201: the host CPU determines the target heterogeneous API that needs to be called.

In this embodiment, the manner in which the host CPU determines the target heterogeneous API that needs to be called is not limited, for example, the compiler 100 may transfer the call right of the target heterogeneous API to the host CPU. Specifically, the compiler 100 may change a call statement of the target heterogeneous API, change the caller of the target heterogeneous API to host CPU (the call statement may be invokeHRT (…)), and the compiler 100 may change the caller of the target heterogeneous API to one module in the host CPU, for example, the module may be in heterogeneous operation or may be another module.

Optionally, the host CPU may further determine information of a target heterogeneous API, which may be notified to the host CPU by the compiler 100. The information of the target heterogeneous API may include an identification of the target heterogeneous API, and may further include a cache address of a parameter required to call the target heterogeneous API and a storage address of a return value of the heterogeneous API.

If the heterogeneous API has no return value, the storage address of the return value of the heterogeneous API can be represented by a null address.

Step 202: the host CPU selects a target processing engine from the plurality of processing engines 210 to call a target heterogeneous API based on the heterogeneous API call information.

The heterogeneous API call information can indicate the efficiency of the processing engines 210 in the heterogeneous system to respectively call the target heterogeneous API, and there are various ways to characterize the efficiency of calling the target heterogeneous API.

For example, the heterogeneous API call information may characterize the efficiency of the processing engine 210 for the target heterogeneous API by a call time (which may be a relative time or an absolute time).

For another example, the heterogeneous API call information may also characterize the efficiency of the processing engine 210 for the target heterogeneous API through the sequence numbers of the plurality of processing engines 210. The sequence number of the processing engine 210 may be determined according to the time when the plurality of processing engines 210 call the target heterogeneous API, may be determined according to the preset call sequence of the plurality of processing engines 210, or may be determined according to the performance of the plurality of processing engines 210, and the setting manner of the sequence number of the processing engine 210 is not limited herein.

As another example, the heterogeneous API call information may also characterize the efficiency of the processing engine 210 for the target heterogeneous API by call speed.

All the ways of representing the efficiency of the plurality of processing engines 210 for respectively calling the target heterogeneous API are applicable to the embodiments of the present application, and the ways of representing the efficiency of the processing engines for calling the target heterogeneous API are not limited in the present application.

It should be noted that, in the embodiment of the present application, the efficiency that the heterogeneous API call information can indicate all processing engines 210 in the heterogeneous system to call the target heterogeneous API is taken as an example for description. In some application scenarios, the heterogeneous API call information may also indicate the efficiency of only a portion of the processing engines 210 in the plurality of processing engines 210 in the heterogeneous system calling the target heterogeneous API.

In the embodiment of the present application, the efficiency of the target heterogeneous API is described by taking the heterogeneous API call information as an example through the call time characterization processing engine 210. Other ways of characterizing the efficiency can be referred to in this manner, except that in this embodiment of the present application, the heterogeneous API call information uses a time value to characterize the efficiency of the processing engine 210 for the target heterogeneous API, and when other ways of characterizing the efficiency are used, the time value may be replaced with other parameters of characterizing the efficiency.

The heterogeneous API call information includes a time required for each processing engine 210 of the plurality of processing engines 210 to respectively call one or more heterogeneous APIs. The time required for the processing engine 210 to call a heterogeneous API may also be understood as the time for the processing engine 210 to execute the heterogeneous API. The one or more heterogeneous APIs include a target heterogeneous API.

The host CPU can select, based on the heterogeneous API call information, a processing engine 210 that requires the shortest time to call the target heterogeneous API from among the plurality of processing engines 210, and a currently available processing engine 210 as a target processing engine. The host CPU may also determine, based on the heterogeneous API call information, a target processing engine from one or more processing engines 210 of the plurality of processing engines 210 that the time required to call the target heterogeneous API is less than a time threshold and that the processing engine 210 is currently available. The currently available processing engine 210 means that the processing engine 210 is currently in an idle state and is capable of executing the target heterogeneous API.

The heterogeneous API call information may be pre-configured in the heterogeneous system 200, and the time required for each processing engine 210 included in the heterogeneous API call information to call one or more heterogeneous APIs may be determined according to an empirical value, or may be determined by means of test statistics, or the like. In the embodiment of the present application, specific information included in the heterogeneous API call information is not limited, and information capable of indicating a time required for each processing engine 210 of the plurality of processing engines 210 to call one or more heterogeneous APIs may be used as the heterogeneous API call information.

The specific information included in the heterogeneous API calling information is different, the modes of selecting the target processing engine by the host CPU are different, and the following modes are listed:

the heterogeneous API call information includes an identification of the heterogeneous API and a time required for each processing engine 210 of the plurality of processing engines 210 to call one or more of the heterogeneous APIs, respectively.

The information included in the heterogeneous API call information may be found in table 1:

TABLE 1

As can be seen from table 1, the heterogeneous API call information includes the time required by each processing engine 210 to call M heterogeneous APIs, and taking the processing engine 1 as an example, the heterogeneous API call information includes the time required by the processing engine 1 to call M heterogeneous APIs. The required times were T11, T12, … …, T1M, respectively.

The heterogeneous APIs may be of different types and the processing engine 210 may also require different time to call the heterogeneous APIs, for example, for a heterogeneous API for implementing convolution operations, the time for the GPU to call the heterogeneous API is shorter and the time for the CPU to call the heterogeneous API is longer. For the heterogeneous API for realizing the logic operation, the GPU takes longer to call the heterogeneous API, and the CPU takes shorter to call the heterogeneous API.

As can be seen from table 1, the heterogeneous API call information may also include a parameter size of each heterogeneous API. In this way, the parameter scale of each heterogeneous API may be only for reference and not used as a basis for selecting the target processing engine, for example, the parameter scales of the heterogeneous APIs in table 1 may all be set to the same value, that is, the time difference that the processing engine 210 calls the same heterogeneous API under different parameter scales is not considered.

Based on the heterogeneous API call information shown in table 1, the host CPU may determine, according to the identifier of the target heterogeneous API, time information corresponding to the target heterogeneous API in the heterogeneous API call information, for example, may locate a row in table 1 where the identifier of the target heterogeneous API is located.

After determining the time information corresponding to the target heterogeneous API in the heterogeneous API call information, the host CPU selects a target processing engine according to the time information, and may select, for example, the processing engine 210 with the shortest time required to call the target heterogeneous API as the target processing engine. For example, if the target heterogeneous API is used to implement convolution operations, the host CPU may select, as the target processing engine, a GPU that requires less time to call the target heterogeneous API. If the target heterogeneous API is used for realizing logic operation, the host CPU can select the CPU with shorter time required for calling the target heterogeneous API as the target processing engine.

The host CPU may also select the target processing engine based on other selection policies (e.g., a load balancing policy that ensures that the times or times of the heterogeneous API call operations performed by the processing engines 210 are consistent).

And (ii) the heterogeneous API call information includes an identifier of the heterogeneous API, and a time required for each processing engine 210 in the plurality of processing engines 210 to call one or more heterogeneous APIs, where the time required for each processing engine 210 to call any one of the heterogeneous APIs includes a time required for each processing engine 210 to call one or more parameter scales (the parameter scale may also be understood as a candidate parameter scale) of the heterogeneous API, and the parameter scale may also be understood as a candidate parameter scale, where the number of the parameter scales is not limited, and may be one or more, and the parameter scale includes a target parameter scale of the target heterogeneous API.

The information included in the heterogeneous API call information may be found in table 2:

TABLE 2

As can be seen from table 2, the heterogeneous API call information includes the time required by each processing engine 210 to call M heterogeneous APIs, for example, taking the processing engine 1 as an example, the heterogeneous API call information includes the time required by the processing engine 1 to call M heterogeneous APIs, and the time required by the processing engine 1 to call any one of the heterogeneous APIs includes the time required to call one or more of the heterogeneous APIs with the parameter size. The required time is T11₁、T12₂、……、T1M_S。

As can be seen from table 2, the heterogeneous API call information is labeled with one or more parameter sizes of each heterogeneous API. In this way, the parameter size of each heterogeneous API may be used as a basis for selecting a target processing engine, and the time shown in table 2 is determined by combining the time difference of the processing engine 210 calling the same heterogeneous API under different parameter sizes.

In table 2, the parameter scale corresponding to one heterogeneous API may be all possible values or part of possible values of the parameter scale of the heterogeneous API when the heterogeneous API is actually called, for example, S values may exist in the parameter scale corresponding to the heterogeneous API in table 2. The time for the processing engine 2101 to call the heterogeneous API for each value is noted in table 2. In table 2, all possible values of the parameter size of each heterogeneous API are S, and in fact, the number of possible values of the parameter size of each heterogeneous API may also be different.

Based on the heterogeneous API call information shown in table 2, the host CPU may obtain the signature of the target heterogeneous API according to the identifier of the target heterogeneous API, determine the target parameter scale of the target heterogeneous API from the signature of the target heterogeneous API, and determine the time information corresponding to the target heterogeneous API in the heterogeneous API call information according to the identifier of the target heterogeneous API and the target parameter scale, for example, may locate the row in table 2 where the identifier of the target heterogeneous API and the target parameter scale are located. After determining the time information corresponding to the target heterogeneous API in the heterogeneous API call information, the method for selecting the target processing engine according to the time information by the host CPU may refer to the relevant description in the method (1), and details are not described here.

When the host CPU obtains the signature of the target heterogeneous API, the signature of the target heterogeneous API may be determined from a pre-configured heterogeneous API signature set according to the identifier of the target heterogeneous API.

In table 2, the parameter scale corresponding to one heterogeneous API may also be a range of values corresponding to the parameter scale, where all possible values of the parameter scale when the heterogeneous API is actually called are classified into S classes, and each class corresponds to one value range of the parameter scale. For example, in table 2, there may be values of the parameter scale corresponding to the heterogeneous API, which may be classified as S. In table 2, the time for the processing engine 2101 to call the heterogeneous API is indicated in the value range of each parameter scale. In table 2, all possible values of the parameter scale of each heterogeneous API may be divided into S classes, for example, in fact, the possible values of the parameter scale of each heterogeneous API may be divided into different numbers of classes according to specific scenarios.

Based on the heterogeneous API call information shown in table 2, the host CPU may obtain the signature of the target heterogeneous API according to the identifier of the target heterogeneous API, determine the target parameter scale of the target heterogeneous API from the signature of the target heterogeneous API, and determine the time information corresponding to the target heterogeneous API in the heterogeneous API call information according to the identifier of the target heterogeneous API and the target parameter scale, for example, may locate the row in table 2 where the identifier of the target heterogeneous API and the class to which the target parameter scale belongs. After determining the time information corresponding to the target heterogeneous API in the heterogeneous API call information, the method for selecting the target processing engine according to the time information by the host CPU may refer to the relevant description in the method (1), and details are not described here.

For a way of the host CPU obtaining the signature of the target heterogeneous API, reference may be made to the foregoing description, which is not described herein again.

And (iii) the heterogeneous API call information includes a time required for each processing engine 210 of the plurality of processing engines 210 to call the heterogeneous API, and the time required for each processing engine 210 to call the heterogeneous API is a time value.

The information included in the heterogeneous API call information may be found in table 3:

TABLE 3

As can be seen from table 3, the heterogeneous API call information includes the time required by each processing engine 210 to call a heterogeneous API, and taking processing engine 1 as an example, the time required by processing engine 1 to call any of the heterogeneous APIs is T1. T1 may be a mean or empirical value.

As can be seen from table 3, the heterogeneous API call information may not include the parameter size of each heterogeneous API and the identification of the heterogeneous API. In this way, the parameter size of each heterogeneous API and the identifier of the heterogeneous API are not used as the basis for selecting the target processing engine, that is, the time difference of the processing engine 210 calling the heterogeneous API under different heterogeneous APIs and different parameter sizes is not considered in each time shown in table 3.

Based on the heterogeneous API call information shown in table 3, the host CPU may select, as the target processing engine, the processing engine 210 that requires the shortest time for calling the heterogeneous API based on the heterogeneous API call information, or may select the target processing engine based on another selection policy and the heterogeneous API call information (e.g., a load balancing policy, where the load balancing policy is to ensure that the times or times of the heterogeneous API call operations performed by the processing engines 210 are consistent).

It should be noted that the time required for each processing engine 210 to call one or more heterogeneous APIs included in the heterogeneous API call information may be absolute time or relative time, for example, the relative time required for another processing engine 210 to call one or more heterogeneous APIs is determined based on the absolute time required for a certain processing engine 210 to call one heterogeneous API. The time required for each processing engine 210 to call one or more heterogeneous APIs is substantially used for characterizing the efficiency of the processing engine 210 to call the heterogeneous APIs, and all time values capable of characterizing the efficiency of the processing engine 210 to call the heterogeneous APIs can be used as the time required for the processing engine 210 to call the heterogeneous APIs, so as to construct heterogeneous API call information.

The host CPU, upon selecting the target processing engine, may perform step 203.

Optionally, when the host CPU selects the target processing engine, the state of the target processing engine may be marked as unavailable, so as to avoid that the target processing engine is subsequently selected to call other heterogeneous APIs, and the state of the target processing engine is marked as available by the host CPU until the target processing engine finishes calling the target heterogeneous API.

Step 203: the host CPU triggers the target processing engine to call the target heterogeneous API.

When executing step 203, the host CPU may execute the processing operation required by the target processing engine to call the target heterogeneous API, and may send the program instruction required by the target processing engine to call the target heterogeneous API to the target processing engine. The target processing engine can also be notified of the cache address of the parameter required for calling the heterogeneous API and the storage address of the return value of the heterogeneous API, so that the target processing engine can obtain the parameter required for calling the heterogeneous API from the cache address and store the return value of the target heterogeneous API in the corresponding storage address.

Before the host CPU can send a program instruction required by the target processing engine to call the target heterogeneous API to the target processing engine, the host CPU needs to determine the program instruction required by the target processing engine to call the target heterogeneous API, and the embodiment of the present application provides two ways of determining the program instruction required by the target processing engine to call the target heterogeneous API:

in the first method, the heterogeneous API function libraries of the processing engines 210 are pre-configured in the heterogeneous system 200.

As shown in fig. 3, a signature set of heterogeneous APIs and a heterogeneous API function library of each processing engine 210 are pre-configured in the heterogeneous system 200.

The host CPU may select program instructions from the target processing engine's heterogeneous API function library that are needed by the target processing engine to invoke the target heterogeneous API.

The heterogeneous API function library of the processing engine 210 comprises an identifier of a heterogeneous API and program instructions required by the processing engine 210 to call a target heterogeneous API, and the host CPU selects the program instructions required by the target processing engine to call the target heterogeneous API from the heterogeneous API function library of the target processing engine according to the identifier of the target heterogeneous API.

In the second method, the heterogeneous API function libraries of the processing engines 210 are not configured in the heterogeneous system 200, and the host CPU stores an intermediate representation of the target heterogeneous API in advance.

As shown in fig. 4, the compiler 100 may compile a source program including a target heterogeneous API in advance to generate an intermediate representation of the target heterogeneous API, and a host CPU in the heterogeneous system 200 may store the intermediate representation of the target heterogeneous API in advance, but the embodiment of the present application is not limited to the manner in which the host CPU in the heterogeneous system 200 may configure the intermediate representation of the target heterogeneous API in advance, for example, the host CPU in the heterogeneous system 200 stores intermediate representations of one or more heterogeneous APIs in advance, where the intermediate representations of the one or more heterogeneous APIs include the intermediate representation of the target heterogeneous API. As another example, the host CPU in the heterogeneous system 200 configures one or more intermediate representations of heterogeneous APIs under the trigger of the user, wherein the one or more intermediate representations of heterogeneous APIs include the intermediate representation of the target heterogeneous API.

When determining that the target heterogeneous API needs to be called, the host CPU may compile the intermediate representation of the target heterogeneous API into a program instruction required by the target processing engine to call the target heterogeneous API, and send the program instruction to the target processing engine.

In this manner, the host CPU (which may also be understood as a heterogeneous runtime) has a dynamic compiling function, and in the embodiment of the present application, the heterogeneous API call information may further indicate a storage address of the intermediate representation of the target heterogeneous API, taking the first characterization manner of the heterogeneous API call information in the foregoing description as an example.

Referring to table 4, the heterogeneous API call information further includes a storage address of the intermediate representation of the one or more heterogeneous APIs.

TABLE 4

As can be seen from table 4, the heterogeneous API call information includes the time required for each processing engine 210 to call M heterogeneous APIs, respectively, and the storage addresses of the intermediate representations of the M heterogeneous APIs. The memory addresses represented in the middle of the M heterogeneous APIs are memory address 1, memory address 2, … …, and memory address M, respectively.

After determining that the target processing engine of the target heterogeneous API needs to be called, the host CPU may obtain the storage address of the intermediate representation of the target heterogeneous API from the heterogeneous API call information, and then obtain the intermediate representation of the target heterogeneous API according to the storage address of the intermediate representation of the target heterogeneous API.

It should be noted that the manner of indicating the storage address of the intermediate representation of the heterogeneous API in the heterogeneous API call information is merely an example, and the embodiment of the present application does not limit the manner of indicating the storage address of the intermediate representation of the heterogeneous API in the heterogeneous API call information.

Generally, for the same heterogeneous API, program instructions for the same processing engine calling the heterogeneous API with different parameter sizes may also be different. This is because the host CPU may adjust the intermediate representation of the heterogeneous API according to the parameter scale when compiling the intermediate representation of the heterogeneous API, and perform different memory layouts and loop optimizations on the program instructions, so that there may be a difference in the program instructions of the heterogeneous API that the same processing engine calls with different parameter scales.

In this embodiment of the present application, after compiling the intermediate representation of the target heterogeneous API into the program instructions required by the target processing engine to call the target heterogeneous API, the host CPU may further cache the program instructions required by the target processing engine to call the target heterogeneous API. So that when the subsequent host CPU determines that the target processing engine needs to call the target heterogeneous API again, the host CPU can directly acquire the cached program instruction required by the target processing engine to call the target heterogeneous API.

The host CPU may store the program instruction required by the processing engine to call the heterogeneous API after compiling the intermediate representation of the heterogeneous API into a program instruction required by a processing engine to call the heterogeneous API each time, and optionally, may further identify an identifier and a parameter size of the heterogeneous API when the host CPU stores the program instruction required by the processing engine to call the heterogeneous API.

Alternatively, the host CPU may store the compiled program instructions required by the different processing engines 210 to call the heterogeneous API in the heterogeneous API call information. That is, the heterogeneous API call information may indicate cache addresses of program instructions required by the plurality of processing engines to call the heterogeneous API including the target heterogeneous API. For different parameter scales. The cache addresses of the program instructions required by each processing engine to invoke the heterogeneous API include the cache addresses of the program instructions required by the processing engine to invoke the heterogeneous API of a different parameter size.

Taking the second characterization manner of the heterogeneous API call information in the foregoing description as an example, referring to table 5, the heterogeneous API call information further includes cache addresses of program instructions required by the processing engines to call the heterogeneous API.

TABLE 5

As can be seen from table 5, the heterogeneous API call information includes a time required by each processing engine 210 to call M heterogeneous APIs, and cache addresses of program instructions required by the processing engines to call M heterogeneous APIs, for example, the processing engine 1 includes a time required by the processing engine 1 to call M heterogeneous APIs and cache addresses of program instructions required by the processing engines to call M heterogeneous APIs, and the time required by the processing engine 1 to call any one of the heterogeneous APIs includes a time required to call the heterogeneous APIs with different parameter sizes. The required time is T11₁、T12₂、……、T1M_S. The cache addresses of program instructions required by processing engine 1 to invoke a heterogeneous API include the cache addresses of program instructions required by processing engine 1 to invoke the heterogeneous API at different parameter sizes. The cache addresses are cache address 11, cache addresses 12, … …, and cache address 13, respectively.

In the above description, the example in which the heterogeneous API call information includes the cache address of the program instruction required by each processing engine to call each heterogeneous API is described. In practical applications, the heterogeneous API call information may include only cache addresses of program instructions required by a part of the processing engine to call one or more heterogeneous APIs, that is, the heterogeneous API call information may include only cache addresses of program instructions required by a part of the processing engine to call multiple heterogeneous APIs, or may include cache addresses of program instructions required by a part of the processing engine to call a part of the heterogeneous APIs. In addition, information in the heterogeneous API call information (e.g., time required for each processing engine 210 to call a different heterogeneous API, cache addresses of program instructions required for each processing engine 210 to call a different heterogeneous API, etc.) may be updated in real time.

When the heterogeneous API calling information comprises cache addresses of program instructions required by each processing engine to call a plurality of heterogeneous APIs, after determining that a target processing engine of the target heterogeneous API needs to be called, the host CPU determines whether the cache addresses of the program instructions required by the target processing engine to call the target heterogeneous API exist in the heterogeneous API calling information, namely whether the cache addresses of the program instructions required by the target processing engine to call the target heterogeneous API with the target parameter scale exist. If the intermediate representation of the target heterogeneous API does not exist, the intermediate representation of the target heterogeneous API is compiled into the program instruction required by the target processing engine to call the target heterogeneous API, the program instruction required by the target processing engine to call the target heterogeneous API can also be cached, and the cache address of the program instruction required by the target processing engine to call the target heterogeneous API is stored in the heterogeneous API call information.

Based on the same inventive concept as the method embodiment, an embodiment of the present application further provides an API calling apparatus, configured to execute the method executed by the host CPU in the method embodiment, where relevant features may refer to the method embodiment, and are not described herein again, as shown in fig. 5, the heterogeneous API calling apparatus 500 includes a determining unit 501, a selecting unit 502, and optionally, may further include an instruction determining unit 503 and a sending unit 504;

a determining unit 501, configured to determine a target API that needs to be called.

A selecting unit 502, configured to select, based on API call information, the first processing engine to call the target API, where the API call information is used to indicate efficiencies of the first processing engine and the second processing engine to call the target heterogeneous API, respectively.

The API calling apparatus 500 may be configured to execute the method executed by the host CPU shown in fig. 2, wherein the determining unit 501 may execute step 201 in the embodiment shown in fig. 2; the selection unit 502 may perform step 202 in the embodiment shown in fig. 2; the instruction determining unit 503 may perform a method of determining a program instruction required by the target processing engine to call the target heterogeneous API in step 203 in the embodiment shown in fig. 2, and the sending unit 504 may perform a method of sending the program instruction required by the target processing engine to call the target heterogeneous API in step 203 in the embodiment shown in fig. 2.

It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. The functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In a simple embodiment, those skilled in the art can recognize that the API calling apparatus 500 in the above embodiments can all adopt the form shown in fig. 6.

The computing device 600, as shown in fig. 6, includes at least one processor 610, a memory 620. Optionally, a communication interface 630 may also be included.

Memory 620 may be a volatile memory, such as a random access memory; the memory may also be a non-volatile memory such as, but not limited to, a read-only memory, a flash memory, a Hard Disk Drive (HDD) or solid-state drive (SSD), or the memory 620 may be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Memory 620 may be a combination of the above.

The specific connection medium between the processor 610 and the memory 620 is not limited in the embodiments of the present application. Processor 610 may be a processing engine, such as a CPU, in heterogeneous system 200.

In a computing device such as that of fig. 6, a communication interface 630 is also included, and the processor 610 may transmit data through the communication interface 630 when communicating with other devices, such as the compiler 100 or other processing engine 210.

When the API calling apparatus 500 takes the form shown in fig. 6, the processor 610 in fig. 6 may execute the instructions by calling a computer stored in the memory 620, so that the device 600 may execute the method executed by the host CPU in any of the above-described method embodiments; the apparatus 600 may perform the method of the embodiment of the method shown in FIG. 2, which is performed by the host CPU in steps 201-203.

Specifically, the functions/implementation processes of the determining unit 501, the selecting unit 502, the instruction determining unit 503 and the sending unit 504 in fig. 5 can be implemented by the processor 610 in fig. 6 calling a computer executing instruction stored in the memory 620. Alternatively, the functions/implementation procedures of the determining unit 501, the selecting unit 502 and the instruction determining unit 503 in fig. 5 may be implemented by the processor 610 in fig. 6 calling a computer executing instruction stored in the memory 620, and the functions/implementation procedures of the sending unit 504 in fig. 6 may be implemented by the communication interface 630 in fig. 6.

When the API calling apparatus 500 takes the form shown in fig. 6, the processor 610 in fig. 6 may execute the instructions by calling a computer stored in the memory 620, so that the computing apparatus 600 may execute the method executed by the host CPU in any of the above-described method embodiments; such as the computing device 600 may perform the method performed in steps 201-203 of the method embodiment shown in fig. 2.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims

1. A heterogeneous Application Program Interface (API) calling method is applied to a heterogeneous system, the heterogeneous system comprises a first processing engine and a second processing engine, and the method comprises the following steps:

determining a target API needing to be called;

and selecting the first processing engine to call the target API based on API call information, wherein the API call information is used for indicating the efficiency of calling the target API by the first processing engine and the second processing engine respectively.

2. The method of claim 1, wherein said selecting the first processing engine for calling the target API based on API call information comprises:

acquiring a signature of the target API, wherein the signature of the target API is used for indicating the target parameter scale of the target API;

selecting the first processing engine according to the API call information and a target parameter size of the target API, wherein the API call information indicates efficiency of the first processing engine and the second processing engine to call the target API of a candidate parameter size, and the candidate parameter size comprises the target parameter size.

3. The method of claim 2, wherein the obtaining the signature of the target API comprises:

determining an identity of the target API;

and determining the signature of the target API from a pre-configured API signature set according to the identifier of the target API.

4. The method of any one of claims 1-3, further comprising:

acquiring program instructions required by the first processing engine to call the target API from a preconfigured API function library of the first processing engine, wherein the API function library of the first processing engine comprises the program instructions required by the first processing engine to call one or more APIs, and the one or more APIs comprise the target API;

and sending program instructions required by the first processing engine to call the target API to the first processing engine.

5. The method of any one of claims 1-3, further comprising:

acquiring a prestored intermediate representation of the target API;

compiling the intermediate representation of the target API into program instructions required by the first processing engine to invoke the target API;

6. The method of claim 5, wherein the API call information is further for indicating a storage address of an intermediate representation of the target API, and wherein the retrieving a pre-stored intermediate representation of the target API comprises:

and acquiring the intermediate representation of the target API according to the storage address of the intermediate representation of the target API.

7. The method of any of claims 1 to 6, wherein the API call information is stored in a tabular form.

8. The method of any of claims 1 to 3, wherein the API call information is further used to indicate cache addresses of program instructions required by the first processing engine and the second processing engine to respectively call the target API, and the method further comprises:

acquiring a program instruction required by the first processing engine to call the target API according to a cache address of the program instruction required by the first processing engine to call the target API in the API call information;

sending the program instructions to the first processing engine.

9. The method of claim 8, wherein the retrieving the program instructions required by the first processing engine to call the target API according to the cache address of the program instructions required by the first processing engine to call the target API in the API call information comprises:

and acquiring the program instruction required by the first processing engine to call the target API with the target parameter scale according to the cache address of the program instruction required by the first processing engine to call the target API and the target parameter scale of the target API.

10. An API calling apparatus for selecting a processing engine that calls a target API from a heterogeneous system, the heterogeneous system including a first processing engine and a second processing engine, the apparatus comprising:

the determining unit is used for determining a target API needing to be called;

and the selecting unit is used for selecting the first processing engine to call the target API based on API calling information, wherein the API calling information is used for indicating the efficiency of calling the target API by the first processing engine and the second processing engine respectively.

11. The apparatus according to claim 10, wherein the selecting unit, when selecting the first processing engine to call the target API based on API call information, is specifically configured to:

12. The apparatus of claim 11, wherein the selecting unit, when obtaining the signature of the target API, is specifically configured to:

determining an identity of the target API;

and determining the signature of the target API from a pre-configured API signature set according to the identifier of the target API, wherein the target API signature comprises the identifier of the target API.

13. The apparatus according to any of claims 10-12, wherein the apparatus further comprises an instruction determining unit and a transmitting unit:

the instruction determining unit is configured to obtain, from a preconfigured API function library of the first processing engine, a program instruction required by the first processing engine to call the target API, where the API function library of the first processing engine includes the program instruction required by the first processing engine to call the API, and the API includes the target API;

and the sending unit is used for sending a program instruction required by the first processing engine to call the target API to the first processing engine.

14. The apparatus of any of claims 10-12, wherein the method further comprises the instruction determining unit and the transmitting unit to:

the instruction determining unit is used for acquiring a prestored intermediate representation of the target API; compiling the intermediate representation of the target API into program instructions required by the first processing engine to call the target API;

15. The apparatus of claim 14, wherein the API call information is further configured to indicate a storage address of the intermediate representation of the target API, and the instruction determining unit, when obtaining the pre-stored intermediate representation of the target API, is specifically configured to:

16. The apparatus of any one of claims 10 to 15, wherein the API call information is stored in a table form.

17. The apparatus of any one of claims 10 to 12, wherein the API call information indicates cache addresses of program instructions required by the first processing engine and the second processing engine to call the target API, respectively, the method further comprising an instruction determination unit and a transmission unit:

the instruction determining unit is configured to obtain, according to the cache address of the program instruction required by the first processing engine to invoke the target API in the API invocation information, the program instruction required by the first processing engine to invoke the target API;

18. The apparatus of claim 17, wherein the instruction determination unit is specifically configured to:

19. A computing device, comprising a memory to store computer instructions and a processor; the processor invokes the memory-stored computer instructions to perform the method of any of claims 1 to 9.