CN115309407B

CN115309407B - Method and system capable of realizing calculation power abstraction

Info

Publication number: CN115309407B
Application number: CN202211243920.7A
Authority: CN
Inventors: 王晓云; 罗馨玥; 王升; 刘景磊
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2023-03-31
Anticipated expiration: 2042-10-12
Also published as: CN115309407A

Abstract

The application provides a method and a system capable of realizing computing power abstraction, which relate to the technical field of electrical digital data processing, wherein the method comprises the following steps: analyzing the development code, identifying a control code and an acceleration code, mapping the control code and the acceleration code with an operator interface respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain a native code; generating a flow code based on the native code; and performing resource matching based on the flow code, and performing hardware linkage according to a resource matching result to generate an executable file. The application provides a uniform development system of heterogeneous hardware of cross-manufacturer, which enables one set of codes of a developer to realize heterogeneous deployment by providing a uniform development environment and a loosely-coupled development framework, thereby solving the problems of development ecological isolation, difficult code migration and the like between hardware manufacturers and improving the resource utilization rate.

Description

Method and system capable of realizing calculation power abstraction

Technical Field

The application relates to the technical field of electric digital data processing, in particular to a method and a system capable of realizing computing power abstraction.

Background

In recent years, efforts have been taken as core productivity, exhibiting ubiquitous, heterogeneous characteristics. Industrial digital transformation has made higher demands on computational efficiency, so heterogeneous chips such as Graphics Processing Units (GPUs), field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs) have appeared, and computational resources have evolved from general computational power of traditional Central Processing Units (CPUs) to heterogeneous computation of a mixture of various hardware. Currently, most application development systems for heterogeneous computing are in a vertical-shaft mode. Generally, the hardware vendors use respective development ecology and do not communicate with each other. A single vendor's development system is shown in fig. 1, and is mainly divided into 3 layers:

an application development layer: code is developed based on business needs.

And a compiling layer: the front end of the compiler analyzes the codes to generate intermediate expression; inputting a rear end for optimization to form a machine language code which can be understood by hardware; an executable file is generated by the linker.

Computing resource layer: and providing a runtime library and executing the target program.

In the heterogeneous computational power scene in the prior art, the problems of development ecological isolation, difficult code migration, low resource utilization and the like of different hardware manufacturers generally exist.

Disclosure of Invention

At least one embodiment of the present application provides a method and a system capable of implementing computational power abstraction, which are used to solve at least one of the problems of development ecological isolation, difficult code migration, low resource utilization, and the like between different hardware vendors in heterogeneous computational power scenarios in the prior art.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a method for implementing computation power abstraction, including:

analyzing the development code, identifying a control code and an acceleration code, mapping the control code and the acceleration code with an operator interface respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain a native code;

generating a flow code based on the native code;

and performing resource matching based on the flow code, and performing hardware linkage according to a resource matching result to generate an executable file.

Optionally, the operator interface is used for defining a unified expression of computation type operation;

the equipment management interface is used for registering and managing hardware resources, and acquiring, initializing and configuring hardware information;

the kernel scheduling interface is used for uniformly describing the scheduling type, the scheduling characteristic and the attribute of the calculation;

the memory management interface is used for standardizing memory allocation release, copy migration and priority description.

Optionally, the control code includes a parameter configuration code and/or an execution flow code in the development code;

the acceleration code includes a portion of code in the development code that is capable of being processed by heterogeneous hardware.

Optionally, at least one of the device management interface, the kernel scheduling interface, and the memory management interface is configured, which includes at least one of the following:

uniformly expressing and mapping memory usage and address configuration in the control code and a memory management interface;

applying and using the equipment resources in the control codes and carrying out unified expression and mapping on the equipment resources and the equipment management interface;

and uniformly expressing and mapping the process management and the time sequence control in the control code and the kernel scheduling interface.

Optionally, generating the stream code based on the native code includes:

optimizing execution efficiency of the native code to obtain an optimized control code and an optimized acceleration code; generating execution priorities of different heterogeneous hardware for executing the acceleration codes according to the predicted execution efficiency and performance analysis results of the acceleration codes on different hardware;

and merging the optimized control codes, the optimized acceleration codes and the execution priority into a flow code which can be uniformly dispatched and distributed.

Optionally, performing resource matching based on the flow code includes:

and respectively matching the optimized control code and the optimized acceleration code with the registered hardware resources in the computation abstract runtime library, and finishing related hardware registration and resource matching according to the matching result to obtain a resource matching result, wherein when the acceleration code is subjected to resource matching, available hardware with the highest priority is selected from the computation abstract runtime library according to the execution priority to be subjected to registration and resource matching.

Optionally, the method further includes:

and receiving hardware registration information, and updating hardware resources in the computing power abstract runtime library.

In a second aspect, an embodiment of the present application provides a system capable of implementing computation power abstraction, including:

the computing abstract parser is used for parsing the development code, identifying a control code and an acceleration code, mapping the control code and the acceleration code with an operator interface respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain a native code;

a computational abstraction translator for generating a translated code based on the native code;

and the computation abstract runtime library is used for performing resource matching based on the flow codes, performing hardware linkage according to a resource matching result, and generating an executable file.

Optionally, the computation power abstract parser is further configured to perform at least one of the following processes:

uniformly expressing and mapping memory usage and address configuration in the control codes and a memory management interface;

and carrying out unified expression mapping on the process management and the time sequence control in the control code and the kernel scheduling interface.

Optionally, the method further includes:

the computing abstract code generator is used for optimizing the execution efficiency of the native code to obtain an optimized control code and an optimized acceleration code; generating execution priorities of different heterogeneous hardware for executing the acceleration codes according to the predicted execution efficiency and performance analysis results of the acceleration codes on different hardware;

the computational power abstraction translator is further used for combining the optimized control codes, the optimized acceleration codes and the execution priority into a translation code which can be uniformly dispatched and distributed.

Optionally, the computation power abstract runtime library is further configured to match the optimized control code and the optimized acceleration code with the hardware resources registered in the computation power abstract runtime library, and complete related hardware registration and resource matching according to a matching result to obtain a resource matching result, where when performing resource matching on the acceleration code, according to the execution priority, an available hardware with a highest priority is selected from the computation power abstract runtime library for registration and resource matching.

Optionally, the computation power abstract runtime library is further configured to receive hardware registration information, and update hardware resources in the computation power abstract runtime library.

Compared with the prior art, the method and the system capable of realizing computing power abstraction provided by the embodiment of the application provide a uniform development system of heterogeneous hardware of cross-manufacturers, and a set of codes of a developer can realize heterogeneous deployment through a uniform development environment and a loosely-coupled development framework, so that the problems of development ecological isolation, code migration difficulty and the like among hardware manufacturers are solved, and the resource utilization rate is improved.

Drawings

FIG. 1 is a schematic diagram of a development system of a single vendor;

FIG. 2 is a schematic illustration of the prior art of developing ecological barriers by different vendors;

FIG. 3 is a block diagram illustrating an architecture of a system capable of implementing computational power abstraction according to an embodiment of the present application;

fig. 4 is a flowchart illustrating a method for implementing computation power abstraction according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. In the description and in the claims "and/or" means at least one of the connected objects.

In prior art heterogeneous development, different hardware vendors often have the following problems:

1. developing ecological isolation

As shown in FIG. 2, different hardware vendors provide customized development environments for their own hardware, and are ecologically isolated from each other. Chip architectures and software stacks of various manufacturers are fragmented, chimney-type technology stacks are presented, the difference of software and hardware technology stacks of different manufacturers is large, application codes and hardware are tightly coupled, updating and maintenance costs are high, and rapid adaptation and optimization are difficult when new hardware appears.

2. The migration difficulty is large

Due to the difference of hardware architecture and different basic development languages, the same function is realized in different chip architectures, and codes are generated by using different languages and based on different development environments. For example: the matrix multiplication function is realized, and the CPU, the FPGA and the GPU need to be realized by 3 languages and 3 sets of codes. When the service is migrated among different heterogeneous hardware, the code migration is difficult, the multiplexing is difficult, the development efficiency is low, for example, in C + +/Fortran-based data intensive application, the code amount is more in the order of millions of lines, and the migration workload on a heterogeneous platform is large.

3. Low resource utilization rate

The current application development mode limits the calculation power scheduling method and reduces the utilization rate of infrastructure layer calculation power resources. Because the application code is coupled with the hardware, when resource scheduling is carried out, a user needs to designate a specific computing resource, and for a computing provider, on one hand, in a planning and construction stage, the construction and planning of a computing resource pool are difficult to carry out in advance according to requirements, and investment waste is possibly caused; on the other hand, in the operation process, if the calculation power type applied by the user is not matched with the calculation power type provided by the resource pool, peripheral calculation power resources cannot be called, a calculation power island may be formed, and the use rate of the calculation power resources is reduced.

The core of the technical problem lies in that a uniform development system which spans manufacturers and heterogeneous hardware is lacked, a uniform development environment and a loosely-coupled development framework are provided, and a set of codes of a developer can realize heterogeneous deployment, so that the resource utilization rate is improved.

In order to solve at least one of the above problems, embodiments of the present application provide a method and a system capable of implementing computation power abstraction, which are applied to a heterogeneous computation power scenario, where the heterogeneous computation power scenario includes multiple different types of hardware, such as multiple types of hardware in a CPU, a GPU, an ASIC, and the like, and may specifically be hardware of different types and/or different device manufacturers. The system capable of realizing computing power abstraction provided by the embodiment of the application can also be called a computing power abstraction system, and the system is a uniform development system which is oriented to the computing power network field and is across manufacturers and heterogeneous hardware.

As shown in fig. 3, the system capable of implementing computational power abstraction provided in the embodiment of the present application includes a computational power abstraction parser, a computational power abstraction streamer, and a computational power abstraction runtime library. Wherein the content of the first and second substances,

the computing power abstract resolver is used for resolving development codes, identifying control codes and acceleration codes, mapping the control codes and the acceleration codes with operator interfaces respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain native codes.

The development code may be code generated by a developer selecting an arbitrary software and hardware stack for program development, the language of program development is not limited in the embodiment of the present application, the developer may select an arbitrary program development language, such as C, C + +, python, and the like, and the software and hardware stack may be an arbitrary development framework, such as tensrflow, pytorch, openGL, cuda, and the like. The control code generally comprises parameter configuration code and/or execution flow code in the development code, and the acceleration code generally comprises a code part which can be processed by heterogeneous hardware in the development code. The control code is generally code executed by a CPU, the acceleration code is code executed by heterogeneous hardware different from the CPU, the heterogeneous hardware may include AISC, FPGA, GPU, and the like, and the control code may be executed by the CPU when heterogeneous hardware resources are insufficient.

In the embodiment of the application, the operator interface is used for defining the unified expression of the calculation operation, so that the unified expression of the calculation operation such as a basic math library, image rendering identification, deep learning and the like is standardized. The device management interface is used for registering and managing hardware resources, and acquiring, initializing and configuring hardware information. The kernel scheduling interface is used for uniformly describing the scheduling type, the scheduling characteristics and the attributes of the calculation. The memory management interface is used for standardizing memory allocation release, copy migration and priority description.

The computational abstraction translator is configured to generate a translated code based on the native code.

And the computation power abstract runtime library is used for performing resource matching based on the flow codes, performing hardware linkage according to a resource matching result, and generating an executable file.

Through the units or the modules, the embodiment of the application provides a unified development environment and a loosely-coupled development framework, so that heterogeneous deployment of development codes can be realized, the problems of development ecological isolation, code migration difficulty and the like among hardware manufacturers are solved, and the resource utilization rate is improved.

In an embodiment of the present application, the computation power abstraction parser is further configured to perform at least one of the following processes:

Through the mapping processing, control codes obtained by development of different program development languages can be supported, and a loose coupling structure of a computer network resource layer is realized so as to support access of multi-manufacturer and heterogeneous hardware.

As shown in fig. 3, the above-described system of the embodiment of the present application may further include an algorithmic abstract code generator. The computing power abstract code generator is used for optimizing the execution efficiency of the native code to obtain an optimized control code and an optimized acceleration code; and generating the execution priority of the acceleration code executed by different heterogeneous hardware according to the predicted execution efficiency and performance analysis result of the acceleration code on different hardware. At this time, the computational power abstraction translator is further configured to merge the optimized control code, the optimized acceleration code, and the execution priority into a translation code that can be uniformly dispatched and allocated. In this way, the optimization processing can improve the code execution efficiency, and in addition, the execution priority can be generated according to the execution efficiency and the performance analysis result of the acceleration code on the heterogeneous hardware, so as to select the available hardware with the optimal execution efficiency and performance to execute the corresponding code.

For example, the computation power abstract runtime library is further configured to match the optimized control code and acceleration code with the hardware resources registered in the computation power abstract runtime library, and complete related hardware registration and resource matching according to the matching result to obtain a resource matching result, where when performing resource matching on the acceleration code, according to the execution priority, an available hardware with a highest priority is selected from the computation power abstract runtime library to perform registration and resource matching. Therefore, by selecting the available hardware with the highest priority for registration and resource matching, the execution efficiency and performance of the acceleration code can be improved.

In the embodiment of the application, the computation force abstract runtime library is further configured to receive hardware registration information and update hardware resources in the computation force abstract runtime library, so that access of multiple vendors and heterogeneous hardware can be supported.

In the system capable of implementing computational power abstraction shown in fig. 3, the system may specifically include three layers, namely, a computational application development layer, a computational power abstraction layer, and a computational resource layer. The web computing application development layer may generate corresponding development codes for development of various applications (such as AI applications, big data applications, cloud games, and scientific computing). The computing network resource layer includes various hardware resources for computing, and may also include network and storage resources. The calculation force abstraction interface, the calculation force abstraction resolver, the calculation force abstraction code generator, the calculation force abstraction translator and the calculation force abstraction runtime library are all located in a calculation force abstraction layer.

And the computation force abstraction layer is used for distributing the upper layer service code to concrete hardware for execution according to an operator defined by the computation force abstraction interface. The allocation process comprises 3 steps of analyzing codes, splitting the codes into operators (computing tasks such as Fourier transformation and matrix multiplication), optimizing the operators to improve execution efficiency (like combination of class operators), determining the hardware type suitable for operating the operators and the corresponding operation time, linking the operators to a specific operation time base, sequencing according to priority when a plurality of operation time bases are suitable for operating the operators, determining the available computing resource amount of the operators according to the sequence, and finally selecting one operation time suitable for operating the operators and having sufficient computing resource for linking.

The computation force abstract interface defines a unified specification of decoupling development codes and software and hardware technology stacks in the computation force abstract system, and specifically comprises:

(1) Operator interface: unified expressions related to calculation are defined, and unified expressions of calculation operations such as basic math library, image rendering recognition and deep learning are standardized.

(2) An equipment management interface: registering and managing hardware resources, acquiring hardware information, initializing and configuring.

(3) A kernel scheduling interface: a unified description of the calculated schedule type, schedule characteristics, and attributes.

(4) A memory management interface: memory allocation release, copy migration and priority description in the standard computation force abstraction system.

It can be seen that the force abstraction interface defines a standard, cross-vendor, cross-heterogeneous resource programming specification, and code developers need not be concerned with the vendor, type of underlying hardware. The computing power abstraction interface also defines standard access methods (such as a hardware registration method, a runtime library access method and the like) of bottom hardware, and is used for the computing power abstraction layer to carry out computing power calling after the hardware is accessed.

The computational force abstract resolver: and carrying out explicit analysis and implicit analysis on the development codes, identifying control codes and acceleration codes in the development codes, mapping the control codes and the acceleration codes with an operator interface, and realizing the splitting and unified description of the control codes and the acceleration codes.

The computational abstraction code generator: optimizing and analyzing performance (predicting the execution efficiency of calculation tasks on different hardware and empirically analyzing the execution priority of different hardware) of the native code to improve the execution efficiency, generating abstract code, and dividing the abstract code into three sections: control segment, acceleration segment, priority. The control code mainly refers to running on a CPU, the acceleration code runs on heterogeneous hardware, and the priority refers to executing an optimal heterogeneous equipment sequencing queue.

The computational power abstraction streamer: and merging the three-segment abstract codes into a stream code which can be uniformly dispatched and allocated.

The computing power abstraction runtime library: the loose coupling architecture integrates a hardware runtime library and a compiler back end into each hardware manufacturer, actually deploys and executes the circulation codes, comprises a dynamic link library of the specific hardware executable circulation codes, and manages equipment, a kernel and a memory during program running.

An embodiment of the present application further provides a method capable of implementing computation power abstraction, as shown in fig. 4, including:

and step 41, analyzing the development code, identifying a control code and an acceleration code, mapping the control code and the acceleration code with an operator interface respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain a native code.

Here, the operator interface is used to define a unified representation of the computation-like operation. The device management interface is used for registering and managing hardware resources, and acquiring, initializing and configuring hardware information. The kernel scheduling interface is used for uniformly describing the scheduling type, the scheduling characteristics and the attributes of the calculation. The memory management interface is used for standardizing memory allocation release, copy migration and priority description. The control code comprises parameter configuration code and/or execution flow code in the development code. The acceleration code includes a portion of code in the development code that is capable of being processed by heterogeneous hardware.

Configuring at least one of a device management interface, a kernel scheduling interface and a memory management interface, wherein the configuration comprises at least one of the following steps: uniformly expressing and mapping memory usage and address configuration in the control code and a memory management interface; applying and using the equipment resources in the control codes and carrying out unified expression and mapping on the equipment resources and the equipment management interface; and carrying out unified expression mapping on the process management and the time sequence control in the control code and the kernel scheduling interface.

In addition, the control code and the acceleration code in the development code can be identified in different modes. For example, when the attributes of different code sections are defined or annotated in advance in the development code, for example, if a code of a certain section is designated as a host code (control code) and a code of a certain section is designated as a device code (acceleration code), each of the control code and the acceleration code can be determined based on the attribute information defined or annotated in advance in the development code. For another example, the embodiment of the present application may also train a neural network model for identifying the control code and the acceleration code in advance, and identify the code portion by using the neural network model.

Step 42, generating a flow code based on the native code.

Here, the embodiment of the present application may optimize execution efficiency of the native code to obtain an optimized control code and an optimized acceleration code; generating execution priorities of different heterogeneous hardware for executing the acceleration codes according to the predicted execution efficiency and performance analysis results of the acceleration codes on different hardware; and merging the optimized control codes, the optimized acceleration codes and the execution priority into a flow code which can be uniformly dispatched and distributed.

And 43, performing resource matching based on the circulation codes, and performing hardware linkage according to a resource matching result to generate an executable file.

Here, in the embodiment of the present application, the optimized control code and the optimized acceleration code are respectively matched with the hardware resources registered in the computation force abstract runtime library, and relevant hardware registration and resource matching are completed according to the matching result, so as to obtain a resource matching result, where when the acceleration code is subjected to resource matching, according to the execution priority, the available hardware with the highest priority is selected from the computation force abstract runtime library for registration and resource matching.

In addition, the computation power abstraction runtime library may receive various hardware registration information and update hardware resources in the computation power abstraction runtime library.

Through the steps, the development environment and the loosely-coupled development framework are unified, heterogeneous deployment of a set of development codes of a developer can be achieved, the problems of development ecological isolation, code migration difficulty and the like among hardware manufacturers are solved, and the resource utilization rate is improved.

A more specific example of a method based on the above system that can implement computational power abstraction is provided below, and the above method is further described.

Examples of this include:

step S1: a developer selects any software and hardware stack to develop a program and generate a development code;

in a specific implementation, a developer selects any program development language such as C, C + +, python, and the like, and the software and hardware stack may be any development framework such as tensrflow, pytorch, openGL, cuda, and the like, to perform program development, thereby generating development codes.

Step S2: the calculation abstract analyzer performs explicit analysis and implicit analysis on development codes, identifies control codes and codes for parallel acceleration, maps the codes with operator interfaces, and configures an equipment management interface, a kernel scheduling interface and a memory management interface to generate native codes.

In one specific implementation, the explicit parsing may be a conversion to a specific programming interface definition in Cuda, openGL, or macro annotation.

In one implementation, implicit parsing may be intelligent recognition and conversion of code portions that are not specifically labeled or programmatically interface defined by the user.

In one particular implementation, control code, which may be portions of code that include parameter configurations, execution flows, etc., and parallel acceleration code, which may be portions of code that are specially optimized for heterogeneous hardware in a computational power network, are identified based on explicit and implicit parsing.

In one specific implementation, operators in parallel acceleration codes such as OpenGL can perform unified expression mapping with image rendering recognition operations in operator interfaces. Operators in parallel acceleration codes such as TensorFlow can be uniformly expressed and mapped with deep learning type operation in an operator interface.

In one embodiment, memory usage and address allocation in the control code may be expressed and mapped uniformly with the memory management interface. The device resource application and use in the control code can be uniformly expressed and mapped with the device management interface. The process management and the sequence control in the control code can be uniformly expressed and mapped with the kernel scheduling interface.

In one particular implementation, the native code includes an intermediate representation file that is a unified representation of the computational abstraction interface.

And step S3: optimizing and analyzing performance (predicting execution efficiency of calculation on different hardware and empirically analyzing execution priority of different hardware) for improving execution efficiency of the native code, generating abstract code, and dividing the abstract code into three sections: control code, acceleration code, execution priority. The control code generally runs on a CPU, the acceleration code generally runs on heterogeneous hardware, and the priority refers to execution of an optimal heterogeneous equipment sequencing queue;

in a specific implementation, the optimization specifically refers to optimization of an execution flow and an operation structure, such as merging of operators, replacement of constants, and the like.

In a specific implementation, the performance analysis predicts the parts uniformly expressed by the operator interfaces according to the execution capacity of heterogeneous hardware such as a CPU and a GPU, and generates the execution priority according to the predicted execution efficiency.

And step S4: and inputting the abstract codes into a computational power converter, and combining to generate the conversion codes which can be uniformly dispatched and distributed.

In one specific implementation, the computational power streamer merges control code, acceleration code, and execution priority in abstract code into a stream code file.

In one embodiment, the flow code is a file arranged according to a data pool, a variable pool, a data segment, a control segment, an acceleration segment, and an execution priority configuration data segment.

Step S5: and the calculation abstract runtime library receives the hardware registration information and completes the updating of the runtime library.

In one particular implementation, the runtime library may be a computational abstracted runtime running on CPU hardware, a computational abstracted runtime running on GPU hardware, a computational abstracted runtime running on FPGA hardware, a computational abstracted runtime running on ASIC hardware, or the like.

In one specific implementation, the computation-oriented abstract runtime running on the CPU accepts hardware code related to the CPU, such as a control segment, a data segment, and an execution priority configuration data segment, in the stream code, and completes information registration.

In a specific implementation, a computational power abstraction runtime running on heterogeneous hardware such as a GPU receives codes of hardware related to a CPU, such as an acceleration segment and a data segment, in a stream code, and completes information registration.

The loose coupling architecture integrates a hardware runtime library and a compiler back end into each hardware manufacturer, actually deploys and executes the circulation codes, comprises a dynamic link library of the specific hardware executable circulation codes, and manages equipment, a kernel and a memory during program running.

Step S6: and after the circulation code is input into the runtime library, performing resource matching according to the execution priority and the hardware registration information.

In a specific implementation, the execution priority in the stream code assumes a first priority of the FPGA, a second priority of the GPU, and a third priority of the CPU.

In a specific implementation, when the FPGA runtime, the GPU runtime, and the CPU runtime are included at the same time, the stream code will complete hardware registration and resource matching related to the FPGA according to the first priority of the FPGA.

In a specific implementation, the stream code includes both the GPU runtime and the CPU runtime, and the related hardware registration and resource matching are completed according to the second priority of the GPU.

In one specific implementation, when only the CPU runs, the flow code will complete the related hardware registration and resource matching according to the third priority of the CPU.

Step S7: and combining the computing type of the computing resource pool, performing instant link of specific hardware according to a matching result, and generating an executable file.

In a specific implementation, after registration and resource matching with hardware such as a CPU, a GPU, an FPGA, an ASIC, and the like are completed, mapped codes are allocated, machine language dynamic translation and runtime library linking of a specific hardware structure are completed, and an executable file that can be executed on the specific hardware is generated.

From the above, it can be seen that the computation force abstraction layer provided in the embodiment of the present application constructs a loose coupling development system with a north direction connected to a computation network application development layer and a south direction connected to a computation network resource layer based on a computation force abstraction interface, the north direction can support multiple types of application development, the south direction can support access of multiple manufacturers and heterogeneous hardware, the heterogeneous hardware difference is shielded, the service development is insensitive and flexibly scheduled, and the problems of difficulty in developing ecological isolation and migration are solved. In addition, the embodiment of the application also provides a method capable of realizing computing power abstraction, which realizes flexible migration of a set of codes, improves code execution efficiency and improves computing power resource utilization rate.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the method embodiment for implementing computation power abstraction, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the present embodiments are not limited to those precise embodiments, which are intended to be illustrative rather than restrictive, and that various changes and modifications may be effected therein by one skilled in the art without departing from the scope of the appended claims.

Claims

1. A method for enabling computational force abstraction, comprising:

generating a flow code based on the native code;

performing resource matching based on the flow code, and performing hardware linkage according to a resource matching result to generate an executable file;

at least one of the device management interface, the kernel scheduling interface and the memory management interface is configured, and the configuration comprises at least one of the following:

carrying out unified expression mapping on process management and time sequence control in the control code and a kernel scheduling interface;

generating a flow code based on the native code, comprising:

merging the optimized control codes, the optimized acceleration codes and the execution priority into a flow code which can be uniformly dispatched and distributed;

performing resource matching based on the flow code, including:

2. The method of claim 1,

the operator interface is used for defining the unified expression of calculation operation;

3. The method of claim 1,

the control code comprises parameter configuration code and/or execution flow code in the development code;

4. The method of claim 1, further comprising:

5. A system that enables computation power abstraction, comprising:

a computational abstraction translator to generate translation code based on the native code;

the computational power abstract runtime library is used for carrying out resource matching based on the flow codes and carrying out hardware linkage according to a resource matching result to generate an executable file;

wherein the computational power abstract parser is further configured to perform at least one of:

the system further comprises:

the computational power abstraction translator is also used for combining the optimized control codes, the optimized acceleration codes and the execution priority into a translation code which can be uniformly dispatched and distributed;

and the computation force abstract runtime library is further used for respectively matching the optimized control codes and the optimized acceleration codes with the registered hardware resources in the computation force abstract runtime library, and completing related hardware registration and resource matching according to matching results to obtain resource matching results, wherein when the acceleration codes are subjected to resource matching, available hardware with the highest priority is selected from the computation force abstract runtime library for registration and resource matching according to the execution priority.

6. The system of claim 5,

7. The system of claim 5,

8. The system of claim 5,

the computation force abstract runtime library is also used for receiving hardware registration information and updating hardware resources in the computation force abstract runtime library.