CN115309407B - Method and system capable of realizing calculation power abstraction - Google Patents

Method and system capable of realizing calculation power abstraction Download PDF

Info

Publication number
CN115309407B
CN115309407B CN202211243920.7A CN202211243920A CN115309407B CN 115309407 B CN115309407 B CN 115309407B CN 202211243920 A CN202211243920 A CN 202211243920A CN 115309407 B CN115309407 B CN 115309407B
Authority
CN
China
Prior art keywords
code
hardware
acceleration
codes
interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211243920.7A
Other languages
Chinese (zh)
Other versions
CN115309407A (en
Inventor
王晓云
罗馨玥
王升
刘景磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN202211243920.7A priority Critical patent/CN115309407B/en
Publication of CN115309407A publication Critical patent/CN115309407A/en
Application granted granted Critical
Publication of CN115309407B publication Critical patent/CN115309407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a method and a system capable of realizing computing power abstraction, which relate to the technical field of electrical digital data processing, wherein the method comprises the following steps: analyzing the development code, identifying a control code and an acceleration code, mapping the control code and the acceleration code with an operator interface respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain a native code; generating a flow code based on the native code; and performing resource matching based on the flow code, and performing hardware linkage according to a resource matching result to generate an executable file. The application provides a uniform development system of heterogeneous hardware of cross-manufacturer, which enables one set of codes of a developer to realize heterogeneous deployment by providing a uniform development environment and a loosely-coupled development framework, thereby solving the problems of development ecological isolation, difficult code migration and the like between hardware manufacturers and improving the resource utilization rate.

Description

Method and system capable of realizing calculation power abstraction
Technical Field
The application relates to the technical field of electric digital data processing, in particular to a method and a system capable of realizing computing power abstraction.
Background
In recent years, efforts have been taken as core productivity, exhibiting ubiquitous, heterogeneous characteristics. Industrial digital transformation has made higher demands on computational efficiency, so heterogeneous chips such as Graphics Processing Units (GPUs), field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs) have appeared, and computational resources have evolved from general computational power of traditional Central Processing Units (CPUs) to heterogeneous computation of a mixture of various hardware. Currently, most application development systems for heterogeneous computing are in a vertical-shaft mode. Generally, the hardware vendors use respective development ecology and do not communicate with each other. A single vendor's development system is shown in fig. 1, and is mainly divided into 3 layers:
an application development layer: code is developed based on business needs.
And a compiling layer: the front end of the compiler analyzes the codes to generate intermediate expression; inputting a rear end for optimization to form a machine language code which can be understood by hardware; an executable file is generated by the linker.
Computing resource layer: and providing a runtime library and executing the target program.
In the heterogeneous computational power scene in the prior art, the problems of development ecological isolation, difficult code migration, low resource utilization and the like of different hardware manufacturers generally exist.
Disclosure of Invention
At least one embodiment of the present application provides a method and a system capable of implementing computational power abstraction, which are used to solve at least one of the problems of development ecological isolation, difficult code migration, low resource utilization, and the like between different hardware vendors in heterogeneous computational power scenarios in the prior art.
In order to solve the technical problem, the present application is implemented as follows:
in a first aspect, an embodiment of the present application provides a method for implementing computation power abstraction, including:
analyzing the development code, identifying a control code and an acceleration code, mapping the control code and the acceleration code with an operator interface respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain a native code;
generating a flow code based on the native code;
and performing resource matching based on the flow code, and performing hardware linkage according to a resource matching result to generate an executable file.
Optionally, the operator interface is used for defining a unified expression of computation type operation;
the equipment management interface is used for registering and managing hardware resources, and acquiring, initializing and configuring hardware information;
the kernel scheduling interface is used for uniformly describing the scheduling type, the scheduling characteristic and the attribute of the calculation;
the memory management interface is used for standardizing memory allocation release, copy migration and priority description.
Optionally, the control code includes a parameter configuration code and/or an execution flow code in the development code;
the acceleration code includes a portion of code in the development code that is capable of being processed by heterogeneous hardware.
Optionally, at least one of the device management interface, the kernel scheduling interface, and the memory management interface is configured, which includes at least one of the following:
uniformly expressing and mapping memory usage and address configuration in the control code and a memory management interface;
applying and using the equipment resources in the control codes and carrying out unified expression and mapping on the equipment resources and the equipment management interface;
and uniformly expressing and mapping the process management and the time sequence control in the control code and the kernel scheduling interface.
Optionally, generating the stream code based on the native code includes:
optimizing execution efficiency of the native code to obtain an optimized control code and an optimized acceleration code; generating execution priorities of different heterogeneous hardware for executing the acceleration codes according to the predicted execution efficiency and performance analysis results of the acceleration codes on different hardware;
and merging the optimized control codes, the optimized acceleration codes and the execution priority into a flow code which can be uniformly dispatched and distributed.
Optionally, performing resource matching based on the flow code includes:
and respectively matching the optimized control code and the optimized acceleration code with the registered hardware resources in the computation abstract runtime library, and finishing related hardware registration and resource matching according to the matching result to obtain a resource matching result, wherein when the acceleration code is subjected to resource matching, available hardware with the highest priority is selected from the computation abstract runtime library according to the execution priority to be subjected to registration and resource matching.
Optionally, the method further includes:
and receiving hardware registration information, and updating hardware resources in the computing power abstract runtime library.
In a second aspect, an embodiment of the present application provides a system capable of implementing computation power abstraction, including:
the computing abstract parser is used for parsing the development code, identifying a control code and an acceleration code, mapping the control code and the acceleration code with an operator interface respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain a native code;
a computational abstraction translator for generating a translated code based on the native code;
and the computation abstract runtime library is used for performing resource matching based on the flow codes, performing hardware linkage according to a resource matching result, and generating an executable file.
Optionally, the operator interface is used for defining a unified expression of computation type operation;
the equipment management interface is used for registering and managing hardware resources, and acquiring, initializing and configuring hardware information;
the kernel scheduling interface is used for uniformly describing the scheduling type, the scheduling characteristic and the attribute of the calculation;
the memory management interface is used for standardizing memory allocation release, copy migration and priority description.
Optionally, the control code includes a parameter configuration code and/or an execution flow code in the development code;
the acceleration code includes a portion of code in the development code that is capable of being processed by heterogeneous hardware.
Optionally, the computation power abstract parser is further configured to perform at least one of the following processes:
uniformly expressing and mapping memory usage and address configuration in the control codes and a memory management interface;
applying and using the equipment resources in the control codes and carrying out unified expression and mapping on the equipment resources and the equipment management interface;
and carrying out unified expression mapping on the process management and the time sequence control in the control code and the kernel scheduling interface.
Optionally, the method further includes:
the computing abstract code generator is used for optimizing the execution efficiency of the native code to obtain an optimized control code and an optimized acceleration code; generating execution priorities of different heterogeneous hardware for executing the acceleration codes according to the predicted execution efficiency and performance analysis results of the acceleration codes on different hardware;
the computational power abstraction translator is further used for combining the optimized control codes, the optimized acceleration codes and the execution priority into a translation code which can be uniformly dispatched and distributed.
Optionally, the computation power abstract runtime library is further configured to match the optimized control code and the optimized acceleration code with the hardware resources registered in the computation power abstract runtime library, and complete related hardware registration and resource matching according to a matching result to obtain a resource matching result, where when performing resource matching on the acceleration code, according to the execution priority, an available hardware with a highest priority is selected from the computation power abstract runtime library for registration and resource matching.
Optionally, the computation power abstract runtime library is further configured to receive hardware registration information, and update hardware resources in the computation power abstract runtime library.
Compared with the prior art, the method and the system capable of realizing computing power abstraction provided by the embodiment of the application provide a uniform development system of heterogeneous hardware of cross-manufacturers, and a set of codes of a developer can realize heterogeneous deployment through a uniform development environment and a loosely-coupled development framework, so that the problems of development ecological isolation, code migration difficulty and the like among hardware manufacturers are solved, and the resource utilization rate is improved.
Drawings
FIG. 1 is a schematic diagram of a development system of a single vendor;
FIG. 2 is a schematic illustration of the prior art of developing ecological barriers by different vendors;
FIG. 3 is a block diagram illustrating an architecture of a system capable of implementing computational power abstraction according to an embodiment of the present application;
fig. 4 is a flowchart illustrating a method for implementing computation power abstraction according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. In the description and in the claims "and/or" means at least one of the connected objects.
In prior art heterogeneous development, different hardware vendors often have the following problems:
1. developing ecological isolation
As shown in FIG. 2, different hardware vendors provide customized development environments for their own hardware, and are ecologically isolated from each other. Chip architectures and software stacks of various manufacturers are fragmented, chimney-type technology stacks are presented, the difference of software and hardware technology stacks of different manufacturers is large, application codes and hardware are tightly coupled, updating and maintenance costs are high, and rapid adaptation and optimization are difficult when new hardware appears.
2. The migration difficulty is large
Due to the difference of hardware architecture and different basic development languages, the same function is realized in different chip architectures, and codes are generated by using different languages and based on different development environments. For example: the matrix multiplication function is realized, and the CPU, the FPGA and the GPU need to be realized by 3 languages and 3 sets of codes. When the service is migrated among different heterogeneous hardware, the code migration is difficult, the multiplexing is difficult, the development efficiency is low, for example, in C + +/Fortran-based data intensive application, the code amount is more in the order of millions of lines, and the migration workload on a heterogeneous platform is large.
3. Low resource utilization rate
The current application development mode limits the calculation power scheduling method and reduces the utilization rate of infrastructure layer calculation power resources. Because the application code is coupled with the hardware, when resource scheduling is carried out, a user needs to designate a specific computing resource, and for a computing provider, on one hand, in a planning and construction stage, the construction and planning of a computing resource pool are difficult to carry out in advance according to requirements, and investment waste is possibly caused; on the other hand, in the operation process, if the calculation power type applied by the user is not matched with the calculation power type provided by the resource pool, peripheral calculation power resources cannot be called, a calculation power island may be formed, and the use rate of the calculation power resources is reduced.
The core of the technical problem lies in that a uniform development system which spans manufacturers and heterogeneous hardware is lacked, a uniform development environment and a loosely-coupled development framework are provided, and a set of codes of a developer can realize heterogeneous deployment, so that the resource utilization rate is improved.
In order to solve at least one of the above problems, embodiments of the present application provide a method and a system capable of implementing computation power abstraction, which are applied to a heterogeneous computation power scenario, where the heterogeneous computation power scenario includes multiple different types of hardware, such as multiple types of hardware in a CPU, a GPU, an ASIC, and the like, and may specifically be hardware of different types and/or different device manufacturers. The system capable of realizing computing power abstraction provided by the embodiment of the application can also be called a computing power abstraction system, and the system is a uniform development system which is oriented to the computing power network field and is across manufacturers and heterogeneous hardware.
As shown in fig. 3, the system capable of implementing computational power abstraction provided in the embodiment of the present application includes a computational power abstraction parser, a computational power abstraction streamer, and a computational power abstraction runtime library. Wherein the content of the first and second substances,
the computing power abstract resolver is used for resolving development codes, identifying control codes and acceleration codes, mapping the control codes and the acceleration codes with operator interfaces respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain native codes.
The development code may be code generated by a developer selecting an arbitrary software and hardware stack for program development, the language of program development is not limited in the embodiment of the present application, the developer may select an arbitrary program development language, such as C, C + +, python, and the like, and the software and hardware stack may be an arbitrary development framework, such as tensrflow, pytorch, openGL, cuda, and the like. The control code generally comprises parameter configuration code and/or execution flow code in the development code, and the acceleration code generally comprises a code part which can be processed by heterogeneous hardware in the development code. The control code is generally code executed by a CPU, the acceleration code is code executed by heterogeneous hardware different from the CPU, the heterogeneous hardware may include AISC, FPGA, GPU, and the like, and the control code may be executed by the CPU when heterogeneous hardware resources are insufficient.
In the embodiment of the application, the operator interface is used for defining the unified expression of the calculation operation, so that the unified expression of the calculation operation such as a basic math library, image rendering identification, deep learning and the like is standardized. The device management interface is used for registering and managing hardware resources, and acquiring, initializing and configuring hardware information. The kernel scheduling interface is used for uniformly describing the scheduling type, the scheduling characteristics and the attributes of the calculation. The memory management interface is used for standardizing memory allocation release, copy migration and priority description.
The computational abstraction translator is configured to generate a translated code based on the native code.
And the computation power abstract runtime library is used for performing resource matching based on the flow codes, performing hardware linkage according to a resource matching result, and generating an executable file.
Through the units or the modules, the embodiment of the application provides a unified development environment and a loosely-coupled development framework, so that heterogeneous deployment of development codes can be realized, the problems of development ecological isolation, code migration difficulty and the like among hardware manufacturers are solved, and the resource utilization rate is improved.
In an embodiment of the present application, the computation power abstraction parser is further configured to perform at least one of the following processes:
uniformly expressing and mapping memory usage and address configuration in the control code and a memory management interface;
applying and using the equipment resources in the control codes and carrying out unified expression and mapping on the equipment resources and the equipment management interface;
and carrying out unified expression mapping on the process management and the time sequence control in the control code and the kernel scheduling interface.
Through the mapping processing, control codes obtained by development of different program development languages can be supported, and a loose coupling structure of a computer network resource layer is realized so as to support access of multi-manufacturer and heterogeneous hardware.
As shown in fig. 3, the above-described system of the embodiment of the present application may further include an algorithmic abstract code generator. The computing power abstract code generator is used for optimizing the execution efficiency of the native code to obtain an optimized control code and an optimized acceleration code; and generating the execution priority of the acceleration code executed by different heterogeneous hardware according to the predicted execution efficiency and performance analysis result of the acceleration code on different hardware. At this time, the computational power abstraction translator is further configured to merge the optimized control code, the optimized acceleration code, and the execution priority into a translation code that can be uniformly dispatched and allocated. In this way, the optimization processing can improve the code execution efficiency, and in addition, the execution priority can be generated according to the execution efficiency and the performance analysis result of the acceleration code on the heterogeneous hardware, so as to select the available hardware with the optimal execution efficiency and performance to execute the corresponding code.
For example, the computation power abstract runtime library is further configured to match the optimized control code and acceleration code with the hardware resources registered in the computation power abstract runtime library, and complete related hardware registration and resource matching according to the matching result to obtain a resource matching result, where when performing resource matching on the acceleration code, according to the execution priority, an available hardware with a highest priority is selected from the computation power abstract runtime library to perform registration and resource matching. Therefore, by selecting the available hardware with the highest priority for registration and resource matching, the execution efficiency and performance of the acceleration code can be improved.
In the embodiment of the application, the computation force abstract runtime library is further configured to receive hardware registration information and update hardware resources in the computation force abstract runtime library, so that access of multiple vendors and heterogeneous hardware can be supported.
In the system capable of implementing computational power abstraction shown in fig. 3, the system may specifically include three layers, namely, a computational application development layer, a computational power abstraction layer, and a computational resource layer. The web computing application development layer may generate corresponding development codes for development of various applications (such as AI applications, big data applications, cloud games, and scientific computing). The computing network resource layer includes various hardware resources for computing, and may also include network and storage resources. The calculation force abstraction interface, the calculation force abstraction resolver, the calculation force abstraction code generator, the calculation force abstraction translator and the calculation force abstraction runtime library are all located in a calculation force abstraction layer.
And the computation force abstraction layer is used for distributing the upper layer service code to concrete hardware for execution according to an operator defined by the computation force abstraction interface. The allocation process comprises 3 steps of analyzing codes, splitting the codes into operators (computing tasks such as Fourier transformation and matrix multiplication), optimizing the operators to improve execution efficiency (like combination of class operators), determining the hardware type suitable for operating the operators and the corresponding operation time, linking the operators to a specific operation time base, sequencing according to priority when a plurality of operation time bases are suitable for operating the operators, determining the available computing resource amount of the operators according to the sequence, and finally selecting one operation time suitable for operating the operators and having sufficient computing resource for linking.
The computation force abstract interface defines a unified specification of decoupling development codes and software and hardware technology stacks in the computation force abstract system, and specifically comprises:
(1) Operator interface: unified expressions related to calculation are defined, and unified expressions of calculation operations such as basic math library, image rendering recognition and deep learning are standardized.
(2) An equipment management interface: registering and managing hardware resources, acquiring hardware information, initializing and configuring.
(3) A kernel scheduling interface: a unified description of the calculated schedule type, schedule characteristics, and attributes.
(4) A memory management interface: memory allocation release, copy migration and priority description in the standard computation force abstraction system.
It can be seen that the force abstraction interface defines a standard, cross-vendor, cross-heterogeneous resource programming specification, and code developers need not be concerned with the vendor, type of underlying hardware. The computing power abstraction interface also defines standard access methods (such as a hardware registration method, a runtime library access method and the like) of bottom hardware, and is used for the computing power abstraction layer to carry out computing power calling after the hardware is accessed.
The computational force abstract resolver: and carrying out explicit analysis and implicit analysis on the development codes, identifying control codes and acceleration codes in the development codes, mapping the control codes and the acceleration codes with an operator interface, and realizing the splitting and unified description of the control codes and the acceleration codes.
The computational abstraction code generator: optimizing and analyzing performance (predicting the execution efficiency of calculation tasks on different hardware and empirically analyzing the execution priority of different hardware) of the native code to improve the execution efficiency, generating abstract code, and dividing the abstract code into three sections: control segment, acceleration segment, priority. The control code mainly refers to running on a CPU, the acceleration code runs on heterogeneous hardware, and the priority refers to executing an optimal heterogeneous equipment sequencing queue.
The computational power abstraction streamer: and merging the three-segment abstract codes into a stream code which can be uniformly dispatched and allocated.
The computing power abstraction runtime library: the loose coupling architecture integrates a hardware runtime library and a compiler back end into each hardware manufacturer, actually deploys and executes the circulation codes, comprises a dynamic link library of the specific hardware executable circulation codes, and manages equipment, a kernel and a memory during program running.
An embodiment of the present application further provides a method capable of implementing computation power abstraction, as shown in fig. 4, including:
and step 41, analyzing the development code, identifying a control code and an acceleration code, mapping the control code and the acceleration code with an operator interface respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain a native code.
Here, the operator interface is used to define a unified representation of the computation-like operation. The device management interface is used for registering and managing hardware resources, and acquiring, initializing and configuring hardware information. The kernel scheduling interface is used for uniformly describing the scheduling type, the scheduling characteristics and the attributes of the calculation. The memory management interface is used for standardizing memory allocation release, copy migration and priority description. The control code comprises parameter configuration code and/or execution flow code in the development code. The acceleration code includes a portion of code in the development code that is capable of being processed by heterogeneous hardware.
Configuring at least one of a device management interface, a kernel scheduling interface and a memory management interface, wherein the configuration comprises at least one of the following steps: uniformly expressing and mapping memory usage and address configuration in the control code and a memory management interface; applying and using the equipment resources in the control codes and carrying out unified expression and mapping on the equipment resources and the equipment management interface; and carrying out unified expression mapping on the process management and the time sequence control in the control code and the kernel scheduling interface.
In addition, the control code and the acceleration code in the development code can be identified in different modes. For example, when the attributes of different code sections are defined or annotated in advance in the development code, for example, if a code of a certain section is designated as a host code (control code) and a code of a certain section is designated as a device code (acceleration code), each of the control code and the acceleration code can be determined based on the attribute information defined or annotated in advance in the development code. For another example, the embodiment of the present application may also train a neural network model for identifying the control code and the acceleration code in advance, and identify the code portion by using the neural network model.
Step 42, generating a flow code based on the native code.
Here, the embodiment of the present application may optimize execution efficiency of the native code to obtain an optimized control code and an optimized acceleration code; generating execution priorities of different heterogeneous hardware for executing the acceleration codes according to the predicted execution efficiency and performance analysis results of the acceleration codes on different hardware; and merging the optimized control codes, the optimized acceleration codes and the execution priority into a flow code which can be uniformly dispatched and distributed.
And 43, performing resource matching based on the circulation codes, and performing hardware linkage according to a resource matching result to generate an executable file.
Here, in the embodiment of the present application, the optimized control code and the optimized acceleration code are respectively matched with the hardware resources registered in the computation force abstract runtime library, and relevant hardware registration and resource matching are completed according to the matching result, so as to obtain a resource matching result, where when the acceleration code is subjected to resource matching, according to the execution priority, the available hardware with the highest priority is selected from the computation force abstract runtime library for registration and resource matching.
In addition, the computation power abstraction runtime library may receive various hardware registration information and update hardware resources in the computation power abstraction runtime library.
Through the steps, the development environment and the loosely-coupled development framework are unified, heterogeneous deployment of a set of development codes of a developer can be achieved, the problems of development ecological isolation, code migration difficulty and the like among hardware manufacturers are solved, and the resource utilization rate is improved.
A more specific example of a method based on the above system that can implement computational power abstraction is provided below, and the above method is further described.
Examples of this include:
step S1: a developer selects any software and hardware stack to develop a program and generate a development code;
in a specific implementation, a developer selects any program development language such as C, C + +, python, and the like, and the software and hardware stack may be any development framework such as tensrflow, pytorch, openGL, cuda, and the like, to perform program development, thereby generating development codes.
Step S2: the calculation abstract analyzer performs explicit analysis and implicit analysis on development codes, identifies control codes and codes for parallel acceleration, maps the codes with operator interfaces, and configures an equipment management interface, a kernel scheduling interface and a memory management interface to generate native codes.
In one specific implementation, the explicit parsing may be a conversion to a specific programming interface definition in Cuda, openGL, or macro annotation.
In one implementation, implicit parsing may be intelligent recognition and conversion of code portions that are not specifically labeled or programmatically interface defined by the user.
In one particular implementation, control code, which may be portions of code that include parameter configurations, execution flows, etc., and parallel acceleration code, which may be portions of code that are specially optimized for heterogeneous hardware in a computational power network, are identified based on explicit and implicit parsing.
In one specific implementation, operators in parallel acceleration codes such as OpenGL can perform unified expression mapping with image rendering recognition operations in operator interfaces. Operators in parallel acceleration codes such as TensorFlow can be uniformly expressed and mapped with deep learning type operation in an operator interface.
In one embodiment, memory usage and address allocation in the control code may be expressed and mapped uniformly with the memory management interface. The device resource application and use in the control code can be uniformly expressed and mapped with the device management interface. The process management and the sequence control in the control code can be uniformly expressed and mapped with the kernel scheduling interface.
In one particular implementation, the native code includes an intermediate representation file that is a unified representation of the computational abstraction interface.
And step S3: optimizing and analyzing performance (predicting execution efficiency of calculation on different hardware and empirically analyzing execution priority of different hardware) for improving execution efficiency of the native code, generating abstract code, and dividing the abstract code into three sections: control code, acceleration code, execution priority. The control code generally runs on a CPU, the acceleration code generally runs on heterogeneous hardware, and the priority refers to execution of an optimal heterogeneous equipment sequencing queue;
in a specific implementation, the optimization specifically refers to optimization of an execution flow and an operation structure, such as merging of operators, replacement of constants, and the like.
In a specific implementation, the performance analysis predicts the parts uniformly expressed by the operator interfaces according to the execution capacity of heterogeneous hardware such as a CPU and a GPU, and generates the execution priority according to the predicted execution efficiency.
And step S4: and inputting the abstract codes into a computational power converter, and combining to generate the conversion codes which can be uniformly dispatched and distributed.
In one specific implementation, the computational power streamer merges control code, acceleration code, and execution priority in abstract code into a stream code file.
In one embodiment, the flow code is a file arranged according to a data pool, a variable pool, a data segment, a control segment, an acceleration segment, and an execution priority configuration data segment.
Step S5: and the calculation abstract runtime library receives the hardware registration information and completes the updating of the runtime library.
In one particular implementation, the runtime library may be a computational abstracted runtime running on CPU hardware, a computational abstracted runtime running on GPU hardware, a computational abstracted runtime running on FPGA hardware, a computational abstracted runtime running on ASIC hardware, or the like.
In one specific implementation, the computation-oriented abstract runtime running on the CPU accepts hardware code related to the CPU, such as a control segment, a data segment, and an execution priority configuration data segment, in the stream code, and completes information registration.
In a specific implementation, a computational power abstraction runtime running on heterogeneous hardware such as a GPU receives codes of hardware related to a CPU, such as an acceleration segment and a data segment, in a stream code, and completes information registration.
The loose coupling architecture integrates a hardware runtime library and a compiler back end into each hardware manufacturer, actually deploys and executes the circulation codes, comprises a dynamic link library of the specific hardware executable circulation codes, and manages equipment, a kernel and a memory during program running.
Step S6: and after the circulation code is input into the runtime library, performing resource matching according to the execution priority and the hardware registration information.
In a specific implementation, the execution priority in the stream code assumes a first priority of the FPGA, a second priority of the GPU, and a third priority of the CPU.
In a specific implementation, when the FPGA runtime, the GPU runtime, and the CPU runtime are included at the same time, the stream code will complete hardware registration and resource matching related to the FPGA according to the first priority of the FPGA.
In a specific implementation, the stream code includes both the GPU runtime and the CPU runtime, and the related hardware registration and resource matching are completed according to the second priority of the GPU.
In one specific implementation, when only the CPU runs, the flow code will complete the related hardware registration and resource matching according to the third priority of the CPU.
Step S7: and combining the computing type of the computing resource pool, performing instant link of specific hardware according to a matching result, and generating an executable file.
In a specific implementation, after registration and resource matching with hardware such as a CPU, a GPU, an FPGA, an ASIC, and the like are completed, mapped codes are allocated, machine language dynamic translation and runtime library linking of a specific hardware structure are completed, and an executable file that can be executed on the specific hardware is generated.
From the above, it can be seen that the computation force abstraction layer provided in the embodiment of the present application constructs a loose coupling development system with a north direction connected to a computation network application development layer and a south direction connected to a computation network resource layer based on a computation force abstraction interface, the north direction can support multiple types of application development, the south direction can support access of multiple manufacturers and heterogeneous hardware, the heterogeneous hardware difference is shielded, the service development is insensitive and flexibly scheduled, and the problems of difficulty in developing ecological isolation and migration are solved. In addition, the embodiment of the application also provides a method capable of realizing computing power abstraction, which realizes flexible migration of a set of codes, improves code execution efficiency and improves computing power resource utilization rate.
The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the method embodiment for implementing computation power abstraction, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the present embodiments are not limited to those precise embodiments, which are intended to be illustrative rather than restrictive, and that various changes and modifications may be effected therein by one skilled in the art without departing from the scope of the appended claims.

Claims (8)

1. A method for enabling computational force abstraction, comprising:
analyzing the development code, identifying a control code and an acceleration code, mapping the control code and the acceleration code with an operator interface respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain a native code;
generating a flow code based on the native code;
performing resource matching based on the flow code, and performing hardware linkage according to a resource matching result to generate an executable file;
at least one of the device management interface, the kernel scheduling interface and the memory management interface is configured, and the configuration comprises at least one of the following:
uniformly expressing and mapping memory usage and address configuration in the control code and a memory management interface;
applying and using the equipment resources in the control codes and carrying out unified expression and mapping on the equipment resources and the equipment management interface;
carrying out unified expression mapping on process management and time sequence control in the control code and a kernel scheduling interface;
generating a flow code based on the native code, comprising:
optimizing execution efficiency of the native code to obtain an optimized control code and an optimized acceleration code; generating execution priorities of different heterogeneous hardware for executing the acceleration codes according to the predicted execution efficiency and performance analysis results of the acceleration codes on different hardware;
merging the optimized control codes, the optimized acceleration codes and the execution priority into a flow code which can be uniformly dispatched and distributed;
performing resource matching based on the flow code, including:
and respectively matching the optimized control code and the optimized acceleration code with the registered hardware resources in the computation abstract runtime library, and finishing related hardware registration and resource matching according to the matching result to obtain a resource matching result, wherein when the acceleration code is subjected to resource matching, available hardware with the highest priority is selected from the computation abstract runtime library according to the execution priority to be subjected to registration and resource matching.
2. The method of claim 1,
the operator interface is used for defining the unified expression of calculation operation;
the equipment management interface is used for registering and managing hardware resources, and acquiring, initializing and configuring hardware information;
the kernel scheduling interface is used for uniformly describing the scheduling type, the scheduling characteristic and the attribute of the calculation;
the memory management interface is used for standardizing memory allocation release, copy migration and priority description.
3. The method of claim 1,
the control code comprises parameter configuration code and/or execution flow code in the development code;
the acceleration code includes a portion of code in the development code that is capable of being processed by heterogeneous hardware.
4. The method of claim 1, further comprising:
and receiving hardware registration information, and updating hardware resources in the computing power abstract runtime library.
5. A system that enables computation power abstraction, comprising:
the computing abstract parser is used for parsing the development code, identifying a control code and an acceleration code, mapping the control code and the acceleration code with an operator interface respectively, and configuring at least one of an equipment management interface, a kernel scheduling interface and a memory management interface to obtain a native code;
a computational abstraction translator to generate translation code based on the native code;
the computational power abstract runtime library is used for carrying out resource matching based on the flow codes and carrying out hardware linkage according to a resource matching result to generate an executable file;
wherein the computational power abstract parser is further configured to perform at least one of:
uniformly expressing and mapping memory usage and address configuration in the control code and a memory management interface;
applying and using the equipment resources in the control codes and carrying out unified expression and mapping on the equipment resources and the equipment management interface;
carrying out unified expression mapping on process management and time sequence control in the control code and a kernel scheduling interface;
the system further comprises:
the computing abstract code generator is used for optimizing the execution efficiency of the native code to obtain an optimized control code and an optimized acceleration code; generating execution priorities of different heterogeneous hardware for executing the acceleration codes according to the predicted execution efficiency and performance analysis results of the acceleration codes on different hardware;
the computational power abstraction translator is also used for combining the optimized control codes, the optimized acceleration codes and the execution priority into a translation code which can be uniformly dispatched and distributed;
and the computation force abstract runtime library is further used for respectively matching the optimized control codes and the optimized acceleration codes with the registered hardware resources in the computation force abstract runtime library, and completing related hardware registration and resource matching according to matching results to obtain resource matching results, wherein when the acceleration codes are subjected to resource matching, available hardware with the highest priority is selected from the computation force abstract runtime library for registration and resource matching according to the execution priority.
6. The system of claim 5,
the operator interface is used for defining the unified expression of calculation operation;
the equipment management interface is used for registering and managing hardware resources, and acquiring, initializing and configuring hardware information;
the kernel scheduling interface is used for uniformly describing the scheduling type, the scheduling characteristic and the attribute of the calculation;
the memory management interface is used for standardizing memory allocation release, copy migration and priority description.
7. The system of claim 5,
the control code comprises parameter configuration code and/or execution flow code in the development code;
the acceleration code includes a portion of code in the development code that is capable of being processed by heterogeneous hardware.
8. The system of claim 5,
the computation force abstract runtime library is also used for receiving hardware registration information and updating hardware resources in the computation force abstract runtime library.
CN202211243920.7A 2022-10-12 2022-10-12 Method and system capable of realizing calculation power abstraction Active CN115309407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211243920.7A CN115309407B (en) 2022-10-12 2022-10-12 Method and system capable of realizing calculation power abstraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211243920.7A CN115309407B (en) 2022-10-12 2022-10-12 Method and system capable of realizing calculation power abstraction

Publications (2)

Publication Number Publication Date
CN115309407A CN115309407A (en) 2022-11-08
CN115309407B true CN115309407B (en) 2023-03-31

Family

ID=83868009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211243920.7A Active CN115309407B (en) 2022-10-12 2022-10-12 Method and system capable of realizing calculation power abstraction

Country Status (1)

Country Link
CN (1) CN115309407B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10140099B2 (en) * 2016-06-01 2018-11-27 The Mathworks, Inc. Systems and methods for generating code from executable models with floating point data
CN110209627A (en) * 2019-06-03 2019-09-06 山东浪潮人工智能研究院有限公司 A kind of hardware-accelerated method of SSD towards intelligent terminal
CN113469360B (en) * 2020-03-31 2023-10-20 杭州海康威视数字技术股份有限公司 Reasoning method and device
CN113687816B (en) * 2020-05-19 2023-09-01 杭州海康威视数字技术股份有限公司 Method and device for generating executable code of operator
CN112784959A (en) * 2021-01-13 2021-05-11 鹏城实验室 Deep learning model rapid building system compatible with multiple frames

Also Published As

Publication number Publication date
CN115309407A (en) 2022-11-08

Similar Documents

Publication Publication Date Title
US9286042B2 (en) Control flow graph application configuration
US20220188086A1 (en) Off-load servers software optimal placement method and program
US10768916B2 (en) Dynamic generation of CPU instructions and use of the CPU instructions in generated code for a softcore processor
US10579349B2 (en) Verification of a dataflow representation of a program through static type-checking
Zatsarinny et al. Toward high performance solutions as services of research digital platform
Taura et al. Design and implementation of GXP make—A workflow system based on make
US11321090B2 (en) Serializing and/or deserializing programs with serializable state
CN110555550A (en) Online prediction service deployment method, device and equipment
CN114115841A (en) Method, apparatus, device, medium and program product for dynamically arranging data stream interface
US10530892B2 (en) Processing request for multi-versioned service
CN112132530B (en) Visual dynamic flow arranging method and system
EP1993038A1 (en) Data processing system and data processing method
CN115309407B (en) Method and system capable of realizing calculation power abstraction
CN112199184A (en) Cross-language task scheduling method, device, equipment and readable storage medium
US20230048399A1 (en) Offload server, offload control method, and offload program
Requeno et al. Towards the performance analysis of Apache Tez applications
Deb et al. Towards autonomic distribution of existing object oriented programs
Andrade et al. ParallelME: A parallel mobile engine to explore heterogeneity in mobile computing architectures
CN116861359A (en) Operator fusion method and system for deep learning reasoning task compiler
Benoit et al. Scheduling skeleton-based grid applications using PEPA and NWS
Medeiros et al. A gpu-accelerated molecular docking workflow with kubernetes and apache airflow
WO2022097245A1 (en) Offload server, offload control method, and offload program
Chang et al. Support NNEF execution model for NNAPI
Quiroz-Fabián et al. VPPE: A novel visual parallel programming environment
Psomopoulos et al. Bioinformatics algorithm development for Grid environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant