CN116560666A - AI front end unified computing method, device and medium based on multi-level code generation - Google Patents

AI front end unified computing method, device and medium based on multi-level code generation Download PDF

Info

Publication number
CN116560666A
CN116560666A CN202310834277.3A CN202310834277A CN116560666A CN 116560666 A CN116560666 A CN 116560666A CN 202310834277 A CN202310834277 A CN 202310834277A CN 116560666 A CN116560666 A CN 116560666A
Authority
CN
China
Prior art keywords
node
parameters
computing
intermediate representation
shape
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310834277.3A
Other languages
Chinese (zh)
Other versions
CN116560666B (en
Inventor
鲍国庆
石恒
张亚林
姚建国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Suiyuan Technology Co ltd
Original Assignee
Shanghai Enflame Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Enflame Technology Co ltd filed Critical Shanghai Enflame Technology Co ltd
Priority to CN202310834277.3A priority Critical patent/CN116560666B/en
Publication of CN116560666A publication Critical patent/CN116560666A/en
Application granted granted Critical
Publication of CN116560666B publication Critical patent/CN116560666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/447Target code generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/76Adapting program code to run in a different environment; Porting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an AI front end unified computing method, device and medium based on multi-level code generation. Comprising the following steps: analyzing a network model written or defined by a main stream AI computing framework and model standards to obtain node parameters of each computing node, and calling a unified API (application program interface) according to the node parameters to generate a consistent front-end computing diagram; deducing the output type and shape of each operation instance from the front end calculation graph in an iterative mode, and carrying out high-level intermediate representation and integration on each operation instance according to a static single assignment mode to generate a high-level intermediate representation of the front end calculation graph; the high-level intermediate representation of the front-end computational graph is escaped according to a multi-level descent criterion to generate a standard intermediate representation of the front-end computational graph to be compatible with existing AI compilation back-ends. The invention solves the problems of fragmentation of the computing front-end software, low compatibility and the like in the AI compiling field, and shows higher compatibility and faster end-to-end compiling and executing speed on different hardware platforms and mainstream AI models.

Description

AI front end unified computing method, device and medium based on multi-level code generation
Technical Field
The embodiment of the invention relates to the technical field of artificial intelligence, in particular to an AI front end unified computing method, device and medium based on multi-level code generation.
Background
With the continuous development of deep learning technology, the demand for large-scale deep neural network models (large models) is increasing. Training and reasoning use of large models requires a powerful computational effort and an efficient computational framework. Mainstream AI computing frameworks, such as PyTorch, tensorFlow and Keras, etc., provide an effective tool for building, training, and deploying deep learning models. However, existing AI frameworks rely excessively on handwriting operator libraries, which limits their use on new and specific hardware architectures (e.g., homemade DSA, FPGA, GPGPU, etc.), and handwriting operator libraries for specific hardware platforms are extremely expensive and difficult to accommodate for further development of large models.
With the advent of new AI compiling technologies such as multi-level intermediate representation, multi-level code generation technologies have become an effective solution to the weaknesses of the traditional AI computing framework, and AI models generated based on multi-level codes have been widely used in deep learning communities. The method provides a bridge for constructing an AI computing framework and underlying hardware, and AI computing front ends generated based on multi-level codes, such as IREE-TF, torch-MLIR and ONNX-MLIR, are developed gradually in the last two years and are valued by a deep learning community, and the aim is to convert a network model represented by the existing mainstream AI computing framework into an intermediate representation through a code translation form and generate execution instructions of the corresponding model on different hardware devices through a multi-level code descent function.
Disclosure of Invention
The embodiment of the invention provides an AI front end unified computing method, device and medium based on multi-level code generation, which are used for realizing that a network model under a main stream AI computing frame can be converted into a consistent high-level intermediate representation and further compiled into executable codes on different hardware platforms, avoiding various limitations caused by a handwriting operator library and solving the problems of software fragmentation, low compatibility and the like in the existing AI computing front end based on multi-level code generation.
In a first aspect, an embodiment of the present invention provides an AI front end unified computing method based on multi-level code generation, including: analyzing a network model written or defined by a main stream AI computing framework and a model standard to obtain node parameters of each computing node, and calling a unified API (application program interface) according to the node parameters to generate a consistent front-end computing graph, wherein the front-end computing graph comprises computing examples corresponding to each computing node in the network model;
deducing the output type and shape of each operation instance from the front end calculation graph in an iterative mode, and carrying out high-level intermediate representation and integration on each operation instance according to a static single assignment mode to generate a high-level intermediate representation of the front end calculation graph;
And escaping the high-level intermediate representation of the front-end computational graph according to a multi-level descent standard to generate a standard intermediate representation of the front-end computational graph, so that the AI compiler generates execution instructions compatible with a specific hardware platform according to the standard intermediate representation.
In a second aspect, an embodiment of the present invention provides an AI front end unified computing device based on multi-level code generation, including:
the front-end computing graph acquisition module is used for analyzing a network model written or defined by a main stream AI computing framework and a model standard to acquire node parameters of each computing node, and calling a unified API (application program interface) according to the node parameters to generate a consistent front-end computing graph, wherein the front-end computing graph comprises computing examples corresponding to each computing node in the network model;
the high-level intermediate representation generation module is used for deducing the output type and shape of each operation instance from the front-end calculation graph in an iterative mode, and carrying out high-level intermediate representation and integration on each operation instance according to a static single assignment mode to generate the high-level intermediate representation of the front-end calculation graph;
and the standard intermediate representation generation module is used for generating the standard intermediate representation of the front-end computational graph by escaping the high-level intermediate representation of the front-end computational graph according to the multi-level descent standard, so that the AI compiler generates execution instructions compatible with a specific hardware platform according to the standard intermediate representation.
In a third aspect, an embodiment of the present invention provides a computer device including a memory, a processor, an AI accelerator, and a computer program stored on the memory and executable on the processor and the AI accelerator, where the processor implements the method described above when executing the program.
In a fourth aspect, embodiments of the present invention provide a storage medium having computer-executable instructions stored thereon a computer program which, when executed by a processor, implements a method as described above.
According to the method, the network model written or defined by the main stream computing framework and the model standard is analyzed through the AI front end unified computing method, the unified API interface is called to generate the consistent high-level intermediate representation, the standard intermediate representation of the front end computing diagram is generated through the multi-level descending standard, and therefore the network model under different AI computing frameworks can be enabled to be converted into the intermediate representation compatible with the existing AI compiler, and executable instructions adapting to different hardware platforms can be generated.
Drawings
FIG. 1 is a flowchart of a method for unified calculation of AI front end based on multi-level code generation according to an embodiment of the invention;
fig. 2 is an application scenario schematic diagram of an AI front end unified computing method based on multi-level code generation according to an embodiment of the present invention;
FIG. 3 is a graph showing the comparison of the inference speeds of UFRon and the calculation front-end IREE-TF of the current most advanced MLIR Tensorflow/Keras model after compiling each Tensorflow/Keras model according to the embodiment of the invention;
FIG. 4 is a graph showing the comparison of the inference speeds of UFRon and the current most advanced calculation front-end Torch-MLIR of the MLIR Pytorch model after compiling each Pytorch model according to the embodiment of the invention;
FIG. 5 is a graph showing the comparison of the inference speeds of UFRon, an embodiment of the present invention, and the compiled ONNX models of the front end ONNX-MLIR of the current most advanced MLIR ONNX calculation;
FIG. 6 is a flowchart of a method for unified calculation of AI front end based on multi-level code generation according to a second embodiment of the invention;
FIG. 7 is a schematic structural diagram of an AI front end unified computing device based on multi-level code generation according to a third embodiment of the invention;
fig. 8 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1 is a flowchart of an AI front end unified computing method based on multi-level code generation according to a first embodiment of the present invention, where the embodiment may be adapted to escape a network model written and defined by a mainstream AI computing framework and a model standard, so as to be compatible with applications in different hardware back ends.
Step S101, analyzing a network model written or defined by a main stream AI computing framework and a model standard to obtain node parameters of each computing node, and calling a unified API interface according to the node parameters to generate a consistent front-end computing graph.
Fig. 2 is a schematic diagram of an application scenario in an embodiment of the present invention, and the focus of the present application is to describe a process of processing a network model by an AI computing front end (front) to obtain a standard intermediate representation (TOSA IR), so that the generated standard intermediate representation can be adapted to specific hardware architectures under the layer after further processing by a back end (back end). And as shown in fig. 2, the AI unified computing front end includes: model converters, unified front end API computing interfaces, type and shape inference modules, graph and high level representation generators, and standard representation converters. Of course, the present embodiment is merely an example, and the specific configuration included in the AI-computing front-end is not limited thereto. In addition, in addition to converting network models written or defined by the mainstream AI computing framework and standard models, such as Tensorflow, keras, pytorch and ONNX, into an MLIR standard representation, the AI unified computing front end (Unified Computing Frontend, UFront) also provides a native computing front end interface that a user can define various network models using the UFront native interface.
Optionally, analyzing the network model written or defined by the main stream AI computing framework and the model standard to obtain node parameters of each computing node, and calling a unified API interface to generate a consistent front-end computing graph according to the node parameters, including: tracking and traversing the network model through a model converter to obtain computing nodes contained in the network model, and analyzing each computing node to obtain node parameters, wherein the node parameters comprise node names, node computing parameters and node input data; determining a target API interface in the unified front-end computing interface according to the node name through a model converter; and triggering the type and shape inference module by calling the target API interface, generating operation examples of all the calculation nodes according to the calling result, and constructing a front-end calculation graph according to all the operation examples.
Specifically, in this embodiment, after the network model is obtained by the AI unified computing front end, the network model is processed by the model converter, the unified front end computing interface and the type and shape deducing module to generate a front end computing diagram matched with the network model, where the front end computing diagram includes computing examples corresponding to computing nodes in the network model. The operation performed by the model converter is mainly to track and traverse the network model to obtain the computing nodes contained in the network model. After the computing nodes are obtained through traversing, the computing nodes are analyzed to obtain node parameters, and when the computing nodes are analyzed, the predefined processing branches are respectively used for analyzing the computing nodes according to the types of the computing nodes, for example, when the computing nodes are determined to be convolution computing nodes, conv2dNode processing branches are adopted for analyzing the computing nodes to obtain node computing parameters such as convolution kernel size, step size, boundary filling, output channel number and the like, and node names and node input data are analyzed. In addition, optional parameters and necessary parameters are generally included in the node calculation parameters, the necessary parameters refer to parameters that must be included in the calculation node for accurate escape and execution, and the optional parameters refer to parameters that do not affect the calculation node for accurate escape and execution. In addition, the model converter also determines a target API interface from the unified front-end computing interface according to the node name so as to convert the analyzed node parameters into front-end computing interface API calls, triggers the type and shape deducing module by calling the target API interface, and generates operation examples of all the computing nodes according to the calling result.
Optionally, the triggering of the type and shape inference module by calling the target API interface, and generating an operation instance of each computing node according to the calling result, includes: transmitting the node parameters to a type and shape deducing module through a target API interface, so that the type and shape deducing module determines node output data of the computing node according to the node parameters, and feeding the node output data back to the target API interface as a calling result, wherein the node output data comprises an output shape and an output type; and the target API interface generates an operation instance of each computing node according to the node parameters and the calling result of each computing node.
Optionally, determining node output data of the computing node according to the node parameter includes: classifying the nodes according to the node names and obtaining classification results, wherein the classification results comprise variable type nodes, variable shape nodes, tensor node creation and type shape invariant nodes; when the classification result is determined to be a variable type node, taking an input shape in node input data as an output shape, acquiring a target type in a calling parameter, taking the target type as an output type, and determining node output data according to the output shape and the output type; when the classification result is determined to be a variable-shape node, taking an input type in node input data as an output type, retrieving a matching calculation formula from a predefined template according to the node type, calculating an output shape according to the calculation formula and node parameters, and determining node output data according to the output shape and the output type; when the classification result is determined to be the creation tensor node, receiving an initial shape, an initial type and an initial value given in node parameters, taking the initial shape as an output shape, taking the initial type as an output type, and determining node output data according to the output shape, the output type and the initial value; when the classification result is determined to be the node with the unchanged type shape, the input shape and the input type in the input parameters are obtained, the input shape is taken as the output shape, the input type is taken as the output type, and the node output data is determined according to the output shape and the output type.
In a specific implementation, when the model converter determines a target API interface from a unified front-end computing interface and calls the target API interface, the called target API interface firstly checks node computing parameters analyzed by the model converter, specifically judges whether the computing parameters include necessary parameters, for example, when the computing node is determined to be a convolution computing node, whether the computing parameters include necessary parameters such as convolution kernel size, step size, output channel number and the like or not is determined, and when the computing node is determined to be not present, the API call triggers abnormal return and prompts corresponding error information so as to prompt the computing node to analyze errors; when the existence of the optional parameters is determined, whether the optional parameters exist is further determined, the optional parameters are directly used when the analyzed node parameters are determined to include the optional parameters, and the optional parameters are used after being complemented by default values when the analyzed node parameters are determined to not exist. The method comprises the steps that when a target API interface checks and determines that a calculation parameter is not problematic, a type and shape deducing module is triggered, node parameters obtained through analysis are transmitted to the type and shape deducing module, after the node parameters comprising node names, node calculation parameters and node input data are received by the type and shape deducing module, the output shape and the output type of a node are determined according to the content in the node parameters, and the node output data comprising the output shape and the output type are fed back to the target API interface of the triggering type and shape deducing module as a calling result.
It should be noted that, after the type and shape inference module in this embodiment obtains the node parameters, the node is classified according to the node name to obtain the classification result, and the node classification result in this embodiment includes four types, including a first type node, a second type node, a third type node, and a fourth type node. Wherein, for the third type, since the initial shape, the initial type, and the initial value are included in the node parameters, the type and shape inference module directly uses the initial information in the node parameters as node output data, and thus includes the output shape, the output type, and the initial value in the node output data for creating the tensor node. For the remaining three types, mainly determining an output type and an output shape, and determining node output data according to the output type and the output shape, wherein for the variable type node, the input shape in the node input data is directly used as the output shape is the same as the input shape; since the output type and the input type are not identical, the target type in the call parameter is acquired and the target type is taken as the output type. For the variable-shape node, the input type in the node input data is used as the output type because the output type is the same as the input type; since the output shape is different from the input shape, a matching calculation formula is retrieved from the predefined template, and the output shape is calculated according to the calculation formula and the node parameters, and the calculation formula of the output shape corresponding to each calculation node is stored in the predefined template in advance, and the specific content of the calculation formula is not limited in this embodiment. For the node with unchanged type shape, since the output shape is the same as the input shape and the output type is the same as the input type, the output type and the output shape can be directly determined according to the node parameters in the embodiment.
It should be noted that, the target API interface may obtain the node output data returned by the type and shape inference module, and generate the operation instance of the computing node according to the node parameter and the node output data sent by the type and shape inference module that are analyzed by the model converter. The operation examples of all the calculation nodes are fed back to the model converter through the target API interface; and the output data of the operation instance corresponding to the previous node is used as the input data of the next adjacent node through the model converter, so that the operation instance corresponding to the next adjacent node is further calculated.
It should be noted that the model transformer in this embodiment also performs an operator fusion operation to ensure that the network models written or defined by different computing frameworks and/or model standards have consistent high-level representations. For example, multi-headed attention (Multiheaded Attention) is a key operator of the Transformer model, represented in the ONNX format (model standard) as a set of sub-operators including slice (slice), matrix multiplication (matmul), deformation (reshape) and normalized exponential function (softmax), rather than a single advanced operator (multi-headed_attribute) defined in PyTorch, tensorFlow or Keras et al. In this case, the model transformer is implemented by fusing the sub-operators into a multi-headed attention operator and linking it to the input and output of the sequence of sub-operators and removing redundant parts. After the operator fusion, the network model is analyzed according to the operation to obtain the node parameters of each computing node.
Step S102, deducing the output type and shape of each operation instance from the front-end calculation graph in an iterative mode, and carrying out high-level intermediate representation and integration on each operation instance according to a static single assignment mode to generate the high-level intermediate representation of the front-end calculation graph.
Optionally, deducing, in an iterative manner, an output type and a shape of each operation instance for the front-end computation graph, and performing high-level intermediate representation and integration on each operation instance according to a static single-assignment form to generate a high-level intermediate representation of the front-end computation graph, including: obtaining instance parameters of each operation instance in the front-end calculation graph through a graph and high-level representation generator, and converting each operation instance according to the instance parameters in a static single-assignment form to obtain a corresponding high-level intermediate representation, wherein the instance parameters comprise: node parameters, node input data and node output data, wherein the node output data comprises type and shape inference results; acquiring corresponding weight parameters in the calculation parameters for the operation instance with weight, creating a tensor operation instance according to the shape type and the initial value given in the weight parameters, and taking the output of the tensor operation instance as the weight representation of the high-level intermediate representation of the current operation instance so as to update the high-level intermediate representation corresponding to the operation instance with weight; determining a head intermediate representation and a tail intermediate representation according to the input and output types and the shapes in the front-end computational graph, wherein the head intermediate representation comprises an additional function signature; a high-level intermediate representation of the front-end computational graph is constructed from the head intermediate representation, the tail intermediate representation, and the high-level intermediate representation of each operational instance.
Specifically, in this embodiment, after the model converter generates the front-end computation graph by calling the target API interface trigger type and shape inference module in the unified front-end computation interface, the front-end computation graph including the computation instances of each computation node is sent to the graph and high-level representation generator, and the graph and high-level representation generator performs format conversion on each computation instance in the front-end computation graph to generate a high-level intermediate representation of the front-end computation graph. For example, when the network model is a dual-input convolutional neural network, the high-level middle representation of the front-end computational graph corresponding to the network model is as follows:
func.func @forward(%input1: tensor<1x3x224x224xf16>) -> tensor<1x1000
xf16> {
%1="ufront.parameter"(){dtype="Half", initializer="0x558b4a523310", requires_grad=true}:() -> tensor<64x3x7x7xf16>
%2="ufront.conv2d"(%input1, %1){groups=1, kernel=[7, 7], operand_segment
_sizes=array<i32:1, 1, 0>, pad=[3, 3], stride=[2, 2]}:(tensor<1x3x224x224xf16>, tensor<64x3x7x7xf16>) -> tensor<1x64x112x112xf16>
...
...
%163="ufront.parameter"(){dtype="Half", initializer="0x558b47d99870", requires_grad=false}:() -> tensor<1x512x1x1xf16>
%164="ufront.batchnorm"(%159, %160, %161, %162, %163){affine=true, eps=0.00001, momentum=0.1, operand_segment_sizes=array<i32:1, 1, 1, 1, 1>, track_running_stats=true}:(tensor<1x512x7x7xf16>, tensor<1x512x1x1xf16>, tensor<1x512x1x1xf16>, tensor<1x512x1x1xf16>, tensor<1x512x1x1xf16>) -> tensor<1x512x7x7xf16>
%165="ufront.add"(%164,%149):(tensor<1x512x7x7xf16>, tensor<1x512
x7x7xf16>) -> tensor<1x512x7x7xf16>
%166="ufront.relu"(%165):(tensor<1x512x7x7xf16>) -> tensor<1x512x7
x7xf16>
%167="ufront.pool2d"(%166){output_size=[1, 1], pool_type="POOL_
ADAPTIVE"}:(tensor<1x512x7x7xf16>) -> tensor<1x512x1x1xf16>
%168="ufront.flat"(%167){end_dim=-1, start_dim=1}:(tensor<1x512x1x1
xf16>) -> tensor<1x512xf16>
%169="ufront.parameter"(){dtype="Half", initializer="0x558b4895d1a0", requires_grad=true}:() -> tensor<1000x512xf16>
%170="ufront.parameter"(){dtype="Half", initializer="0x558b473f3a00", requires_grad=true}:() -> tensor<1000xf16>
%171="ufront.linear"(%168, %169, %170){operand_segment_sizes=array<i32:1, 1, 1>}:(tensor<1x512xf16>, tensor<1000x512xf16>, tensor<1000xf16>) -> tensor<1x1000xf16>
return %171: tensor<1x1000xf16>
}
however, for the sake of space limitation, this embodiment is merely illustrative, and the specific content of the high-level intermediate representation of the front-end computation graph is not limited thereto. In this embodiment, the instance parameters of each operation instance in the front-end computation graph are obtained by the graph and high-level representation generator, where the instance parameters include node parameters and node output data, and each operation instance is converted according to the instance parameters according to a static single assignment form to obtain a corresponding high-level intermediate representation, such as "% 165=" ufront.add "(% 164,% 149)
(tensor <1x512x7x7xf16 >), tensor <1x512x7x7xf16> - > tensor <1x512x7x7xf16> "is a high-level intermediate representation corresponding to one operation instance in the front-end computational graph. And the high of each operation instance
The input and output types and shapes of the intermediate representation are determined based on the calculation parameters, the input types and shapes, and the type and shape inferences.
In addition, a head intermediate representation, e.g., "func. Func @ forward (% input1: tensor <1x3x224x224xf16 >) - > tensor <1x1000xf16>", and a tail intermediate representation, e.g., "return%171:tensor <1x1000xf16>", are determined from the input and output types and shapes in the front end computational graph, and an additional function signature, e.g., "func. Func @ forward", is also included in the head intermediate representation. Thus constructing a high-level intermediate representation of the front-end computational graph as shown above from the acquired head intermediate representation, tail intermediate representation, and high-level intermediate representation of each operational instance.
In the present embodiment, when the high-level intermediate representation corresponding to the operation instance is obtained, the corresponding weight parameter in the calculation parameter is obtained for the operation instance with weight, the tensor operation instance is created according to the shape type and the initial value given in the weight parameter, and the output of the tensor operation is used as the weight representation of the high-level intermediate representation of the current operation instance, so as to update the high-level intermediate representation corresponding to the operation instance with weight. For example:
%1="ufront.parameter"(){dtype="Float", initializer="0x558b4a523310", requires_grad=true}:() -> tensor<64x3x7x7xf32>
%2="ufront.conv2d"(%input1, %1){groups=1, kernel=[7, 7], operand
segment_sizes=array<i32:1, 1, 0>, pad=[3, 3], stride=[2, 2]}:(tensor<1x3x224x224x
f32>, tensor<64x3x7x7xf32>) -> tensor<1x64x112x112xf32>
Analyzing the upper network model to obtain weight parameters and obtaining the memory address '0 x558b 4' where the weight value is
a523310", creating tensors by the" ufront. Parameter "operator, where type (dtype) single precision (flow), type and shape tensor <64x3x7x7xf32> etc. are weight parameter escape results. After the weight representation is obtained, it is updated to a high-level intermediate representation of the operation instance, i.e. the input weight representation in ufront.conv2d above is%1, input weight type and shape tensor <64x3x7x7xf32>.
Step S103, the high-level intermediate representation of the front-end computational graph is subjected to escape according to the multi-level descent standard to generate a standard intermediate representation of the front-end computational graph, so that the AI compiler generates execution instructions compatible with a specific hardware platform according to the standard intermediate representation.
Optionally, escaping the high-level intermediate representation of the front-end computational graph according to the multi-level descent criterion generates a standard intermediate representation of the front-end computational graph, comprising: obtaining a high-level intermediate representation of each operation example in the front-end calculation graph through a standard representation converter, wherein the high-level intermediate representation of the operation example comprises a high-level operator, original input data, calculation parameters, input and output types and shapes; determining a target operator or a target operator sequence according to a high-level operator, wherein the target operator is an operator of a multi-level descent standard; obtaining a target data format of the adaptation standard intermediate expression, and inserting a data conversion operator into a target operator or a target operator sequence when the original input data format is not matched with the target data format, so as to realize correct escape of the high-level intermediate expression; the calculation parameters are subjected to escape according to a multi-level descent standard to obtain escape parameters; when the fact that the high-level intermediate representation of the operation instance with the weight does not carry the weight parameter is determined, determining the type and the shape of the weight according to the input and output type and shape, generating the weight according to the type and the shape of the weight based on a multi-level descent criterion, and initializing the generated weight to obtain the weight representation; a standard intermediate representation of the front-end computational graph is generated from the target operator or sequence of target operators, the input-output data type, the shape, the weight representation, and the escape parameters.
Specifically, the graph and high-level representation generator performs format conversion on each operation instance in the front-end computation graph to generate a high-level intermediate representation of the front-end computation graph, and then transmits the high-level intermediate representation of the front-end computation graph to the standard representation converter, so that the high-level representation is reduced to the standard intermediate representation of the front-end computation graph through the standard representation conversion to be compatible with MLIR ecology. An example of a standard intermediate representation of the acquired front end computation graph is shown below:
func.func @forward(%arg0: tensor<1x3x224x224xf32>) -> tensor<1x1000
xf32> {
...
%1 = "tosa.const"() {value = dense<[0, 2, 3, 1]> : tensor<4xi64>} : () -> tensor<4xi64>
%2 = "tosa.const"() {value = dense<0.000000e+00> : tensor<64xf32>} : () -> tensor<64xf32>
%3 = "tosa.const"() {value = dense<[0, 3, 1, 2]> : tensor<4xi64>} : () -> tensor<4xi64>
...
%107 = "tosa.transpose"(%arg0, %1) : (tensor<1x3x224x224xf32>, tensor<4xi64>) -> tensor<1x224x224x3xf32>
%108 = "tosa.transpose"(%0, %1) : (tensor<64x3x7x7xf32>, tensor<4xi64>) -> tensor<64x7x7x3xf32>
%109 = "tosa.conv2d"(%107, %108, %2) {dilation = array<i64: 1, 1>, pad = array<i64: 3, 3, 3, 3>, stride = array<i64: 2, 2>} : (tensor<1x224x224x3xf32>, tensor<64x7x7x3xf32>, tensor<64xf32>) -> tensor<1x112x112x64xf32>
%110 = "tosa.transpose"(%109, %3) : (tensor<1x112x112x64xf32>, tensor<4xi64>) -> tensor<1x64x112x112xf32>
%111 = "tosa.sub"(%110, %6) : (tensor<1x64x112x112xf32>, tensor<1x64x1x1xf32>) -> tensor<1x64x112x112xf32>
...
%336 = "tosa.avg_pool2d"(%335) {kernel = array<i64: 7, 7>, pad = array<i64: 0, 0, 0, 0>, stride = array<i64: 1, 1>} : (tensor<1x7x7x512xf32>) -> tensor<1x1x1x512xf32>
...
%344 = "tosa.reshape"(%343) {new_shape = array<i64: 1, 1000>} : (tensor<1x1x1000xf32>) -> tensor<1x1000xf32>
return %344 : tensor<1x1000xf32>
}
however, for the sake of space limitation, this embodiment is only illustrative, and the specific content of the standard intermediate representation of the front-end calculation map is not limited thereto. When the standard intermediate representation of the front-end computation graph is obtained, specifically, the standard converter is used for obtaining the high-level intermediate representation of each computation example in the front-end computation graph, and the high-level intermediate representation of the computation example contains information such as a high-level operator, original input data, computation parameters, input/output types, shapes and the like, so that the escape is performed according to the information. The objects that are escape are operators, data formats, computation parameters, weights, etc. of the high-level intermediate representation of the operation instance, respectively. For the operators, because of the correspondence between the preset operators of the high-level and multi-level descent standards, after the high-level operators represented in the middle of the high-level of the operation instance are obtained, the target operators or the target operator sequences are obtained by searching the correspondence.
It should be noted that, in the present embodiment, the calculation parameters are adapted to the multi-level descent standard to obtain the escape parameters, and the escape parameters conform to the multi-level descent standard. And when it is determined that the high-level intermediate representation of the operation instance that should have the weight does not carry the weight parameter, then the type and shape of the weight are determined according to the input-output type and shape, and the weight is generated according to the type and shape of the weight based on the multi-level descent criterion, and the generated weight is initialized to obtain the weight representation, e.g., the high-level intermediate representation of the operation instance is: % 16= "ufront.linear" (% 15) - (tensor <1x512xf32 >) - > tensor <1x10xf32>, where the linear layer has 512 inputs and 10 outputs, which can be locally dropped to TOSA representation by weight replenishment (ufront.elide), as follows:
%46="ufront.elided"(){init="linear", linear_output_shape=[1,10]}:() -> tensor<1x512x10xf32>
%47="tosa.matmul"(%45,%46)(tensor<1x1x512xf32>,tensor<1x512x10xf32>) -> tensor<1x1x10xf32>
then by converting the temporary weights into Const form down completely to TOSA representation,
the%4= "tosa.const" () { value=dense "<" 0x85B5D8BD22F3913EE3B2503E 2."> }" # partial weight value shows
....
%47 = "tosa.matmul"(%46, %4) : (tensor<1x1x512xf32>, tensor<1x512x10xf32>) -> tensor<1x1x10xf32>
If the weight initialization parameter or the non-initialized weight is not specified in the user-defined model, the initialization is performed by using a linear method by default, namely: Wherein (1)>. In addition, the target data format expressed in the middle of the adaptation standard is also obtained, when the original input data format is not matched with the target data formatAt this time, a data conversion operator is inserted in the target operator or sequence of target operators to achieve correct escape of the high-level intermediate representation. Because, like the linear layer, the conversion of other operation instances, such as convolutional layers, may involve data format matching issues, UFront can ensure that its data format matches the standard MLIR (TOSA) representation during the descent of the high-level representation. UFront inserts a transfer function, if necessary, converting the channel first format to the channel last format, and back to the channel first format after the corresponding calculation is completed, as follows:
% 107= "tosa. Transfer" (% arg0,% 1) ((tensor <1x3x224x224xf32 >), tensor <4xi64 >) - > tensor <1x224x224x3xf32> # converts input data into channel post format
% 108= "tosa. Transfer" (% 0,% 1) ((tensor <64x3x7x7xf32 >), tensor <4xi64 >) - > tensor <64x7x7x3xf32> # converts weights to channel post format
%109 = "tosa.conv2d"(%107, %108, %2) {dilation = array<i64: 1, 1>, pad = array<i64: 3, 3, 3, 3>, stride = array<i64: 2, 2>} : (tensor<1x224x224x3xf32>, tensor<64x7x7x3xf32>, tensor<64xf32>) -> tensor<1x112x112x64xf32>
% 110= "tosa" (% 109,% 3) ((tensor <1x112x112x64xf32 >), tensor <4xi64 >) - > tensor <1x64x112x112xf32> # converts output back to channel-forward format
Thereby generating a standard intermediate representation of the front-end computational graph from the target operator or sequence of target operators, the input-output data type, the shape, the weight representation, and the escape parameters.
It is worth mentioning that the AI unified computing front end UFront of the present embodiment has a significant improvement in not only the end-to-end compiling performance, compatibility, but also the inference speed compared to the three most advanced MLIR front ends/AI computing front ends (i.e., IREE-TF, torch-MLIR and ONNX-MLIR). Table 1 below shows a comparative list of end-to-end compilation speeds (in seconds) for UFRET and IREE-TF, torch-MLIR and ONNX-MLIR:
TABLE 1
The 8 common network models ResNet18, resNet50, mobileNet V3, sheffeNet V2, squeezeNet, denseNet121, inceptionV3, vision Transformer (ViT) are respectively written by using Tensorflow/Keras, pytorch, ONNX, and each AI computing front end is used for end-to-end compiling, namely, the model is compiled into an executable file.
Table 2 and FIG. 3 below show that using UFONT and IREE-TF to compile each Tensorflow/Keras model, and performing ImageNet-1k reasoning on CPU (left in FIG. 3) and GPU (right in FIG. 3) hardware platforms, respectively, measuring average time consumption (milliseconds) per sample reasoning, and the results show that the execution performance of each hardware platform code generated by the embodiment is significantly better than that of the current most advanced MLIR Tensorflow/Keras calculation front-end IREE-TF.
TABLE 2
Table 3 and FIG. 4 below show that using UFONT and Torch-MLIR to compile each Pytorch model and execute image Net-1k reasoning on CPU and GPU hardware platforms, respectively, measuring average time (milliseconds) per sample reasoning, and the results show that the execution performance of each hardware platform code generated by the embodiment is significantly better than that of the current most advanced MLIR Pytorch calculation front end Torch-MLIR. Wherein Torch-MLIR cannot generate executable files of the other four models of ShuffeNetV 2, denseNet121, inceptionV3, vision Transformer (ViT).
TABLE 3 Table 3
Table 4 and FIG. 5 below show that using UFONT and ONNX-MLIR to compile respective ONNX models and to perform image Net-1k reasoning on CPU and GPU hardware platforms, respectively, measuring average time (milliseconds) per sample reasoning, and the results show that the execution performance of each hardware platform code generated by the embodiment is significantly better than that of the current most advanced MLIR ONNX calculation front-end ONNX-MLIR. Wherein ONNX-MLIR cannot generate executable files of models DenseNet121, inceptionV3 and Vision Transformer (ViT); meanwhile, ONNX-MLIR cannot generate the executable file at the GPU end corresponding to the ONNX model, so performance comparison on the GPU is not shown.
TABLE 4 Table 4
According to the method, the network model written or defined by the main stream computing framework and the model standard is analyzed through the AI front end unified computing method, the unified API interface is called to generate the consistent high-level intermediate representation, the standard intermediate representation of the front end computing diagram is generated through the multi-level descending standard, and therefore the network model under different AI computing frameworks can be enabled to be converted into the intermediate representation compatible with the existing AI compiler, and executable instructions adapting to different hardware platforms can be generated.
Example two
Fig. 6 is a schematic diagram of a method for uniformly calculating an AI front end based on multi-level code generation according to a second embodiment of the present invention, wherein after a high-level intermediate representation of a front end calculation map is subjected to escape according to a multi-level descent criterion to generate a standard intermediate representation of the front end calculation map, the method further includes verifying the standard intermediate representation of the front end calculation map, as shown in fig. 6, and includes:
step S201, analyzing the network model written or defined by the main stream AI computing framework and the model standard to obtain node parameters of each computing node, and calling a unified API interface to generate a consistent front-end computing graph according to the node parameters.
Optionally, analyzing a network model written or defined by the mainstream AI computing framework and the model standard to obtain node parameters of each computing node, and calling a unified API interface to generate a consistent front-end computing graph according to the node parameters, including: tracking and traversing the network model through a model converter to obtain computing nodes contained in the network model, and analyzing each computing node to obtain node parameters, wherein the node parameters comprise node names, node computing parameters and node input data; determining a target API interface in the unified front-end computing interface according to the node name through a model converter; and triggering the type and shape inference module by calling the target API interface, generating operation examples of all the calculation nodes according to the calling result, and constructing a front-end calculation graph according to all the operation examples.
Optionally, the triggering of the type and shape inference module by calling the target API interface, and generating an operation instance of each computing node according to the calling result, includes: transmitting the node parameters to a type and shape deducing module through a target API interface, so that the type and shape deducing module determines node output data of the computing node according to the node parameters, and feeding the node output data back to the target API interface as a calling result, wherein the node output data comprises an output shape and an output type; and the target API interface generates an operation instance of each computing node according to the node parameters and the calling result of each computing node.
Step S202, deducing the output type and shape of each operation instance from the front-end calculation graph in an iterative mode, and carrying out high-level intermediate representation and integration on each operation instance according to a static single assignment mode to generate the high-level intermediate representation of the front-end calculation graph.
Optionally, deducing, in an iterative manner, an output type and a shape of each operation instance for the front-end computation graph, and performing high-level intermediate representation and integration on each operation instance according to a static single-assignment form to generate a high-level intermediate representation of the front-end computation graph, including: obtaining example parameters of each operation example in the front-end calculation diagram through a diagram and high-level representation generator, and converting each operation example according to the example parameters in a static single assignment form to obtain a corresponding high-level intermediate representation, wherein the example parameters comprise: node parameters, node input data and node output data, wherein the node output data comprises type and shape inference results; acquiring corresponding weight parameters in the calculation parameters for the operation instance with weight, creating a tensor operation instance according to the shape type and the initial value given in the weight parameters, and taking the output of the tensor operation instance as the weight representation of the high-level intermediate representation of the current operation instance so as to update the high-level intermediate representation corresponding to the operation instance with weight; determining a head intermediate representation and a tail intermediate representation according to the input and output types and the shapes in the front-end computational graph, wherein the head intermediate representation comprises an additional function signature; a high-level intermediate representation of the front-end computational graph is constructed from the head intermediate representation, the tail intermediate representation, and the high-level intermediate representation of each operational instance.
In step S203, the high-level intermediate representation of the front-end computation graph is escaped according to the multi-level descent criterion to generate a standard intermediate representation of the front-end computation graph, so that the AI compiler further generates execution instructions compatible with the specific hardware platform according to the standard intermediate representation.
Optionally, escaping the high-level intermediate representation of the front-end computational graph according to the multi-level descent criterion generates a standard intermediate representation of the front-end computational graph, comprising: obtaining a high-level intermediate representation of each operation example in the front-end calculation graph through a standard representation converter, wherein the high-level intermediate representation of the operation example comprises a high-level operator, original input data, calculation parameters, input and output types and shapes; determining a target operator or a target operator sequence according to a high-level operator, wherein the target operator is an operator of a multi-level descent standard; obtaining a target data format of the adaptation standard intermediate expression, and inserting a data conversion operator into a target operator or a target operator sequence when the original input data format is not matched with the target data format, so as to realize correct escape of the high-level intermediate expression; the calculation parameters are subjected to escape according to a multi-level descent standard to obtain escape parameters; when the fact that the high-level intermediate representation of the operation instance with the weight does not carry the weight parameter is determined, determining the type and the shape of the weight according to the input and output type and shape, generating the weight according to the type and the shape of the weight based on a multi-level descent criterion, and initializing the generated weight to obtain the weight representation; a standard intermediate representation of the front-end computational graph is generated from the target operator or sequence of target operators, the input-output data type, the shape, the weight representation, and the escape parameters.
Step S204, checking the standard intermediate representation of the front-end calculation graph.
Specifically, after the standard intermediate representation of the front-end computing graph is acquired, the front-end computing graph can be further compiled into an executable program of a specific hardware platform through a back-end AI compiler, and when the acquired standard intermediate representation is judged to have errors in the back-end compiling process, an escape process of the standard representation converter is determined, or an execution error occurs in the processing process of the graph and the high-level representation generator.
When the situation occurs, alarm information is sent out to prompt a front-end user to correct the written network model, so that the accuracy of high-level intermediate representation and standard intermediate representation of the obtained network model is ensured. The alarm information may be in the form of an image or voice, and the specific form of the alarm information is not limited in this embodiment.
According to the method, the network model written or defined by the main stream AI computing framework or model standard is subjected to escape in a multi-level intermediate representation mode through the AI front-end unified computing method, and the standard intermediate representation of the front-end computing diagram is generated, so that the network model under different AI computing frameworks or model standards can be ensured to escape into the intermediate representation compatible with the existing AI compiler, and executable instructions adapting to different hardware platforms can be generated according to the intermediate representation.
Example III
Fig. 7 is a schematic structural diagram of an AI front end unified computing device based on multi-level code generation according to a third embodiment of the present invention, where, as shown in fig. 7, the device includes: the front-end computational graph acquisition module 310, the high-level intermediate representation generation module 320 of the front-end computational graph, and the standard intermediate representation generation module 330 of the front-end computational graph.
The front-end computation graph obtaining module 310 is configured to parse a network model written or defined by a mainstream AI computation framework and a model standard to obtain node parameters of each computation node, and call a unified API interface according to the node parameters to generate a consistent front-end computation graph, where the front-end computation graph includes computation instances corresponding to each computation node in the network model;
the high-level intermediate representation generating module 320 of the front-end computation graph is configured to infer, in an iterative manner, an output type and a shape of each computation instance for the front-end computation graph, and perform high-level intermediate representation and integration on each computation instance according to a static single assignment form to generate a high-level intermediate representation of the front-end computation graph;
the standard intermediate representation generation module 330 of the front-end computational graph, which is configured to generate the standard intermediate representation of the front-end computational graph by escaping the high-level intermediate representation of the front-end computational graph according to the multi-level descent standard, further enables the AI compiler to generate execution instructions compatible with the specific hardware platform according to the standard intermediate representation.
Optionally, the AI computing front-end includes a model converter, a unified front-end computing interface, a type and shape inference module, a graph and high-level representation generator, and a standard representation converter.
Optionally, the front-end computation graph obtaining module includes:
the analysis unit is used for tracking and traversing the network model through the model converter to obtain computing nodes contained in the network model, and analyzing each computing node to obtain node parameters, wherein the node parameters comprise node names, node computing parameters and node input data;
the target API interface determining unit is used for determining a target API interface in the unified front-end computing interface according to the node name through the model converter;
the front-end computing graph construction unit is used for triggering the type and shape inference module by calling the target API interface, generating computing examples of all computing nodes according to the calling result, and constructing the front-end computing graph according to all the computing examples.
Optionally, the device further comprises a calculation parameter checking module, configured to check a node calculation parameter of the call target API interface, where the calculation parameter is divided into an optional parameter and a necessary parameter;
triggering exception return and prompting corresponding error information when determining that the necessary parameters are absent according to the checking result;
When the optional parameters are determined to exist according to the checking result, the method is used;
and supplementing the optional parameters by adopting default values when the optional parameters are determined to be absent according to the checking result.
Optionally, the front-end computation graph construction unit is configured to transmit the node parameters to the type and shape inference module through the target API interface, so that the type and shape inference module determines node output data of the computation node according to the node parameters, and feeds back the node output data to the target API interface as a call result, where the node output data includes an output shape and an output type;
and the target API interface generates an operation instance of each computing node according to the node parameters and the calling result of each computing node.
Optionally, the device further comprises an operation instance feedback module, which is used for feeding back the operation instance of each calculation node to the model converter through the target API interface;
and taking the output data of the operation example corresponding to the previous node as the input data of the next adjacent node through the model converter.
Optionally, the front-end computation graph construction unit is further configured to classify the nodes according to the node names and obtain classification results, where the classification results include variable type nodes, variable shape nodes, tensor node creation and type shape invariant nodes;
When the classification result is determined to be a variable type node, taking an input shape in node input data as an output shape, acquiring a target type in a calling parameter, taking the target type as an output type, and determining node output data according to the output shape and the output type;
when the classification result is determined to be a variable-shape node, taking an input type in node input data as an output type, retrieving a matching calculation formula from a predefined template according to the node type, calculating an output shape according to the calculation formula and node parameters, and determining node output data according to the output shape and the output type;
when the classification result is determined to be the creation tensor node, receiving an initial shape, an initial type and an initial value given in node parameters, taking the initial shape as an output shape, taking the initial type as an output type, and determining node output data according to the output shape, the output type and the initial value;
when the classification result is determined to be the node with the unchanged type shape, the input shape and the input type in the input parameters are obtained, the input shape is taken as the output shape, the input type is taken as the output type, and the node output data is determined according to the output shape and the output type.
Optionally, the high-level intermediate representation generating module of the front-end computation graph is configured to obtain, by using the graph and high-level representation generator, instance parameters of each operation instance in the front-end computation graph, and convert each operation instance according to the instance parameters in a static single assignment form to obtain a corresponding high-level intermediate representation, where the instance parameters include: node parameters, node input data and node output data, wherein the node output data comprises type and shape inference results;
acquiring corresponding weight parameters in the calculation parameters for the operation instance with weight, creating a tensor operation instance according to the shape type and the initial value given in the weight parameters, and taking the output of the tensor operation instance as the weight representation of the high-level intermediate representation of the current operation instance so as to update the high-level intermediate representation corresponding to the operation instance with weight;
determining a head intermediate representation and a tail intermediate representation according to the input and output types and the shapes in the front-end computational graph, wherein the head intermediate representation comprises an additional function signature;
a high-level intermediate representation of the front-end computational graph is constructed from the head intermediate representation, the tail intermediate representation, and the high-level intermediate representation of each operational instance.
Optionally, the standard intermediate representation generating module of the front end computing graph is configured to obtain, by using a standard representation converter, a high-level intermediate representation of each operation instance in the front end computing graph, where the high-level intermediate representation of each operation instance includes a high-level operator, original input data, a calculation parameter, an input/output type and a shape;
determining a target operator or a target operator sequence according to a high-level operator, wherein the target operator is an operator of a multi-level descent standard;
obtaining a target data format of the adaptation standard intermediate expression, and inserting a data conversion operator into a target operator or a target operator sequence when the original input data format is not matched with the target data format, so as to realize correct escape of the high-level intermediate expression;
the calculation parameters are subjected to escape according to a multi-level descent standard to obtain escape parameters;
when the fact that the high-level intermediate representation of the operation instance with the weight does not carry the weight parameter is determined, determining the type and the shape of the weight according to the input and output type and shape, generating the weight according to the type and the shape of the weight based on a multi-level descent criterion, and initializing the generated weight to obtain the weight representation;
A standard intermediate representation of the front-end computational graph is generated from the target operator or sequence of target operators, the input-output data type, the shape, the weight representation, and the escape parameters.
Example IV
Fig. 8 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention, and as shown in fig. 8, the computer device includes a processor 610, a memory 620, an input device 630, an output device 640, and an AI accelerator 650; the number of processors 610 and AI accelerators 650 in the computer device may be one or more, one processor 610 and one AI accelerator 650 being illustrated in fig. 5; the processor 610, memory 620, input device 630, output device 640, and AI accelerator 650 in the computer device may be connected by a bus or other means, for example by a bus connection in fig. 8.
The memory 620 is used as a computer readable storage medium for storing software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the AI front end unified computing method based on multi-level code generation in the embodiment of the invention. The processor 610 executes various functional applications of the computer device and data processing by running software programs, instructions and modules stored in the memory 620, i.e., implements the above-described AI front-end unified computing method based on multi-level code generation.
Memory 620 may include primarily a program storage area and a data storage area, wherein the program storage area may store an operating system, at least one application program required for functionality; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 620 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 620 may further include memory remotely located relative to processor 610, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The AI accelerator 650 has acceleration instructions for general AI computation, and the above-mentioned AI front end unified computation method performs model analysis and escape after obtaining the network model, and the final escape result is compiled by the AI compiler to generate an executable program and saved in the data storage area of the memory 620, where the executable program is divided into a scheduler executable on the processor 610 and a computing core instruction executable on the AI accelerator 650. The AI acceleration calculation process is to load the scheduler from the data storage area of the memory 620 to the program storage area and execute the scheduler by the processor 610, the scheduler further loads the calculation core instruction and the calculation data to the AI accelerator 650, the AI accelerator 650 returns the result data to the scheduler after completing the acceleration calculation, and the scheduler saves the result data in the data storage area of the memory 620.
The input device 630 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the computer device. The output device 640 may include a display device such as a display screen.
Example five
The fifth embodiment of the present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing an AI front-end unified computing method based on multi-level code generation.
Of course, the storage medium containing the computer executable instructions provided by the embodiments of the present invention is not limited to the above-described method operations, but may also perform the related operations in the AI front end unified computing method based on multi-level code generation provided by any embodiment of the present invention.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.
It should be noted that, in the embodiment of the AI front end unified computing device based on multi-level code generation described above, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (12)

1. An AI front end unified calculation method based on multi-level code generation is characterized in that:
Analyzing a network model written or defined by a main stream AI computing framework and a model standard to obtain node parameters of each computing node, and calling a unified API (application program interface) according to the node parameters to generate a consistent front-end computing graph, wherein the front-end computing graph comprises computing examples corresponding to each computing node in the network model;
deducing the output type and shape of each operation example from the front-end calculation graph in an iterative mode, and carrying out high-level intermediate representation and integration on each operation example according to a static single assignment mode to generate a high-level intermediate representation of the front-end calculation graph;
and escaping the high-level intermediate representation of the front-end computational graph according to a multi-level descent standard to generate a standard intermediate representation of the front-end computational graph, so that the AI compiler generates execution instructions compatible with a specific hardware platform according to the standard intermediate representation.
2. The method of claim 1, wherein the AI computing front-end comprises a model converter, a unified front-end computing interface, a type and shape inference module, a graph and high-level representation generator, and a standard representation converter.
3. The method of claim 2, wherein parsing the network model written or defined by the mainstream AI computing framework and the model standard to obtain node parameters of each computing node, and calling a unified API interface to generate a consistent front-end computational graph according to the node parameters, comprises:
Tracking and traversing the network model through the model converter to obtain computing nodes contained in the network model, and analyzing each computing node to obtain the node parameters, wherein the node parameters comprise node names, node computing parameters and node input data;
determining a target API interface in the unified front-end computing interface according to the node name through the model converter;
and triggering a type and shape inference module by calling the target API interface, generating operation examples of all the calculation nodes according to a calling result, and constructing the front-end calculation graph according to all the operation examples.
4. The method of claim 3, further comprising, prior to said triggering a type and shape inference module by invoking said target API interface:
checking the node calculation parameters calling the target API interface, wherein the calculation parameters are divided into optional parameters and necessary parameters;
triggering exception return and prompting corresponding error information when determining that the necessary parameters are absent according to the checking result;
when the optional parameters are determined to exist according to the checking result, the method is used;
and supplementing the optional parameters by adopting default values when the optional parameters are determined to be absent according to the checking result.
5. The method according to claim 3, wherein the triggering the type and shape inference module by calling the target API interface and generating the operation instance of each computing node according to the calling result comprises:
transmitting the node parameters to the type and shape inference module through the target API interface, so that the type and shape inference module determines node output data of a computing node according to the node parameters, and feeds back the node output data to the target API interface as a calling result, wherein the node output data comprises an output shape and an output type;
and the target API interface generates an operation instance of each computing node according to the node parameters of each computing node and the calling result.
6. The method of claim 5, wherein after the target API interface generates an operation instance of each computing node according to the node parameters of each computing node and the call result, further comprising:
feeding back the operation instance of each calculation node to the model converter through the target API interface;
and taking the output data of the operation example corresponding to the previous node as the input data of the next adjacent node through the model converter.
7. The method of claim 5, wherein determining node output data of a computing node based on the node parameters comprises:
classifying the nodes according to the node names and obtaining classification results, wherein the classification results comprise variable type nodes, variable shape nodes, tensor node creation and type shape invariant nodes;
when the classification result is determined to be a variable type node, taking an input shape in node input data as the output shape, acquiring a target type in a calling parameter, taking the target type as the output type, and determining node output data according to the output shape and the output type;
when the classification result is determined to be a variable-shape node, taking an input type in node input data as the output type, retrieving a matching calculation formula from a predefined template according to the node type, calculating the output shape according to the calculation formula and node parameters, and determining the node output data according to the output shape and the output type;
when the classification result is determined to be that a tensor node is created, receiving an initial shape, an initial type and an initial value given in node parameters, taking the initial shape as the output shape, taking the initial type as the output type, and determining node output data according to the output shape, the output type and the initial value;
When the classification result is determined to be a node with unchanged type shape, an input shape and an input type in input parameters are obtained, the input shape is taken as the output shape, the input type is taken as the output type, and the node output data is determined according to the output shape and the output type.
8. The method of claim 2, wherein iteratively inferring the output type and shape of each operation instance for the front-end computational graph, high-level intermediate representation and integration of each operation instance in a static single-valued form to generate a high-level intermediate representation of the front-end computational graph, comprising:
obtaining, by the graph and high-level representation generator, instance parameters of each operation instance in the front-end computation graph, and converting each operation instance according to the instance parameters in a static single-assignment form to obtain a corresponding high-level intermediate representation, where the instance parameters include: the node parameters, the node input data and the node output data comprise type and shape inference results;
acquiring corresponding weight parameters in the calculation parameters for the operation instance with weight, creating a tensor operation instance according to the shape type and the initial value given in the weight parameters, and taking the output of the tensor operation instance as the weight representation of the high-level intermediate representation of the current operation instance so as to update the high-level intermediate representation corresponding to the operation instance with weight;
Determining a head intermediate representation and a tail intermediate representation according to the input and output types and the shapes in the front-end calculation graph, wherein the head intermediate representation comprises an additional function signature;
a high-level intermediate representation of the front-end computational graph is constructed from the head intermediate representation, the tail intermediate representation, and the high-level intermediate representation of each of the operational instances.
9. The method of claim 2, wherein said escaping the high-level intermediate representation of the front-end computational graph according to a multi-level descent criterion generates a standard intermediate representation of the front-end computational graph, comprising:
obtaining a high-level intermediate representation of each operation example in the front-end calculation graph through the standard representation converter, wherein the high-level intermediate representation of each operation example comprises a high-level operator, original input data, calculation parameters, input and output types and shapes;
determining a target operator or a target operator sequence according to the high-level operator, wherein the target operator is an operator of a multi-level descent criterion;
obtaining a target data format adapting to the standard intermediate expression, and inserting a data conversion operator into the target operator or a target operator sequence when the original input data format is not matched with the target data format so as to realize correct escape of the high-level intermediate expression;
The calculated parameters are subjected to escape according to the multi-level descent standard to obtain escape parameters;
when the fact that the high-level intermediate representation of the operation instance with the weight does not carry the weight parameter is determined, determining the type and the shape of the weight according to the input and output type and the shape, generating the weight according to the type and the shape of the weight based on the multi-level descent criterion, and initializing the generated weight to obtain the weight representation;
and generating a standard intermediate representation of the front-end calculation graph according to the target operator or the target operator sequence, the input and output data type, the shape, the weight representation and the escape parameter.
10. An AI front end unified computing device based on multi-level code generation, comprising:
the front-end computing graph acquisition module is used for analyzing a network model written or defined by a main stream AI computing framework and a model standard to acquire node parameters of each computing node, and calling a unified API (application program interface) according to the node parameters to generate a consistent front-end computing graph, wherein the front-end computing graph comprises computing examples corresponding to each computing node in the network model;
the high-level intermediate representation generation module is used for deducing the output type and shape of each operation instance from the front-end calculation graph in an iterative mode, and carrying out high-level intermediate representation and integration on each operation instance according to a static single assignment mode to generate the high-level intermediate representation of the front-end calculation graph;
And the standard intermediate representation generation module is used for generating the standard intermediate representation of the front-end computational graph by escaping the high-level intermediate representation of the front-end computational graph according to the multi-level descent standard, so that the AI compiler generates execution instructions compatible with a specific hardware platform according to the standard intermediate representation.
11. A computer device comprising a memory, a processor, an AI accelerator, and a computer program stored on the memory and executable on the processor and AI accelerator, wherein the processor implements the method of any of claims 1-9 when executing the program.
12. A storage medium having stored thereon computer program of instructions, which when executed by a processor, performs the method of any of claims 1-9.
CN202310834277.3A 2023-07-10 2023-07-10 AI front end unified computing method, device and medium based on multi-level code generation Active CN116560666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310834277.3A CN116560666B (en) 2023-07-10 2023-07-10 AI front end unified computing method, device and medium based on multi-level code generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310834277.3A CN116560666B (en) 2023-07-10 2023-07-10 AI front end unified computing method, device and medium based on multi-level code generation

Publications (2)

Publication Number Publication Date
CN116560666A true CN116560666A (en) 2023-08-08
CN116560666B CN116560666B (en) 2023-09-22

Family

ID=87488310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310834277.3A Active CN116560666B (en) 2023-07-10 2023-07-10 AI front end unified computing method, device and medium based on multi-level code generation

Country Status (1)

Country Link
CN (1) CN116560666B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117289914A (en) * 2023-11-22 2023-12-26 南京飓风引擎信息技术有限公司 Data conversion system and method based on SpringBoot framework

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113168564A (en) * 2018-12-20 2021-07-23 西门子股份公司 Method and system for generating artificial intelligence model
CN113298263A (en) * 2020-05-13 2021-08-24 阿里巴巴集团控股有限公司 Calculation graph processing method and device, model running method and device, electronic equipment, server and edge terminal
CN113885845A (en) * 2021-09-30 2022-01-04 苏州浪潮智能科技有限公司 Method, system, device and medium for generating calculation graph of deep learning compiler
US11263077B1 (en) * 2020-09-29 2022-03-01 Hailo Technologies Ltd. Neural network intermediate results safety mechanism in an artificial neural network processor
US20220137941A1 (en) * 2020-11-03 2022-05-05 Tsinghua University Compilation method, apparatus, computing device and medium
CN114461221A (en) * 2022-01-27 2022-05-10 北京奕斯伟计算技术有限公司 Compiling method, compiling device, electronic device, and storage medium
US20220383082A1 (en) * 2019-09-24 2022-12-01 Anhui Cambricon Information Technology Co., Ltd. Neural network processing method and apparatus, computer device and storage medium
CN115437637A (en) * 2021-06-02 2022-12-06 华为技术有限公司 Compiling method and related device
US20230008597A1 (en) * 2020-03-27 2023-01-12 Huawei Technologies Co., Ltd. Neural network model processing method and related device
US20230055313A1 (en) * 2020-01-21 2023-02-23 Inspur Suzhou Intelligent Technology Co., Ltd. Hardware environment-based data quantization method and apparatus, and readable storage medium
CN116149797A (en) * 2023-04-04 2023-05-23 上海燧原科技有限公司 Heterogeneous scene-oriented AI unified computing method, device, equipment and medium
US20230162048A1 (en) * 2021-11-25 2023-05-25 Zhejiang Lab Method for adapting deep learning framework to hardware device based on unified backend engine

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113168564A (en) * 2018-12-20 2021-07-23 西门子股份公司 Method and system for generating artificial intelligence model
US20220066409A1 (en) * 2018-12-20 2022-03-03 Siemens Aktiengesellschaft Method and system for generating an artificial intelligence model
US20220383082A1 (en) * 2019-09-24 2022-12-01 Anhui Cambricon Information Technology Co., Ltd. Neural network processing method and apparatus, computer device and storage medium
US20230055313A1 (en) * 2020-01-21 2023-02-23 Inspur Suzhou Intelligent Technology Co., Ltd. Hardware environment-based data quantization method and apparatus, and readable storage medium
US20230008597A1 (en) * 2020-03-27 2023-01-12 Huawei Technologies Co., Ltd. Neural network model processing method and related device
CN113298263A (en) * 2020-05-13 2021-08-24 阿里巴巴集团控股有限公司 Calculation graph processing method and device, model running method and device, electronic equipment, server and edge terminal
US11263077B1 (en) * 2020-09-29 2022-03-01 Hailo Technologies Ltd. Neural network intermediate results safety mechanism in an artificial neural network processor
US20220137941A1 (en) * 2020-11-03 2022-05-05 Tsinghua University Compilation method, apparatus, computing device and medium
CN115437637A (en) * 2021-06-02 2022-12-06 华为技术有限公司 Compiling method and related device
CN113885845A (en) * 2021-09-30 2022-01-04 苏州浪潮智能科技有限公司 Method, system, device and medium for generating calculation graph of deep learning compiler
US20230162048A1 (en) * 2021-11-25 2023-05-25 Zhejiang Lab Method for adapting deep learning framework to hardware device based on unified backend engine
CN114461221A (en) * 2022-01-27 2022-05-10 北京奕斯伟计算技术有限公司 Compiling method, compiling device, electronic device, and storage medium
CN116149797A (en) * 2023-04-04 2023-05-23 上海燧原科技有限公司 Heterogeneous scene-oriented AI unified computing method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
庞涛;: "开源深度学习框架发展现状与趋势研究", 互联网天地, no. 04, pages 48 - 56 *
黎子毅;李克森;李雨芮;范睿博;敖玉龙;杨超;: "人工智能算子接口标准化研究", 人工智能, no. 03, pages 17 - 24 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117289914A (en) * 2023-11-22 2023-12-26 南京飓风引擎信息技术有限公司 Data conversion system and method based on SpringBoot framework
CN117289914B (en) * 2023-11-22 2024-02-02 南京飓风引擎信息技术有限公司 Data conversion system and method based on SpringBoot framework

Also Published As

Publication number Publication date
CN116560666B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
JP6549332B2 (en) Network model construction method and apparatus based on machine learning
CN108388425B (en) Method for automatically completing codes based on LSTM
EP4116885A1 (en) Processing method for neural network model, and related device
US11521069B2 (en) When output units must obey hard constraints
US20210174214A1 (en) Systems and methods for quantizing a neural network
CN116560666B (en) AI front end unified computing method, device and medium based on multi-level code generation
WO2021000971A1 (en) Method and device for generating operation data and related product
EP3403221A1 (en) Systems and methods for automatically generating code for deep learning systems
CN110688121A (en) Code completion method, device, computer device and storage medium
US20230092453A1 (en) Parameter updating method and apparatus and storage medium
CN110941427A (en) Code generation method and code generator
US20230334292A1 (en) Node fusion method for computational graph and device
US11847436B2 (en) Machine learning (ML) model-based compiler
EP4318319A1 (en) Model processing method and apparatus
CN112527272B (en) Method for docking TVM (transient voltage management) and related equipment
CN116484947B (en) Operator automatic generation method, device, equipment and medium
CN113031954A (en) Code compiling method and device, electronic equipment, storage medium and heterogeneous system
CN116541020A (en) Code generation method, device, equipment, medium and product based on field model
CN113076089B (en) API (application program interface) completion method based on object type
CN114879965A (en) Data settlement method, device, equipment and storage medium
Rudi et al. CodeFlow: A Code Generation System for Flash-X Orchestration Runtime
CN110489124B (en) Source code execution method, source code execution device, storage medium and computer equipment
CN113031952A (en) Method and device for determining execution code of deep learning model and storage medium
CN116700727B (en) Cross-platform data processing method and system
Chang et al. Support NNEF execution model for NNAPI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room a-522, 188 Yesheng Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai, 201306

Patentee after: Shanghai Suiyuan Technology Co.,Ltd.

Country or region after: China

Address before: Room a-522, 188 Yesheng Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai, 201306

Patentee before: SHANGHAI ENFLAME TECHNOLOGY Co.,Ltd.

Country or region before: China