CN117355819A

CN117355819A - Processing method and device of calculation model

Info

Publication number: CN117355819A
Application number: CN202180098203.7A
Authority: CN
Inventors: 林惠敏
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2024-01-05
Also published as: WO2022252091A1

Abstract

The application discloses a processing method and device of a calculation model, wherein the method comprises the following steps: judging whether the data dimension of input data and/or output data of a target computing node in a computing model is fixed or not, wherein the computing model comprises a plurality of computing nodes, the target computing node is any one of the computing nodes, and the target computing node is used for representing a plurality of computing operations in the computing model; when the data dimension change is determined, compiling a first computing instruction in a plurality of computing instructions corresponding to the target computing node; the plurality of computing instructions are in one-to-one correspondence with the plurality of computing operations represented by the target computing node, any one of the computing instructions is used for indicating to execute the corresponding computing operation, and the first computing instruction is used for indicating the computing operation which is irrelevant to the data dimension of the target computing node. According to the scheme provided by the application, the calculation model with the dynamic data dimension can be compiled, meanwhile, the compiling speed is improved, the compiling resource cost is reduced, and the model compiling efficiency is improved.

Description

Processing method and device of calculation model

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a method and apparatus for processing a computing model.

Background

In the fields of artificial intelligence (artificial intelligence, AI), machine Learning (ML), etc., different functions may be implemented by designing or training different computational models. In the calculation model, a calculation node and a corresponding multi-layer operation structure are established through design, proper input and output are selected, and then a functional relation from input to output can be established through learning and tuning of a network, so that a real association relation between the input and the output is represented.

The computing model comprises a plurality of computing nodes, the computing nodes can represent computing operations performed on data in the computing model, and each computing node in the computing model has corresponding computing instructions to indicate the computing operations represented by the computing nodes. The different computing nodes have connection relations, which represent data transmission relations among the computing nodes, and corresponding computing operation can be carried out on the data input into the computing nodes at the computing nodes.

Currently, a computing model needs to be deployed onto a hardware device to implement a floor-standing application. In order to improve the adaptation degree and the model performance of the computing model to various hardware types, a compiler can be utilized to compile and optimize the computing model, and a model definition file of the designed or trained computing model is converted into machine instructions capable of running on a hardware device, so that the computing model can be deployed on the hardware device. After the computing model is deployed to the hardware device, the hardware device can input actual data into the computing model for computing through executing machine instructions corresponding to the compiled computing model, so as to obtain corresponding actual output data.

Currently mainstream model compilers such as accelerated linear algebra (accelerated linear algebra, XLA) compiler, dimension reduction (gli) compiler, tensor virtual machine (tensor virtual machine, TVM) compiler, etc. are all based on determined data dimensions (shape) to complete the compilation of a computing model and instruction optimization, which means that the actual data dimensions of the input data need to be determined before the compilation can be completed.

However, in actual business, the data dimension of the data input to the calculation model may be dynamically changed, and the range of variation thereof may be very large. Moreover, with the rapid development of artificial intelligence services, deep learning algorithms and model layers are endless, data sources to be trained are wider and wider, and specifications of data to be processed in real time are diversified, so that efficient compiling and running of a computing model supporting dynamic dimension (dynamic shape) are urgent requirements. In this regard, some existing compiling methods, such as compiling and caching, compiling in advance according to a data range, stripping dynamic change range of a data dimension, and the like, may be adopted to compile a calculation model of the dynamic dimension, but the foregoing methods all have the problems of long compiling time and large resource consumption, and may also cause the problems of reduced model usability, performance jitter, increased preheating overhead, and the like.

Disclosure of Invention

The application provides a processing method and device of a computing model, which are used for compiling the computing model with dynamic data dimension, improving the compiling speed of the model, reducing the resource expenditure of model compiling and further improving the compiling efficiency of the model.

In a first aspect, an embodiment of the present application provides a method for processing a computing model, where the method includes:

judging whether the data dimension of input data and/or output data of a target computing node in a computing model is fixed or not, wherein the computing model comprises a plurality of computing nodes, the target computing node is any one of the computing nodes, and the target computing node is used for representing a plurality of computing operations in the computing model; when the data dimension change is determined, compiling a first computing instruction in a plurality of computing instructions corresponding to the target computing node; the plurality of computing instructions are in one-to-one correspondence with a plurality of computing operations represented by the target computing node, any one computing instruction is used for indicating to execute the corresponding computing operation, and the first computing instruction is used for indicating the computing operation which is irrelevant to the data dimension of the target computing node.

In the method, when the computing model is compiled, if the data dimension of the input data and/or the output data corresponding to the computing node is dynamically changed for the computing node in the computing model, only the computing instruction which is corresponding to the computing node and is irrelevant to the data dimension is compiled, so that when the computing model is run after compiling, only the computing instruction which is corresponding to the computing node and is relevant to the data dimension can be compiled without repeating the compiled computing instruction corresponding to the compiling node. Therefore, the method for compiling the calculation model can adapt to the model processing scene of dynamic change of data dimension, improves the universality and usability of the calculation model obtained by compiling, and simultaneously can improve the compiling speed, reduce the compiling time and the resource consumption, and further improve the compiling efficiency of the model.

In one possible design, the method further comprises: the plurality of computing instructions are compiled when it is determined that the data dimension is fixed.

In the method, when the data dimension of the computing node is fixed, the computing instruction corresponding to the computing node can be directly compiled, and the compiling speed is high.

In one possible design, after compiling the plurality of computing instructions, the method further comprises: and in the process of running the calculation model, executing calculation operations corresponding to the plurality of calculation instructions according to the plurality of compiled calculation instructions.

In the method, the computing instructions corresponding to the computing nodes in the computing model are compiled first, so that the computing operations corresponding to the compiled computing instructions can be directly executed when the computing model is operated, and the executing efficiency of the computing model is higher.

In one possible design, after compiling a first computing instruction of a plurality of computing instructions corresponding to the target computing node, the method further includes: in the process of running the calculation model, determining the current data dimension of the target calculation node, and compiling a second calculation instruction in the plurality of calculation instructions according to the current data dimension, wherein the second calculation instruction is a calculation instruction except the first calculation instruction in the plurality of calculation instructions; and executing the computing operation corresponding to the plurality of computing instructions according to the compiled plurality of computing instructions.

In the method, when the data dimension of the computing node is dynamically changed, the computing instructions which correspond to the computing node and are irrelevant to the data dimension are compiled, so that repeated compiling of the computing instructions can be avoided when the computing model is operated, and the overall compiling efficiency and the executing efficiency of the computing model are improved.

In one possible design, determining whether the data dimensions of the input data and/or the output data of the target computing node in the computing model are fixed includes: judging whether the data dimension is fixed or not according to dimension indication information of the calculation model; the dimension indication information is used for indicating whether the data dimension is fixed or not; or when the data dimension is the data dimension of the input data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the output data of the first computing node is fixed; wherein the first computing node is at least one computing node located before the target computing node; or when the data dimension is the data dimension of the output data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the input data of the target computing node is fixed.

In the method, whether the data dimension of the computing node in the computing model is fixed or not can be judged in various modes, and then the corresponding compiling strategy is selected according to whether the data dimension of the computing node is fixed or not.

In one possible design, determining whether the data dimension of the output data of the first computing node is fixed according to whether the data dimension is fixed includes: and if the data dimension of the output data of the first computing node is fixed, determining that the data dimension is fixed, otherwise, determining that the data dimension is variable.

In the method, the input data of the computing node is the output data of the previous computing node, so that the data dimension information of the current computing node can be simply, conveniently, quickly and accurately determined according to the data dimension information of the output data of other computing nodes before the current computing node.

In one possible design, determining whether the data dimension of the input data of the target computing node is fixed according to whether the data dimension is fixed includes: determining that a data dimension of input data of the target computing node is changing when the data dimension is changing; when the data dimension of the input data of the target computing node is fixed, judging whether the data dimension is fixed or not according to a set dimension analysis function and the data dimension of the input data of the target computing node, wherein the set dimension analysis function is used for judging whether the data dimension of the computing node is fixed or not according to the data dimension of the input data of the computing node.

In the method, the output data of the computing node is obtained by computing the input data of the computing node, so that the input data and the output data of the computing node have a certain relation, the data dimension information of the output data of the computing node is determined by taking the data dimension information of the input data of the computing node as a reference, and the accuracy of the determined data dimension can be ensured.

In one possible design, according to the compiled plurality of computing instructions, executing computing operations corresponding to the plurality of computing instructions includes: executing the computing operation corresponding to the computing instructions through a first thread according to the compiled computing instructions; the method further comprises the steps of: when executing the computing operation corresponding to the computing instructions through the first thread, executing the following steps through the second thread: determining a current data dimension of a second computing node, wherein the second computing node is at least one computing node located after the target computing node, the data dimension of the second computing node being variable; compiling a third computing instruction in a plurality of computing instructions corresponding to the second computing node according to the current data dimension of the second computing node, wherein the third computing instruction is used for indicating computing operation related to the data dimension of the second computing node.

In the method, for the computing node with the dynamic change of the data dimension, when the computing instructions corresponding to other computing nodes before the computing node are executed, the data dimension information of the computing node can be determined and the compiling of the computing instructions related to the data dimension is completed, so that the preparation expense of the computing node is hidden in the operation process of the other computing nodes, and after the operation of the other nodes is finished, the node can start to operate, the operation continuity of the computing node is ensured, and the execution efficiency of a computing model is further improved. In addition, the compiling and executing processes of the computing instructions are respectively carried out on different threads, so that the continuity of the executing flow of the computing operation in the computing model can be ensured, and the extra overhead which cannot be hidden is avoided.

In one possible design, the method further comprises: executing a plurality of compiled computing instructions corresponding to the second computing node through the first thread.

In the method, the thread for executing the computing instruction corresponding to the computing node is the same thread as the thread for executing the computing instruction corresponding to the computing node before the computing node is executed, so that the continuity of the execution flow of the computing operation corresponding to the computing node can be ensured.

In a second aspect, the present application provides a computing device comprising:

a judging unit, configured to judge whether a data dimension of input data and/or output data of a target computing node in a computing model is fixed, where the computing model includes a plurality of computing nodes, the target computing node is any one of the plurality of computing nodes, and the target computing node is used to represent a plurality of computing operations in the computing model; the processing unit is used for compiling a first calculation instruction in a plurality of calculation instructions corresponding to the target calculation node when the data dimension change is determined; the plurality of computing instructions are in one-to-one correspondence with a plurality of computing operations represented by the target computing node, any one computing instruction is used for indicating to execute the corresponding computing operation, and the first computing instruction is used for indicating the computing operation which is irrelevant to the data dimension of the target computing node.

In one possible design, the processing unit is further configured to: the plurality of computing instructions are compiled when it is determined that the data dimension is fixed.

In one possible design, the processing unit is further configured to, after compiling a first computing instruction of the plurality of computing instructions corresponding to the target computing node: in the process of running the calculation model, determining the current data dimension of the target calculation node, and compiling a second calculation instruction in the plurality of calculation instructions according to the current data dimension, wherein the second calculation instruction is a calculation instruction except the first calculation instruction in the plurality of calculation instructions; and executing the computing operation corresponding to the plurality of computing instructions according to the compiled plurality of computing instructions.

In one possible design, the determining unit is specifically configured to, when determining whether the data dimension of the input data and/or the output data of the target computing node in the computing model is fixed: judging whether the data dimension is fixed or not according to dimension indication information of the calculation model; the dimension indication information is used for indicating whether the data dimension is fixed or not; or when the data dimension is the data dimension of the input data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the output data of the first computing node is fixed; wherein the first computing node is at least one computing node located before the target computing node; or when the data dimension is the data dimension of the output data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the input data of the target computing node is fixed.

In one possible design, the determining unit is configured to determine, according to whether a data dimension of the output data of the first computing node is fixed, whether the data dimension is fixed, specifically: and if the data dimension of the output data of the first computing node is fixed, determining that the data dimension is fixed, otherwise, determining that the data dimension is variable.

In one possible design, the determining unit is configured to determine, according to whether a data dimension of input data of the target computing node is fixed, whether the data dimension is fixed, specifically: determining that a data dimension of input data of the target computing node is changing when the data dimension is changing; when the data dimension of the input data of the target computing node is fixed, judging whether the data dimension is fixed or not according to a set dimension analysis function and the data dimension of the input data of the target computing node, wherein the set dimension analysis function is used for judging whether the data dimension of the computing node is fixed or not according to the data dimension of the input data of the computing node.

In one possible design, the processing unit is specifically configured to, when executing the computing operations corresponding to the plurality of computing instructions according to the compiled plurality of computing instructions: executing the computing operation corresponding to the computing instructions through a first thread according to the compiled computing instructions; the processing unit is further configured to: when executing the computing operation corresponding to the computing instructions through the first thread, executing the following steps through the second thread: determining a current data dimension of a second computing node, wherein the second computing node is at least one computing node located after the target computing node, the data dimension of the second computing node being variable; compiling a third computing instruction in a plurality of computing instructions corresponding to the second computing node according to the current data dimension of the second computing node, wherein the third computing instruction is used for indicating computing operation related to the data dimension of the second computing node.

In one possible design, the processing unit is further configured to: executing a plurality of compiled computing instructions corresponding to the second computing node through the first thread.

In a third aspect, the present application provides a computing device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute a computing program stored in the memory, and implement the method described in the first aspect or any of the possible designs of the first aspect.

In a fourth aspect, the present application provides a computing device comprising at least one processor and an interface; the interface is used for providing program instructions or data for the at least one processor; the at least one processor is configured to execute the program instructions to implement the method described in the first aspect or any of the possible designs of the first aspect.

In a fifth aspect, the present application provides a computer readable storage medium storing a computer program which, when run on a computing device, causes the computing device to perform the method described by the first aspect or any one of the possible designs of the first aspect.

In a sixth aspect, the present application provides a computer program product comprising a computer program or instructions which, when executed by a computing device, implement the method described in the first aspect or any of the possible designs of the first aspect.

In a seventh aspect, the present application provides a chip system comprising at least one processor and an interface for providing program instructions or data to the at least one processor, the at least one processor being configured to execute the program instructions to implement the method described in any one of the possible designs of the first aspect or the first aspect.

In one possible design, the chip system further includes a memory to store program instructions and data.

In one possible design, the chip system may be formed from a chip or may include a chip and other discrete devices.

The advantages of the third aspect to the seventh aspect are described with reference to the first aspect or the second aspect, and the description thereof is not repeated here.

Drawings

FIG. 1 is a schematic architecture diagram of a system for one possible application of a method for processing a computing model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a processing flow of a compiler and an executor according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a method for processing a computing model according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of another method for processing a calculation model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a processing flow of a compiler and an executor according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a computing instruction corresponding to an execution node according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a method for refreshing a computing instruction based on an object address mirror table according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a computing device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings. Wherein in the description of embodiments of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature.

For ease of understanding, a description of concepts related to the present application is given by way of example for reference.

1) Compilation): compilation is the process of converting a program written in one programming language (the source language) to a program in another language (the target language). The source language may be a language used by a user when writing a target program, and the target language may be a language used by a device that the user wishes to select to run the target program. For example, compilation may change a high-level language used when writing a source program into a binary language recognizable by a machine (e.g., computer, executor, etc.) for machine recognition and execution.

2) Computational graph (computational graphs): by defining the computational model and solving the parameters of the model (which may be referred to as model training), a unique computational logic may be determined that is transformed and applied to the inference calculations (which may also be referred to as model reasoning or use), and the computational logic may be represented graphically, i.e., as a computational graph.

The computation graph is expressed as a directed graph, and defines a data circulation mode, a data computation mode, an inter-dependent relation among various computations and the like. The computation graph of the computation model is composed of nodes (nodes) and lines (edges). Wherein the node is used to represent an applied mathematical operation (operation), or an end point of a start point/output (push out) of a data input (feed in), or an end point of a read/write persistent variable (persistent variable). The lines are used to represent input/output relationships between nodes, and the lines may transmit a size (size) dynamically adjustable multi-dimensional data array, i.e., tensor. Tensors such a data structure may be used to represent the data in the model, i.e., a tensor may correspond to an n-dimensional array or list, where n is an integer greater than or equal to zero. Tensors have two attributes, dimension (shape) and rank (rank). In addition, tensors may be circulated between nodes of the computational graph.

The computational graph may be divided into a static computational graph and a dynamic computational graph. The data dimension of the input data or the output data of the computing nodes in the static computing graph is fixed, and the data dimension of the input data or the output data of part or all of the computing nodes in the dynamic computing graph is dynamically changed.

In this embodiment of the present application, a node may also be referred to as a computing node, a task node, a computing task, an operator, an Operation (OP), an operation layer, and the like, and a data dimension may also be referred to as a dimension, a shape, and the like. In addition, the calculation model described in the embodiments of the present application may be a deep learning model, a neural network model, or the like.

It should be understood that in embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one (item) below" or the like, refers to any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, a and b, a and c, b and c, or a, b and c, wherein a, b and c can be single or multiple.

At present, in the fields of artificial intelligence, machine learning and the like, the following two main modes are adopted when a calculation model is compiled:

mode one: runtime compilation (JIT)

In this method, each time a calculation model is executed, model compilation is completed and then executed according to the dimension of actual input data in the calculation model. Therefore, in this approach, recompilation is required each time the computing model is run.

The compiling cost of a general computing model is far higher than the executing cost of the computing model, and even can have a gap of hundreds times, so when the data dimension in the computing model is wide in variation range and frequent in variation, the time and resource cost caused by recompilation can far exceed the executing cost, and the compiling and executing efficiency of the model is low. Although the adoption of techniques such as compiling and caching can reduce a certain number of recompilation times, additional cache (cache) resources are introduced, and when the data dimension change range is too large and even approaches infinity, a hardware device for compiling and executing a computing model cannot work. In addition, in some edge-side scenarios, the computing model that has been deployed to the hardware device needs to support multiple specifications of raw input data (e.g., picture resolution, sentence length, etc.), and after recompilation, the computing model needs to be redeployed to the hardware device.

Mode two: pre-run compilation (ahead-of-time)

In this manner, the calculation model corresponding to all possible data dimensions may be compiled in advance according to the possible data dimension range of the input data of the calculation model. In a scenario where reasoning and the like are not particularly sensitive to precision, some special reasoning engines can divide a large data dimension range into a plurality of gears, and when a calculation model is executed, after the data dimension of actual input data is divided (packing) to the nearest gear, the calculation is performed by using a compiled calculation model corresponding to the data dimension of the corresponding gear.

The method runs without extra compiling cost, and after the specific gear of the data dimension is allocated, the data dimension information in the calculation model is fixed, so that the fusion of the dynamic calculation graphs of the calculation model can obtain the same benefits as those of the static calculation graphs. However, this method requires that services prepare all data dimension ranges in advance, and for some services, such as a sparse model or a detection model for detecting class services, which are common in recommended services, the data dimension ranges are hundreds or even tens of thousands, so that the time length and occupied resources of advanced compiling are not negligible additional costs, and the method still has the problems of large compiling time and resource costs, and low compiling and executing efficiency of the model.

In view of this, the embodiment of the application provides a processing method of a computing model, which can be used for compiling a computing model with dynamic data dimension, and meanwhile, the compiling speed of the model is improved, the resource cost of compiling the model is reduced, and the compiling efficiency of the model is further improved.

The following describes in detail a processing method of a calculation model provided in an embodiment of the present application with reference to the accompanying drawings.

Fig. 1 is a schematic architecture diagram of a system of one possible application of a processing method of a computing model according to an embodiment of the present application. The system may be used to compile and run a computational model. By way of example, the system may include a compiler 101 and an executor (or execution engine) 102.

The computing model in embodiments of the present application includes a plurality of computing nodes, any of which may be used to represent one or more computing operations in the computing model.

By way of example, the computational model may be represented by the computational graph shown in FIG. 1, wherein compute node A, compute node B, compute node C, compute node D, compute node E, compute node F, and compute node G represent multiple computing operations in the computational model, respectively. The links between the compute nodes represent the data traffic flow between the compute nodes and the data dependencies between the compute nodes. For example, if the computing node a is connected to the computing node B, the computing node C, and the computing node D, respectively, the output data of the computing node a may be input to the computing node B, the computing node C, and the computing node D, and the computing node B, the computing node C, and the computing node D may be considered to belong to the data-dependent nodes, and all depend on the computing node a.

When implementing the floor application of the calculation model, the model file of the designed calculation model may be loaded into the compiler 101, the calculation model is compiled by the compiler 101, so that the model file is converted into a binary file that can be identified by the executor 102, and the binary file is transmitted to the executor 102, so that the executor 102 may run the calculation model according to the binary file, thereby performing calculation processing on the data input into the calculation model.

Fig. 2 is a schematic diagram of a processing flow of a compiler and an executor according to an embodiment of the present application.

As shown in the schematic diagram of fig. 2 (a), when compiling the calculation model, the compiler divides (split) the calculation nodes in the calculation model into two calculation nodes of a known part (known part) and an unknown part (unknown part). Wherein the data dimension of the computing nodes of the known portion is fixed and the data dimension of the computing nodes of the unknown portion is dynamically changing, the data dimension comprising the data dimension of the input data and/or the output data of the computing nodes. That is, the data dimensions of the input data and/or the output data of the computing nodes of the known portion of the computing model are static and can uniquely determine a particular data dimension value, while the data dimensions of the input data and/or the output data of the computing nodes of the unknown portion of the computing model are dynamic and cannot uniquely determine a particular data dimension value.

For a known part of computing nodes in the computing model, the compiler may perform full compilation, that is, compile all computing instructions corresponding to the computing nodes, where a specific processing flow may include compilation preparation (preparation), optimization/fusion (optimization/fusion), node compilation (OP common), memory allocation (assignment memory), data stream allocation (assignment stream), task generation (gen task/task gen), and so on.

For the computing nodes of the unknown part in the computing model, the compiler can perform partial compiling, and specifically, the computing instructions which correspond to the computing nodes and are irrelevant to the data dimension of the input data and the output data can be compiled. Herein, a calculation instruction that is independent of the data dimensions of the input data and the output data may be understood as not including any content related to the information of the data dimensions of the input data and/or the data dimensions of the output data in the calculation operation indicated by the calculation instruction. Specific processing flows may include compilation preparation, optimization/fusion, node compilation, task generation, and the like.

It should be understood that the actual compiling of the program by the compiler is not strictly performed according to the operations described in the above-mentioned program, and the operations are merely examples, and the compiler may reasonably adjust the above-mentioned program when actually performing the operations, such as adding or subtracting partial operations, adjusting the execution order of the partial operations, and the like. For the implementation of the above operations, reference may be made to a compiling method of a model compiler in the machine learning field, which will not be described in detail herein.

As shown in the schematic diagram (b) in fig. 2, in the process of running the calculation model obtained by compiling the compiler, the executor (run) may directly execute the calculation operation corresponding to the calculation instruction obtained by compiling on the calculation node of the known part in the calculation model, and for the calculation node of the unknown part in the calculation model, the update thread (update worker) of the executor may first compile the calculation instruction corresponding to the calculation node and related to the data dimension of the input data and/or the output data (i.e., the calculation instruction not compiled in advance by the compiler) according to the data dimension of the input data and/or the output data of the calculation node currently in practice, and then execute the calculation instruction corresponding to the calculation node after compiling.

For example, in the calculation model shown in fig. 1, assuming that the data dimensions of the input data and the output data of the calculation nodes a, B, C, and F are all fixed, and the data dimensions of the input data and the output data of the calculation nodes D, E, and G are all variable, the compiler compiles the calculation instructions corresponding to the calculation nodes a, B, C, and F and compiles the calculation instructions independent of the data dimensions in the calculation nodes D, E, and G at the time of compiling. When the executor runs the compiled calculation model, the corresponding calculation instructions after compiling the calculation nodes can be directly executed for the calculation nodes A, B, C and F, and the corresponding calculation instructions after compiling the calculation nodes are executed for the calculation nodes D, E and G.

According to the method, the pre-compiling and pre-operation updating modes are combined, on one hand, in the compiling process of the calculation model, the compiler only needs to compile the calculation model once, so that the calculation model does not need to be recompiled when the data dimension changes each time, and on the other hand, in the operation process of the calculation model, the executor can drive compiling of the relevant content of the dynamic data dimension according to the current actual input data of the calculation node, and normal operation of the calculation node is guaranteed.

It should be noted that, the compiler and the executor shown in fig. 1 or fig. 2 may be deployed in different devices or apparatuses, or may be integrated in the same device or apparatus, which is not specifically limited in the embodiments of the present application.

For example, the compiler and the executor may be integrated onto the same computing device. In embodiments of the present application, a computing device may also be referred to as a computer system, and from a logical hierarchy, the computing device may include a hardware layer, an operating system layer running above the hardware layer, and an application layer running above the operating system layer. The hardware layer may include hardware such as a processor, a memory, and a memory resource manager. The operating system may be any one or more computer operating systems that implement business processes via processes, e.g., An operating system,An operating system,An operating system,An operating system,Operating systems orAn operating system, etc. The application layer may include an application program。

In the embodiment of the present application, the computing device may be a terminal device such as a smart phone or a personal computer, and the present application is not particularly limited as long as the program code of the method of the embodiment of the present application can be read and run, so long as the computing model is processed by the processing method of the computing model according to the embodiment of the present application.

The processing method of the computing model or the system shown in fig. 1 provided by the embodiment of the application can be applied to the running environments of related equipment such as a data center and the like, and the acceleration execution of the computing model with dynamic data dimension requirements can be realized through software transformation.

For convenience of explanation, the processing method of the computing model provided in the embodiments of the present application will be described below by taking a processing method of the computing device for executing the computing model as an example, where the computing device may include a compiler and a processor, the compiler is configured to execute a method of compiling the computing model, and the executor is configured to execute a method of executing the computing model.

Fig. 3 is a schematic diagram of a processing method of a computing model according to an embodiment of the present application, as shown in fig. 3, where the method includes:

s301: the compiler judges whether the data dimension of input data and/or output data of a target computing node in a computing model is fixed or not, wherein the computing model comprises a plurality of computing nodes, the target computing node is any one of the computing nodes, and the target computing node is used for representing a plurality of computing operations in the computing model.

S302: when the data dimension change is determined, compiling a first computing instruction in a plurality of computing instructions corresponding to the target computing node by a compiler; the plurality of computing instructions are in one-to-one correspondence with a plurality of computing operations represented by the target computing node, any one computing instruction is used for indicating to execute the corresponding computing operation, and the first computing instruction is used for indicating the computing operation which is irrelevant to the data dimension of the target computing node.

Optionally, the method may further include the following step S303:

s303: when it is determined that the data dimension is fixed, a compiler compiles the plurality of computing instructions.

In the step S301, when the compiler determines whether the data dimension of the input data and/or the output data of the target computing node in the computing model is fixed, at least one of the following modes may be adopted:

in the mode 1, according to dimension indication information of a calculation model, whether the data dimension of the target calculation node is fixed is judged, wherein the dimension indication information is used for indicating whether the data dimension of input data and/or output data of the target calculation node is fixed.

In this manner, the data dimension indication information may be preset or user-entered.

As an alternative implementation, the data dimension indicating information may directly indicate that the data dimension of the target computing node is fixed, or directly indicate that the data dimension of the target computing node is variable.

As another alternative embodiment, when the data dimension of the target computing node is fixed, the dimension indicating information may indicate the data dimension of the target computing node, and the compiler may determine that the data dimension of the target computing node is fixed according to the dimension indicating information, and may determine that the input data and/or the output data of the target computing node is specific to the data dimension. The dimension indicating information may indicate that the data dimension of the input data of the target computing node is changed when the data dimension of the target computing node is changed.

In the existing compiling method of the computing model, the information of the data dimension of the input data of the first computing node of the computing model is generally known and is input to the compiler together with the computing model to be compiled. Thus, in some embodiments of the present application, this approach may be used to determine the data dimension of the first compute node in the compute model. Of course, the data dimension information of other computing nodes in the computing model can also be determined in this way, so that the judging speed of the compiler is increased.

Mode 2, when the data dimension is the data dimension of the input data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the output data of the first computing node is fixed; the first computing node is at least one computing node positioned in front of the target computing node, and input data of the target computing node is output data of the first computing node.

This approach may be used to determine the data dimension of the input data from the second to the last compute node in the compute model.

In this manner, if the compiler determines that the data dimension of the output data of the first computing node is fixed, the compiler determines that the data dimension of the input data of the target computing node is fixed, otherwise, determines that the data dimension of the input data of the target computing node is not fixed.

Illustratively, in the computing model shown in FIG. 1, computing node A belongs to a computing node that can determine the data dimension, assuming that the data dimensions of both the input data and the output data of computing node A are fixed. When the target computing node is the computing node B, the first computing node is the computing node a, and since the output data of the computing node a is input to the computing node B and is used as the input data of the computing node B, the data dimension of the output data of the computing node a is the data dimension of the input data of the computing node B. It is thereby determined that the data dimension of the input data of the computing node B is also fixed, the computing node B also belonging to the computing node for which the data dimension can be determined.

In some embodiments of the present application, when a plurality of first computing nodes are connected before the target computing node, the data dimension of the input data of the target computing node is determined to be fixed only when the data dimensions of the input data and the output data of the plurality of first computing nodes are fixed, and if the data dimension of the input data or the output data of at least one first computing node in the plurality of first computing nodes is dynamically changed, the data dimension of the input data of the target computing node is determined to be dynamically changed.

And 3, when the data dimension is the data dimension of the output data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the input data of the target computing node is fixed.

This approach may be used to determine the data dimension of the output data of any one of the computing nodes in the computing model.

In this manner, if the compiler determines that the data dimension of the input data of the target computing node is fixed, then, according to a set data dimension analysis function and the data dimension of the input data of the target computing node, it determines whether the data dimension of the output data of the target computing node is fixed, where the set data dimension analysis function is used to determine whether the data dimension of the output data of the computing node is fixed according to the data dimension of the input data of the computing node. If it is determined that the data dimension of the input data of the target computing node is not fixed, it is determined that the data dimension of the output data of the target computing node is not fixed.

Specifically, after determining the data dimension of the input data of the target computing node, the compiler may use a set data dimension analysis function to derive the data dimension of the output data according to the data dimension of the input data, if the data dimension of the determined output data can be derived, it is determined that the data dimension of the output data is fixed, otherwise, it is determined that the data dimension of the output data is not fixed. The set data dimension analysis function can derive a function refer shape () for the data dimension.

In some embodiments of the present application, when determining that any one of the target computing nodes outputs data or a data dimension of the output data changes, the compiler determines that the target computing node does not belong to a computing node capable of determining a data dimension, and may mark the target computing node as unknown, otherwise, determines that the target computing node belongs to a computing node capable of determining a data dimension, and may mark the target computing node as known.

In some embodiments of the present application, when determining that the data dimension of the output data of the target computing node is fixed, the compiler transfers the information to a next computing node, where the input data of the next computing node is the output data of the target computing node.

For example, in the computing model shown in fig. 1, if the data dimension of the input data of the computing node a is fixed, the data dimension of the output data of the computing node a is derived by using the refer shape () function, and if the data dimension of one determined output data can be derived, it can be determined that the data dimension of the output data of the computing node a is fixed. Thus, compute node a belongs to a compute node that can determine the data dimension and may be labeled known.

For another example, when the data dimension of the output data of the computing node a is fixed, it may be determined that the data dimension of the input data of the computing node D is fixed and the data dimension thereof is the data dimension of the output data of the computing node a. And deriving the data dimension of the output data of the computing node D by utilizing the refer shape () function, and if the determined data dimension of the output data cannot be derived, determining that the data dimension of the output data of the computing node D is dynamically changed. Thus, the compute node D, which is not a compute node that can determine the data dimension, may be labeled unknown.

For another example, in determining that the data dimension of the output data of the computing node D changes, the computing node D outputs-1, -1 indicating that the data dimension of the output data is not fixed, the next computing node E of the computing node D receives-1, and it may be determined that the data dimension of the input data of the computing node E also changes. Thus, compute node E is not a compute node that can determine the data dimension and may be labeled unknown. And the computing node E belongs to a computing node which cannot determine the data dimension due to the blocking of the front computing node.

In some embodiments of the present application, if there are other data transmission paths between two computing nodes that do not belong to determinable data dimensions, then the data dimensions of all computing nodes on the transmission path are not fixed.

For example, in the calculation model shown in fig. 1, assuming that when it is determined that the calculation node a, the calculation node D, the calculation node E, and the calculation node G on the transmission path ADEG are all marked as unknown, since there is one transmission path ACFG between the calculation node a and the calculation node G, it is possible to determine that the calculation node C and the calculation node F on the transmission path between the calculation node a and the calculation node G are both calculation nodes of indeterminate data dimensions, and it is possible to mark the calculation node C and the calculation node F as unknown.

In the above steps S302 and S303, for a computing node (labeled as known) capable of determining a data dimension, the compiler compiles all computing instructions corresponding to the computing node to generate a complete task of execution period, that is, a corresponding computing instruction after compiling is obtained, and for a computing node (labeled as unknown) incapable of determining a data dimension, the compiler precompiles the same to generate a task of execution period, but reservation of contents of the data dimension of the input data and the output data of the dependent computing node in the task is not generated, that is, among a plurality of computing instructions corresponding to the computing node, a first computing instruction for indicating a computing operation unrelated to the data dimension of the target computing node is selected, the selected first computing instruction is compiled, and remaining uncompiled second computing instructions are reserved.

In the above embodiment, when the compiler compiles the calculation model, the compiler may directly compile the calculation instructions of the calculation nodes with the data dimension being fixed, and only compile the calculation instructions which are irrelevant to the data dimension information, for the calculation instructions of the calculation nodes with the data dimension being dynamically changed, based on the compiling method, when the calculation model is run after compiling, only the uncompiled calculation instructions corresponding to the calculation nodes need to be compiled, and the compiled calculation instructions corresponding to the calculation nodes do not need to be repeatedly compiled. The method can adapt to the scene of dynamic change of the data dimension, and the dynamic change range of the data dimension is not limited, so that the model obtained by compiling is high in universality and usability, meanwhile, the calculation instructions which do not depend on the data dimension are compiled only once in the compiling process before operation, and repeated compiling of the calculation instructions is not needed in the operation, so that the compiling time and the resource consumption can be obviously reduced, and the compiling efficiency of the model is improved.

The above embodiments describe a method for compiling a computing model by a compiler of a computing device, and a method for running the compiled computing model by an executor of the computing device is described below.

Fig. 4 is a schematic diagram of another processing method of a calculation model according to an embodiment of the present application. As shown in fig. 4, after performing the method shown in fig. 3 in the above embodiment, the method for processing a computing model provided in the embodiment of the present application may further include the following steps:

s401: and if the executor determines that the plurality of computing instructions corresponding to the target computing node are in the full compiling state, executing computing operations corresponding to the plurality of computing instructions according to the plurality of compiled computing instructions in the process of running the computing model.

When determining that the data dimensions of the input data and the output data of the target computing node are fixed or determining that all computing instructions contained in the target computing node are compiled, determining that the computing instructions corresponding to the target computing node are in a full compiling state, the executor can execute corresponding computing operations directly according to all computing instructions corresponding to the target computing node.

S402: if the executor determines that a plurality of computing instructions corresponding to the target computing node are in a partial compiling state, determining a current data dimension of the target computing node in the process of running the computing model, compiling a second computing instruction in the plurality of computing instructions according to the current data dimension, and executing computing operations corresponding to the plurality of computing instructions according to the plurality of compiled computing instructions; wherein the second computing instruction is a computing instruction other than the first computing instruction in the plurality of computing instructions.

And when determining that the data dimension of the input data or the output data of the target computing node is changed or determining that the target computing node contains an uncompiled computing instruction, the executor determines that the computing instruction corresponding to the target computing node is not in a full compiling state but in a partial compiling state.

When the executor compiles the second computing instruction in the plurality of computing instructions, the executor can determine the data dimension of the input data and the output data of the target computing node according to the input data of the target computing node, and compiles the second computing instruction in the plurality of computing instructions according to the data dimension of the input data and the output data of the target computing node.

Further, the executor may determine the data dimension of the input data and the output data of the next computing node of the target computing node synchronously in the process of executing the computing operations corresponding to the computing instructions according to the compiled computing instructions, and compile the computing instructions corresponding to the next computing node and related to the data dimension. Specifically, the executor may execute the computing operations corresponding to the plurality of computing instructions through the first thread, and determine a current data dimension of a second computing node through a second thread, and compile a third computing instruction in the plurality of computing instructions corresponding to the second computing node according to the current data dimension of the second computing node, where the second computing node is at least one computing node located behind the target computing node, input data of the second computing node is output data of the target computing node, and the data dimension of the second computing node is variable; the third computing instruction is to instruct a computing operation related to a data dimension of the second computing node. And after the computing operation corresponding to the plurality of computing instructions is executed by the first thread, continuing to execute the plurality of computing instructions corresponding to the compiled second computing node by the first thread.

The above method is described below with reference to specific examples.

Fig. 5 is a schematic diagram of a processing flow of a compiler and an executor according to an embodiment of the present application. As shown in fig. 5, in the process of compiling an input computing model by the compiler, the process of processing the computing node of the known part may include operations such as optimization (optimization), fusion (fusion), task generation, and allocation source (allocate resource), and the process of processing the computing node of the unknown part may include operations such as optimization (optimization), fusion (fusion), and partial task generation. The compiler compiles the input computing model and inputs the model to the executor, wherein each computing node can correspond to a separate or optimally fused computing node (operator) before compiling after the model is compiled.

As shown in fig. 5, when the executor receives the compiled calculation model, the executor sorts the calculation nodes (calculation tasks) in the compiled calculation model, and sequentially determines the compiling states corresponding to the calculation nodes according to the sorting, if the calculation nodes are determined to be in a complete compiling state (complete compiling state), pushes (push) the calculation tasks of the calculation nodes into a queue (execute queue) to be executed, and if the calculation nodes are determined to be in an incomplete compiling state (partial compiling state), pushes the calculation tasks of the calculation nodes into a queue (update queue) to be updated. The executor may sort the computing nodes in the compiled computing model according to the dependency relationship among the computing nodes in the computing model and according to a depth priority principle or a breadth priority principle.

For the compute nodes in the queue to be executed, the execution thread of the executor may pop up (pop) the compute tasks of the compute nodes from the queue to be executed and directly issue the execution. Specifically, the execution thread sequentially executes operations such as load (load), execution (execute) of the computation instruction, and an execution callback (execute callback), so as to complete execution of the compiled computation instruction corresponding to the computation node.

The computing nodes in the queue to be updated are in an incompletely compiled state, so that the computing instructions which are not compiled by the compiler are compiled firstly to obtain the compiled complete computing instructions corresponding to the computing nodes, and then the compiled complete computing instructions are executed. For the computing nodes in the queue to be updated, the updating thread of the executor can acquire the output data of the previous computing node, take the output data as the input data of the target computing node, deduce the data dimension of the output data of the target computing node according to the data dimension of the input data, compile uncompiled computing instructions corresponding to the target computing node according to the determined input data and the data dimension of the output data, and finally execute all compiled computing instructions corresponding to the target computing node for the input data of the target computing node.

If the next computing node connected with the target computing node in the queue to be executed is still the computing node in the queue to be executed, the executor directly executes the computing instruction corresponding to the next computing node connected with the target computing node after executing the computing instruction corresponding to the target computing node.

If the next computing node connected with the target computing node in the queue to be executed is the computing node in the queue to be updated, the executor determines the data dimension of the input data and the output data of the target computing node according to the data actually input into the target computing node after executing the computing instruction corresponding to the target computing node, compiles the uncompiled computing instruction corresponding to the target computing node according to the data dimension, and finally executes all the compiled computing instructions corresponding to the target computing node on the input data of the target computing node.

Fig. 6 is a schematic diagram of a calculation instruction corresponding to an execution node according to an embodiment of the present application. As shown in fig. 6, it is assumed that eight computing nodes in the computing model before compiling are sequentially a computing node a, a computing node B, a computing node C, a computing node D, a computing node E, a computing node F, a computing node G, and a computing node H after sorting. The data dimensions of the input data and the output data of the computing nodes a, B, G and H are fixed, so that the full compiling can be performed, and the data dimensions of the input data and the output data of the computing nodes C, D, E and F are not fixed, so that the partial compiling is required.

The computing nodes in the compiled computing model comprise a computing node K1, a computing node C, a computing node D, a computing node E, a computing node F and a computing node K2, wherein the computing node A and the computing node B can be fused into one computing node K1 during compiling, the computing node G and the computing node H are fused into one computing node K2, and the frequent reading and writing of intermediate data in registers and memories can be avoided in a node fusion mode, so that the overall computing performance is improved.

When the execution thread executes the compiled calculation instruction corresponding to the calculation node, if the target calculation node is the calculation node in the full compiling state, directly executing the calculation instruction corresponding to the target calculation node; if the target computing node is a data-dependent computing node, the executor starts to execute the compiled computing instruction corresponding to the previous computing node of the target computing node, and simultaneously starts the update preparation and part of computing instruction compiling actions such as data dimension deduction, resource preparation, strategy refreshing and the like corresponding to the target computing node.

In an ideal state, after the execution of the compiled computing instruction corresponding to the previous computing node of the target computing node is completed, the update preparation corresponding to the target computing node and the compiling action of part of the computing instructions are also completed, and then the target computing node starts to execute the corresponding compiled computing instruction, and simultaneously starts the update preparation corresponding to the next computing node of the target computing node, the compiling action of part of the computing instructions, such as data dimension deduction, resource preparation, strategy refreshing and the like. This allows the overhead of preparing the update of the compute nodes to be completely hidden in the computation of the computation model, allowing the compiler utilization to reach as much as 100%.

For example, as shown in fig. 6, in the process of executing the calculation model, the executor uses the calculation nodes C, D, E and F as the dependent calculation nodes, and in the process of executing the calculation model, if the executor sequentially executes the calculation instructions corresponding to each calculation node on the execution thread, the executor needs to perform the compiling process of the calculation instructions related to the data dimension before executing the calculation instructions corresponding to the calculation node C, which causes additional overhead and discontinuous execution of the calculation operations between the calculation nodes.

Therefore, in the method provided by the embodiment of the application, the operation instruction corresponding to the computing node can be executed only by the execution thread, and the compiling process of the computing instruction related to the data dimension can be executed by other threads. Specifically, as shown in fig. 6, the executor performs preparation processes such as data dimension derivation of the node C, compiling of the calculation instruction concerning the data dimension, and the like by the thread 1 corresponding to the node C while executing the calculation instruction corresponding to the calculation node K1. After the execution thread finishes executing the computing instruction corresponding to the computing node K1, the preparation process corresponding to the computing node C on the thread 1 is also finished, and at this time, all the computing instructions corresponding to the computing node C are compiled, so that the execution thread can directly execute the computing instruction corresponding to the compiled computing node C according to the processing result of the thread 1. And so on, so that the execution thread only executes the computing instruction corresponding to each computing node, and the other threads 1 to 5 respectively execute the preparation process corresponding to each computing node. Therefore, extra preparation expenditure in the running process of the computing model can be hidden, and continuity of the computing node corresponding to the computing instruction in execution can be guaranteed. When the execution thread executes the calculation instruction corresponding to the calculation node C, the data dimension of the output data of the calculation node C is already determined, so that the data dimension of the input data of the calculation node D can be obtained, the thread 2 corresponding to the calculation node D can simultaneously start the preparation process of the calculation node D, derive the data dimension of the output data of the calculation node D according to the data dimension of the input data of the calculation node D, and compile a part of calculation instructions which are not compiled before according to the determined data dimension. After the execution of the computing instruction of the computing node C is completed, the compiling of the computing instruction of the part of the computing instruction which is not compiled before the computing node D by the executor is completed, and then the execution of the computing instruction corresponding to the computing node D may be started.

In some embodiments of the present application, when the executor runs the calculation model, the executor may also issue the calculation instruction corresponding to the calculation node in the calculation model to other devices, and the other devices perform the corresponding calculation operation to obtain the calculation result, so as to reduce the processing capacity of the executor and improve the processing efficiency.

In particular, the computing device may further include at least one hardware device or apparatus (device), where the at least one hardware device may be configured to execute the specific computing instructions. Based on this, the executor may include a host processor (host central processing unit, host CPU) for running the compiled computational model. The host CPU may be connected to at least one hardware device (device), which may be a graphics processor (graphics processing unit, GPU), an embedded neural network processor (neural-network processing units, NPU), a coprocessor, or the like, and the host CPU and the at least one hardware device may be connected through an interface, and the host CPU may copy a computing instruction corresponding to a computing node in the compiled computing model to the hardware device, and notify a task list (device task schedule) of the hardware device, so as to trigger the hardware device to execute the computing instruction.

Because the data dimension of the input data and the output data of the completely compiled computing node is not changed in the execution process of the whole computing model, the resources occupied by the completely compiled computing node can be resident, and when the distributor (allocator) distributes memory resources for the computing node, only the storage address of the input data and the storage address of the output data of the computing node are required to be refreshed.

The allocator may also be referred to as a memory resource manager, and is configured to manage data storage addresses in the hardware device. It should be noted that, the distributor may be disposed on the hardware device or the actuator, or may be disposed separately, which is not specifically limited in the embodiments of the present application.

In some embodiments of the present application, the computing instruction corresponding to the computing node includes header control (header control) information and computing operation information, where the header control information is used to indicate a data storage address of input data and output data of the computing node, and the computing operation information is used to indicate a computing operation corresponding to the computing node.

In some embodiments of the present application, the host CPU may add a two-layer base address to header control (head control) information of a computing instruction corresponding to a computing node, where the two-layer base address is used to determine a data storage address of input data of the computing node. Meanwhile, the host CPU generates an object (targets) address mirror table (table mirror) to represent the correspondence between the data storage addresses of the input data and the output data of the computing node and the computing node.

Fig. 7 is a schematic diagram of a method for refreshing a computing instruction based on an object address mirror table (args table mirror) according to an embodiment of the present application.

In some embodiments of the present application, in a process of executing a computing model, when determining that a computing node is a completely compiled computing node, an executor executes a computing instruction corresponding to the computing node, and stores output data to a data memory address of the output data after execution ends.

When the executor executes the calculation instruction corresponding to the calculation node, the calculation instruction is written into the hardware device in a kernel starting (kernel launch) mode, and the hardware device is triggered to directly execute the calculation instruction, so that corresponding output data is obtained.

For example, as shown in the schematic diagram (b) in fig. 7, if the current computing node is the computing node 1 (op 1) shown in the schematic diagram (b), after executing the computing instruction corresponding to the computing node 1, the executor and the hardware device return the data to the data memory address (output ptr) of the output data.

In the process of running the computing model, when determining that the computing node is not completely compiled, the executor executes the following steps:

step 1: the executor derives the data dimension of the output data of the computing node according to the data dimension of the data currently input to the computing node, and compiles the computing instruction which is not compiled by the compiler according to the obtained data dimension.

As shown in the schematic diagram (a) in fig. 7, the executor may use a dimension deriving function (infershape funclib) to derive the data dimension of the output data of the computing node, and use a policy optimizing function (tilling function) to compile the computing instruction that is not compiled by the compiler according to the input data and the data dimension of the output data of the computing node, and so on, to execute operations of various preparation processes before the instruction.

Step 2: the executor issues the compiled calculation instruction to the hardware device, and the hardware device executes the corresponding calculation operation.

Step 3: when the data dimension of the input data of the computing node changes, the host CPU refreshes the address information in the object address mirror table according to the acquired data memory address of the new input data.

When the executor executes the computing node for the first time, the host CPU applies for a data memory address for input data of the computing node, writes the corresponding relation between the obtained data memory address and the computing node into the object address mirror table, writes a computing instruction corresponding to the computing node into a device, and executes the computing instruction by the device.

And when the data dimension of the input data of the computing node changes, the host CPU redetermines the size of the data memory required by the input data of the computing node after the change, applies for obtaining a new data memory address from the distributor, searches the address of the two-layer base address of the computing node from the object address mirror table after obtaining the new data memory address, and refreshes the address of the two-layer base address by utilizing the obtained new data memory address, wherein the computing instruction corresponding to the computing node does not need to be written into the device again. After the addresses in the object address mirror table are refreshed, the device can determine the corresponding data memory addresses according to the refreshed addresses of the two layers of base addresses, acquire the changed input data of the computing nodes from the data memory addresses, execute the computing instructions on the input data, and accordingly obtain corresponding output data. Therefore, the method can avoid frequent data copying, instruction calculation, memory address transfer and other resource consumption between the host CPU and the hardware device.

For example, if the target computing node is the computing node 2 shown in the schematic diagram (b) in fig. 7, the input data of the computing node 2 includes the output data obtained by the previous computing node 1 according to the input data 1 and the other input data 2. If the executor executes the computing node 2 for the first time, the executor applies for allocating the memory address corresponding to the computing node 2, and the allocator allocates the data memory address for the computing node 2 in a two-layer base address mode. The data memory address allocated by the allocator is expressed as a combination of an offset and a two-layer base address, and the corresponding data memory address can be determined by combining the two-layer base address and the corresponding offset. For example, the data memory address of compute node 2 may include data memory address 1, data memory address 2, and data memory address 3. The data memory address 1 may be used to store data input to the computing node 2 by the computing node 1, and the data memory address 1 may be obtained according to an input 0offset (input 0 offset) and a base address 1 corresponding to the data input to the computing node 2 by the computing node 1. The data memory address 2 may include different memory spaces for storing the input data 2, the output data of the compute node 2, and some data storage of the intermediate compute operations in the compute node 2, and the data memory address 2 may be obtained according to an input 1offset (input 1 offset) and a base address 2 corresponding to the input data 2, an output 0offset (output 0 offset) and a base address 2 corresponding to the output data of the compute node 2, and a working memory offset (workspace offset) and a base address 2 corresponding to the data of the intermediate compute operations. The dataram address 3 may be used to store various data related to preparation operations, such as data during compilation of computing instructions associated with the data dimensions corresponding to the computing node 2. The dataram address 3 may be derived from the policy-optimized dataoffset (tiling data offset) and the base address 3 corresponding to the prepare operation.

It should be noted that, in the allocation and representation of the data memory addresses, the same two-layer base address may correspond to different offsets, and the combination of the different two-layer base address and the offset may correspond to different data memory addresses, or to different memory spaces in the same data memory address.

After the executor obtains the data memory address corresponding to the computing node 2, the data memory address is written into the object address mirror table, and then a computing instruction corresponding to the computing node 2 is sent to the hardware device through a kernel launch process, wherein the obtained data memory address corresponding to the computing node 2 is written into the hardware device through a host to device (H2D) process. If the executor does not execute the computing node 2 for the first time, the executor only needs to re-determine the offset corresponding to the data memory address and notify the hardware device of the change of the data memory address when determining that the data dimension of the computing node changes, and does not need to send the computing instruction corresponding to the computing node 2 to the hardware device through a kernel launch process again.

In this embodiment of the present application, the object address mirror table is used to represent a correspondence between a memory address of input data and/or output data of a computing node and the computing node.

For example, in the object address mirroring table shown in the schematic diagram (a) of fig. 7, conv_1, bn_1, and relu_1 respectively represent three consecutive computing nodes, and args_1ptr, args_2ptr, and args_3ptr are three consecutive data memory addresses respectively represent three memory addresses allocated to the three computing nodes conv_1, bn_1, and relu_1. When the data dimension of the input data of a certain computing node changes, the memory address (such as the size of the memory address space) allocated to the computing node also needs to be correspondingly adjusted, and then the memory addresses of the computing nodes behind the computing node also need to be correspondingly adjusted. Therefore, when the data dimension of the input data of a certain computing node changes, the executor needs to apply for a new memory address to the computing node according to the changed data dimension of the input data of the computing node, and refresh the memory address in the object address mirror table according to the new memory address and the tilling result, thereby updating the memory address corresponding to each computing node in the computing model.

Step 4: after the data dimension of the input data of the computing node is changed, the executor triggers the hardware device to execute the computing operation corresponding to the latest computing instruction.

After the step 3 is completed, the host CPU refreshes the address information in the object address mirror table, and notifies the hardware device of the address change in the object address mirror table, so that the hardware device can obtain the input data of the changed computing node according to the changed data memory address, and execute the computing instruction on the input data.

In the above embodiment, when the compiler of the computing device compiles the computing model, for the computing node with the fixed data dimension, all the computing instructions corresponding to the computing node are directly compiled, and for the computing node with the dynamic data dimension, only the computing instructions corresponding to the computing node and irrelevant to the data dimension are compiled. When the calculation model is operated after compiling, after compiling the calculation instructions which are not compiled and correspond to the calculation nodes according to the data which are actually input into the calculation nodes, the calculation instructions which correspond to the calculation nodes can be executed, the calculation model processing scene of the data dimension dynamic change of the calculation nodes can be adapted, and the range of the data dimension dynamic change of the calculation nodes is supported, so that the universality and usability of the model obtained by compiling are higher, meanwhile, repeated compiling of some calculation instructions is avoided, the compiling speed of the model can be improved, the compiling time and the resource consumption are greatly reduced, and the compiling efficiency of the calculation model is further improved.

Based on the above embodiments and the same concepts, the present application further provides a computing device, as shown in fig. 8, the computing device 800 may include:

a determining unit 801, configured to determine whether a data dimension of input data and/or output data of a target computing node in a computing model is fixed, where the computing model includes a plurality of computing nodes, the target computing node is any one of the plurality of computing nodes, and the target computing node is used to represent a plurality of computing operations in the computing model; a processing unit 802, configured to compile a first calculation instruction of a plurality of calculation instructions corresponding to the target calculation node when determining the data dimension change; the plurality of computing instructions are in one-to-one correspondence with a plurality of computing operations represented by the target computing node, any one computing instruction is used for indicating to execute the corresponding computing operation, and the first computing instruction is used for indicating the computing operation which is irrelevant to the data dimension of the target computing node.

In one possible design, the processing unit 802 is further configured to: the plurality of computing instructions are compiled when it is determined that the data dimension is fixed.

In one possible design, the processing unit 802 is further configured to, after compiling a first computing instruction of the plurality of computing instructions corresponding to the target computing node: in the process of running the calculation model, determining the current data dimension of the target calculation node, and compiling a second calculation instruction in the plurality of calculation instructions according to the current data dimension, wherein the second calculation instruction is a calculation instruction except the first calculation instruction in the plurality of calculation instructions; and executing the computing operation corresponding to the plurality of computing instructions according to the compiled plurality of computing instructions.

In one possible design, the determining unit 801 is specifically configured to, when determining whether the data dimension of the input data and/or the output data of the target computing node in the computing model is fixed: judging whether the data dimension is fixed or not according to dimension indication information of the calculation model; the dimension indication information is used for indicating whether the data dimension is fixed or not; or when the data dimension is the data dimension of the input data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the output data of the first computing node is fixed; wherein the first computing node is at least one computing node located before the target computing node; or when the data dimension is the data dimension of the output data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the input data of the target computing node is fixed.

In one possible design, the determining unit 801 is specifically configured to determine, according to whether the data dimension of the output data of the first computing node is fixed, whether the data dimension is fixed, when: and if the data dimension of the output data of the first computing node is fixed, determining that the data dimension is fixed, otherwise, determining that the data dimension is changed.

In one possible design, the determining unit 801 is specifically configured to, according to whether the data dimension of the input data of the target computing node is fixed, determine whether the data dimension is fixed, when: determining that a data dimension of input data of the target computing node is changing when the data dimension is changing; when the data dimension of the input data of the target computing node is fixed, judging whether the data dimension is fixed or not according to a set dimension analysis function and the data dimension of the input data of the target computing node, wherein the set dimension analysis function is used for judging whether the data dimension of the computing node is fixed or not according to the data dimension of the input data of the computing node.

In one possible design, the processing unit 802 is specifically configured to, when executing the computing operations corresponding to the plurality of computing instructions according to the compiled plurality of computing instructions: executing the computing operation corresponding to the computing instructions through a first thread according to the compiled computing instructions; the processing unit 802 is further configured to: when executing the computing operation corresponding to the computing instructions through the first thread, executing the following steps through the second thread: determining a current data dimension of a second computing node, wherein the second computing node is at least one computing node located after the target computing node, the data dimension of the second computing node being variable; compiling a third computing instruction in a plurality of computing instructions corresponding to the second computing node according to the current data dimension of the second computing node, wherein the third computing instruction is used for indicating computing operation related to the data dimension of the second computing node.

In one possible design, the processing unit 802 is further configured to: executing a plurality of compiled computing instructions corresponding to the second computing node through the first thread.

The division of the units in the embodiments of the present application is schematically shown, which is merely a logic function division, and may have another division manner when actually implemented, and in addition, each functional unit in each embodiment of the present application may be integrated in one processor, or may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Only one or more of the individual units in fig. 8 may be implemented in software, hardware, firmware or a combination thereof. The software or firmware includes, but is not limited to, computer program instructions or code and may be executed by a hardware processor. The hardware includes, but is not limited to, various types of integrated circuits such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or an Application Specific Integrated Circuit (ASIC).

Based on the above embodiments and the same concept, the embodiments of the present application further provide a computing device, which is configured to implement the processing method of the computing model provided by the embodiments of the present application. As shown in fig. 9, the computing device 900 may include: one or more processors 901, a memory 902, and one or more computer programs (not shown). As one implementation, the devices described above may be coupled by one or more communication lines 903. Wherein the memory 902 has stored therein one or more computer programs, the one or more computer programs comprising instructions; the processor 901 invokes the instructions stored in the memory 902 to cause the computing device 900 to execute the software program processing method provided in the embodiment of the present application.

In the embodiments of the present application, the processor may be a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.

In embodiments of the present application, the memory may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory. The memory in the embodiments of the present application may also be a circuit or any other device capable of implementing a memory function.

As an implementation, the computing device 900 may further include a communication interface 904 for communicating with other devices via a transmission medium, for example, the computing device 900 may interact with information or data via the communication interface 904 with a first server, a second server, or a database, etc. In embodiments of the present application, the communication interface may be a transceiver, a circuit, a bus, a module, or other type of communication interface. In the embodiment of the application, when the communication interface is a transceiver, the transceiver may include a stand-alone receiver and a stand-alone transmitter; a transceiver or interface circuit integrating the transceiver function is also possible.

In some embodiments of the present application, the processor 901, the memory 902, and the communication interface 904 may be interconnected by a communication line 903; the communication line 903 may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, or the like. The communication lines 903 may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 9, but not only one bus or one type of bus.

The method provided in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user device, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by means of a wired (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL), or wireless (e.g., infrared, wireless, microwave, etc.), the computer-readable storage medium may be any available medium that can be accessed by the computer or a data storage device such as a server, data center, etc., that contains an integration of one or more available media, the available media may be magnetic media (e.g., floppy disk, hard disk, tape), optical media (e.g., digital video disc (digital video disc, DVD), or semiconductor media (e.g., SSD), etc.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

A method of processing a computational model, the method comprising:

judging whether the data dimension of input data and/or output data of a target computing node in a computing model is fixed or not, wherein the computing model comprises a plurality of computing nodes, the target computing node is any one of the computing nodes, and the target computing node is used for representing a plurality of computing operations in the computing model;

when the data dimension change is determined, compiling a first calculation instruction in a plurality of calculation instructions corresponding to the target calculation node; the plurality of computing instructions are in one-to-one correspondence with a plurality of computing operations represented by the target computing node, any one computing instruction is used for indicating to execute the corresponding computing operation, and the first computing instruction is used for indicating the computing operation which is irrelevant to the data dimension of the target computing node.
The method according to claim 1, wherein the method further comprises:

the plurality of computing instructions are compiled when it is determined that the data dimension is fixed.
The method of claim 1 or 2, wherein after compiling a first computing instruction of a plurality of computing instructions corresponding to the target computing node, the method further comprises:

in the process of running the calculation model, determining the current data dimension of the target calculation node, and compiling a second calculation instruction in the plurality of calculation instructions according to the current data dimension, wherein the second calculation instruction is a calculation instruction except the first calculation instruction in the plurality of calculation instructions;

and executing the computing operation corresponding to the plurality of computing instructions according to the compiled plurality of computing instructions.
A method according to any one of claims 1 to 3, wherein determining whether the data dimensions of the input data and/or the output data of the target computing node in the computing model are fixed comprises:

judging whether the data dimension is fixed or not according to dimension indication information of the calculation model; the dimension indication information is used for indicating whether the data dimension is fixed or not; or alternatively

When the data dimension is the data dimension of the input data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the output data of the first computing node is fixed; wherein the first computing node is at least one computing node located before the target computing node; or alternatively

And when the data dimension is the data dimension of the output data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the input data of the target computing node is fixed.
The method of claim 4, wherein determining whether the data dimension of the output data of the first computing node is fixed based on whether the data dimension is fixed comprises:

and if the data dimension of the output data of the first computing node is fixed, determining that the data dimension is fixed, otherwise, determining that the data dimension is variable.
The method of claim 4 or 5, wherein determining whether the data dimension of the input data of the target computing node is fixed based on whether the data dimension is fixed comprises:

Determining that a data dimension of input data of the target computing node is changing when the data dimension is changing;

when the data dimension of the input data of the target computing node is fixed, judging whether the data dimension is fixed or not according to a set dimension analysis function and the data dimension of the input data of the target computing node, wherein the set dimension analysis function is used for judging whether the data dimension of the computing node is fixed or not according to the data dimension of the input data of the computing node.
A method according to any one of claims 3 to 6, wherein,

executing the computing operation corresponding to the plurality of computing instructions according to the compiled plurality of computing instructions, wherein the computing operation comprises the following steps:

executing the computing operation corresponding to the computing instructions through a first thread according to the compiled computing instructions;

the method further comprises the steps of:

when executing the computing operation corresponding to the computing instructions through the first thread, executing the following steps through the second thread:

determining a current data dimension of a second computing node, wherein the second computing node is at least one computing node located after the target computing node, the data dimension of the second computing node being variable;

Compiling a third computing instruction in a plurality of computing instructions corresponding to the second computing node according to the current data dimension of the second computing node, wherein the third computing instruction is used for indicating computing operation related to the data dimension of the second computing node.
The method of claim 7, wherein the method further comprises:

executing a plurality of compiled computing instructions corresponding to the second computing node through the first thread.
A computing device, comprising:

a judging unit, configured to judge whether a data dimension of input data and/or output data of a target computing node in a computing model is fixed, where the computing model includes a plurality of computing nodes, the target computing node is any one of the plurality of computing nodes, and the target computing node is used to represent a plurality of computing operations in the computing model;

the processing unit is used for compiling a first calculation instruction in a plurality of calculation instructions corresponding to the target calculation node when the data dimension change is determined; the plurality of computing instructions are in one-to-one correspondence with a plurality of computing operations represented by the target computing node, any one computing instruction is used for indicating to execute the corresponding computing operation, and the first computing instruction is used for indicating the computing operation which is irrelevant to the data dimension of the target computing node.
The apparatus of claim 9, wherein the processing unit is further configured to:

the plurality of computing instructions are compiled when it is determined that the data dimension is fixed.
The apparatus according to claim 9 or 10, wherein the processing unit, after compiling a first computing instruction of a plurality of computing instructions corresponding to the target computing node, is further configured to:

in the process of running the calculation model, determining the current data dimension of the target calculation node, and compiling a second calculation instruction in the plurality of calculation instructions according to the current data dimension, wherein the second calculation instruction is a calculation instruction except the first calculation instruction in the plurality of calculation instructions;

and executing the computing operation corresponding to the plurality of computing instructions according to the compiled plurality of computing instructions.
The apparatus according to any one of claims 9 to 11, wherein the determining unit is configured to determine whether a data dimension of input data and/or output data of a target computing node in the computing model is fixed, when:

judging whether the data dimension is fixed or not according to dimension indication information of the calculation model; the dimension indication information is used for indicating whether the data dimension is fixed or not; or alternatively

When the data dimension is the data dimension of the input data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the output data of the first computing node is fixed; wherein the first computing node is at least one computing node located before the target computing node; or alternatively

And when the data dimension is the data dimension of the output data of the target computing node, judging whether the data dimension is fixed according to whether the data dimension of the input data of the target computing node is fixed.
The apparatus according to claim 12, wherein the determining unit is configured to determine whether the data dimension of the output data of the first computing node is fixed according to whether the data dimension is fixed, specifically:

and if the data dimension of the output data of the first computing node is fixed, determining that the data dimension is fixed, otherwise, determining that the data dimension is variable.
The apparatus according to claim 12 or 13, wherein the determining unit is configured to determine, according to whether a data dimension of the input data of the target computing node is fixed, whether the data dimension is fixed, specifically for:

Determining that a data dimension of input data of the target computing node is changing when the data dimension is changing;

when the data dimension of the input data of the target computing node is fixed, judging whether the data dimension is fixed or not according to a set dimension analysis function and the data dimension of the input data of the target computing node, wherein the set dimension analysis function is used for judging whether the data dimension of the computing node is fixed or not according to the data dimension of the input data of the computing node.
The device according to any one of claims 11 to 14, wherein,

the processing unit is specifically configured to, when executing the computing operations corresponding to the plurality of computing instructions according to the compiled plurality of computing instructions:

executing the computing operation corresponding to the computing instructions through a first thread according to the compiled computing instructions;

the processing unit is further configured to:

when executing the computing operation corresponding to the computing instructions through the first thread, executing the following steps through the second thread:

determining a current data dimension of a second computing node, wherein the second computing node is at least one computing node located after the target computing node, the data dimension of the second computing node being variable;

Compiling a third computing instruction in a plurality of computing instructions corresponding to the second computing node according to the current data dimension of the second computing node, wherein the third computing instruction is used for indicating computing operation related to the data dimension of the second computing node.
The apparatus of claim 15, wherein the processing unit is further configured to:

executing a plurality of compiled computing instructions corresponding to the second computing node through the first thread.
A computing device comprising a memory and at least one processor;

the memory is used for storing a computer program;

the processor is configured to execute a computer program stored in the memory to implement the method of any one of claims 1-8.
A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when run on a computing device, causes the computing device to perform the method of any one of claims 1-8.